Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 13/16] include/asm-generic: prefer __section from compiler_attributes.h
From: Naveen N. Rao @ 2019-08-19 17:52 UTC (permalink / raw)
  To: akpm, Nick Desaulniers
  Cc: Anil S Keshavamurthy, Arnd Bergmann, Alexei Starovoitov, bpf,
	clang-built-linux, Daniel Borkmann, David S. Miller, jpoimboe,
	Martin KaFai Lau, linux-arch, linux-kernel, Masami Hiramatsu,
	miguel.ojeda.sandonis, netdev, sedat.dilek, Song Liu, yhs
In-Reply-To: <20190812215052.71840-13-ndesaulniers@google.com>

Nick Desaulniers wrote:
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

- Naveen


^ permalink raw reply

* Re: [GIT] Networking
From: pr-tracker-bot @ 2019-08-19 17:45 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20190818.194615.2174476213333990592.davem@davemloft.net>

The pull request you sent on Sun, 18 Aug 2019 19:46:15 -0700 (PDT):

> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git refs/heads/master

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/06821504fd47a5e5b641aeeb638a0ae10a216ef8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain
From: Marcelo Ricardo Leitner @ 2019-08-19 17:42 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Pravin B Shelar, netdev, David S. Miller, Justin Pettit,
	Simon Horman, Vlad Buslov, Jiri Pirko, Roi Dayan, Yossi Kuperman,
	Rony Efraim, Oz Shlomo
In-Reply-To: <1566144059-8247-1-git-send-email-paulb@mellanox.com>

On Sun, Aug 18, 2019 at 07:00:59PM +0300, Paul Blakey wrote:
> What do you guys say about the following diff on top of the last one?
> Use static key, and also have OVS_DP_CMD_SET command probe/enable the feature.
> 
> This will allow userspace to probe the feature, and selectivly enable it via the
> OVS_DP_CMD_SET command.

I'm not convinced yet that we need something like this. Been
wondering, skb_ext_find() below is not that expensive if not in use.
It's just a bit check and that's it, it returns NULL.

And drivers will only be setting this if they have tc-offloading
enabled (assuming they won't be seeing it for chain 0 all the time).
On which case, with tc offloading, we need this in order to work
properly.

Is the bit checking really that worrysome?

> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -853,8 +853,10 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  	key->mac_proto = res;
>  
>  #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> -	tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> -	key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +	if (static_branch_unlikely(&tc_recirc_sharing_support)) {
> +		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> +		key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +	}
>  #else
>  	key->recirc_id = 0;
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-19 17:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Thomas Gleixner, Jordan Glover, Andy Lutomirski,
	Daniel Colascione, Song Liu, Kees Cook, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190819172718.jwnvvotssxwhc7m6@ast-mbp.dhcp.thefacebook.com>



> On Aug 19, 2019, at 10:27 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
>> On Mon, Aug 19, 2019 at 11:15:11AM +0200, Thomas Gleixner wrote:
>> Alexei,
>> 
>>> On Sat, 17 Aug 2019, Alexei Starovoitov wrote:
>>>> On Fri, Aug 16, 2019 at 10:28:29PM +0200, Thomas Gleixner wrote:
>>>> On Fri, 16 Aug 2019, Alexei Starovoitov wrote:
>>>> While real usecases are helpful to understand a design decision, the design
>>>> needs to be usecase independent.
>>>> 
>>>> The kernel provides mechanisms, not policies. My impression of this whole
>>>> discussion is that it is policy driven. That's the wrong approach.
>>> 
>>> not sure what you mean by 'policy driven'.
>>> Proposed CAP_BPF is a policy?
>> 
>> I was referring to the discussion as a whole.
>> 
>>> Can kernel.unprivileged_bpf_disabled=1 be used now?
>>> Yes, but it will weaken overall system security because things that
>>> use unpriv to load bpf and CAP_NET_ADMIN to attach bpf would need
>>> to move to stronger CAP_SYS_ADMIN.
>>> 
>>> With CAP_BPF both load and attach would happen under CAP_BPF
>>> instead of CAP_SYS_ADMIN.
>> 
>> I'm not arguing against that.
>> 
>>>> So let's look at the mechanisms which we have at hand:
>>>> 
>>>> 1) Capabilities
>>>> 
>>>> 2) SUID and dropping priviledges
>>>> 
>>>> 3) Seccomp and LSM
>>>> 
>>>> Now the real interesting questions are:
>>>> 
>>>> A) What kind of restrictions does BPF allow? Is it a binary on/off or is
>>>>    there a more finegrained control of BPF functionality?
>>>> 
>>>>    TBH, I can't tell.
>>>> 
>>>> B) Depending on the answer to #A what is the control possibility for
>>>>    #1/#2/#3 ?
>>> 
>>> Can any of the mechanisms 1/2/3 address the concern in mds.rst?
>> 
>> Well, that depends. As with any other security policy which is implemented
>> via these mechanisms, the policy can be strict enough to prevent it by not
>> allowing certain operations. The more fine-grained the control is, it
>> allows the administrator who implements the policy to remove the
>> 'dangerous' parts from an untrusted user.
>> 
>> So really question #A is important for this. Is BPF just providing a binary
>> ON/OFF knob or does it allow to disable/enable certain aspects of BPF
>> functionality in a more fine grained way? If the latter, then it might be
>> possible to control functionality which might be abused for exploits of
>> some sorts (including MDS) in a way which allows other parts of BBF to be
>> exposed to less priviledged contexts.
> 
> I see. So the kernel.unprivileged_bpf_disabled knob is binary and I think it's
> the right mechanism to expose to users.
> Having N knobs for every map/prog type won't decrease attack surface.
> In the other email Andy's quoting seccomp man page...
> Today seccomp cannot really look into bpf_attr syscall args, but even
> if it could it won't secure the system.
> Examples:
> 1.
> spectre v2 is using bpf in-kernel interpreter in speculative way.
> The mere presence of interpreter as part of kernel .text makes the exploit
> easier to do. That was the reason to do CONFIG_BPF_JIT_ALWAYS_ON.
> For this case even kernel.unprivileged_bpf_disabled=1 was hopeless.
> 
> 2.
> var4 doing store hazard. It doesn't matter which program type is used.
> load/store instructions are the same across program types.
> 
> 3.
> prog_array was used as part of var1. I guess it was simply more
> convenient for Jann to do it this way :) All other map types
> have the same out-of-bounds speculation issue.
> 
> In general side channels are cpu bugs that are exploited via sequences
> of cpu instructions. In that sense bpf infra provides these instructions.
> So all program types and all maps have the same level of 'side channel risk'.
> 
>>> I believe Andy wants to expand the attack surface when
>>> kernel.unprivileged_bpf_disabled=0
>>> Before that happens I'd like the community to work on addressing the text above.
>> 
>> Well, that text above can be removed when the BPF wizards are entirely sure
>> that BPF cannot be abused to exploit stuff. 
> 
> Myself and Daniel looked at it in detail. I think we understood
> MDS mechanism well enough. Right now we're fairly confident that
> combination of existing mechanisms we did for var4 and
> verifier speculative analysis protect us from MDS.
> The thing is that every new cpu bug is looked at through the bpf lenses.
> Can it be exploited through bpf? Complexity of side channels
> is growing. Can the most recent swapgs be exploited ?
> What if we kprobe+bpf somewhere ?
> I don't think there is an issue, but we will never be 'entirely sure'.
> Even if myself and Daniel are sure the concern will stay.
> Unprivileged bpf as a whole is the concern due to side channels.
> The number of them are not yet disclosed. Who is going to analyze them?
> imo the only answer to that is kernel.unprivileged_bpf_disabled=1
> which together with CONFIG_BPF_JIT_ALWAYS_ON is secure enough.
> The other option is to sprinkle every bpf load/store with lfence
> which will make execution so slow that it will be unusable.
> Which is effectively the same as unprivileged_bpf_disabled=1.
> 
> There are other things we can do. Like kasan-style shadow memory
> for bpf execution. Auto re-JITing the code after it's running.
> We can do lfences everywhere for some time then re-JIT
> when kasan-ed shadow memory shows only clean memory accesses.
> The beauty of BPF that it is analyze-able and JIT-able instruction set.
> The verifier speculative analysis is an example that the kernel
> can analyze the speculative execution path that cpu will
> take before the code starts executing.
> Unprivileged bpf can made absolutely secure. It can be
> made more secure than the rest of the kernel.
> But today we should just go with unprivileged_bpf_disabled=1

I’m still okay with this.

> and CAP_BPF.
> 

I think this needs more design work.  I’m halfway through writing up an actual proposal. I’ll send it soon.

^ permalink raw reply

* Re: Help needed - Kernel lockup while running ipsec
From: Florian Westphal @ 2019-08-19 17:38 UTC (permalink / raw)
  To: Vakul Garg; +Cc: netdev@vger.kernel.org
In-Reply-To: <DB7PR04MB4620CD9AFFAFF8678F803DCE8BA80@DB7PR04MB4620.eurprd04.prod.outlook.com>

Vakul Garg <vakul.garg@nxp.com> wrote:
> Hi
> 
> With kernel 4.14.122, I am getting a kernel softlockup while running single static ipsec tunnel.
> The problem reproduces mostly after running 8-10 hours of ipsec encap test (on my dual core arm board).
> 
> I found that in function xfrm_policy_lookup_bytype(), the policy in variable 'ret' shows refcnt=0 under problem situation.
> This creates an infinite loop in  xfrm_policy_lookup_bytype() and hence the lockup.
> 
> Can some body please provide me pointers about 'refcnt'?
> Is it legitimate for 'refcnt' to become '0'? Under what condition can it become '0'?

Yes, when policy is destroyed and the last user calls
xfrm_pol_put() which will invoke call_rcu to free the structure.

^ permalink raw reply

* Re: [PATCH net-next 2/3] net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
From: Andrew Lunn @ 2019-08-19 17:34 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: netdev, lkml, Richard Cochran, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller
In-Reply-To: <CAFfN3gUNrnjdOt8bW2EugzjSZMb_vvdEaLN0yOv_06=roqcJYQ@mail.gmail.com>

On Mon, Aug 19, 2019 at 07:14:25PM +0200, Hubert Feurstein wrote:
> Hi Andrew,
> 
> Am Mo., 19. Aug. 2019 um 15:27 Uhr schrieb Andrew Lunn <andrew@lunn.ch>:
> >
> > > @@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
> > >  {
> > >       int ret;
> > >
> > > -     ret = mdiobus_write_nested(chip->bus, dev, reg, data);
> > > +     ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
> > > +                                    chip->ptp_sts);
> > >       if (ret < 0)
> > >               return ret;
> > >
> >
> > Please also make a similar change to mv88e6xxx_smi_indirect_write().
> > The last write in that function should be timestamped.
> Since it is already the last write it should be already ok (The
> mv88e6xxx_smi_indirect_write
> calls the mv88e6xxx_smi_direct_write which initiates the timestamping).

Hi Hubert

But you are also time stamping the first write as well. And it seems
like it is not completely for free. So it would be nice to limit it to
the write which actually matters.

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: enable and disable all ports
From: Marek Behun @ 2019-08-19 17:32 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, davem, f.fainelli, andrew
In-Reply-To: <20190818173548.19631-4-vivien.didelot@gmail.com>

On Sun, 18 Aug 2019 13:35:45 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> Call the .port_enable and .port_disable functions for all ports,
> not only the user ports, so that drivers may optimize the power
> consumption of all ports after a successful setup.
> 
> Unused ports are now disabled on setup. CPU and DSA ports are now
> enabled on setup and disabled on teardown. User ports were already
> enabled at slave creation and disabled at slave destruction.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

My original reason for enabling CPU and DSA ports is that enabling
serdes irq could not be done in .setup in mv88e6xxx, since the required
phylink structures did not yet exists for those ports.

The case after this patch would be that .port_enable is called for
CPU/DSA ports right after these required phylink structures are created
for this port. A thought came to me while reading this that some driver
in the future can expect, in their implementation of
port_enable/port_disable, that phylink structures already exist for all
ports, not just the one being currently enabled/disabled.

Wouldn't it be safer if CPU/DSA ports were enabled in setup after all
ports are registered, and disabled in teardown before ports are
unregistered?

Current:
  ->setup()
  for each port
      dsa_port_link_register_of()
      dsa_port_enable()

Proposed:
  ->setup()
  for each port
      dsa_port_link_register_of()
  for each port
      dsa_port_enable()

Marek

^ permalink raw reply

* [PATCH net-next v2 0/4] Improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Vivien Didelot,
	Florian Fainelli, Heiner Kallweit, Vladimir Oltean, Fugang Duan,
	David S. Miller

With this patchset the phc2sys synchronisation precision improved to +/-555ns on
an IMX6DL with an MV88E6220 switch attached.

This patchset takes into account the comments from the following discussions:
- https://lkml.org/lkml/2019/8/2/1364
- https://lkml.org/lkml/2019/8/5/169

Patch 01 adds the required infrastructure in the MDIO layer.
Patch 02 adds additional PTP offset compensation
Patch 03 adds support for the PTP_SYS_OFFSET_EXTENDED ioctl in the mv88e6xxx driver.
Patch 04 adds support for the PTP system timestamping in the imx-fec driver.

The following tests show the improvement caused by each patch. The system clock 
precision was set to 15ns instead of 333ns (as described in https://lkml.org/lkml/2019/8/2/1364).

Without this patchset applied, the phc2sys synchronisation performance was very 
poor:

  offset: min -27120 max 28840 mean 2.44 stddev 8040.78 count 1236
  delay:  min 282103 max 386385 mean 352568.03 stddev 27814.27 count 1236
  (test runtime 20 minutes)

Results after appling patch 01-03:

  offset: min -12316 max 13314 mean -9.38 stddev 4274.82 count 1022
  delay:  min 69977 max 96266 mean 87939.04 stddev 6466.17 count 1022
  (test runtime 16 minutes)

Results after appling patch 03:

  offset: min -788 max 528 mean -0.06 stddev 185.02 count 7171
  delay:  min 1773 max 2031 mean 1909.43 stddev 33.74 count 7171
  (test runtime 119 minutes)

Hubert Feurstein (4):
  net: mdio: add support for passing a PTP system timestamp to the
    mii_bus driver
  net: mdio: add PTP offset compensation to mdiobus_write_sts
  net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
  net: fec: add support for PTP system timestamping for MDIO devices

 drivers/net/dsa/mv88e6xxx/chip.h          |  2 +
 drivers/net/dsa/mv88e6xxx/ptp.c           | 11 +--
 drivers/net/dsa/mv88e6xxx/smi.c           |  3 +-
 drivers/net/ethernet/freescale/fec_main.c |  7 +-
 drivers/net/phy/mdio_bus.c                | 88 +++++++++++++++++++++++
 include/linux/mdio.h                      |  5 ++
 include/linux/phy.h                       | 42 +++++++++++
 7 files changed, 152 insertions(+), 6 deletions(-)

-- 
2.22.1


^ permalink raw reply

* [PATCH net-next v2 3/4] net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.h |  2 ++
 drivers/net/dsa/mv88e6xxx/ptp.c  | 11 +++++++----
 drivers/net/dsa/mv88e6xxx/smi.c  |  3 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index a406be2f5652..1bfde0d8a5a3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -275,6 +275,8 @@ struct mv88e6xxx_chip {
 	struct ptp_clock_info	ptp_clock_info;
 	struct delayed_work	tai_event_work;
 	struct ptp_pin_desc	pin_config[MV88E6XXX_MAX_GPIO];
+	struct ptp_system_timestamp *ptp_sts;
+
 	u16 trig_config;
 	u16 evcap_config;
 	u16 enable_count;
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index 073cbd0bb91b..cf6e52ee9e0a 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
 	return 0;
 }
 
-static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
-				 struct timespec64 *ts)
+static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
+				  struct timespec64 *ts,
+				  struct ptp_system_timestamp *sts)
 {
 	struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
 	u64 ns;
 
 	mv88e6xxx_reg_lock(chip);
+	chip->ptp_sts = sts;
 	ns = timecounter_read(&chip->tstamp_tc);
+	chip->ptp_sts = NULL;
 	mv88e6xxx_reg_unlock(chip);
 
 	*ts = ns_to_timespec64(ns);
@@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
 	struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
 	struct timespec64 ts;
 
-	mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
+	mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
 
 	schedule_delayed_work(&chip->overflow_work,
 			      MV88E6XXX_TAI_OVERFLOW_PERIOD);
@@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
 	chip->ptp_clock_info.max_adj    = MV88E6XXX_MAX_ADJ_PPB;
 	chip->ptp_clock_info.adjfine	= mv88e6xxx_ptp_adjfine;
 	chip->ptp_clock_info.adjtime	= mv88e6xxx_ptp_adjtime;
-	chip->ptp_clock_info.gettime64	= mv88e6xxx_ptp_gettime;
+	chip->ptp_clock_info.gettimex64	= mv88e6xxx_ptp_gettimex;
 	chip->ptp_clock_info.settime64	= mv88e6xxx_ptp_settime;
 	chip->ptp_clock_info.enable	= ptp_ops->ptp_enable;
 	chip->ptp_clock_info.verify	= ptp_ops->ptp_verify;
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 282fe08db050..abedd04ff2ae 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
 {
 	int ret;
 
-	ret = mdiobus_write_nested(chip->bus, dev, reg, data);
+	ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
+				       chip->ptp_sts);
 	if (ret < 0)
 		return ret;
 
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next v2 1/4] net: mdio: add support for passing a PTP system timestamp to the mii_bus driver
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.

This patch adds the required infrastructure to solve the problem described
above.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
Changes in v2:
 - Removed mdiobus_write_sts as there was no user
 - Removed ptp_sts_supported-boolean and introduced flags instead

 drivers/net/phy/mdio_bus.c | 76 ++++++++++++++++++++++++++++++++++++++
 include/linux/mdio.h       |  5 +++
 include/linux/phy.h        | 34 +++++++++++++++++
 3 files changed, 115 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index bd04fe762056..4dba2714495e 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -34,6 +34,7 @@
 #include <linux/phy.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/ptp_clock_kernel.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/mdio.h>
@@ -697,6 +698,81 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val)
 }
 EXPORT_SYMBOL(mdiobus_write);
 
+/**
+ * __mdiobus_write_sts - Unlocked version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * Write a MDIO bus register and request the MDIO bus driver to take the
+ * system timestamps when sts-pointer is valid. When the bus driver doesn't
+ * support this, the timestamps are taken in this function instead.
+ *
+ * In order to improve the synchronisation precision of phc2sys (from
+ * the linuxptp project) for devices like switches which are attached
+ * to the MDIO bus, it is necessary the get the system timestamps as
+ * close as possible to the access which causes the PTP timestamp
+ * register to be snapshotted in the switch hardware. Usually this is
+ * triggered by an MDIO write access, the snapshotted timestamp is then
+ * transferred by several MDIO reads.
+ *
+ * Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			struct ptp_system_timestamp *sts)
+{
+	int retval;
+
+	WARN_ON_ONCE(!mutex_is_locked(&bus->mdio_lock));
+
+	if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
+		ptp_read_system_prets(sts);
+
+	bus->ptp_sts = sts;
+	retval = __mdiobus_write(bus, addr, regnum, val);
+	bus->ptp_sts = NULL;
+
+	if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
+		ptp_read_system_postts(sts);
+
+	return retval;
+}
+EXPORT_SYMBOL(__mdiobus_write_sts);
+
+/**
+ * mdiobus_write_sts_nested - Nested version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * In case of nested MDIO bus access avoid lockdep false positives by
+ * using mutex_lock_nested().
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			     struct ptp_system_timestamp *sts)
+{
+	int retval;
+
+	BUG_ON(in_interrupt());
+
+	mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
+	retval = __mdiobus_write_sts(bus, addr, regnum, val, sts);
+	mutex_unlock(&bus->mdio_lock);
+
+	return retval;
+}
+EXPORT_SYMBOL(mdiobus_write_sts_nested);
+
 /**
  * mdio_bus_match - determine if given MDIO driver supports the given
  *		    MDIO device
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index e8242ad88c81..482388341c7b 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -9,6 +9,7 @@
 #include <uapi/linux/mdio.h>
 #include <linux/mod_devicetable.h>
 
+struct ptp_system_timestamp;
 struct gpio_desc;
 struct mii_bus;
 
@@ -305,11 +306,15 @@ static inline void mii_10gbt_stat_mod_linkmode_lpa_t(unsigned long *advertising,
 
 int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			struct ptp_system_timestamp *sts);
 
 int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
 int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			     struct ptp_system_timestamp *sts);
 
 int mdiobus_register_device(struct mdio_device *mdiodev);
 int mdiobus_unregister_device(struct mdio_device *mdiodev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index d26779f1fb6b..0b33662e0320 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -205,6 +205,13 @@ struct device;
 struct phylink;
 struct sk_buff;
 
+/* MII-bus flags:
+ * @MII_BUS_F_PTP_STS_SUPPORTED: The driver supports PTP system timestamping
+ */
+enum mii_bus_flags {
+	MII_BUS_F_PTP_STS_SUPPORTED = BIT(0)
+};
+
 /*
  * The Bus class for PHYs.  Devices which provide access to
  * PHYs should register using this structure
@@ -252,7 +259,34 @@ struct mii_bus {
 	int reset_delay_us;
 	/* RESET GPIO descriptor pointer */
 	struct gpio_desc *reset_gpiod;
+
+	/* Feature flags */
+	u32 flags;
+
+	/* PTP system timestamping support
+	 *
+	 * In order to improve the synchronisation precision of phc2sys (from
+	 * the linuxptp project) for devices like switches which are attached
+	 * to the MDIO bus, it is necessary the get the system timestamps as
+	 * close as possible to the access which causes the PTP timestamp
+	 * register to be snapshotted in the switch hardware. Usually this is
+	 * triggered by an MDIO write access, the snapshotted timestamp is then
+	 * transferred by several MDIO reads.
+	 *
+	 * The switch driver can use mdio_write_sts*() to pass through the
+	 * system timestamp pointer @ptp_sts to the MDIO bus driver. The bus
+	 * driver simply has to do the following calls in its write handler:
+	 *	ptp_read_system_prets(bus->ptp_sts);
+	 *	writel(value, mdio-register)
+	 *	ptp_read_system_postts(bus->ptp_sts);
+	 *
+	 * The ptp_read_system_*ts functions already check the ptp_sts pointer.
+	 * The MII_BUS_F_PTP_STS_SUPPORTED-bit must be set in flags, when the
+	 * MDIO bus driver takes the timestamps as described above.
+	 */
+	struct ptp_system_timestamp *ptp_sts;
 };
+
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
 
 struct mii_bus *mdiobus_alloc_size(size_t);
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next v2 4/4] net: fec: add support for PTP system timestamping for MDIO devices
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Fugang Duan,
	Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.

The ptp_read_system_*ts functions already check the ptp_sts pointer.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index c01d3ec3e9af..dd1253683ac0 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1815,10 +1815,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 	reinit_completion(&fep->mdio_done);
 
 	/* start a write op */
+	ptp_read_system_prets(bus->ptp_sts);
 	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
 		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
 		FEC_MMFR_TA | FEC_MMFR_DATA(value),
 		fep->hwp + FEC_MII_DATA);
+	ptp_read_system_postts(bus->ptp_sts);
 
 	/* wait for end of transfer */
 	time_left = wait_for_completion_timeout(&fep->mdio_done,
@@ -1956,7 +1958,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	struct device_node *node;
 	int err = -ENXIO;
-	u32 mii_speed, holdtime;
+	u32 mii_speed, mii_period, holdtime;
 
 	/*
 	 * The i.MX28 dual fec interfaces are not equal.
@@ -1993,6 +1995,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	 * document.
 	 */
 	mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 5000000);
+	mii_period = div_u64((u64)mii_speed * 2 * NSEC_PER_SEC, clk_get_rate(fep->clk_ipg));
 	if (fep->quirks & FEC_QUIRK_ENET_MAC)
 		mii_speed--;
 	if (mii_speed > 63) {
@@ -2034,6 +2037,8 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 		pdev->name, fep->dev_id + 1);
 	fep->mii_bus->priv = fep;
 	fep->mii_bus->parent = &pdev->dev;
+	fep->mii_bus->flags = MII_BUS_F_PTP_STS_SUPPORTED;
+	fep->mii_bus->ptp_sts_offset = 32 * mii_period;
 
 	node = of_get_child_by_name(pdev->dev.of_node, "mdio");
 	err = of_mdiobus_register(fep->mii_bus, node);
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next v2 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Hubert Feurstein @ 2019-08-19 17:28 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190819172827.9550-1-hubert.feurstein@vahle.at>

From: Hubert Feurstein <h.feurstein@gmail.com>

The slow MDIO access introduces quite a big offset (~13us) to the PTP
system time synchronisation. With this patch the driver has the possibility
to set the correct offset which can then be compensated.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/phy/mdio_bus.c | 12 ++++++++++++
 include/linux/phy.h        |  8 ++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 4dba2714495e..50a37cf46f96 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -739,6 +739,18 @@ int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
 	if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
 		ptp_read_system_postts(sts);
 
+	/* PTP offset compensation:
+	 * After the MDIO access is completed (from the chip perspective), the
+	 * switch chip will snapshot the PHC timestamp. To make sure our system
+	 * timestamp corresponds to the PHC timestamp, we have to add the
+	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
+	 * takes the average of pre_ts and post_ts to calculate the final
+	 * system timestamp. With this in mind, we have to add ptp_sts_offset
+	 * twice to post_ts, in order to not introduce an constant time offset.
+	 */
+	if (sts)
+		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
+
 	return retval;
 }
 EXPORT_SYMBOL(__mdiobus_write_sts);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 0b33662e0320..615df9c7f2c3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -283,8 +283,16 @@ struct mii_bus {
 	 * The ptp_read_system_*ts functions already check the ptp_sts pointer.
 	 * The MII_BUS_F_PTP_STS_SUPPORTED-bit must be set in flags, when the
 	 * MDIO bus driver takes the timestamps as described above.
+	 *
+	 * @ptp_sts_offset: This is the compensation offset for the system
+	 * timestamp which is introduced by the slow MDIO access duration. An
+	 * MDIO access consists of 32 clock cycles. Usually the MDIO bus runs
+	 * at ~2.5MHz, so we have to compensate ~12800ns offset.
+	 * Set the ptp_sts_offset to the exact duration of one MDIO frame
+	 * (= 32 * clock-period) in nano-seconds.
 	 */
 	struct ptp_system_timestamp *ptp_sts;
+	u32 ptp_sts_offset;
 };
 
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
-- 
2.22.1


^ permalink raw reply related

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Naveen N. Rao @ 2019-08-19 17:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: andrii.nakryiko, David S . Miller, Michael Holzheu,
	John Fastabend, kernel-team, Michal Rostecki, Sargun Dhillon
In-Reply-To: <20190816054543.2215626-1-andriin@fb.com>

Andrii Nakryiko wrote:
> bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> definitions essential to almost every BPF program. Which makes them
> useful not just for selftests. To be able to expose them as part of
> libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> BSD-2-Clause. This patch updates licensing of those two files.

Quite late, but FWIW:
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>


- Naveen


^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-19 17:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jordan Glover, Andy Lutomirski, Daniel Colascione, Song Liu,
	Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <alpine.DEB.2.21.1908191103130.1923@nanos.tec.linutronix.de>

On Mon, Aug 19, 2019 at 11:15:11AM +0200, Thomas Gleixner wrote:
> Alexei,
> 
> On Sat, 17 Aug 2019, Alexei Starovoitov wrote:
> > On Fri, Aug 16, 2019 at 10:28:29PM +0200, Thomas Gleixner wrote:
> > > On Fri, 16 Aug 2019, Alexei Starovoitov wrote:
> > > While real usecases are helpful to understand a design decision, the design
> > > needs to be usecase independent.
> > > 
> > > The kernel provides mechanisms, not policies. My impression of this whole
> > > discussion is that it is policy driven. That's the wrong approach.
> > 
> > not sure what you mean by 'policy driven'.
> > Proposed CAP_BPF is a policy?
> 
> I was referring to the discussion as a whole.
>  
> > Can kernel.unprivileged_bpf_disabled=1 be used now?
> > Yes, but it will weaken overall system security because things that
> > use unpriv to load bpf and CAP_NET_ADMIN to attach bpf would need
> > to move to stronger CAP_SYS_ADMIN.
> > 
> > With CAP_BPF both load and attach would happen under CAP_BPF
> > instead of CAP_SYS_ADMIN.
> 
> I'm not arguing against that.
> 
> > > So let's look at the mechanisms which we have at hand:
> > > 
> > >  1) Capabilities
> > >  
> > >  2) SUID and dropping priviledges
> > > 
> > >  3) Seccomp and LSM
> > > 
> > > Now the real interesting questions are:
> > > 
> > >  A) What kind of restrictions does BPF allow? Is it a binary on/off or is
> > >     there a more finegrained control of BPF functionality?
> > > 
> > >     TBH, I can't tell.
> > > 
> > >  B) Depending on the answer to #A what is the control possibility for
> > >     #1/#2/#3 ?
> > 
> > Can any of the mechanisms 1/2/3 address the concern in mds.rst?
> 
> Well, that depends. As with any other security policy which is implemented
> via these mechanisms, the policy can be strict enough to prevent it by not
> allowing certain operations. The more fine-grained the control is, it
> allows the administrator who implements the policy to remove the
> 'dangerous' parts from an untrusted user.
> 
> So really question #A is important for this. Is BPF just providing a binary
> ON/OFF knob or does it allow to disable/enable certain aspects of BPF
> functionality in a more fine grained way? If the latter, then it might be
> possible to control functionality which might be abused for exploits of
> some sorts (including MDS) in a way which allows other parts of BBF to be
> exposed to less priviledged contexts.

I see. So the kernel.unprivileged_bpf_disabled knob is binary and I think it's
the right mechanism to expose to users.
Having N knobs for every map/prog type won't decrease attack surface.
In the other email Andy's quoting seccomp man page...
Today seccomp cannot really look into bpf_attr syscall args, but even
if it could it won't secure the system.
Examples:
1.
spectre v2 is using bpf in-kernel interpreter in speculative way.
The mere presence of interpreter as part of kernel .text makes the exploit
easier to do. That was the reason to do CONFIG_BPF_JIT_ALWAYS_ON.
For this case even kernel.unprivileged_bpf_disabled=1 was hopeless.

2.
var4 doing store hazard. It doesn't matter which program type is used.
load/store instructions are the same across program types.

3.
prog_array was used as part of var1. I guess it was simply more
convenient for Jann to do it this way :) All other map types
have the same out-of-bounds speculation issue.

In general side channels are cpu bugs that are exploited via sequences
of cpu instructions. In that sense bpf infra provides these instructions.
So all program types and all maps have the same level of 'side channel risk'.

> > I believe Andy wants to expand the attack surface when
> > kernel.unprivileged_bpf_disabled=0
> > Before that happens I'd like the community to work on addressing the text above.
> 
> Well, that text above can be removed when the BPF wizards are entirely sure
> that BPF cannot be abused to exploit stuff. 

Myself and Daniel looked at it in detail. I think we understood
MDS mechanism well enough. Right now we're fairly confident that
combination of existing mechanisms we did for var4 and
verifier speculative analysis protect us from MDS.
The thing is that every new cpu bug is looked at through the bpf lenses.
Can it be exploited through bpf? Complexity of side channels
is growing. Can the most recent swapgs be exploited ?
What if we kprobe+bpf somewhere ?
I don't think there is an issue, but we will never be 'entirely sure'.
Even if myself and Daniel are sure the concern will stay.
Unprivileged bpf as a whole is the concern due to side channels.
The number of them are not yet disclosed. Who is going to analyze them?
imo the only answer to that is kernel.unprivileged_bpf_disabled=1
which together with CONFIG_BPF_JIT_ALWAYS_ON is secure enough.
The other option is to sprinkle every bpf load/store with lfence
which will make execution so slow that it will be unusable.
Which is effectively the same as unprivileged_bpf_disabled=1.

There are other things we can do. Like kasan-style shadow memory
for bpf execution. Auto re-JITing the code after it's running.
We can do lfences everywhere for some time then re-JIT
when kasan-ed shadow memory shows only clean memory accesses.
The beauty of BPF that it is analyze-able and JIT-able instruction set.
The verifier speculative analysis is an example that the kernel
can analyze the speculative execution path that cpu will
take before the code starts executing.
Unprivileged bpf can made absolutely secure. It can be
made more secure than the rest of the kernel.
But today we should just go with unprivileged_bpf_disabled=1
and CAP_BPF.

^ permalink raw reply

* Re: [PATCH net-next 1/6] net: dsa: use a single switch statement for port setup
From: Vivien Didelot @ 2019-08-19 17:20 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, marek.behun, davem, andrew
In-Reply-To: <cb37b6b0-c6c4-6f77-3658-a5cf676fabfe@gmail.com>

Hi Florian,

On Mon, 19 Aug 2019 10:14:24 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 8/18/19 10:35 AM, Vivien Didelot wrote:
> > It is currently difficult to read the different steps involved in the
> > setup and teardown of ports in the DSA code. Keep it simple with a
> > single switch statement for each port type: UNUSED, CPU, DSA, or USER.
> > 
> > Also no need to call devlink_port_unregister from within dsa_port_setup
> > as this step is inconditionally handled by dsa_port_teardown on error.
> > 
> > Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
> > ---
> 
> [snip]
> 
> >  	case DSA_PORT_TYPE_CPU:
> > +		memset(dlp, 0, sizeof(*dlp));
> > +		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_CPU,
> > +				       dp->index, false, 0, id, len);
> > +		err = devlink_port_register(dl, dlp, dp->index);
> > +		if (err)
> > +			return err;
> 
> This is shared between all port flavors with the exception that the
> flavor type is different, maybe we should create a helper function and
> factor out even more code. I don't feel great about repeating 3 times t
> the same code without making use of a fall through.

Nah I did not want to use an helper because the code is still too
noodly, if you look at user ports setup, we continue to setup devlink
after the slave creation, so here I prefered to keep things readable
and expose all steps first, before writing yet another helper.


Thanks,

	Vivien

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: enable and disable all ports
From: Florian Fainelli @ 2019-08-19 17:17 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: marek.behun, davem, andrew
In-Reply-To: <20190818173548.19631-4-vivien.didelot@gmail.com>

On 8/18/19 10:35 AM, Vivien Didelot wrote:
> Call the .port_enable and .port_disable functions for all ports,
> not only the user ports, so that drivers may optimize the power
> consumption of all ports after a successful setup.
> 
> Unused ports are now disabled on setup. CPU and DSA ports are now
> enabled on setup and disabled on teardown. User ports were already
> enabled at slave creation and disabled at slave destruction.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 2/6] net: dsa: do not enable or disable non user ports
From: Florian Fainelli @ 2019-08-19 17:16 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: marek.behun, davem, andrew
In-Reply-To: <20190818173548.19631-3-vivien.didelot@gmail.com>

On 8/18/19 10:35 AM, Vivien Didelot wrote:
> The .port_enable and .port_disable operations are currently only
> called for user ports, hence assuming they have a slave device. In
> preparation for using these operations for other port types as well,
> simply guard all implementations against non user ports and return
> directly in such case.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
> ---

[snip]

>
diff --git a/drivers/net/dsa/b53/b53_common.c
b/drivers/net/dsa/b53/b53_common.c
> index 907af62846ba..1639ea7b7dab 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -510,10 +510,15 @@ EXPORT_SYMBOL(b53_imp_vlan_setup);
>  int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
>  {
>  	struct b53_device *dev = ds->priv;
> -	unsigned int cpu_port = ds->ports[port].cpu_dp->index;
> +	unsigned int cpu_port;
>  	int ret = 0;
>  	u16 pvlan;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return 0;
> +
> +	cpu_port = ds->ports[port].cpu_dp->index;
> +
>  	if (dev->ops->irq_enable)
>  		ret = dev->ops->irq_enable(dev, port);
>  	if (ret)
> @@ -547,6 +552,9 @@ void b53_disable_port(struct dsa_switch *ds, int port)
>  	struct b53_device *dev = ds->priv;
>  	u8 reg;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return;
> +
>  	/* Disable Tx/Rx for the port */
>  	b53_read8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), &reg);
>  	reg |= PORT_CTRL_RX_DISABLE | PORT_CTRL_TX_DISABLE;

For b53, the changes are correct, for bcm_sf2, see comments below:

> diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
> index 49f99436018a..3d06262817bd 100644
> --- a/drivers/net/dsa/bcm_sf2.c
> +++ b/drivers/net/dsa/bcm_sf2.c
> @@ -157,6 +157,9 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port,
>  	unsigned int i;
>  	u32 reg;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return 0;
> +
>  	/* Clear the memory power down */
>  	reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL);
>  	reg &= ~P_TXQ_PSM_VDD(port);
> @@ -222,6 +225,9 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, int port)
>  	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
>  	u32 reg;
>  
> +	if (!dsa_is_user_port(ds, port))
> +		return;

in bcm_sf2_sw_suspend(), we do call bcm_sf2_port_disable() against the
user and CPU port.
-- 
Florian

^ permalink raw reply

* Re: What to do when a bridge port gets its pvid deleted?
From: Vladimir Oltean @ 2019-08-19 17:15 UTC (permalink / raw)
  To: Florian Fainelli, Ido Schimmel, roopa, nikolay
  Cc: netdev, Andrew Lunn, Vivien Didelot, stephen
In-Reply-To: <bb99eabb-1410-e7c2-4226-ee6c5fef6880@gmail.com>

On 6/28/19 7:45 PM, Florian Fainelli wrote:
> On 6/28/19 5:37 AM, Vladimir Oltean wrote:
>> On Fri, 28 Jun 2019 at 15:30, Ido Schimmel <idosch@idosch.org> wrote:
>>>
>>> On Tue, Jun 25, 2019 at 11:49:29PM +0300, Vladimir Oltean wrote:
>>>> A number of DSA drivers (BCM53XX, Microchip KSZ94XX, Mediatek MT7530
>>>> at the very least), as well as Mellanox Spectrum (I didn't look at all
>>>> the pure switchdev drivers) try to restore the pvid to a default value
>>>> on .port_vlan_del.
>>>
>>> I don't know about DSA drivers, but that's not what mlxsw is doing. If
>>> the VLAN that is configured as PVID is deleted from the bridge port, the
>>> driver instructs the port to discard untagged and prio-tagged packets.
>>> This is consistent with the bridge driver's behavior.
>>>
>>> We do have a flow the "restores" the PVID, but that's when a port is
>>> unlinked from its bridge master. The PVID we set is 4095 which cannot be
>>> configured by the 8021q / bridge driver. This is due to the way the
>>> underlying hardware works. Even if a port is not bridged and used purely
>>> for routing, packets still do L2 lookup first which sends them directly
>>> to the router block. If PVID is not configured, untagged packets could
>>> not be routed. Obviously, at egress we strip this VLAN.
>>>
>>>> Sure, the port stops receiving traffic when its pvid is a VLAN ID that
>>>> is not installed in its hw filter, but as far as the bridge core is
>>>> concerned, this is to be expected:
>>>>
>>>> # bridge vlan add dev swp2 vid 100 pvid untagged
>>>> # bridge vlan
>>>> port    vlan ids
>>>> swp5     1 PVID Egress Untagged
>>>>
>>>> swp2     1 Egress Untagged
>>>>           100 PVID Egress Untagged
>>>>
>>>> swp3     1 PVID Egress Untagged
>>>>
>>>> swp4     1 PVID Egress Untagged
>>>>
>>>> br0      1 PVID Egress Untagged
>>>> # ping 10.0.0.1
>>>> PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
>>>> 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.682 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.299 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.251 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.324 ms
>>>> 64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=0.257 ms
>>>> ^C
>>>> --- 10.0.0.1 ping statistics ---
>>>> 5 packets transmitted, 5 received, 0% packet loss, time 4188ms
>>>> rtt min/avg/max/mdev = 0.251/0.362/0.682/0.163 ms
>>>> # bridge vlan del dev swp2 vid 100
>>>> # bridge vlan
>>>> port    vlan ids
>>>> swp5     1 PVID Egress Untagged
>>>>
>>>> swp2     1 Egress Untagged
>>>>
>>>> swp3     1 PVID Egress Untagged
>>>>
>>>> swp4     1 PVID Egress Untagged
>>>>
>>>> br0      1 PVID Egress Untagged
>>>>
>>>> # ping 10.0.0.1
>>>> PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
>>>> ^C
>>>> --- 10.0.0.1 ping statistics ---
>>>> 8 packets transmitted, 0 received, 100% packet loss, time 7267ms
>>>>
>>>> What is the consensus here? Is there a reason why the bridge driver
>>>> doesn't take care of this?
>>>
>>> Take care of what? :) Always maintaining a PVID on the bridge port? It's
>>> completely OK not to have a PVID.
>>>
>>
>> Yes, I didn't think it through during the first email. I came to the
>> same conclusion in the second one.
>>
>>>> Do switchdev drivers have to restore the pvid to always be
>>>> operational, even if their state becomes inconsistent with the upper
>>>> dev? Is it just 'nice to have'? What if VID 1 isn't in the hw filter
>>>> either (perfectly legal)?
>>>
>>> Are you saying that DSA drivers always maintain a PVID on the bridge
>>> port and allow untagged traffic to ingress regardless of the bridge
>>> driver's configuration? If so, I think this needs to be fixed.
>>
>> Well, not at the DSA core level.
>> But for Microchip:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/dsa/microchip/ksz9477.c#n576
>> For Broadcom:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/dsa/b53/b53_common.c#n1376
>> For Mediatek:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/dsa/mt7530.c#n1196
>>
>> There might be others as well.
> 
> That sounds bogus indeed, and I bet that the two other drivers just
> followed the b53 driver there. I will have to test this again and come
> up with a patch eventually.
> 
> When the port leaves the bridge we do bring it back into a default PVID
> (which is different than the bridge's default PVID) so that part should
> be okay.
> 

Adding a few more networking people.
So my flow is something like this:
- Boot a board with a DSA switch
- Bring all interfaces up
- Enslave all interfaces to br0
- Enable vlan_filtering on br0

What VIDs should be installed into the ports' hw filters, and what 
should the pvid be at this point?
Should the switch ports pass any traffic?
At this point, 'bridge vlan' shows a confusing:
port    vlan ids
eth0     1 PVID Egress Untagged

swp5     1 PVID Egress Untagged

swp2     1 PVID Egress Untagged

swp3     1 PVID Egress Untagged

swp4     1 PVID Egress Untagged

br0      1 PVID Egress Untagged
for all ports, but the .port_vlan_add callback is nowhere to be found.

Whose responsibility is it for the switch to pass traffic without any 
further 'bridge vlan' command? What is the mechanism through which this 
should work?

What if I do:
sudo bridge vlan add vid 100 dev swp2 pvid untagged
echo 0 | sudo tee /sys/class/net/br0/bridge/vlan_filtering
echo 1 | sudo tee /sys/class/net/br0/bridge/vlan_filtering
What pvid should there be on swp2 now?
'bridge vlan' shows:
port    vlan ids
eth0     1 PVID Egress Untagged

swp5     1 PVID Egress Untagged

swp2     1 Egress Untagged
          100 PVID Egress Untagged

swp3     1 PVID Egress Untagged

swp4     1 PVID Egress Untagged

br0      1 PVID Egress Untagged
If the 'bridge vlan' output is correct, whose responsibility is it to 
restore this pvid?

More context: the sja1105 driver is somewhat similar to the mlxsw in 
that VLAN awareness cannot be truly disabled. Arid details aside, in 
both cases, achieving "VLAN-unaware"-like behavior involves manipulating 
the pvid in both cases. But it appears that the bridge core does expect:
(1) that the driver performs a default VLAN initialization which matches 
its own, without them ever communicating. But because switchdev/DSA 
drivers start off in standalone mode, vlan_filtering=0 comes first, 
hence the non-standard pvid. Through what mechanism is the 
bridge-expected pvid supposed to get restored upon flipping vlan_filtering?
(2) that toggling VLAN filtering off and on has no other state upon the 
underlying driver than enabling and disabling VLAN awareness. The VLAN 
hw filter table is assumed to be unchanged. Is this a correct assumption?

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH net-next 2/3] net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
From: Hubert Feurstein @ 2019-08-19 17:14 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, lkml, Richard Cochran, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller
In-Reply-To: <20190819132733.GE8981@lunn.ch>

Hi Andrew,

Am Mo., 19. Aug. 2019 um 15:27 Uhr schrieb Andrew Lunn <andrew@lunn.ch>:
>
> > @@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
> >  {
> >       int ret;
> >
> > -     ret = mdiobus_write_nested(chip->bus, dev, reg, data);
> > +     ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
> > +                                    chip->ptp_sts);
> >       if (ret < 0)
> >               return ret;
> >
>
> Please also make a similar change to mv88e6xxx_smi_indirect_write().
> The last write in that function should be timestamped.
Since it is already the last write it should be already ok (The
mv88e6xxx_smi_indirect_write
calls the mv88e6xxx_smi_direct_write which initiates the timestamping).

Hubert

^ permalink raw reply

* Re: [PATCH net-next 1/6] net: dsa: use a single switch statement for port setup
From: Florian Fainelli @ 2019-08-19 17:14 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: marek.behun, davem, andrew
In-Reply-To: <20190818173548.19631-2-vivien.didelot@gmail.com>

On 8/18/19 10:35 AM, Vivien Didelot wrote:
> It is currently difficult to read the different steps involved in the
> setup and teardown of ports in the DSA code. Keep it simple with a
> single switch statement for each port type: UNUSED, CPU, DSA, or USER.
> 
> Also no need to call devlink_port_unregister from within dsa_port_setup
> as this step is inconditionally handled by dsa_port_teardown on error.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
> ---

[snip]

>  	case DSA_PORT_TYPE_CPU:
> +		memset(dlp, 0, sizeof(*dlp));
> +		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_CPU,
> +				       dp->index, false, 0, id, len);
> +		err = devlink_port_register(dl, dlp, dp->index);
> +		if (err)
> +			return err;

This is shared between all port flavors with the exception that the
flavor type is different, maybe we should create a helper function and
factor out even more code. I don't feel great about repeating 3 times t
the same code without making use of a fall through.
-- 
Florian

^ permalink raw reply

* [PATCH net-next 0/1] net: fec: add C45 MDIO read/write support
From: Marco Hartmann @ 2019-08-19 17:11 UTC (permalink / raw)
  To: Marco Hartmann, Andy Duan, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Christian Herber

As of yet, the Fast Ethernet Controller (FEC) driver only supports Clause 22
conform MDIO transactions. IEEE 802.3ae Clause 45 defines a modified MDIO
protocol that uses a two staged access model in order to increase the address
space.

This patch adds support for Clause 45 conform MDIO read and write operations to
the FEC driver.

Marco Hartmann (1):
  fec: add C45 MDIO read/write support

 drivers/net/ethernet/freescale/fec_main.c | 65 ++++++++++++++++++++++++++++---
 1 file changed, 59 insertions(+), 6 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net-next 1/1] fec: add C45 MDIO read/write support
From: Marco Hartmann @ 2019-08-19 17:11 UTC (permalink / raw)
  To: Marco Hartmann, Andy Duan, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Christian Herber
In-Reply-To: <1566234659-7164-1-git-send-email-marco.hartmann@nxp.com>

IEEE 802.3ae clause 45 defines a modified MDIO protocol that uses a two
staged access model in order to increase the address space.

This patch adds support for C45 MDIO read and write accesses, which are
used whenever the MII_ADDR_C45 flag in the regnum argument is set.
In case it is not set, C22 accesses are used as before.

Co-developed-by: Christian Herber <christian.herber@nxp.com>
Signed-off-by: Christian Herber <christian.herber@nxp.com>
Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 65 ++++++++++++++++++++++++++++---
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index c01d3ec3e9af..73f8f9a149a1 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -208,8 +208,11 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 
 /* FEC MII MMFR bits definition */
 #define FEC_MMFR_ST		(1 << 30)
+#define FEC_MMFR_ST_C45		(0)
 #define FEC_MMFR_OP_READ	(2 << 28)
+#define FEC_MMFR_OP_READ_C45	(3 << 28)
 #define FEC_MMFR_OP_WRITE	(1 << 28)
+#define FEC_MMFR_OP_ADDR_WRITE	(0)
 #define FEC_MMFR_PA(v)		((v & 0x1f) << 23)
 #define FEC_MMFR_RA(v)		((v & 0x1f) << 18)
 #define FEC_MMFR_TA		(2 << 16)
@@ -1767,7 +1770,7 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 	struct fec_enet_private *fep = bus->priv;
 	struct device *dev = &fep->pdev->dev;
 	unsigned long time_left;
-	int ret = 0;
+	int ret = 0, frame_start, frame_addr, frame_op;
 
 	ret = pm_runtime_get_sync(dev);
 	if (ret < 0)
@@ -1775,9 +1778,36 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 
 	reinit_completion(&fep->mdio_done);
 
+	if (MII_ADDR_C45 & regnum) {
+		frame_start = FEC_MMFR_ST_C45;
+
+		/* write address */
+		frame_addr = (regnum >> 16);
+		writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
+		       FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
+		       FEC_MMFR_TA | (regnum & 0xFFFF),
+		       fep->hwp + FEC_MII_DATA);
+
+		/* wait for end of transfer */
+		time_left = wait_for_completion_timeout(&fep->mdio_done,
+				usecs_to_jiffies(FEC_MII_TIMEOUT));
+		if (time_left == 0) {
+			netdev_err(fep->netdev, "MDIO address write timeout\n");
+			ret  = -ETIMEDOUT;
+		}
+
+		frame_op = FEC_MMFR_OP_READ_C45;
+
+	} else {
+		/* C22 read */
+		frame_op = FEC_MMFR_OP_READ;
+		frame_start = FEC_MMFR_ST;
+		frame_addr = regnum;
+	}
+
 	/* start a read op */
-	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
-		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+	writel(frame_start | frame_op |
+		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
 		FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
 
 	/* wait for end of transfer */
@@ -1804,7 +1834,7 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 	struct fec_enet_private *fep = bus->priv;
 	struct device *dev = &fep->pdev->dev;
 	unsigned long time_left;
-	int ret;
+	int ret, frame_start, frame_addr;
 
 	ret = pm_runtime_get_sync(dev);
 	if (ret < 0)
@@ -1814,9 +1844,32 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 
 	reinit_completion(&fep->mdio_done);
 
+	if (MII_ADDR_C45 & regnum) {
+		frame_start = FEC_MMFR_ST_C45;
+
+		/* write address */
+		frame_addr = (regnum >> 16);
+		writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
+		       FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
+		       FEC_MMFR_TA | (regnum & 0xFFFF),
+		       fep->hwp + FEC_MII_DATA);
+
+		/* wait for end of transfer */
+		time_left = wait_for_completion_timeout(&fep->mdio_done,
+			usecs_to_jiffies(FEC_MII_TIMEOUT));
+		if (time_left == 0) {
+			netdev_err(fep->netdev, "MDIO address write timeout\n");
+			ret  = -ETIMEDOUT;
+		}
+	} else {
+		/* C22 write */
+		frame_start = FEC_MMFR_ST;
+		frame_addr = regnum;
+	}
+
 	/* start a write op */
-	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
-		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+	writel(frame_start | FEC_MMFR_OP_WRITE |
+		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
 		FEC_MMFR_TA | FEC_MMFR_DATA(value),
 		fep->hwp + FEC_MII_DATA);
 
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH net-next 4/6] net: dsa: mv88e6xxx: do not change STP state on port disabling
From: Andrew Lunn @ 2019-08-19 16:44 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, marek.behun, davem, f.fainelli
In-Reply-To: <20190819122737.GB16144@t480s.localdomain>

On Mon, Aug 19, 2019 at 12:27:37PM -0400, Vivien Didelot wrote:
> On Mon, 19 Aug 2019 18:10:18 +0200, Andrew Lunn <andrew@lunn.ch> wrote:
> > > On Mon, 19 Aug 2019 15:40:57 +0200, Andrew Lunn <andrew@lunn.ch> wrote:
> > > > On Sun, Aug 18, 2019 at 01:35:46PM -0400, Vivien Didelot wrote:
> > > > > When disabling a port, that is not for the driver to decide what to
> > > > > do with the STP state. This is already handled by the DSA layer.
> > > > 
> > > > Putting the port into STP disabled state is how you actually disable
> > > > it, for the mv88e6xxx. So this is not really about STP, it is about
> > > > powering off the port. Maybe a comment is needed, rather than removing
> > > > the code?
> > > 
> > > This is not for the driver to decide, the stack already handles that.
> > > Otherwise, calling dsa_port_disable on a bridged port would result in
> > > mv88e6xxx forcing the STP state to Disabled while this is not expected.
> 
> [...]
> 
> > Are you saying the core already sets the STP to disabled, for ports
> > which are unused? I did not spot that in your previous patch?
> 
> Just look at dsa_port_disable Andrew:
> 
> 
>     void dsa_port_disable(struct dsa_port *dp)
>     {
>     	struct dsa_switch *ds = dp->ds;
>     	int port = dp->index;
>     
>     	if (!dp->bridge_dev)
>     		dsa_port_set_state_now(dp, BR_STATE_DISABLED);
>     
>     	if (ds->ops->port_disable)
>     		ds->ops->port_disable(ds, port);
>     }
> 

Ah, cool. I completely missed that.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH v5 03/17] nvmem: core: add nvmem_device_find
From: Thomas Bogendoerfer @ 2019-08-19 16:31 UTC (permalink / raw)
  To: Jonathan Corbet, Ralf Baechle, Paul Burton, James Hogan,
	Dmitry Torokhov, Lee Jones, David S. Miller, Srinivas Kandagatla,
	Alessandro Zummo, Alexandre Belloni, Greg Kroah-Hartman,
	Jiri Slaby, Evgeniy Polyakov, linux-doc, linux-kernel, linux-mips,
	linux-input, netdev, linux-rtc, linux-serial
In-Reply-To: <20190819163144.3478-1-tbogendoerfer@suse.de>

nvmem_device_find provides a way to search for nvmem devices with
the help of a match function simlair to bus_find_device.

Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
---
 Documentation/driver-api/nvmem.rst |  2 ++
 drivers/nvmem/core.c               | 62 ++++++++++++++++++++------------------
 include/linux/nvmem-consumer.h     |  9 ++++++
 3 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/Documentation/driver-api/nvmem.rst b/Documentation/driver-api/nvmem.rst
index d9d958d5c824..287e86819640 100644
--- a/Documentation/driver-api/nvmem.rst
+++ b/Documentation/driver-api/nvmem.rst
@@ -129,6 +129,8 @@ To facilitate such consumers NVMEM framework provides below apis::
   struct nvmem_device *nvmem_device_get(struct device *dev, const char *name);
   struct nvmem_device *devm_nvmem_device_get(struct device *dev,
 					   const char *name);
+  struct nvmem_device *nvmem_device_find(void *data,
+			int (*match)(struct device *dev, const void *data));
   void nvmem_device_put(struct nvmem_device *nvmem);
   int nvmem_device_read(struct nvmem_device *nvmem, unsigned int offset,
 		      size_t bytes, void *buf);
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index ac5d945be88a..e591ba54758f 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -76,36 +76,18 @@ static struct bus_type nvmem_bus_type = {
 	.name		= "nvmem",
 };
 
+#if IS_ENABLED(CONFIG_OF)
 static int of_nvmem_match(struct device *dev, const void *nvmem_np)
 {
 	return dev->of_node == nvmem_np;
 }
+#endif
 
-static struct nvmem_device *of_nvmem_find(struct device_node *nvmem_np)
+static int nvmem_match_name(struct device *dev, const void *data)
 {
-	struct device *d;
-
-	if (!nvmem_np)
-		return NULL;
-
-	d = bus_find_device(&nvmem_bus_type, NULL, nvmem_np, of_nvmem_match);
-
-	if (!d)
-		return NULL;
+	const char *name = data;
 
-	return to_nvmem_device(d);
-}
-
-static struct nvmem_device *nvmem_find(const char *name)
-{
-	struct device *d;
-
-	d = bus_find_device_by_name(&nvmem_bus_type, NULL, name);
-
-	if (!d)
-		return NULL;
-
-	return to_nvmem_device(d);
+	return sysfs_streq(name, dev_name(dev));
 }
 
 static void nvmem_cell_drop(struct nvmem_cell *cell)
@@ -537,13 +519,16 @@ int devm_nvmem_unregister(struct device *dev, struct nvmem_device *nvmem)
 }
 EXPORT_SYMBOL(devm_nvmem_unregister);
 
-static struct nvmem_device *__nvmem_device_get(struct device_node *np,
-					       const char *nvmem_name)
+static struct nvmem_device *__nvmem_device_get(void *data,
+			int (*match)(struct device *dev, const void *data))
 {
 	struct nvmem_device *nvmem = NULL;
+	struct device *dev;
 
 	mutex_lock(&nvmem_mutex);
-	nvmem = np ? of_nvmem_find(np) : nvmem_find(nvmem_name);
+	dev = bus_find_device(&nvmem_bus_type, NULL, data, match);
+	if (dev)
+		nvmem = to_nvmem_device(dev);
 	mutex_unlock(&nvmem_mutex);
 	if (!nvmem)
 		return ERR_PTR(-EPROBE_DEFER);
@@ -592,7 +577,7 @@ struct nvmem_device *of_nvmem_device_get(struct device_node *np, const char *id)
 	if (!nvmem_np)
 		return ERR_PTR(-ENOENT);
 
-	return __nvmem_device_get(nvmem_np, NULL);
+	return __nvmem_device_get(nvmem_np, of_nvmem_match);
 }
 EXPORT_SYMBOL_GPL(of_nvmem_device_get);
 #endif
@@ -618,10 +603,26 @@ struct nvmem_device *nvmem_device_get(struct device *dev, const char *dev_name)
 
 	}
 
-	return __nvmem_device_get(NULL, dev_name);
+	return __nvmem_device_get((void *)dev_name, nvmem_match_name);
 }
 EXPORT_SYMBOL_GPL(nvmem_device_get);
 
+/**
+ * nvmem_device_find() - Find nvmem device with matching function
+ *
+ * @data: Data to pass to match function
+ * @match: Callback function to check device
+ *
+ * Return: ERR_PTR() on error or a valid pointer to a struct nvmem_device
+ * on success.
+ */
+struct nvmem_device *nvmem_device_find(void *data,
+			int (*match)(struct device *dev, const void *data))
+{
+	return __nvmem_device_get(data, match);
+}
+EXPORT_SYMBOL_GPL(nvmem_device_find);
+
 static int devm_nvmem_device_match(struct device *dev, void *res, void *data)
 {
 	struct nvmem_device **nvmem = res;
@@ -715,7 +716,8 @@ nvmem_cell_get_from_lookup(struct device *dev, const char *con_id)
 		if ((strcmp(lookup->dev_id, dev_id) == 0) &&
 		    (strcmp(lookup->con_id, con_id) == 0)) {
 			/* This is the right entry. */
-			nvmem = __nvmem_device_get(NULL, lookup->nvmem_name);
+			nvmem = __nvmem_device_get((void *)lookup->nvmem_name,
+						   nvmem_match_name);
 			if (IS_ERR(nvmem)) {
 				/* Provider may not be registered yet. */
 				cell = ERR_CAST(nvmem);
@@ -785,7 +787,7 @@ struct nvmem_cell *of_nvmem_cell_get(struct device_node *np, const char *id)
 	if (!nvmem_np)
 		return ERR_PTR(-EINVAL);
 
-	nvmem = __nvmem_device_get(nvmem_np, NULL);
+	nvmem = __nvmem_device_get(nvmem_np, of_nvmem_match);
 	of_node_put(nvmem_np);
 	if (IS_ERR(nvmem))
 		return ERR_CAST(nvmem);
diff --git a/include/linux/nvmem-consumer.h b/include/linux/nvmem-consumer.h
index 8f8be5b00060..02dc4aa992b2 100644
--- a/include/linux/nvmem-consumer.h
+++ b/include/linux/nvmem-consumer.h
@@ -89,6 +89,9 @@ void nvmem_del_cell_lookups(struct nvmem_cell_lookup *entries,
 int nvmem_register_notifier(struct notifier_block *nb);
 int nvmem_unregister_notifier(struct notifier_block *nb);
 
+struct nvmem_device *nvmem_device_find(void *data,
+			int (*match)(struct device *dev, const void *data));
+
 #else
 
 static inline struct nvmem_cell *nvmem_cell_get(struct device *dev,
@@ -204,6 +207,12 @@ static inline int nvmem_unregister_notifier(struct notifier_block *nb)
 	return -EOPNOTSUPP;
 }
 
+static inline struct nvmem_device *nvmem_device_find(void *data,
+			int (*match)(struct device *dev, const void *data))
+{
+	return NULL;
+}
+
 #endif /* CONFIG_NVMEM */
 
 #if IS_ENABLED(CONFIG_NVMEM) && IS_ENABLED(CONFIG_OF)
-- 
2.13.7


^ permalink raw reply related

* [PATCH v5 01/17] w1: add 1-wire master driver for IP block found in SGI ASICs
From: Thomas Bogendoerfer @ 2019-08-19 16:31 UTC (permalink / raw)
  To: Jonathan Corbet, Ralf Baechle, Paul Burton, James Hogan,
	Dmitry Torokhov, Lee Jones, David S. Miller, Srinivas Kandagatla,
	Alessandro Zummo, Alexandre Belloni, Greg Kroah-Hartman,
	Jiri Slaby, Evgeniy Polyakov, linux-doc, linux-kernel, linux-mips,
	linux-input, netdev, linux-rtc, linux-serial
In-Reply-To: <20190819163144.3478-1-tbogendoerfer@suse.de>

Starting with SGI Origin machines nearly every new SGI ASIC contains
an 1-Wire master. They are used for attaching One-Wire prom devices,
which contain information about part numbers, revision numbers,
serial number etc. and MAC addresses for ethernet interfaces.
This patch adds a master driver to support this IP block.
It also adds an extra field dev_id to struct w1_bus_master, which
could be in used in slave drivers for creating unique device names.

Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
---
 drivers/w1/masters/Kconfig           |   9 +++
 drivers/w1/masters/Makefile          |   1 +
 drivers/w1/masters/sgi_w1.c          | 130 +++++++++++++++++++++++++++++++++++
 include/linux/platform_data/sgi-w1.h |  15 ++++
 include/linux/w1.h                   |   2 +
 5 files changed, 157 insertions(+)
 create mode 100644 drivers/w1/masters/sgi_w1.c
 create mode 100644 include/linux/platform_data/sgi-w1.h

diff --git a/drivers/w1/masters/Kconfig b/drivers/w1/masters/Kconfig
index 7ae260577901..24b9a8e05f64 100644
--- a/drivers/w1/masters/Kconfig
+++ b/drivers/w1/masters/Kconfig
@@ -65,5 +65,14 @@ config HDQ_MASTER_OMAP
 	  Say Y here if you want support for the 1-wire or HDQ Interface
 	  on an OMAP processor.
 
+config W1_MASTER_SGI
+	tristate "SGI ASIC driver"
+	help
+	  Say Y here if you want support for your 1-wire devices using
+	  SGI ASIC 1-Wire interface
+
+	  This support is also available as a module.  If so, the module
+	  will be called sgi_w1.
+
 endmenu
 
diff --git a/drivers/w1/masters/Makefile b/drivers/w1/masters/Makefile
index 18954cae4256..dae629b7ab49 100644
--- a/drivers/w1/masters/Makefile
+++ b/drivers/w1/masters/Makefile
@@ -11,3 +11,4 @@ obj-$(CONFIG_W1_MASTER_MXC)		+= mxc_w1.o
 obj-$(CONFIG_W1_MASTER_DS1WM)		+= ds1wm.o
 obj-$(CONFIG_W1_MASTER_GPIO)		+= w1-gpio.o
 obj-$(CONFIG_HDQ_MASTER_OMAP)		+= omap_hdq.o
+obj-$(CONFIG_W1_MASTER_SGI)		+= sgi_w1.o
diff --git a/drivers/w1/masters/sgi_w1.c b/drivers/w1/masters/sgi_w1.c
new file mode 100644
index 000000000000..1b2d96b945be
--- /dev/null
+++ b/drivers/w1/masters/sgi_w1.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * sgi_w1.c - w1 master driver for one wire support in SGI ASICs
+ */
+
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/platform_device.h>
+#include <linux/platform_data/sgi-w1.h>
+
+#include <linux/w1.h>
+
+#define MCR_RD_DATA	BIT(0)
+#define MCR_DONE	BIT(1)
+
+#define MCR_PACK(pulse, sample) (((pulse) << 10) | ((sample) << 2))
+
+struct sgi_w1_device {
+	u32 __iomem *mcr;
+	struct w1_bus_master bus_master;
+	char dev_id[64];
+};
+
+static u8 sgi_w1_wait(u32 __iomem *mcr)
+{
+	u32 mcr_val;
+
+	do {
+		mcr_val = readl(mcr);
+	} while (!(mcr_val & MCR_DONE));
+
+	return (mcr_val & MCR_RD_DATA) ? 1 : 0;
+}
+
+/*
+ * this is the low level routine to
+ * reset the device on the One Wire interface
+ * on the hardware
+ */
+static u8 sgi_w1_reset_bus(void *data)
+{
+	struct sgi_w1_device *dev = data;
+	u8 ret;
+
+	writel(MCR_PACK(520, 65), dev->mcr);
+	ret = sgi_w1_wait(dev->mcr);
+	udelay(500); /* recovery time */
+	return ret;
+}
+
+/*
+ * this is the low level routine to read/write a bit on the One Wire
+ * interface on the hardware. It does write 0 if parameter bit is set
+ * to 0, otherwise a write 1/read.
+ */
+static u8 sgi_w1_touch_bit(void *data, u8 bit)
+{
+	struct sgi_w1_device *dev = data;
+	u8 ret;
+
+	if (bit)
+		writel(MCR_PACK(6, 13), dev->mcr);
+	else
+		writel(MCR_PACK(80, 30), dev->mcr);
+
+	ret = sgi_w1_wait(dev->mcr);
+	if (bit)
+		udelay(100); /* recovery */
+	return ret;
+}
+
+static int sgi_w1_probe(struct platform_device *pdev)
+{
+	struct sgi_w1_device *sdev;
+	struct sgi_w1_platform_data *pdata;
+	struct resource *res;
+
+	sdev = devm_kzalloc(&pdev->dev, sizeof(struct sgi_w1_device),
+			    GFP_KERNEL);
+	if (!sdev)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	sdev->mcr = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(sdev->mcr))
+		return PTR_ERR(sdev->mcr);
+
+	sdev->bus_master.data = sdev;
+	sdev->bus_master.reset_bus = sgi_w1_reset_bus;
+	sdev->bus_master.touch_bit = sgi_w1_touch_bit;
+
+	pdata = dev_get_platdata(&pdev->dev);
+	if (pdata) {
+		strlcpy(sdev->dev_id, pdata->dev_id, sizeof(sdev->dev_id));
+		sdev->bus_master.dev_id = sdev->dev_id;
+	}
+
+	platform_set_drvdata(pdev, sdev);
+
+	return w1_add_master_device(&sdev->bus_master);
+}
+
+/*
+ * disassociate the w1 device from the driver
+ */
+static int sgi_w1_remove(struct platform_device *pdev)
+{
+	struct sgi_w1_device *sdev = platform_get_drvdata(pdev);
+
+	w1_remove_master_device(&sdev->bus_master);
+
+	return 0;
+}
+
+static struct platform_driver sgi_w1_driver = {
+	.driver = {
+		.name = "sgi_w1",
+	},
+	.probe = sgi_w1_probe,
+	.remove = sgi_w1_remove,
+};
+module_platform_driver(sgi_w1_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Thomas Bogendoerfer");
+MODULE_DESCRIPTION("Driver for One-Wire IP in SGI ASICs");
diff --git a/include/linux/platform_data/sgi-w1.h b/include/linux/platform_data/sgi-w1.h
new file mode 100644
index 000000000000..fc6b92e0b942
--- /dev/null
+++ b/include/linux/platform_data/sgi-w1.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * SGI One-Wire (W1) IP
+ */
+
+#ifndef PLATFORM_DATA_SGI_W1_H
+#define PLATFORM_DATA_SGI_W1_H
+
+#include <asm/sn/types.h>
+
+struct sgi_w1_platform_data {
+	char dev_id[64];
+};
+
+#endif /* PLATFORM_DATA_SGI_W1_H */
diff --git a/include/linux/w1.h b/include/linux/w1.h
index e0b5156f78fd..89843e9f634c 100644
--- a/include/linux/w1.h
+++ b/include/linux/w1.h
@@ -150,6 +150,8 @@ struct w1_bus_master {
 
 	void		(*search)(void *, struct w1_master *,
 		u8, w1_slave_found_callback);
+
+	char		*dev_id;
 };
 
 /**
-- 
2.13.7


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox