Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Bridge] [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Stephen Hemminger @ 2019-08-24 12:05 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <1566500850-6247-1-git-send-email-horatiu.vultur@microchip.com>

On Thu, 22 Aug 2019 21:07:27 +0200
Horatiu Vultur <horatiu.vultur@microchip.com> wrote:

> Current implementation of the SW bridge is setting the interfaces in
> promisc mode when they are added to bridge if learning of the frames is
> enabled.
> In case of Ocelot which has HW capabilities to switch frames, it is not
> needed to set the ports in promisc mode because the HW already capable of
> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> the ports in promisc mode to do the switching.
> This optimization takes places only if all the interfaces that are part
> of the bridge have this flag and have the same network driver.
> 
> If the bridge interfaces is added in promisc mode then also the ports part
> of the bridge are set in promisc mode.
> 
> Horatiu Vultur (3):
>   net: Add HW_BRIDGE offload feature
>   net: mscc: Use NETIF_F_HW_BRIDGE
>   net: mscc: Implement promisc mode.
> 
>  drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
>  include/linux/netdev_features.h    |  3 +++
>  net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
>  net/core/ethtool.c                 |  1 +
>  4 files changed, 56 insertions(+), 3 deletions(-)
> 

IMHO there are already enough nerd knobs in bridge to support this.
If you hardware can't do real bridging, it is only doing MACVLAN so that
is what you should support instead.

^ permalink raw reply

* Re: [PATCH spi for-5.4 0/5] Deterministic SPI latency with NXP DSPI driver
From: Vladimir Oltean @ 2019-08-24 12:13 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Mark Brown, Hubert Feurstein, Miroslav Lichvar, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <20190823052217.GD2502@localhost>

On Fri, 23 Aug 2019 at 08:22, Richard Cochran <richardcochran@gmail.com> wrote:
>
> On Thu, Aug 22, 2019 at 07:13:12PM +0300, Vladimir Oltean wrote:
> > You do think that I understand the problem? But I don't!
>
> ;^)
>
> > > And who generates Local_sync_resp?
> > >
> >
> > Local_sync_resp is the same as Local_sync_req except maybe with a
> > custom tag added by the switch. Irrelevant as long as the DSA master
> > can timestamp it.
>
> So this is point why it won't work.  The time stamping logic in the
> switch only recognizes PTP frames.
>
> Thanks,
> Richard

So to summarize the pros and cons of a PHC-to-PHC loopback sync:
Pros:
- At least two orders of magnitude improvement in offset compared to a
software timestamping solution, even in the situation where the
software timestamp is fully optimized
- Does not depend on the availability of PPS hardware
- In the case where both MACs support this, the synchronization can
simply reuse the DSA link with no dedicated circuitry
Cons:
- DSA framework for retrieving timestamps would need to be reworked
- The solution would have to be implemented in the kernel
- A separate protocol from PTP would have to be devised
- Not all DSA masters support hardware timestamping. Of those that do,
not all may support timestamping generic frames
- Not all PTP-capable DSA switches support timestamping generic frames
- Not all DSA switches may be able to loop back traffic from their CPU
port. I think this is called "hairpinning".
- The solution only covers the sync of the top-most switch in the DSA
tree. The hairpinning described above would need to be selective as
well, not just possible.

So at this point, the solution is not generic enough for me to be
compelled to prototype it. Taking system clock timestamps in the SPI
driver is "good enough". We'll just need to work out a way with Mark
that this can be added to the SPI subsystem, given the valid
objections already expressed.

Thanks,
-Vladimir

^ permalink raw reply

* pull-request: ieee802154 for net 2019-08-24
From: Stefan Schmidt @ 2019-08-24 12:19 UTC (permalink / raw)
  To: davem; +Cc: linux-wpan, alex.aring, netdev

Hello Dave.

An update from ieee802154 for your *net* tree.

Yue  Haibing fixed two bugs discovered by KASAN in the hwsim driver for
ieee802154 and Colin Ian King cleaned up a redundant variable assignment.

If there are any problems let me know.

regards
Stefan Schmidt

The following changes since commit 6c0afef5fb0c27758f4d52b2210c61b6bd8b4470:

  ipv6/flowlabel: wait rcu grace period before put_pid() (2019-04-29 23:30:13 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan.git ieee802154-for-davem-2019-08-24

for you to fetch changes up to 074014abdf2bd2a00da3dd14a6ae04cafc1d62cc:

  net: ieee802154: remove redundant assignment to rc (2019-08-14 01:10:41 +0200)

----------------------------------------------------------------
Colin Ian King (1):
      net: ieee802154: remove redundant assignment to rc

YueHaibing (2):
      ieee802154: hwsim: Fix error handle path in hwsim_init_module
      ieee802154: hwsim: unregister hw while hwsim_subscribe_all_others fails

 drivers/net/ieee802154/mac802154_hwsim.c | 8 +++++---
 net/ieee802154/socket.c                  | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next v3 1/3] net: ethernet: mediatek: Add basic PHYLINK support
From: René van Dorst @ 2019-08-24 12:33 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190824091106.GC13294@shell.armlinux.org.uk>

Hi Russell,

Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:

> On Fri, Aug 23, 2019 at 03:45:14PM +0200, René van Dorst wrote:
>> This convert the basics to PHYLINK API.
>> SGMII support is not in this patch.
>>
>> Signed-off-by: René van Dorst <opensource@vdorst.com>
>> --
>> v2->v3:
>> * Make link_down() similar as link_up() suggested by Russell King
>
> Yep, almost there, but...
>
>> +static void mtk_mac_link_down(struct phylink_config *config,  
>> unsigned int mode,
>> +			      phy_interface_t interface)
>> +{
>> +	struct mtk_mac *mac = container_of(config, struct mtk_mac,
>> +					   phylink_config);
>> +	u32 mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
>>
>> +	mcr &= (MAC_MCR_TX_EN | MAC_MCR_RX_EN);
>
> ... this clears all bits _except_ for the tx and rx enable (which will
> remain set) - you probably wanted a ~ before the (.

Yes that is what it should be.
I only want to clear the TX_EN en RX_EN bits.

Greats,

René

>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up




^ permalink raw reply

* Re: [PATCH spi for-5.4 2/5] spi: Add a PTP system timestamp to the transfer structure
From: Vladimir Oltean @ 2019-08-24 12:38 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, Miroslav Lichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <20190822171137.GB23391@sirena.co.uk>

On Thu, 22 Aug 2019 at 21:19, Mark Brown <broonie@kernel.org> wrote:
>
> On Sun, Aug 18, 2019 at 09:25:57PM +0300, Vladimir Oltean wrote:
>
> > @@ -1391,6 +1402,13 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
> >               goto out;
> >       }
> >
> > +     if (!ctlr->ptp_sts_supported) {
> > +             list_for_each_entry(xfer, &mesg->transfers, transfer_list) {
> > +                     xfer->ptp_sts_word_pre = 0;
> > +                     ptp_read_system_prets(xfer->ptp_sts);
> > +             }
> > +     }
> > +
>
> We can do better than this for controllers which use transfer_one().
>

You mean I should guard this "if", and the one below, with "&&
!ctlr->transfer_one"?

> > +const void *spi_xfer_ptp_sts_word(struct spi_transfer *xfer, bool pre)
> > +{
>
> xfer can be const here too.
>
> > + * @ptp_sts_supported: If the driver sets this to true, it must provide a
> > + *   time snapshot in @spi_transfer->ptp_sts as close as possible to the
> > + *   moment in time when @spi_transfer->ptp_sts_word_pre and
> > + *   @spi_transfer->ptp_sts_word_post were transmitted.
> > + *   If the driver does not set this, the SPI core takes the snapshot as
> > + *   close to the driver hand-over as possible.
>
> A couple of issues here.  The big one is that for PIO transfers
> this is going to either complicate the code or introduce overhead
> in individual drivers for an extremely niche use case.  I guess
> most drivers won't implement it which makes this a bit moot but
> then this is a concern that pushes back against the idea of
> implementing the feature.
>

The concern is the overhead in terms of code, or runtime performance?
Arguably the applications that require deterministic latency are
actually going to push for overall less overhead at runtime, even if
that comes at a cost in terms of code size. The spi-fsl-dspi driver
does not perform worse by any metric after this rework.

> The other is that it's not 100% clear what you're looking to
> timestamp here - is it when the data goes on the wire, is it when
> the data goes on the FIFO (which could be relatively large)?  I'm
> guessing you're looking for the physical transfer here, if that's
> the case should there be some effort to compensate for the delays
> in the controller?

The goal is to timestamp the moment when the SPI slave sees word N of
the data. Luckily the DSPI driver raises the TCF (Transfer Complete
Flag) once that word has been transmitted, which I used to my
advantage. The EOQ mode behaves similarly, but has a granularity of 4
words. The controller delays are hence implicitly included in the
software timestamp.
But the question is valid and I expect that such compensation might be
needed for some hardware, provided that it can be measured and
guaranteed. In fact Hubert did add such logic to the v3 of his MDIO
patch: https://lkml.org/lkml/2019/8/20/195 There were some objections
mainly related to the certainty of those offset corrections. I don't
want to "future-proof" the API now with features I have no use of, but
such compensation logic might come in the future.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Miguel Ojeda @ 2019-08-24 12:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Andrew Morton, Sedat Dilek, Josh Poimboeuf,
	Yonghong Song, clang-built-linux, Catalin Marinas,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrey Konovalov, Greg Kroah-Hartman, Enrico Weigelt,
	Suzuki K Poulose, Thomas Gleixner, Masayoshi Mizuma,
	Shaokun Zhang, Alexios Zavras, Allison Randal, Linux ARM,
	linux-kernel, Network Development, bpf
In-Reply-To: <20190824112542.7guulvdenm35ihs7@willie-the-truck>

On Sat, Aug 24, 2019 at 1:25 PM Will Deacon <will@kernel.org> wrote:
>
> Which bit are you pinging about? This patch (12/16) has been in -next for a
> while and is queued in the arm64 tree for 5.4. The Oops/boot issue is
> addressed in patch 14 which probably needs to be sent as a separate patch
> (with a commit message) if it's targetting 5.3 and, I assume, routed via
> somebody like akpm.

I was pinging about the bit I was quoting, i.e. whether the Oops in
the cover letter was #14 indeed. Also, since Nick said he wanted to
get this ASAP through compiler-attributes, I assumed he wanted it to
be in 5.3, but I have not seen the independent patch.

Since he seems busy, I will write a better commit message myself and
send it to Linus next week.

Cheers,
Miguel

^ permalink raw reply

* Re: [RFC PATCH 1/2] gianfar: convert to phylink
From: Vladimir Oltean @ 2019-08-24 12:49 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Arseny Solokha, Claudiu Manoil, Ioana Ciornei, Andrew Lunn,
	netdev, Florian Fainelli
In-Reply-To: <20190730102329.GZ1330@shell.armlinux.org.uk>

Hi Russell,

On Tue, 30 Jul 2019 at 13:23, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Tue, Jul 30, 2019 at 02:39:58AM +0300, Vladimir Oltean wrote:
> > To be honest I don't have a complete answer to that question. The
> > literature recommends writing 0x01a0 to the MII_ADVERTISE (0x4)
> > register of the MAC PCS for 1000Base-X, and 0x4001 for SGMII.
>
> That looks entirely sane for both modes.
>
> 0x01a0 for 802.3z (1000BASE-X) is defined in 802.3 37.2.5.1.3 as:
>         bit 5 - full duplex = 1
>         bit 6 - half duplex = 0
>         bit 7 - pause = 1
>         bit 8 - asym_pause = 1
>
> The description of the bits match the config word that is sent in the
> link with the exception of bit 14, which is the acknowledgement bit.
> Normally, in 802.3z, bit 14 will not be set in the transmitted config
> word until we have received the config word from the other end of the
> link.
>
> For SGMII, 0x4001 is the acknowledgement code word, which is defined
> in the SGMII spec as "tx_config_Reg[15:0] sent from the MAC to the
> PHY" which requires:
>         bit 0 - must be 1
>         bit 1 .. 13, 15 - must be 0, reserved for future use
>         bit 14 - must be 1 (auto-negotiation acknowledgement in 802.3z)
>
> > The FMan driver which uses the TSEC MAC does exactly that:
> > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/ethernet/freescale/fman/fman_dtsec.c#n58
> > However I can't seem to be able to trace down the definition of bit 14
> > from 0x4001 - it's reserved in all the manuals I see. I have a hunch
> > that it selects the format of the Config_Reg base page between
> > 1000Base-X and SGMII.
>
> It could be that is what bit 14 is being used for, or it could just be
> that they've taken the values from the appropriate specs, and the
> hardware behaves the same way irrespective of whether it is in SGMII
> or 1000BASE-X mode.
>

Mystery solved - indeed it appears that there is no hardware logic for
propagating the AN results from the PCS to the MAC. Hence there is no
hardware distinction between 1000Base-X and SGMII. That must be
handled in PHYLINK.

> > > +static int gfar_mac_link_state(struct phylink_config *config,
> > > +                              struct phylink_link_state *state)
> > > +{
> > > +       if (state->interface == PHY_INTERFACE_MODE_SGMII ||
> > > +           state->interface == PHY_INTERFACE_MODE_1000BASEX) {
> >
> > What if you reduce the indentation level by 1 here, by just exiting if
> > the interface mode is not SGMII?
>
> It's only called for in-band negotiation modes anyway, so the above
> protection probably doesn't gain much.
>

Or that.

> > > +               struct gfar_private *priv =
> > > +                       netdev_priv(to_net_dev(config->dev));
> > > +               u16 tbi_cr;
> > > +
> > > +               if (!priv->tbi_phy)
> > > +                       return -ENODEV;
> > > +
> > > +               tbi_cr = phy_read(priv->tbi_phy, MII_TBI_CR);
> > > +
> > > +               state->duplex = !!(tbi_cr & TBI_CR_FULL_DUPLEX);
> >
> > Woah there. Aren't you supposed to first ensure state->an_complete is
> > ok, based on TBI_MII_Register_Set_SR[AN_Done]? There's also a
> > Link_Status bit in that register that you could retrieve.
>
> Indeed.
>
> > > +               if ((tbi_cr & TBI_CR_SPEED_1000_MASK) == TBI_CR_SPEED_1000_MASK)
> > > +                       state->speed = SPEED_1000;
> > > +       }
> >
> > See the Speed_Bit table from TBI_MII_Register_Set_ANLPBPA_SGMII for
> > the link partner (aka SGMII PHY) advertisement. You have to do a
> > logical-and between that and your own. Also please make sure you
> > really are in SGMII AN and not 1000 Base-X when interpreting registers
> > 4 & 5 as one way or another.
>
> From what you've said above, yes, this needs to do exactly that.
>
> > > -static noinline void gfar_update_link_state(struct gfar_private *priv)
> > > +static void gfar_mac_config(struct phylink_config *config, unsigned int mode,
> > > +                           const struct phylink_link_state *state)
> > >  {
> > > +       struct gfar_private *priv = netdev_priv(to_net_dev(config->dev));
> > >         struct gfar __iomem *regs = priv->gfargrp[0].regs;
> > > -       struct net_device *ndev = priv->ndev;
> > > -       struct phy_device *phydev = ndev->phydev;
> > > -       struct gfar_priv_rx_q *rx_queue = NULL;
> > > -       int i;
> > > +       u32 maccfg1, new_maccfg1;
> > > +       u32 maccfg2, new_maccfg2;
> > > +       u32 ecntrl, new_ecntrl;
> > > +       u32 tx_flow, new_tx_flow;
> >
> > Don't introduce new_ variables. Is there any issue if you
> > unconditionally write to the MAC registers?
>
> We do this in every driver, as mac_config() can be called with only a
> small number of changes in the settings, and it is important not to
> upset the MAC for minor updates.
>
> An example of this is when we are in SGMII mode with an attached PHY.
> SGMII will communicate the speed and duplex, but not the results of
> the pause negotiation.  We read that from the attached PHY and report
> it back by calling mac_config() - but at that point, we don't want to
> cause the established link to bounce.
>
> So, mac_config() should be implemented to avoid upsetting an already
> established link where possible (unless the configuration items that
> affect the link have changed.)
>

Ok.

> > > +static void gfar_mac_an_restart(struct phylink_config *config)
> > > +{
> > > +       /* Not supported */
> > > +}
> >
> > What about running gfar_configure_serdes again?
>
> The intention here is to cause the 802.3z link to renegotiate...
>

Yes, gfar_configure_serdes does this:

    phy_write(tbiphy, MII_BMCR,
          BMCR_ANENABLE | BMCR_ANRESTART | BMCR_FULLDPLX |
          BMCR_SPEED1000);

> >
> > > +
> > > +static void gfar_mac_link_down(struct phylink_config *config, unsigned int mode,
> > > +                              phy_interface_t interface)
> > > +{
> > > +       /* Not supported */
> > > +}
> > > +
> >
> > What about disabling RX_EN and TX_EN from MACCFG1?
> >
> > > +static void gfar_mac_link_up(struct phylink_config *config, unsigned int mode,
> > > +                            phy_interface_t interface, struct phy_device *phy)
> > > +{
> > > +       /* Not supported */
> > >  }
> > >
> >
> > What about enabling RX_EN and TX_EN from MACCFG1?
>
> Note that both of these functions must still allow the link to be
> established if we are using in-band negotiation - but they are
> expected to start/stop the transmission of packets.
>

I don't think AN pages count as Ethernet frames, and the RX_EN and
TX_EN bits are MAC settings anyway - it is the PCS below it that would
process them.

> > > @@ -149,8 +148,13 @@ extern const char gfar_driver_version[];
> > >  #define GFAR_SUPPORTED_GBIT SUPPORTED_1000baseT_Full
> > >
> > >  /* TBI register addresses */
> > > +#define MII_TBI_CR             0x00
> > >  #define MII_TBICON             0x11
> > >
> > > +/* TBI_CR register bit fields */
> > > +#define TBI_CR_FULL_DUPLEX     0x0100
> > > +#define TBI_CR_SPEED_1000_MASK 0x0040
> > > +
> >
> > I think BIT() definitions are preferred, even if that means you have
> > to convert existing code first.
>
> If MII_TBI_CR is the BMCR of the PCS, how about using the definitions
> that we already have in the kernel:
>
> BMCR_SPEED1000 is TBI_CR_SPEED_1000_MASK
> BMCR_FULLDPLX is TBI_CR_FULL_DUPLEX
>
> ?

Or that, yes.

>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH 14/16] include/linux: prefer __section from compiler_attributes.h
From: Miguel Ojeda @ 2019-08-24 12:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Andrew Morton, Sedat Dilek, Josh Poimboeuf,
	Yonghong Song, clang-built-linux, Luc Van Oostenryck,
	Lai Jiangshan, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra (Intel), Nicholas Piggin, Jiri Kosina,
	Ard Biesheuvel, Michael Ellerman, Masahiro Yamada,
	Hans Liljestrand, Elena Reshetova, David Windsor, Marc Zyngier,
	Ming Lei, Dou Liyang, Julien Thierry, Mauro Carvalho Chehab,
	Jens Axboe, linux-kernel, linux-sparse, rcu, Network Development,
	bpf
In-Reply-To: <20190813083257.nnsxf5khnqagl46s@willie-the-truck>

On Tue, Aug 13, 2019 at 10:33 AM Will Deacon <will@kernel.org> wrote:
>
> -ENOCOMMITMESSAGE
>
> Otherwise, patch looks good to me.

Do you want Ack, Review or nothing?

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: ethernet: mediatek: Re-add support SGMII
From: René van Dorst @ 2019-08-24 13:11 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190824092156.GD13294@shell.armlinux.org.uk>

Hi Russell,

Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:

> On Fri, Aug 23, 2019 at 03:45:15PM +0200, René van Dorst wrote:
>> +	switch (state->interface) {
>> +	case PHY_INTERFACE_MODE_SGMII:
>> +		phylink_set(mask, 10baseT_Half);
>> +		phylink_set(mask, 10baseT_Full);
>> +		phylink_set(mask, 100baseT_Half);
>> +		phylink_set(mask, 100baseT_Full);
>
> You also want 1000baseX_Full here - the connected PHY could have a fiber
> interface on it.

Ok, I shall add that mode too.

>
>> +		/* fall through */
>> +	case PHY_INTERFACE_MODE_TRGMII:
>>  		phylink_set(mask, 1000baseT_Full);
>
> I don't know enough about this interface type to comment whether it
> should support 1000baseX_Full - if this is connected to a PHY that may
> support fiber, then it ought to set it.

Mediatek calls it Turbo RGMII. It is a overclock version of RGMII mode.
It is used between first GMAC and port 6 of the mt7530 switch. Can be  
used with
an internal and an external mt7530 switch.

TRGMII speed are:
* mt7621: 1200Mbit
* mt7623: 2000Mbit and 2600Mbit.

I think that TRGMII is only used in a fixed-link situation in  
combination with a
mt7530 switch and running and maximum speed/full duplex. So reporting
1000baseT_Full seems to me the right option.

>
>> +		break;
>> +	case PHY_INTERFACE_MODE_2500BASEX:
>> +		phylink_set(mask, 2500baseX_Full);
>> +		/* fall through */
>> +	case PHY_INTERFACE_MODE_1000BASEX:
>> +		phylink_set(mask, 1000baseX_Full);
>
> Both should be set.  The reasoning here is that if you have a
> Fiberchannel 4Gbaud SFP plugged in and connected directly to the
> MAC, it can operate at either 2500Base-X or 1000Base-X.  If we
> decide to operate at 2500Base-X, then PHY_INTERFACE_MODE_2500BASEX
> will be chosen.  Otherwise, PHY_INTERFACE_MODE_1000BASEX will be
> used.
>
> The user can use ethtool to control which interface mode is used
> by adjusting the advertise mask and/or placing the interface in
> manual mode and setting the speed directly.  This will change
> the PHY_INTERFACE_MODE_xxxxBASEX (via phylink_helper_basex_speed())
> between the two settings.
>
> If we lose 2500baseX_Full when 1000Base-X is selected, the user
> will not be able to go back to 2500Base-X mode.
>
> Yes, it's a little confusing and has slightly different rules
> from the other modes - partly due to phylink_helper_basex_speed().
> These are the only interface modes that we dynamically switch
> between depending on the settings that the user configures via
> ethtool.

Thanks for this extra information.


I made a list for each mode what that mode should report back when chosen.

PHY_INTERFACE_MODE_SGMII:
	  10baseT_Half
	  10baseT_Full
	 100baseT_Half
	 100baseT_Full
	1000baseT_Full
	1000baseX_Full

PHY_INTERFACE_MODE_1000BASEX:
PHY_INTERFACE_MODE_2500BASEX:
	1000baseX_Full
	2500baseX_Full

PHY_INTERFACE_MODE_TRGMII:
	1000baseT_Full

PHY_INTERFACE_MODE_RGMII:
PHY_INTERFACE_MODE_RGMII_ID:
PHY_INTERFACE_MODE_RGMII_RXID:
PHY_INTERFACE_MODE_RGMII_TXID:
	  10baseT_Half
	  10baseT_Full
	 100baseT_Half
	 100baseT_Full
	1000baseT_Half
	1000baseT_Full
	1000baseX_Full

PHY_INTERFACE_MODE_GMII:
	  10baseT_Half
	  10baseT_Full
	 100baseT_Half
	 100baseT_Full
	1000baseT_Half
	1000baseT_Full

PHY_INTERFACE_MODE_MII:
PHY_INTERFACE_MODE_RMII:
PHY_INTERFACE_MODE_REVMII:
	 10baseT_Half
	 10baseT_Full
	100baseT_Half
	100baseT_Full

case PHY_INTERFACE_MODE_NA:
	  10baseT_Half
	  10baseT_Full
	 100baseT_Half
	 100baseT_Full
	1000baseT_Half
	1000baseT_Full
	1000baseX_Full
	2500baseX_Full

I think this is the full list.
Or do I miss something?

Greats,

René

>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up




^ permalink raw reply

* RE: [RFC 4/4] net: cdc_ncm: Add ACPI MAC address pass through functionality
From: Mario.Limonciello @ 2019-08-24 13:26 UTC (permalink / raw)
  To: bjorn, Charles.Hyde
  Cc: linux-usb, linux-acpi, gregkh, oliver, netdev, nic_swsd
In-Reply-To: <877e722691.fsf@miraculix.mork.no>

> -----Original Message-----
> From: Bjørn Mork <bjorn@mork.no>
> Sent: Saturday, August 24, 2019 5:44 AM
> To: Hyde, Charles - Dell Team
> Cc: linux-usb@vger.kernel.org; linux-acpi@vger.kernel.org;
> gregkh@linuxfoundation.org; Limonciello, Mario; oliver@neukum.org;
> netdev@vger.kernel.org; nic_swsd@realtek.com
> Subject: Re: [RFC 4/4] net: cdc_ncm: Add ACPI MAC address pass through
> functionality
> 
> 
> [EXTERNAL EMAIL]
> 
> <Charles.Hyde@dellteam.com> writes:
> 
> > @@ -930,11 +931,18 @@ int cdc_ncm_bind_common(struct usbnet *dev,
> struct usb_interface *intf, u8 data_
> >  	usb_set_intfdata(ctx->control, dev);
> >
> >  	if (ctx->ether_desc) {
> > +		struct sockaddr sa;
> > +
> >  		temp = usbnet_get_ethernet_addr(dev, ctx->ether_desc-
> >iMACAddress);
> >  		if (temp) {
> >  			dev_dbg(&intf->dev, "failed to get mac address\n");
> >  			goto error2;
> >  		}
> > +		if (get_acpi_mac_passthru(&intf->dev, &sa) == 0) {
> > +			memcpy(dev->net->dev_addr, sa.sa_data, ETH_ALEN);
> > +			if (usbnet_set_ethernet_addr(dev) < 0)
> > +				usbnet_get_ethernet_addr(dev, ctx-
> >ether_desc->iMACAddress);
> > +		}
> >  		dev_info(&intf->dev, "MAC-Address: %pM\n", dev->net-
> >dev_addr);
> >  	}
> 
> So you want to run a Dell specific ACPI method every time anyone plugs some
> NCM class device into a host supporing ACPI?  That's going to annoy the hell out
> of 99.9997% of the x86, ia64 and arm64 users of this driver.
> 
> Call ACPI once when the driver loads, and only if running on an actual Dell
> system where this method is supported.  There must be some ACPI device ID you
> can match on to know if this method is supported or not?
> 
> 
> Bjørn

I have to agree - this is missing an identifying factor of the D6000.  It shouldn't be
running on "just any" cdc_ncm device.

The code that is in get_acpi_mac_passthrough checks for a properly built ACPI method
though.

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: ethernet: mediatek: Re-add support SGMII
From: Russell King - ARM Linux admin @ 2019-08-24 13:32 UTC (permalink / raw)
  To: René van Dorst
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190824131117.Horde.vSCF_CQ5jCMHcSTWkh7Woxm@www.vdorst.com>

Hi René,

On Sat, Aug 24, 2019 at 01:11:17PM +0000, René van Dorst wrote:
> Hi Russell,
> 
> Mediatek calls it Turbo RGMII. It is a overclock version of RGMII mode.
> It is used between first GMAC and port 6 of the mt7530 switch. Can be used
> with
> an internal and an external mt7530 switch.
> 
> TRGMII speed are:
> * mt7621: 1200Mbit
> * mt7623: 2000Mbit and 2600Mbit.
> 
> I think that TRGMII is only used in a fixed-link situation in combination
> with a
> mt7530 switch and running and maximum speed/full duplex. So reporting
> 1000baseT_Full seems to me the right option.

I think we can ignore this one for the purposes of merging this patch
set, since this seems to be specific to this setup.  Neither 1000BaseT
nor 1000BaseX fit very well, but we have to choose something.

> PHY_INTERFACE_MODE_GMII:
> 	  10baseT_Half
> 	  10baseT_Full
> 	 100baseT_Half
> 	 100baseT_Full
> 	1000baseT_Half
> 	1000baseT_Full

I think GMII can be connected to a PHY that can convert to 1000BaseX, so
should probably include that here too.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH net-next v3 2/8] MIPS: dts: mscc: describe the PTP register  range
From: Paul Burton @ 2019-08-24 14:18 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem@davemloft.net, richardcochran@gmail.com,
	alexandre.belloni@bootlin.com, UNGLinuxDriver@microchip.com,
	ralf@linux-mips.org, Paul Burton, jhogan@kernel.org,
	Antoine Tenart, netdev@vger.kernel.org,
	linux-mips@vger.kernel.org, thomas.petazzoni@bootlin.com,
	allan.nielsen@microchip.com, linux-mips@vger.kernel.org
In-Reply-To: <20190724081715.29159-3-antoine.tenart@bootlin.com>

Hello,

Antoine Tenart wrote:
> This patch adds one register range within the mscc,vsc7514-switch node,
> to describe the PTP registers.

Applied to mips-next.

> commit 048dc3abe827
> https://git.kernel.org/mips/c/048dc3abe827
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
> Signed-off-by: Paul Burton <paul.burton@mips.com>

Thanks,
    Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.burton@mips.com to report it. ]

^ permalink raw reply

* Re: [PATCH net-next v3 4/8] MIPS: dts: mscc: describe the PTP ready interrupt
From: Paul Burton @ 2019-08-24 14:18 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem@davemloft.net, richardcochran@gmail.com,
	alexandre.belloni@bootlin.com, UNGLinuxDriver@microchip.com,
	ralf@linux-mips.org, Paul Burton, jhogan@kernel.org,
	Antoine Tenart, netdev@vger.kernel.org,
	linux-mips@vger.kernel.org, thomas.petazzoni@bootlin.com,
	allan.nielsen@microchip.com, linux-mips@vger.kernel.org
In-Reply-To: <20190724081715.29159-5-antoine.tenart@bootlin.com>

Hello,

Antoine Tenart wrote:
> This patch adds a description of the PTP ready interrupt, which can be
> triggered when a PTP timestamp is available on an hardware FIFO.

Applied to mips-next.

> commit b4742e6682d5
> https://git.kernel.org/mips/c/b4742e6682d5
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
> Signed-off-by: Paul Burton <paul.burton@mips.com>

Thanks,
    Paul

[ This message was auto-generated; if you believe anything is incorrect
  then please email paul.burton@mips.com to report it. ]

^ permalink raw reply

* Re: [PATCH] igb/igc: Don't warn on fatal read failures when the device is removed
From: Neftin, Sasha @ 2019-08-24 14:40 UTC (permalink / raw)
  To: Lyude Paul, intel-wired-lan, netdev
  Cc: Feng Tang, Jeff Kirsher, David S. Miller, linux-kernel
In-Reply-To: <20190822183318.27634-1-lyude@redhat.com>

On 8/22/2019 21:33, Lyude Paul wrote:
> Fatal read errors are worth warning about, unless of course the device
> was just unplugged from the machine - something that's a rather normal
> occurence when the igb/igc adapter is located on a Thunderbolt dock. So,
> let's only WARN() if there's a fatal read error while the device is
> still present.
> 
> This fixes the following WARN splat that's been appearing whenever I
> unplug my Caldigit TS3 Thunderbolt dock from my laptop:
> 
>    igb 0000:09:00.0 enp9s0: PCIe link lost
>    ------------[ cut here ]------------
>    igb: Failed to read reg 0x18!
>    WARNING: CPU: 7 PID: 516 at
>    drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32+0x57/0x6a [igb]
>    Modules linked in: igb dca thunderbolt fuse vfat fat elan_i2c mei_wdt
>    mei_hdcp i915 wmi_bmof intel_wmi_thunderbolt iTCO_wdt
>    iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp joydev
>    coretemp crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel
>    intel_cstate drm_kms_helper intel_uncore syscopyarea sysfillrect
>    sysimgblt fb_sys_fops intel_rapl_perf intel_xhci_usb_role_switch mei_me
>    drm roles idma64 i2c_i801 ucsi_acpi typec_ucsi mei intel_lpss_pci
>    processor_thermal_device typec intel_pch_thermal intel_soc_dts_iosf
>    intel_lpss int3403_thermal thinkpad_acpi wmi int340x_thermal_zone
>    ledtrig_audio int3400_thermal acpi_thermal_rel acpi_pad video
>    pcc_cpufreq ip_tables serio_raw nvme nvme_core crc32c_intel uas
>    usb_storage e1000e i2c_dev
>    CPU: 7 PID: 516 Comm: kworker/u16:3 Not tainted 5.2.0-rc1Lyude-Test+ #14
>    Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
>    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>    RIP: 0010:igb_rd32+0x57/0x6a [igb]
>    Code: 87 b8 fc ff ff 48 c7 47 08 00 00 00 00 48 c7 c6 33 42 9b c0 4c 89
>    c7 e8 47 45 cd dc 89 ee 48 c7 c7 43 42 9b c0 e8 c1 94 71 dc <0f> 0b eb
>    08 8b 00 ff c0 75 b0 eb c8 44 89 e0 5d 41 5c c3 0f 1f 44
>    RSP: 0018:ffffba5801cf7c48 EFLAGS: 00010286
>    RAX: 0000000000000000 RBX: ffff9e7956608840 RCX: 0000000000000007
>    RDX: 0000000000000000 RSI: ffffba5801cf7b24 RDI: ffff9e795e3d6a00
>    RBP: 0000000000000018 R08: 000000009dec4a01 R09: ffffffff9e61018f
>    R10: 0000000000000000 R11: ffffba5801cf7ae5 R12: 00000000ffffffff
>    R13: ffff9e7956608840 R14: ffff9e795a6f10b0 R15: 0000000000000000
>    FS:  0000000000000000(0000) GS:ffff9e795e3c0000(0000) knlGS:0000000000000000
>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    CR2: 0000564317bc4088 CR3: 000000010e00a006 CR4: 00000000003606e0
>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>    Call Trace:
>     igb_release_hw_control+0x1a/0x30 [igb]
>     igb_remove+0xc5/0x14b [igb]
>     pci_device_remove+0x3b/0x93
>     device_release_driver_internal+0xd7/0x17e
>     pci_stop_bus_device+0x36/0x75
>     pci_stop_bus_device+0x66/0x75
>     pci_stop_bus_device+0x66/0x75
>     pci_stop_and_remove_bus_device+0xf/0x19
>     trim_stale_devices+0xc5/0x13a
>     ? __pm_runtime_resume+0x6e/0x7b
>     trim_stale_devices+0x103/0x13a
>     ? __pm_runtime_resume+0x6e/0x7b
>     trim_stale_devices+0x103/0x13a
>     acpiphp_check_bridge+0xd8/0xf5
>     acpiphp_hotplug_notify+0xf7/0x14b
>     ? acpiphp_check_bridge+0xf5/0xf5
>     acpi_device_hotplug+0x357/0x3b5
>     acpi_hotplug_work_fn+0x1a/0x23
>     process_one_work+0x1a7/0x296
>     worker_thread+0x1a8/0x24c
>     ? process_scheduled_works+0x2c/0x2c
>     kthread+0xe9/0xee
>     ? kthread_destroy_worker+0x41/0x41
>     ret_from_fork+0x35/0x40
>    ---[ end trace 252bf10352c63d22 ]---
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 47e16692b26b ("igb/igc: warn when fatal read failure happens")
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Sasha Neftin <sasha.neftin@intel.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org
> ---
>   drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
>   drivers/net/ethernet/intel/igc/igc_main.c | 3 ++-
>   2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index e5b7e638df28..1a7f7cd28df9 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -753,7 +753,8 @@ u32 igb_rd32(struct e1000_hw *hw, u32 reg)
>   		struct net_device *netdev = igb->netdev;
>   		hw->hw_addr = NULL;
>   		netdev_err(netdev, "PCIe link lost\n");
> -		WARN(1, "igb: Failed to read reg 0x%x!\n", reg);
> +		WARN(pci_device_is_present(igb->pdev),
> +		     "igb: Failed to read reg 0x%x!\n", reg);
>   	}
>   
>   	return value;
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 28072b9aa932..f873a4b35eaf 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -3934,7 +3934,8 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
>   		hw->hw_addr = NULL;
>   		netif_device_detach(netdev);
>   		netdev_err(netdev, "PCIe link lost, device now detached\n");
> -		WARN(1, "igc: Failed to read reg 0x%x!\n", reg);
> +		WARN(pci_device_is_present(igc->pdev),
> +		     "igc: Failed to read reg 0x%x!\n", reg);
>   	}
>   
>   	return value;
> 
Thanks, for igc
Acked-by: Sasha Neftin <sasha.neftin@intel.com>

^ permalink raw reply

* Re: [PATCH net-next v2 03/10] net: sched: refactor block offloads counter usage
From: Vlad Buslov @ 2019-08-24 14:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	pablo@netfilter.org
In-Reply-To: <20190823172648.7777e2b6@cakuba.netronome.com>


On Sat 24 Aug 2019 at 03:26, Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Fri, 23 Aug 2019 21:50:49 +0300, Vlad Buslov wrote:
>> @@ -1201,14 +1199,11 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
>>  			cls_u32.knode.link_handle = ht->handle;
>>  	}
>>
>> -	err = cb(TC_SETUP_CLSU32, &cls_u32, cb_priv);
>> -	if (err) {
>> -		if (add && tc_skip_sw(n->flags))
>> -			return err;
>> -		return 0;
>> -	}
>> -
>> -	tc_cls_offload_cnt_update(block, &n->in_hw_count, &n->flags, add);
>> +	err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSU32,
>> +				    &cls_u32, cb_priv, &n->flags,
>> +				    &n->in_hw_count);
>> +	if (err && add && tc_skip_sw(n->flags))
>> +		return err;
>
> Could this be further simplified by adding something along the lines of:
>
> 	if (!add || !tc_skip_sw(*flags))
> 		err = 0;
>
> to tc_setup_cb_reoffload() ?

Indeed, all the users of tc_setup_cb_reoffload() have same error check
that can be moved into the function. I will refactor it in V3.

>
>>
>>  	return 0;
>>  }

^ permalink raw reply

* Re: [PATCH net-next 0/1] Add BASE-T1 PHY support
From: Heiner Kallweit @ 2019-08-24 15:03 UTC (permalink / raw)
  To: Christian Herber, Andrew Lunn
  Cc: davem@davemloft.net, Florian Fainelli, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <AM6PR0402MB3798C702793071E34A5659ED86A50@AM6PR0402MB3798.eurprd04.prod.outlook.com>

On 22.08.2019 09:18, Christian Herber wrote:
> On 21.08.2019 20:57, Andrew Lunn wrote:
>>
>>> The current patch set IMO is a little bit hacky. I'm not 100% happy
>>> with the implicit assumption that there can't be devices supporting
>>> T1 and classic BaseT modes or fiber modes.
>>
>>> Andrew: Do you have an opinion on that?
>>
>> Hi Heiner
>>
>> I would also like cleaner integration. I doubt here is anything in the
>> standard which says you cannot combine these modes. It is more a
>> marketing question if anybody would build such a device. Maybe not
>> directly into a vehicle, but you could imaging a mobile test device
>> which uses T1 to talk to the car and T4 to connect to the garage
>> network?
>>
>> So i don't think we should limit ourselves. phylib should provide a
>> clean, simple set of helpers to perform standard operations for
>> various modes. Drivers can make use of those helpers. That much should
>> be clear. If we try to make genphy support them all simultaneously, is
>> less clear.
>>
>>       Andrew
>>
> 
> If you want to go down this path, then i think we have to ask some more 
> questions. Clause 45 is a very scalable register scheme, it is not a 
> specific class of devices and will be extended and extended.
> 
> Currently, the phy-c45.c supports 10/100/1000/2500/5000/10000 Mbps 
> consumer/enterprise PHYs. This is also an implicit assumption. The 
> register set (e.g. on auto-neg) used for this will also only support 
> these modes and nothing more, as it is done scaling.
> 
> Currently not supported, but already present in IEEE 802.3:
> - MultiGBASE-T (25/40 Gbps) (see e.g. MultiGBASE-T AN control 1 register)
> - BASE-T1
> - 10BASE-T1
> - NGBASE-T1
> 
> And surely there are some on the way or already there that I am not 
> aware of.
> 
> To me, one architectural decision point is if you want to have generic 
> support for all C45 PHYs in one file, or if you want to split it by 
> device class. I went down the first path with my patch, as this is the 
> road gone also with the existing code.
> 
> If you want to split BASE-T1, i think you will need one basic C45 
> library (genphy_c45_pma_read_abilities() is a good example of a function 
> that is not specific to a device class). On the other hand, 
> genphy_c45_pma_setup_forced() is not a generic function at this point as 
> it supports only a subset of devices managed in C45.
> 
> I tend to agree with you that splitting is the best way to go in the 
> long run, but that also requires a split of the existing phy-c45.c into 
> two IMHO.
> 
BASE-T1 seems to be based on Clause 45 (at least Clause 45 MDIO),
but it's not fully compliant with Clause 45. Taking AN link status
as an example: 45.2.7.2.7 states that link-up is signaled in bit 7.1.2.
If BASE-T1 uses a different register, then it's not fully Clause 45
compatible.
Therefore also my question for the datasheet of an actual BASE-T1 PHY,
as I would be curious whether it shadows the link-up bit from 7.513.2
to 7.1.2 to be Clause 45 compliant. Definitely reading bit 7.513.2
is nothing that belongs into a genphy_c45_ function.

The extension to genphy_c45_pma_read_abilities() looks good to me,
for the other parts I'd like to see first how real world BASE-T1 PHYs
handle it. If they shadow the T1-specific bits to the Clause 45
standard ones, we should be fine. Otherwise IMO we have to add
separate T1 functions to phylib.

Heiner















^ permalink raw reply

* Re: [RFC PATCH 1/2] gianfar: convert to phylink
From: Vladimir Oltean @ 2019-08-24 15:21 UTC (permalink / raw)
  To: Arseny Solokha
  Cc: Claudiu Manoil, Russell King, Ioana Ciornei, Andrew Lunn, netdev,
	Florian Fainelli
In-Reply-To: <87lfwfio13.fsf@eldim>

Hi Arseny.

On Tue, 30 Jul 2019 at 17:40, Arseny Solokha <asolokha@kb.kras.ru> wrote:
>
> > Hi Arseny,
> >
> > Nice project!
>
> Vladimir, Russell, thanks for your review. I'm on vacation now, so won't fully
> address your comments in a few weeks: while I can build the code, I won't have
> access to hardware to test.
>
> So it seems this patch will turn into a series where we'll have some cleanup
> patches preceding the actual conversion (and the latter will also contain a
> documentation change in Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> which I've overlooked in the first submission). I'll try to post trivial
> cleanups independently while still on vacation.
>

Yes, ideally the cleanup would be separate from the conversion.

>
> >> @@ -891,11 +912,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
> >>
> >>         err = of_property_read_string(np, "phy-connection-type", &ctype);
> >>
> >> -       /* We only care about rgmii-id.  The rest are autodetected */
> >> -       if (err == 0 && !strcmp(ctype, "rgmii-id"))
> >> -               priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> >> -       else
> >> +       /* We only care about rgmii-id and sgmii - the former
> >> +        * is indistinguishable from rgmii in hardware, and phylink needs
> >> +        * the latter to be set appropriately for correct phy configuration.
> >> +        * The rest are autodetected
> >> +        */
> >> +       if (err == 0) {
> >> +               if (!strcmp(ctype, "rgmii-id"))
> >> +                       priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> >> +               else if (!strcmp(ctype, "sgmii"))
> >> +                       priv->interface = PHY_INTERFACE_MODE_SGMII;
> >> +               else
> >> +                       priv->interface = PHY_INTERFACE_MODE_MII;
> >> +       } else {
> >>                 priv->interface = PHY_INTERFACE_MODE_MII;
> >> +       }
> >>
> >
> > No. Don't do this. Just do:
> >
> >     err = of_get_phy_mode(np);
> >     if (err < 0)
> >         goto err_grp_init;
> >
> >     priv->interface = err;
> >
> >>         if (of_find_property(np, "fsl,magic-packet", NULL))
> >>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_MAGIC_PACKET;
> >> @@ -903,19 +934,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
> >>         if (of_get_property(np, "fsl,wake-on-filer", NULL))
> >>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_WAKE_ON_FILER;
> >>
> >> -       priv->phy_node = of_parse_phandle(np, "phy-handle", 0);
> >> +       priv->device_node = np;
> >> +       priv->speed = SPEED_UNKNOWN;
> >>
> >> -       /* In the case of a fixed PHY, the DT node associated
> >> -        * to the PHY is the Ethernet MAC DT node.
> >> -        */
> >> -       if (!priv->phy_node && of_phy_is_fixed_link(np)) {
> >> -               err = of_phy_register_fixed_link(np);
> >> -               if (err)
> >> -                       goto err_grp_init;
> >> +       priv->phylink_config.dev = &priv->ndev->dev;
> >> +       priv->phylink_config.type = PHYLINK_NETDEV;
> >>
> >> -               priv->phy_node = of_node_get(np);
> >> +       phylink = phylink_create(&priv->phylink_config, of_fwnode_handle(np),
> >> +                                priv->interface, &gfar_phylink_ops);
> >
> > You introduced a bug here.
> > of_phy_connect used to take the PHY interface type (for good or bad)
> > from gfar_get_interface() (which is reconstructing it from the MAC
> > registers).
> > You are now passing the PHY interface type to phylink_create from the
> > "phy-connection-type" DT property.
> > At the very least, you are breaking LS1021A which uses phy-mode
> > instead of phy-connection-type (hence my comment above to use the
> > generic OF helper).
> > Actually I think you just uncovered a latent bug, in that the DT
> > bindings for phy-mode didn't mean much at all to the driver - it would
> > rely on what the bootloader had set up.
> > Actually DT bindings for phy-connection-type were most likely simply
> > bolt on on top of gianfar when they figured they couldn't just
> > auto-detect the various species of required RGMII delays.
> > But gfar_get_interface is a piece of history that was introduced in
> > the same commit as the enum phy_interface_t itself: e8a2b6a42073
> > ("[PATCH] PHY: Add support for configuring the PHY connection
> > interface"). Its time has come.
>
> <…>
>
> >>         }
> >>
> >> +       priv->tbi_phy = NULL;
> >> +       interface = gfar_get_interface(dev);
> >
> > Be consistent and just go for priv->interface. Nobody's changing it anyway.
>
> So if I get you right, I'm supposed to drop gfar_get_interface() and rely on DT
> bindings entirely?
>

Oof, I checked arch/powerpc/boot/dts/fsl and the following boards
using the gianfar driver are guilty of populating phy-handle but not
phy-connection-type:
mpc8540ads
mpc8541cds
mpc8548cds_32b
mpc8548cds_36b
mpc8555cds
mpc8560ads
mpc8568mds
ppa8548

I think a sane logic for populating priv->interface would be:
- Attempt of_get_phy_mode
- If phy-mode or phy-connection-type properties are not found, revert
to gfar_get_interface for the legacy blobs above.

>
> >> @@ -3387,23 +3384,6 @@ static irqreturn_t gfar_interrupt(int irq, void *grp_id)
> >>         return IRQ_HANDLED;
> >>  }
> >>
> >> -/* Called every time the controller might need to be made
> >> - * aware of new link state.  The PHY code conveys this
> >> - * information through variables in the phydev structure, and this
> >> - * function converts those variables into the appropriate
> >> - * register values, and can bring down the device if needed.
> >> - */
> >> -static void adjust_link(struct net_device *dev)
> >> -{
> >> -       struct gfar_private *priv = netdev_priv(dev);
> >> -       struct phy_device *phydev = dev->phydev;
> >> -
> >> -       if (unlikely(phydev->link != priv->oldlink ||
> >> -                    (phydev->link && (phydev->duplex != priv->oldduplex ||
> >> -                                      phydev->speed != priv->oldspeed))))
> >> -               gfar_update_link_state(priv);
> >> -}
> >
> > Getting rid of the cruft from this function deserves its own patch.
>
> How am I supposed to remove it without breaking the PHYLIB-based driver? Or do
> you mean making it call gfar_update_link_state() just before the conversion
> which will then remove adjust_link() altogether?
>

I don't know, if you can't refactor without breaking anything then ok.

>
> >>
> >>         if (unlikely(test_bit(GFAR_RESETTING, &priv->state)))
> >>                 return;
> >>
> >> -       if (phydev->link) {
> >> -               u32 tempval1 = gfar_read(&regs->maccfg1);
> >> -               u32 tempval = gfar_read(&regs->maccfg2);
> >> -               u32 ecntrl = gfar_read(&regs->ecntrl);
> >> -               u32 tx_flow_oldval = (tempval1 & MACCFG1_TX_FLOW);
> >> +       if (unlikely(phylink_autoneg_inband(mode)))
> >> +               return;
> >>
> >> -               if (phydev->duplex != priv->oldduplex) {
> >> -                       if (!(phydev->duplex))
> >> -                               tempval &= ~(MACCFG2_FULL_DUPLEX);
> >> -                       else
> >> -                               tempval |= MACCFG2_FULL_DUPLEX;
> >> +       maccfg1 = gfar_read(&regs->maccfg1);
> >> +       maccfg2 = gfar_read(&regs->maccfg2);
> >> +       ecntrl = gfar_read(&regs->ecntrl);
> >>
> >> -                       priv->oldduplex = phydev->duplex;
> >> -               }
> >> +       new_maccfg2 = maccfg2 & ~(MACCFG2_FULL_DUPLEX | MACCFG2_IF);
> >> +       new_ecntrl = ecntrl & ~ECNTRL_R100;
> >>
> >> -               if (phydev->speed != priv->oldspeed) {
> >> -                       switch (phydev->speed) {
> >> -                       case 1000:
> >> -                               tempval =
> >> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_GMII);
> >> +       if (state->duplex)
> >> +               new_maccfg2 |= MACCFG2_FULL_DUPLEX;
> >>
> >> -                               ecntrl &= ~(ECNTRL_R100);
> >> -                               break;
> >> -                       case 100:
> >> -                       case 10:
> >> -                               tempval =
> >> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_MII);
> >> -
> >> -                               /* Reduced mode distinguishes
> >> -                                * between 10 and 100
> >> -                                */
> >> -                               if (phydev->speed == SPEED_100)
> >> -                                       ecntrl |= ECNTRL_R100;
> >> -                               else
> >> -                                       ecntrl &= ~(ECNTRL_R100);
> >> -                               break;
> >> -                       default:
> >> -                               netif_warn(priv, link, priv->ndev,
> >> -                                          "Ack!  Speed (%d) is not 10/100/1000!\n",
> >> -                                          phydev->speed);
> >
> > Please 1. remove "Ack!" 2. treat SPEED_UNKNOWN here by setting the MAC
> > into a low-power state (e.g. 10 Mbps - the power savings are real).
> > Don't print that Speed -1 is not 10/100/1000, we know that.
>
> In my first conversion attempt I see "Ack!" when changing link speed on when
> shutting it down, so switching to 10 Mbps doesn't seem right for me—hence early
> return in this case. Maybe I'm doing something wrong here.
>

When mac_config calls with SPEED_UNKNOWN, the link is down, and you
can put the MAC in the lowest energy state it can go to (10 Mbps, in
this case). Or so I've been told. Maybe Russell can chime in. Anyway,
you don't need to print anything, there's lots of prints from PHYLINK
already.

>
> >> diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> >> index 3433b46b90c1..146b30d07789 100644
> >> --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
> >> +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> >> @@ -35,7 +35,7 @@
> >>  #include <asm/types.h>
> >>  #include <linux/ethtool.h>
> >>  #include <linux/mii.h>
> >> -#include <linux/phy.h>
> >> +#include <linux/phylink.h>
> >>  #include <linux/sort.h>
> >>  #include <linux/if_vlan.h>
> >>  #include <linux/of_platform.h>
> >> @@ -207,12 +207,10 @@ static void gfar_get_regs(struct net_device *dev, struct ethtool_regs *regs,
> >>  static unsigned int gfar_usecs2ticks(struct gfar_private *priv,
> >>                                      unsigned int usecs)
> >>  {
> >> -       struct net_device *ndev = priv->ndev;
> >> -       struct phy_device *phydev = ndev->phydev;
> >
> > Are you sure this still works? You missed a ndev->phydev check from
> > gfar_gcoalesce, where this is called from. Technically you can still
> > check ndev->phydev, it's just that PHYLINK doesn't guarantee you'll
> > have one (e.g. fixed-link interfaces).
>
> It still works for RGMII PHYs, SGMII and 1000Base-X in my testing. I didn't
> check it with fixed-link, though.
>
>
> >> @@ -1519,6 +1472,24 @@ static int gfar_get_ts_info(struct net_device *dev,
> >>         return 0;
> >>  }
> >>
> >> +/* Set link ksettings (phy address, speed) for ethtools */
> >
> > ethtool, not ethtools. Also, I'm not quite sure what you mean by
> > setting the "phy address" with ethtool.
>
> Well, I know where I've copied it from… Thanks for pointing it out.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Andrew Lunn @ 2019-08-24 15:24 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

On Sat, Aug 24, 2019 at 04:42:47AM +0200, Marek Behún wrote:
> Hi,
> this is my attempt to solve the multi-CPU port issue for DSA.
> 
> Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
> If more than one CPU port is found in a tree, the code assigns CPU ports
> to user/DSA ports in a round robin way. So for the simplest case where
> we have one switch with N ports, 2 of them of type CPU connected to eth0
> and eth1, and the other ports labels being lan1, lan2, ..., the code
> assigns them to CPU ports this way:
>   lan1 <-> eth0
>   lan2 <-> eth1
>   lan3 <-> eth0
>   lan4 <-> eth1
>   lan5 <-> eth0

Hi Marek

That is what i've always argued is a good default. So i'm happy with
this.

> Patch 2 adds a new operation to the net device operations structure.
> Currently we use the iflink property of a net device to report to which
> CPU port a given switch port si connected to. The ip link utility from
> iproute2 reports this as "lan1@eth0". We add a new net device operation,
> ndo_set_iflink, which can be used to set this property. We call this
> function from the netlink handlers.

That is a new idea. Interesting.

I would like to look around and see what else uses this "lan1@eth0"
concept. We need to ensure it is not counter intuitive in general,
when you consider all possible users.

> Patch 3 implements this new ndo_set_iflink operation for DSA slave
> device. Thus the userspace can request a change of CPU port of a given
> port.

So this is all about transmit from the host out the switch. What about
receive? How do you tell the switch which CPU interface it should use
for a port?

    Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Vladimir Oltean @ 2019-08-24 15:40 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>

Hi Marek,

On Sat, 24 Aug 2019 at 05:43, Marek Behún <marek.behun@nic.cz> wrote:
>
> Hi,
> this is my attempt to solve the multi-CPU port issue for DSA.
>
> Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
> If more than one CPU port is found in a tree, the code assigns CPU ports
> to user/DSA ports in a round robin way. So for the simplest case where
> we have one switch with N ports, 2 of them of type CPU connected to eth0
> and eth1, and the other ports labels being lan1, lan2, ..., the code
> assigns them to CPU ports this way:
>   lan1 <-> eth0
>   lan2 <-> eth1
>   lan3 <-> eth0
>   lan4 <-> eth1
>   lan5 <-> eth0
>   ...
>
> Patch 2 adds a new operation to the net device operations structure.
> Currently we use the iflink property of a net device to report to which
> CPU port a given switch port si connected to. The ip link utility from
> iproute2 reports this as "lan1@eth0". We add a new net device operation,
> ndo_set_iflink, which can be used to set this property. We call this
> function from the netlink handlers.
>
> Patch 3 implements this new ndo_set_iflink operation for DSA slave
> device. Thus the userspace can request a change of CPU port of a given
> port.
>
> I am also sending patch for iproute2-next, to add support for setting
> this iflink value.
>
> Marek
>

The topic is interesting.
This changeset leaves the reader wanting to see a driver
implementation of .port_change_cpu_port. (mostly to understand what
your hardware is capable of)
Will DSA assume that all CPU ports are equal in terms of tagging
protocol abilities? There are switches where one of the CPU ports can
do tagging and the other can't.
Is the static assignment between slave and CPU ports going to be the
only use case? What about link aggregation? Flow steering perhaps?
And like Andrew pointed out, how do you handle the receive case? What
happens to flooded frames, will the switch send them to both CPU
interfaces, and get received twice in Linux? How do you prevent that?

> Marek Behún (3):
>   net: dsa: allow for multiple CPU ports
>   net: add ndo for setting the iflink property
>   net: dsa: implement ndo_set_netlink for chaning port's CPU port
>
>  include/linux/netdevice.h |  5 +++
>  include/net/dsa.h         | 11 ++++-
>  net/core/dev.c            | 15 +++++++
>  net/core/rtnetlink.c      |  7 ++++
>  net/dsa/dsa2.c            | 84 +++++++++++++++++++++++++--------------
>  net/dsa/slave.c           | 35 ++++++++++++++++
>  6 files changed, 126 insertions(+), 31 deletions(-)
>
> --
> 2.21.0
>

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports
From: Andrew Lunn @ 2019-08-24 15:43 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824024251.4542-2-marek.behun@nic.cz>

> +static int dsa_tree_setup_default_cpus(struct dsa_switch_tree *dst)
>  {
>  	struct dsa_switch *ds;
>  	struct dsa_port *dp;
> -	int device, port;
> +	int device, port, i;
>  
> -	/* DSA currently only supports a single CPU port */
> -	dst->cpu_dp = dsa_tree_find_first_cpu(dst);
> -	if (!dst->cpu_dp) {
> +	dsa_tree_fill_cpu_ports(dst);
> +	if (!dst->num_cpu_dps) {
>  		pr_warn("Tree has no master device\n");
>  		return -EINVAL;
>  	}
>  
> -	/* Assign the default CPU port to all ports of the fabric */
> +	/* Assign the default CPU port to all ports of the fabric in a round
> +	 * robin way. This should work nicely for all sane switch tree designs.
> +	 */
> +	i = 0;
> +
>  	for (device = 0; device < DSA_MAX_SWITCHES; device++) {
>  		ds = dst->ds[device];
>  		if (!ds)
> @@ -238,18 +249,20 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
>  		for (port = 0; port < ds->num_ports; port++) {
>  			dp = &ds->ports[port];
>  
> -			if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp))
> -				dp->cpu_dp = dst->cpu_dp;
> +			if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp)) {
> +				dp->cpu_dp = dst->cpu_dps[i++];
> +				if (i == dst->num_cpu_dps)
> +					i = 0;
> +			}

Hi Marek

For a single switch, i think this is O.K, but when you have a cluster,
maybe a different allocation should be considered? If this switch has
a local CPU port, use it. Only round robing between remote CPU ports
when there is no local CPU port?

For a two switch setup and each switch having its own CPU port, your
allocation will cause half the CPU traffic to go across the DSA link
between the two switches. But we really want to keep the DSA link for
traffic between user ports on different switches.

But i don't know if it is worth the effort. I've never seen a D in DSA
setup with multiple CPUs ports. I've only ever seen an single switch
with multiple CPU ports.

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Vladimir Oltean @ 2019-08-24 15:44 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <CA+h21hpBKnueT0QrVDL=Hhcp9X0rnaPW8omxiegq4TkcQ18EVQ@mail.gmail.com>

On Sat, 24 Aug 2019 at 18:40, Vladimir Oltean <olteanv@gmail.com> wrote:
>
> Hi Marek,
>
> On Sat, 24 Aug 2019 at 05:43, Marek Behún <marek.behun@nic.cz> wrote:
> >
> > Hi,
> > this is my attempt to solve the multi-CPU port issue for DSA.
> >
> > Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
> > If more than one CPU port is found in a tree, the code assigns CPU ports
> > to user/DSA ports in a round robin way. So for the simplest case where
> > we have one switch with N ports, 2 of them of type CPU connected to eth0
> > and eth1, and the other ports labels being lan1, lan2, ..., the code
> > assigns them to CPU ports this way:
> >   lan1 <-> eth0
> >   lan2 <-> eth1
> >   lan3 <-> eth0
> >   lan4 <-> eth1
> >   lan5 <-> eth0
> >   ...
> >
> > Patch 2 adds a new operation to the net device operations structure.
> > Currently we use the iflink property of a net device to report to which
> > CPU port a given switch port si connected to. The ip link utility from
> > iproute2 reports this as "lan1@eth0". We add a new net device operation,
> > ndo_set_iflink, which can be used to set this property. We call this
> > function from the netlink handlers.
> >
> > Patch 3 implements this new ndo_set_iflink operation for DSA slave
> > device. Thus the userspace can request a change of CPU port of a given
> > port.
> >
> > I am also sending patch for iproute2-next, to add support for setting
> > this iflink value.
> >
> > Marek
> >
>
> The topic is interesting.
> This changeset leaves the reader wanting to see a driver
> implementation of .port_change_cpu_port. (mostly to understand what
> your hardware is capable of)
> Will DSA assume that all CPU ports are equal in terms of tagging
> protocol abilities? There are switches where one of the CPU ports can
> do tagging and the other can't.

Just to be clear. You can argue that such switches are weird, and
that's ok. Just want to understand the general type of hardware for
which such a patch is intended.

> Is the static assignment between slave and CPU ports going to be the
> only use case? What about link aggregation? Flow steering perhaps?
> And like Andrew pointed out, how do you handle the receive case? What
> happens to flooded frames, will the switch send them to both CPU
> interfaces, and get received twice in Linux? How do you prevent that?
>
> > Marek Behún (3):
> >   net: dsa: allow for multiple CPU ports
> >   net: add ndo for setting the iflink property
> >   net: dsa: implement ndo_set_netlink for chaning port's CPU port
> >
> >  include/linux/netdevice.h |  5 +++
> >  include/net/dsa.h         | 11 ++++-
> >  net/core/dev.c            | 15 +++++++
> >  net/core/rtnetlink.c      |  7 ++++
> >  net/dsa/dsa2.c            | 84 +++++++++++++++++++++++++--------------
> >  net/dsa/slave.c           | 35 ++++++++++++++++
> >  6 files changed, 126 insertions(+), 31 deletions(-)
> >
> > --
> > 2.21.0
> >
>
> Regards,
> -Vladimir

^ permalink raw reply

* Re: [PATCH RFC net-next 3/3] net: dsa: implement ndo_set_netlink for chaning port's CPU port
From: Andrew Lunn @ 2019-08-24 15:47 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824024251.4542-4-marek.behun@nic.cz>

On Sat, Aug 24, 2019 at 04:42:50AM +0200, Marek Behún wrote:
> Implement ndo_set_iflink for DSA slave device. In multi-CPU port setup
> this should be used to change to which CPU destination port a given port
> should be connected.
> 
> This adds a new operation into the DSA switch operations structure,
> port_change_cpu_port. A driver implementing this function has the
> ability to change CPU destination port of a given port.
> 
> Signed-off-by: Marek Behún <marek.behun@nic.cz>
> ---
>  include/net/dsa.h |  6 ++++++
>  net/dsa/slave.c   | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 64bd70608f2f..4f3f0032b886 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -545,6 +545,12 @@ struct dsa_switch_ops {
>  	 */
>  	netdev_tx_t (*port_deferred_xmit)(struct dsa_switch *ds, int port,
>  					  struct sk_buff *skb);
> +
> +	/*
> +	 * Multi-CPU port support
> +	 */
> +	int	(*port_change_cpu_port)(struct dsa_switch *ds, int port,
> +					struct dsa_port *new_cpu_dp);
>  };

Hi Marek

We need to see an actual implementation of this. We don't add new APIs
without having a user.

	Andrew

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Andrew Lunn @ 2019-08-24 15:56 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Marek Behún, netdev, Vivien Didelot, Florian Fainelli,
	David Ahern, Stephen Hemminger
In-Reply-To: <CA+h21hpBKnueT0QrVDL=Hhcp9X0rnaPW8omxiegq4TkcQ18EVQ@mail.gmail.com>

> Will DSA assume that all CPU ports are equal in terms of tagging
> protocol abilities? There are switches where one of the CPU ports can
> do tagging and the other can't.

Hi Vladimir

Given the current definition of what a CPU port is, we have to assume
the port is using tags. Frames have to be directed out a specific
egress port, otherwise things like BPDU, PTP will break. You cannot
rely on MAC address learning.

> Is the static assignment between slave and CPU ports going to be the
> only use case? What about link aggregation? Flow steering perhaps?
> And like Andrew pointed out, how do you handle the receive case? What
> happens to flooded frames, will the switch send them to both CPU
> interfaces, and get received twice in Linux? How do you prevent that?

I expect bad things will happen if frames are flooded to multiple CPU
ports. For this to work, the whole switch design needs to support
multiple CPU ports. I doubt this will work on any old switch.

Having a host interface connected to a user port of the switch is a
completely different uses case, and not what this patchset is about.

	   Andrew

^ permalink raw reply

* [PATCH net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments.
From: Justin Pettit @ 2019-08-24 16:58 UTC (permalink / raw)
  To: netdev, Pravin Shelar; +Cc: Joe Stringer

When IP fragments are reassembled before being sent to conntrack, the
key from the last fragment is used.  Unless there are reordering
issues, the last fragment received will not contain the L4 ports, so the
key for the reassembled datagram won't contain them.  This patch updates
the key once we have a reassembled datagram.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
---
 net/openvswitch/conntrack.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 848c6eb55064..f40ad2a42086 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -524,6 +524,10 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
 		return -EPFNOSUPPORT;
 	}

+	/* The key extracted from the fragment that completed this datagram
+	 * likely didn't have an L4 header, so regenerate it. */
+	ovs_flow_key_update(skb, key);
+
 	key->ip.frag = OVS_FRAG_TYPE_NONE;
 	skb_clear_hash(skb);
 	skb->ignore_df = 1;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 2/2] openvswitch: Clear the L4 portion of the key for "later" fragments.
From: Justin Pettit @ 2019-08-24 16:58 UTC (permalink / raw)
  To: netdev, Pravin Shelar; +Cc: Joe Stringer
In-Reply-To: <20190824165846.79627-1-jpettit@ovn.org>

Only the first fragment in a datagram contains the L4 headers.  When the
Open vSwitch module parses a packet, it always sets the IP protocol
field in the key, but can only set the L4 fields on the first fragment.
The original behavior would not clear the L4 portion of the key, so
garbage values would be sent in the key for "later" fragments.  This
patch clears the L4 fields in that circumstance to prevent sending those
garbage values as part of the upcall.

Signed-off-by: Justin Pettit <jpettit@ovn.org>
---
 net/openvswitch/flow.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index bc89e16e0505..0fb2cec08523 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -623,6 +623,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		offset = nh->frag_off & htons(IP_OFFSET);
 		if (offset) {
 			key->ip.frag = OVS_FRAG_TYPE_LATER;
+			memset(&key->tp, 0, sizeof(key->tp));
 			return 0;
 		}
 		if (nh->frag_off & htons(IP_MF) ||
@@ -740,8 +741,10 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 			return error;
 		}

-		if (key->ip.frag == OVS_FRAG_TYPE_LATER)
+		if (key->ip.frag == OVS_FRAG_TYPE_LATER) {
+			memset(&key->tp, 0, sizeof(key->tp));
 			return 0;
+		}
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
 			key->ip.frag = OVS_FRAG_TYPE_FIRST;

-- 
2.17.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox