* Re: [PATCH] net/skbuff: silence warnings under memory pressure
From: Vlastimil Babka @ 2019-09-02 14:24 UTC (permalink / raw)
To: Qian Cai, Eric Dumazet, davem
Cc: netdev, linux-mm, linux-kernel, Michal Hocko
In-Reply-To: <1567178728.5576.32.camel@lca.pw>
On 8/30/19 5:25 PM, Qian Cai wrote:
> On Fri, 2019-08-30 at 17:11 +0200, Eric Dumazet wrote:
>>
>> On 8/30/19 4:57 PM, Qian Cai wrote:
>>> When running heavy memory pressure workloads, the system is throwing
>>> endless warnings below due to the allocation could fail from
>>> __build_skb(), and the volume of this call could be huge which may
>>> generate a lot of serial console output and cosumes all CPUs as
>>> warn_alloc() could be expensive by calling dump_stack() and then
>>> show_mem().
>>>
>>> Fix it by silencing the warning in this call site. Also, it seems
>>> unnecessary to even print a warning at all if the allocation failed in
>>> __build_skb(), as it may just retransmit the packet and retry.
>>>
Well, __GFP_NOWARN would save me from explaining this warning to users
many times. OTOH usually it's an indication that min_free_kbytes should
be raised to better cope with network traffic.
>>
>> Same patches are showing up there and there from time to time.
>>
>> Why is this particular spot interesting, against all others not adding
>> __GFP_NOWARN ?
This one is interesting that it's a GFP_ATOMIC allocation triggered by
incoming packets, and has a fallback mechanism. I don't recall other so
notoric ones.
>> Are we going to have hundred of patches adding __GFP_NOWARN at various points,
>> or should we get something generic to not flood the syslog in case of memory
>> pressure ?
>>
>
> From my testing which uses LTP oom* tests. There are only 3 places need to be
> patched. The other two are in IOMMU code for both Intel and AMD. The place is
> particular interesting because it could cause the system with floating serial
> console output for days without making progress in OOM. I suppose it ends up in
> a looping condition that warn_alloc() would end up generating more calls into
> __build_skb() via ksoftirqd.
Regardless of this particular allocation, if the reporting itself makes
the conditions so much worse, then at least some kind of general
ratelimit would make sense indeed.
^ permalink raw reply
* Re: [PATCH v5] perf machine: arm/arm64: Improve completeness for kernel address space
From: Leo Yan @ 2019-09-02 14:15 UTC (permalink / raw)
To: Adrian Hunter
Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Song Liu, Yonghong Song, linux-kernel, netdev,
bpf, clang-built-linux, Mathieu Poirier, Peter Zijlstra,
Suzuki Poulouse, coresight, linux-arm-kernel
In-Reply-To: <20190826125105.GA3288@leoy-ThinkPad-X240s>
Hi Adrian,
On Mon, Aug 26, 2019 at 08:51:05PM +0800, Leo Yan wrote:
> Hi Adrian,
>
> On Fri, Aug 16, 2019 at 04:00:02PM +0300, Adrian Hunter wrote:
> > On 16/08/19 4:45 AM, Leo Yan wrote:
> > > Hi Adrian,
> > >
> > > On Thu, Aug 15, 2019 at 02:45:57PM +0300, Adrian Hunter wrote:
> > >
> > > [...]
> > >
> > >>>> How come you cannot use kallsyms to get the information?
> > >>>
> > >>> Thanks for pointing out this. Sorry I skipped your comment "I don't
> > >>> know how you intend to calculate ARM_PRE_START_SIZE" when you reviewed
> > >>> the patch v3, I should use that chance to elaborate the detailed idea
> > >>> and so can get more feedback/guidance before procceed.
> > >>>
> > >>> Actually, I have considered to use kallsyms when worked on the previous
> > >>> patch set.
> > >>>
> > >>> As mentioned in patch set v4's cover letter, I tried to implement
> > >>> machine__create_extra_kernel_maps() for arm/arm64, the purpose is to
> > >>> parse kallsyms so can find more kernel maps and thus also can fixup
> > >>> the kernel start address. But I found the 'perf script' tool directly
> > >>> calls machine__get_kernel_start() instead of running into the flow for
> > >>> machine__create_extra_kernel_maps();
> > >>
> > >> Doesn't it just need to loop through each kernel map to find the lowest
> > >> start address?
> > >
> > > Based on your suggestion, I worked out below change and verified it
> > > can work well on arm64 for fixing up start address; please let me know
> > > if the change works for you?
> >
> > How does that work if take a perf.data file to a machine with a different
> > architecture?
>
> Sorry I delayed so long to respond to your question; I didn't have
> confidence to give out very reasonale answer and this is the main reason
> for delaying.
Could you take chance to review my below replying? I'd like to get
your confirmation before I send out offical patch.
Thanks,
Leo Yan
>
> For your question for taking a perf.data file to a machine with a
> different architecture, we can firstly use command 'perf buildid-list'
> to print out the buildid for kallsyms, based on the dumped buildid we
> can find out the location for the saved kallsyms file; then we can use
> option '--kallsyms' to specify the offline kallsyms file and use the
> offline kallsyms to fixup kernel start address. The detailed commands
> are listed as below:
>
> root@debian:~# perf buildid-list
> 7b36dfca8317ef74974ebd7ee5ec0a8b35c97640 [kernel.kallsyms]
> 56b84aa88a1bcfe222a97a53698b92723a3977ca /usr/lib/systemd/systemd
> 0956b952e9cd673d48ff2cfeb1a9dbd0c853e686 /usr/lib/aarch64-linux-gnu/libm-2.28.so
> [...]
>
> root@debian:~# perf script --kallsyms ~/.debug/\[kernel.kallsyms\]/7b36dfca8317ef74974ebd7ee5ec0a8b35c97640/kallsyms
>
> The amended patch is as below, please review and always welcome
> any suggestions or comments!
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 5734460fc89e..593f05cc453f 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2672,9 +2672,26 @@ int machine__nr_cpus_avail(struct machine *machine)
> return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
> }
>
> +static int machine__fixup_kernel_start(void *arg,
> + const char *name __maybe_unused,
> + char type,
> + u64 start)
> +{
> + struct machine *machine = arg;
> +
> + type = toupper(type);
> +
> + /* Fixup for text, weak, data and bss sections. */
> + if (type == 'T' || type == 'W' || type == 'D' || type == 'B')
> + machine->kernel_start = min(machine->kernel_start, start);
> +
> + return 0;
> +}
> +
> int machine__get_kernel_start(struct machine *machine)
> {
> struct map *map = machine__kernel_map(machine);
> + char filename[PATH_MAX];
> int err = 0;
>
> /*
> @@ -2696,6 +2713,22 @@ int machine__get_kernel_start(struct machine *machine)
> if (!err && !machine__is(machine, "x86_64"))
> machine->kernel_start = map->start;
> }
> +
> + if (symbol_conf.kallsyms_name != NULL) {
> + strncpy(filename, symbol_conf.kallsyms_name, PATH_MAX);
> + } else {
> + machine__get_kallsyms_filename(machine, filename, PATH_MAX);
> +
> + if (symbol__restricted_filename(filename, "/proc/kallsyms"))
> + goto out;
> + }
> +
> + if (kallsyms__parse(filename, machine, machine__fixup_kernel_start))
> + pr_warning("Fail to fixup kernel start address. skipping...\n");
> +
> +out:
> return err;
> }
>
>
> Thanks,
> Leo Yan
^ permalink raw reply
* Re: BUG_ON in skb_segment, after bpf_skb_change_proto was applied
From: Shmulik Ladkani @ 2019-09-02 13:44 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Eric Dumazet, netdev, Alexander Duyck,
Alexei Starovoitov, Yonghong Song, Steffen Klassert,
Shmulik Ladkani, eyal
In-Reply-To: <CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com>
On Sun, 1 Sep 2019 16:05:48 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> One quick fix is to disable sg and thus revert to copying in this
> case. Not ideal, but better than a kernel splat:
>
> @@ -3714,6 +3714,9 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> sg = !!(features & NETIF_F_SG);
> csum = !!can_checksum_protocol(features, proto);
>
> + if (list_skb && skb_headlen(list_skb) && !list_skb->head_frag)
> + sg = false;
> +
Many thanks Willem for looking into this.
Indeed the above suggestion avoids the splat, at least with the
reproducer.
I'll look deeper into this, to make sure the generated skbs are indeed
legit and have correct content.
Shmulik
^ permalink raw reply
* Re: [PATCH] net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)
From: David Ahern @ 2019-09-02 13:37 UTC (permalink / raw)
To: Maciej Żenczykowski, Maciej Żenczykowski,
David S . Miller; +Cc: netdev
In-Reply-To: <20190901174759.257032-1-zenczykowski@gmail.com>
On 9/1/19 11:47 AM, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
>
> There is a subtle change in behaviour introduced by:
> commit c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43
> 'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create'
>
> Before that patch /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
>
> Afterwards /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo
>
> ie. the above commit causes the ::/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000).
The ADDRCONF flag is wrong.
>
> AFAICT, this is incorrect since these routes are *not* coming from RA's.
>
> As such, this patch restores the old behaviour.
>
> Fixes: c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
> net/ipv6/route.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 558c6c68855f..cee977e52d34 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4365,13 +4365,14 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net,
> struct fib6_config cfg = {
> .fc_table = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL,
> .fc_ifindex = idev->dev->ifindex,
> - .fc_flags = RTF_UP | RTF_ADDRCONF | RTF_NONEXTHOP,
> + .fc_flags = RTF_UP | RTF_NONEXTHOP,
> .fc_dst = *addr,
> .fc_dst_len = 128,
> .fc_protocol = RTPROT_KERNEL,
> .fc_nlinfo.nl_net = net,
> .fc_ignore_dev_down = true,
> };
> + struct fib6_info *f6i;
>
> if (anycast) {
> cfg.fc_type = RTN_ANYCAST;
> @@ -4381,7 +4382,9 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net,
> cfg.fc_flags |= RTF_LOCAL;
> }
>
> - return ip6_route_info_create(&cfg, gfp_flags, NULL);
> + f6i = ip6_route_info_create(&cfg, gfp_flags, NULL);
ip6_route_info_create can return NULL.
> + f6i->dst_nocount = true;
> + return f6i;
> }
>
> /* remove deleted ip from prefsrc entries */
>
^ permalink raw reply
* RE: [net-next 0/3] ravb: Remove use of undocumented registers
From: Chris Paterson @ 2019-09-02 13:16 UTC (permalink / raw)
To: Geert Uytterhoeven, Simon Horman, Biju Das, Fabrizio Castro
Cc: David Miller, Sergei Shtylyov, Magnus Damm, netdev, Linux-Renesas
In-Reply-To: <CAMuHMdXQypGJ_oqUndOcf02GCqxEGEOK15+jnS5ehqdUJ+A8aw@mail.gmail.com>
Hello Geert,
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org>
> On Behalf Of Geert Uytterhoeven
> Sent: 02 September 2019 09:16
>
> Hi Simon, Biju, Fabrizio,
>
> On Mon, Sep 2, 2019 at 10:06 AM Simon Horman
> <horms+renesas@verge.net.au> wrote:
> > this short series cleans up the RAVB driver a little.
> >
> > The first patch corrects the spelling of the FBP field of SFO register.
> > This register field is unused and should have no run-time effect.
> >
> > The remaining two patches remove the use of undocumented registers
> > after some consultation with the internal Renesas BSP team.
> >
> > All patches have been lightly tested on:
> > * E3 Ebisu
> > * H3 Salvator-XS (ES2.0)
> > * M3-W Salvator-XS
> > * M3-N Salvator-XS
>
> It would be good if someone could test this on an R-Car Gen2 board
> that uses ravb (iwg22d or iwg23s).
I've tried this series + net-next on the iwg23s haven't seen any issues from a quick sanity test.
Kind regards, Chris
>
> Thanks!
>
> > Kazuya Mizuguchi (2):
> > ravb: correct typo in FBP field of SFO register
> > ravb: Remove undocumented processing
> >
> > Simon Horman (1):
> > ravb: TROCR register is only present on R-Car Gen3
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-
> m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
^ permalink raw reply
* [PATCH net-next v3 1/3] net: dsa: mt7530: Convert to PHYLINK API
From: René van Dorst @ 2019-09-02 13:02 UTC (permalink / raw)
To: Sean Wang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
David S . Miller, Matthias Brugger
Cc: netdev, linux-arm-kernel, linux-mediatek, Russell King,
John Crispin, linux-mips, Frank Wunderlich, René van Dorst,
Russell King
In-Reply-To: <20190902130226.26845-1-opensource@vdorst.com>
Convert mt7530 to PHYLINK API
Signed-off-by: René van Dorst <opensource@vdorst.com>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
---
v2->v3:
* No change
* Add tags acked-by and tested-by
v1->v2:
* Refactor "unsupported" phy_interface part in
mt7530_phylink_mac_validate() suggested by Russell King
* Report and return when phylink tries to use autoneg_inband in
mt7530_phylink_mac_config() suggested by Russell King
* Refactor port 6 setup in mt7530_phylink_mac_config()
rfc->v1:
* Renamed P5_MODE_* to P5_INTF_SEL_*. fits the function more
* Convert if-statement for speed bits to a switch suggested by
Daniel Santos
* Refactor flow_control pause bits and don't use state->link in
mt7530_phylink_mac_config() suggested by Russell King
* Move MAC tx/rx en/disable to mt7530_phylink_mac_link_up/down()
suggested by Russell King
* Always support PHY_INTERFACE_MODE_NA in mt7530_phylink_validate()
suggested by Russell King
* Added phylink_set_port_modes() in mt7530_phylink_validate() suggested
by Russell King
* Remove dev_err on the end of mt7530_phylink_mac_config() suggested by
Russell King
drivers/net/dsa/mt7530.c | 266 +++++++++++++++++++++++++++++----------
drivers/net/dsa/mt7530.h | 32 +++--
2 files changed, 211 insertions(+), 87 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index c48e29486b10..ecc13b57e619 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -13,7 +13,7 @@
#include <linux/of_mdio.h>
#include <linux/of_net.h>
#include <linux/of_platform.h>
-#include <linux/phy.h>
+#include <linux/phylink.h>
#include <linux/regmap.h>
#include <linux/regulator/consumer.h>
#include <linux/reset.h>
@@ -633,63 +633,6 @@ mt7530_get_sset_count(struct dsa_switch *ds, int port, int sset)
return ARRAY_SIZE(mt7530_mib);
}
-static void mt7530_adjust_link(struct dsa_switch *ds, int port,
- struct phy_device *phydev)
-{
- struct mt7530_priv *priv = ds->priv;
-
- if (phy_is_pseudo_fixed_link(phydev)) {
- dev_dbg(priv->dev, "phy-mode for master device = %x\n",
- phydev->interface);
-
- /* Setup TX circuit incluing relevant PAD and driving */
- mt7530_pad_clk_setup(ds, phydev->interface);
-
- if (priv->id == ID_MT7530) {
- /* Setup RX circuit, relevant PAD and driving on the
- * host which must be placed after the setup on the
- * device side is all finished.
- */
- mt7623_pad_clk_setup(ds);
- }
- } else {
- u16 lcl_adv = 0, rmt_adv = 0;
- u8 flowctrl;
- u32 mcr = PMCR_USERP_LINK | PMCR_FORCE_MODE;
-
- switch (phydev->speed) {
- case SPEED_1000:
- mcr |= PMCR_FORCE_SPEED_1000;
- break;
- case SPEED_100:
- mcr |= PMCR_FORCE_SPEED_100;
- break;
- }
-
- if (phydev->link)
- mcr |= PMCR_FORCE_LNK;
-
- if (phydev->duplex) {
- mcr |= PMCR_FORCE_FDX;
-
- if (phydev->pause)
- rmt_adv = LPA_PAUSE_CAP;
- if (phydev->asym_pause)
- rmt_adv |= LPA_PAUSE_ASYM;
-
- lcl_adv = linkmode_adv_to_lcl_adv_t(
- phydev->advertising);
- flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
-
- if (flowctrl & FLOW_CTRL_TX)
- mcr |= PMCR_TX_FC_EN;
- if (flowctrl & FLOW_CTRL_RX)
- mcr |= PMCR_RX_FC_EN;
- }
- mt7530_write(priv, MT7530_PMCR_P(port), mcr);
- }
-}
-
static int
mt7530_cpu_port_enable(struct mt7530_priv *priv,
int port)
@@ -698,9 +641,6 @@ mt7530_cpu_port_enable(struct mt7530_priv *priv,
mt7530_write(priv, MT7530_PVC_P(port),
PORT_SPEC_TAG);
- /* Setup the MAC by default for the cpu port */
- mt7530_write(priv, MT7530_PMCR_P(port), PMCR_CPUP_LINK);
-
/* Disable auto learning on the cpu port */
mt7530_set(priv, MT7530_PSC_P(port), SA_DIS);
@@ -731,9 +671,6 @@ mt7530_port_enable(struct dsa_switch *ds, int port,
mutex_lock(&priv->reg_mutex);
- /* Setup the MAC for the user port */
- mt7530_write(priv, MT7530_PMCR_P(port), PMCR_USERP_LINK);
-
/* Allow the user port gets connected to the cpu port and also
* restore the port matrix if the port is the member of a certain
* bridge.
@@ -742,7 +679,7 @@ mt7530_port_enable(struct dsa_switch *ds, int port,
priv->ports[port].enable = true;
mt7530_rmw(priv, MT7530_PCR_P(port), PCR_MATRIX_MASK,
priv->ports[port].pm);
- mt7530_port_set_status(priv, port, 1);
+ mt7530_port_set_status(priv, port, 0);
mutex_unlock(&priv->reg_mutex);
@@ -1232,10 +1169,10 @@ static int
mt7530_setup(struct dsa_switch *ds)
{
struct mt7530_priv *priv = ds->priv;
- int ret, i;
- u32 id, val;
- struct device_node *dn;
struct mt7530_dummy_poll p;
+ struct device_node *dn;
+ u32 id, val;
+ int ret, i;
/* The parent node of master netdev which holds the common system
* controller also is the container for two GMACs nodes representing
@@ -1305,6 +1242,8 @@ mt7530_setup(struct dsa_switch *ds)
val |= MHWTRAP_MANUAL;
mt7530_write(priv, MT7530_MHWTRAP, val);
+ priv->p6_interface = PHY_INTERFACE_MODE_NA;
+
/* Enable and reset MIB counters */
mt7530_mib_reset(ds);
@@ -1329,6 +1268,191 @@ mt7530_setup(struct dsa_switch *ds)
return 0;
}
+static void mt7530_phylink_mac_config(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state)
+{
+ struct mt7530_priv *priv = ds->priv;
+ u32 mcr_cur, mcr_new;
+
+ switch (port) {
+ case 0: /* Internal phy */
+ case 1:
+ case 2:
+ case 3:
+ case 4:
+ if (state->interface != PHY_INTERFACE_MODE_GMII)
+ return;
+ break;
+ /* case 5: Port 5 is not supported! */
+ case 6: /* 1st cpu port */
+ if (priv->p6_interface == state->interface)
+ break;
+
+ if (state->interface != PHY_INTERFACE_MODE_RGMII &&
+ state->interface != PHY_INTERFACE_MODE_TRGMII)
+ return;
+
+ /* Setup TX circuit incluing relevant PAD and driving */
+ mt7530_pad_clk_setup(ds, state->interface);
+
+ if (priv->id == ID_MT7530) {
+ /* Setup RX circuit, relevant PAD and driving on the
+ * host which must be placed after the setup on the
+ * device side is all finished.
+ */
+ mt7623_pad_clk_setup(ds);
+ }
+
+ priv->p6_interface = state->interface;
+ break;
+ default:
+ dev_err(ds->dev, "%s: unsupported port: %i\n", __func__, port);
+ return;
+ }
+
+ if (phylink_autoneg_inband(mode)) {
+ dev_err(ds->dev, "%s: in-band negotiation unsupported\n",
+ __func__);
+ return;
+ }
+
+ mcr_cur = mt7530_read(priv, MT7530_PMCR_P(port));
+ mcr_new = mcr_cur;
+ mcr_new &= ~(PMCR_FORCE_SPEED_1000 | PMCR_FORCE_SPEED_100 |
+ PMCR_FORCE_FDX | PMCR_TX_FC_EN | PMCR_RX_FC_EN);
+ mcr_new |= PMCR_IFG_XMIT(1) | PMCR_MAC_MODE | PMCR_BACKOFF_EN |
+ PMCR_BACKPR_EN | PMCR_FORCE_MODE | PMCR_FORCE_LNK;
+
+ switch (state->speed) {
+ case SPEED_1000:
+ mcr_new |= PMCR_FORCE_SPEED_1000;
+ break;
+ case SPEED_100:
+ mcr_new |= PMCR_FORCE_SPEED_100;
+ break;
+ }
+ if (state->duplex == DUPLEX_FULL) {
+ mcr_new |= PMCR_FORCE_FDX;
+ if (state->pause & MLO_PAUSE_TX)
+ mcr_new |= PMCR_TX_FC_EN;
+ if (state->pause & MLO_PAUSE_RX)
+ mcr_new |= PMCR_RX_FC_EN;
+ }
+
+ if (mcr_new != mcr_cur)
+ mt7530_write(priv, MT7530_PMCR_P(port), mcr_new);
+}
+
+static void mt7530_phylink_mac_link_down(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface)
+{
+ struct mt7530_priv *priv = ds->priv;
+
+ mt7530_port_set_status(priv, port, 0);
+}
+
+static void mt7530_phylink_mac_link_up(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface,
+ struct phy_device *phydev)
+{
+ struct mt7530_priv *priv = ds->priv;
+
+ mt7530_port_set_status(priv, port, 1);
+}
+
+static void mt7530_phylink_validate(struct dsa_switch *ds, int port,
+ unsigned long *supported,
+ struct phylink_link_state *state)
+{
+ __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+
+ switch (port) {
+ case 0: /* Internal phy */
+ case 1:
+ case 2:
+ case 3:
+ case 4:
+ if (state->interface != PHY_INTERFACE_MODE_NA &&
+ state->interface != PHY_INTERFACE_MODE_GMII)
+ goto unsupported;
+ break;
+ /* case 5: Port 5 not supported! */
+ case 6: /* 1st cpu port */
+ if (state->interface != PHY_INTERFACE_MODE_NA &&
+ state->interface != PHY_INTERFACE_MODE_RGMII &&
+ state->interface != PHY_INTERFACE_MODE_TRGMII)
+ goto unsupported;
+ break;
+ default:
+ dev_err(ds->dev, "%s: unsupported port: %i\n", __func__, port);
+unsupported:
+ linkmode_zero(supported);
+ return;
+ }
+
+ phylink_set_port_modes(mask);
+ phylink_set(mask, Autoneg);
+
+ if (state->interface != PHY_INTERFACE_MODE_TRGMII) {
+ phylink_set(mask, 10baseT_Half);
+ phylink_set(mask, 10baseT_Full);
+ phylink_set(mask, 100baseT_Half);
+ phylink_set(mask, 100baseT_Full);
+ phylink_set(mask, 1000baseT_Half);
+ }
+
+ phylink_set(mask, 1000baseT_Full);
+
+ phylink_set(mask, Pause);
+ phylink_set(mask, Asym_Pause);
+
+ linkmode_and(supported, supported, mask);
+ linkmode_and(state->advertising, state->advertising, mask);
+}
+
+static int
+mt7530_phylink_mac_link_state(struct dsa_switch *ds, int port,
+ struct phylink_link_state *state)
+{
+ struct mt7530_priv *priv = ds->priv;
+ u32 pmsr;
+
+ if (port < 0 || port >= MT7530_NUM_PORTS)
+ return -EINVAL;
+
+ pmsr = mt7530_read(priv, MT7530_PMSR_P(port));
+
+ state->link = (pmsr & PMSR_LINK);
+ state->an_complete = state->link;
+ state->duplex = !!(pmsr & PMSR_DPX);
+
+ switch (pmsr & PMSR_SPEED_MASK) {
+ case PMSR_SPEED_10:
+ state->speed = SPEED_10;
+ break;
+ case PMSR_SPEED_100:
+ state->speed = SPEED_100;
+ break;
+ case PMSR_SPEED_1000:
+ state->speed = SPEED_1000;
+ break;
+ default:
+ state->speed = SPEED_UNKNOWN;
+ break;
+ }
+
+ state->pause &= ~(MLO_PAUSE_RX | MLO_PAUSE_TX);
+ if (pmsr & PMSR_RX_FC)
+ state->pause |= MLO_PAUSE_RX;
+ if (pmsr & PMSR_TX_FC)
+ state->pause |= MLO_PAUSE_TX;
+
+ return 1;
+}
+
static const struct dsa_switch_ops mt7530_switch_ops = {
.get_tag_protocol = mtk_get_tag_protocol,
.setup = mt7530_setup,
@@ -1337,7 +1461,6 @@ static const struct dsa_switch_ops mt7530_switch_ops = {
.phy_write = mt7530_phy_write,
.get_ethtool_stats = mt7530_get_ethtool_stats,
.get_sset_count = mt7530_get_sset_count,
- .adjust_link = mt7530_adjust_link,
.port_enable = mt7530_port_enable,
.port_disable = mt7530_port_disable,
.port_stp_state_set = mt7530_stp_state_set,
@@ -1350,6 +1473,11 @@ static const struct dsa_switch_ops mt7530_switch_ops = {
.port_vlan_prepare = mt7530_port_vlan_prepare,
.port_vlan_add = mt7530_port_vlan_add,
.port_vlan_del = mt7530_port_vlan_del,
+ .phylink_validate = mt7530_phylink_validate,
+ .phylink_mac_link_state = mt7530_phylink_mac_link_state,
+ .phylink_mac_config = mt7530_phylink_mac_config,
+ .phylink_mac_link_down = mt7530_phylink_mac_link_down,
+ .phylink_mac_link_up = mt7530_phylink_mac_link_up,
};
static const struct of_device_id mt7530_of_match[] = {
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index bfac90f48102..107dd04acede 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -198,26 +198,20 @@ enum mt7530_vlan_port_attr {
#define PMCR_FORCE_SPEED_100 BIT(2)
#define PMCR_FORCE_FDX BIT(1)
#define PMCR_FORCE_LNK BIT(0)
-#define PMCR_COMMON_LINK (PMCR_IFG_XMIT(1) | PMCR_MAC_MODE | \
- PMCR_BACKOFF_EN | PMCR_BACKPR_EN | \
- PMCR_TX_EN | PMCR_RX_EN | \
- PMCR_TX_FC_EN | PMCR_RX_FC_EN)
-#define PMCR_CPUP_LINK (PMCR_COMMON_LINK | PMCR_FORCE_MODE | \
- PMCR_FORCE_SPEED_1000 | \
- PMCR_FORCE_FDX | \
- PMCR_FORCE_LNK)
-#define PMCR_USERP_LINK PMCR_COMMON_LINK
-#define PMCR_FIXED_LINK (PMCR_IFG_XMIT(1) | PMCR_MAC_MODE | \
- PMCR_FORCE_MODE | PMCR_TX_EN | \
- PMCR_RX_EN | PMCR_BACKPR_EN | \
- PMCR_BACKOFF_EN | \
- PMCR_FORCE_SPEED_1000 | \
- PMCR_FORCE_FDX | \
- PMCR_FORCE_LNK)
-#define PMCR_FIXED_LINK_FC (PMCR_FIXED_LINK | \
- PMCR_TX_FC_EN | PMCR_RX_FC_EN)
+#define PMCR_SPEED_MASK (PMCR_FORCE_SPEED_100 | \
+ PMCR_FORCE_SPEED_1000)
#define MT7530_PMSR_P(x) (0x3008 + (x) * 0x100)
+#define PMSR_EEE1G BIT(7)
+#define PMSR_EEE100M BIT(6)
+#define PMSR_RX_FC BIT(5)
+#define PMSR_TX_FC BIT(4)
+#define PMSR_SPEED_1000 BIT(3)
+#define PMSR_SPEED_100 BIT(2)
+#define PMSR_SPEED_10 0x00
+#define PMSR_SPEED_MASK (PMSR_SPEED_100 | PMSR_SPEED_1000)
+#define PMSR_DPX BIT(1)
+#define PMSR_LINK BIT(0)
/* Register for MIB */
#define MT7530_PORT_MIB_COUNTER(x) (0x4000 + (x) * 0x100)
@@ -423,6 +417,7 @@ struct mt7530_port {
* @ports: Holding the state among ports
* @reg_mutex: The lock for protecting among process accessing
* registers
+ * @p6_interface Holding the current port 6 interface
*/
struct mt7530_priv {
struct device *dev;
@@ -435,6 +430,7 @@ struct mt7530_priv {
struct gpio_desc *reset;
unsigned int id;
bool mcm;
+ phy_interface_t p6_interface;
struct mt7530_port ports[MT7530_NUM_PORTS];
/* protect among processes for registers access*/
--
2.20.1
^ permalink raw reply related
* [PATCH net-next v3 3/3] net: dsa: mt7530: Add support for port 5
From: René van Dorst @ 2019-09-02 13:02 UTC (permalink / raw)
To: Sean Wang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
David S . Miller, Matthias Brugger
Cc: netdev, linux-arm-kernel, linux-mediatek, Russell King,
John Crispin, linux-mips, Frank Wunderlich, René van Dorst,
Russell King
In-Reply-To: <20190902130226.26845-1-opensource@vdorst.com>
Adding support for port 5.
Port 5 can muxed/interface to:
- internal 5th GMAC of the switch; can be used as 2nd CPU port or as
extra port with an external phy for a 6th ethernet port.
- internal PHY of port 0 or 4; Used in most applications so that port 0
or 4 is the WAN port and interfaces with the 2nd GMAC of the SOC.
Signed-off-by: René van Dorst <opensource@vdorst.com>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
---
v2->v3:
* Change in mt7530_setup_port5() the port 5 setup message in to a debug
message. Suggested by David Miller
* Add tags acked-by and tested-by
v1->v2:
* Also report 1000base-x support for port 5 suggested by Russell King
* Reorder variable declaraiant in reverse christmas tree suggested by
Daved Miller
* Refactor phy-handle lookup for 2nd GMAC.
* Use of_mdio_parse_addr() instead of do it manualy suggested by
Florian Fainelli
* Refactor port 5 setup in mt7530_phylink_mac_config()
rfc->v1:
* Removed unnecessary info print suggested by Andrew Lunn
* Added support for MII mode for port 5
drivers/net/dsa/mt7530.c | 145 +++++++++++++++++++++++++++++++++++++--
drivers/net/dsa/mt7530.h | 29 ++++++++
2 files changed, 168 insertions(+), 6 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index ecc13b57e619..1d8d36de4d20 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -633,6 +633,77 @@ mt7530_get_sset_count(struct dsa_switch *ds, int port, int sset)
return ARRAY_SIZE(mt7530_mib);
}
+static void mt7530_setup_port5(struct dsa_switch *ds, phy_interface_t interface)
+{
+ struct mt7530_priv *priv = ds->priv;
+ u8 tx_delay = 0;
+ int val;
+
+ mutex_lock(&priv->reg_mutex);
+
+ val = mt7530_read(priv, MT7530_MHWTRAP);
+
+ val |= MHWTRAP_MANUAL | MHWTRAP_P5_MAC_SEL | MHWTRAP_P5_DIS;
+ val &= ~MHWTRAP_P5_RGMII_MODE & ~MHWTRAP_PHY0_SEL;
+
+ switch (priv->p5_intf_sel) {
+ case P5_INTF_SEL_PHY_P0:
+ /* MT7530_P5_MODE_GPHY_P0: 2nd GMAC -> P5 -> P0 */
+ val |= MHWTRAP_PHY0_SEL;
+ /* fall through */
+ case P5_INTF_SEL_PHY_P4:
+ /* MT7530_P5_MODE_GPHY_P4: 2nd GMAC -> P5 -> P4 */
+ val &= ~MHWTRAP_P5_MAC_SEL & ~MHWTRAP_P5_DIS;
+
+ /* Setup the MAC by default for the cpu port */
+ mt7530_write(priv, MT7530_PMCR_P(5), 0x56300);
+ break;
+ case P5_INTF_SEL_GMAC5:
+ /* MT7530_P5_MODE_GMAC: P5 -> External phy or 2nd GMAC */
+ val &= ~MHWTRAP_P5_DIS;
+ break;
+ case P5_DISABLED:
+ interface = PHY_INTERFACE_MODE_NA;
+ break;
+ default:
+ dev_err(ds->dev, "Unsupported p5_intf_sel %d\n",
+ priv->p5_intf_sel);
+ goto unlock_exit;
+ }
+
+ /* Setup RGMII settings */
+ if (phy_interface_mode_is_rgmii(interface)) {
+ val |= MHWTRAP_P5_RGMII_MODE;
+
+ /* P5 RGMII RX Clock Control: delay setting for 1000M */
+ mt7530_write(priv, MT7530_P5RGMIIRXCR, CSR_RGMII_EDGE_ALIGN);
+
+ /* Don't set delay in DSA mode */
+ if (!dsa_is_dsa_port(priv->ds, 5) &&
+ (interface == PHY_INTERFACE_MODE_RGMII_TXID ||
+ interface == PHY_INTERFACE_MODE_RGMII_ID))
+ tx_delay = 4; /* n * 0.5 ns */
+
+ /* P5 RGMII TX Clock Control: delay x */
+ mt7530_write(priv, MT7530_P5RGMIITXCR,
+ CSR_RGMII_TXC_CFG(0x10 + tx_delay));
+
+ /* reduce P5 RGMII Tx driving, 8mA */
+ mt7530_write(priv, MT7530_IO_DRV_CR,
+ P5_IO_CLK_DRV(1) | P5_IO_DATA_DRV(1));
+ }
+
+ mt7530_write(priv, MT7530_MHWTRAP, val);
+
+ dev_dbg(ds->dev, "Setup P5, HWTRAP=0x%x, intf_sel=%s, phy-mode=%s\n",
+ val, p5_intf_modes(priv->p5_intf_sel), phy_modes(interface));
+
+ priv->p5_interface = interface;
+
+unlock_exit:
+ mutex_unlock(&priv->reg_mutex);
+}
+
static int
mt7530_cpu_port_enable(struct mt7530_priv *priv,
int port)
@@ -1169,7 +1240,10 @@ static int
mt7530_setup(struct dsa_switch *ds)
{
struct mt7530_priv *priv = ds->priv;
+ struct device_node *phy_node;
+ struct device_node *mac_np;
struct mt7530_dummy_poll p;
+ phy_interface_t interface;
struct device_node *dn;
u32 id, val;
int ret, i;
@@ -1260,6 +1334,40 @@ mt7530_setup(struct dsa_switch *ds)
mt7530_port_disable(ds, i);
}
+ /* Setup port 5 */
+ priv->p5_intf_sel = P5_DISABLED;
+ interface = PHY_INTERFACE_MODE_NA;
+
+ if (!dsa_is_unused_port(ds, 5)) {
+ priv->p5_intf_sel = P5_INTF_SEL_GMAC5;
+ interface = of_get_phy_mode(ds->ports[5].dn);
+ } else {
+ /* Scan the ethernet nodes. look for GMAC1, lookup used phy */
+ for_each_child_of_node(dn, mac_np) {
+ if (!of_device_is_compatible(mac_np,
+ "mediatek,eth-mac"))
+ continue;
+
+ ret = of_property_read_u32(mac_np, "reg", &id);
+ if (ret < 0 || id != 1)
+ continue;
+
+ phy_node = of_parse_phandle(mac_np, "phy-handle", 0);
+ if (phy_node->parent == priv->dev->of_node->parent) {
+ interface = of_get_phy_mode(mac_np);
+ id = of_mdio_parse_addr(ds->dev, phy_node);
+ if (id == 0)
+ priv->p5_intf_sel = P5_INTF_SEL_PHY_P0;
+ if (id == 4)
+ priv->p5_intf_sel = P5_INTF_SEL_PHY_P4;
+ }
+ of_node_put(phy_node);
+ break;
+ }
+ }
+
+ mt7530_setup_port5(ds, interface);
+
/* Flush the FDB table */
ret = mt7530_fdb_cmd(priv, MT7530_FDB_FLUSH, NULL);
if (ret < 0)
@@ -1284,7 +1392,16 @@ static void mt7530_phylink_mac_config(struct dsa_switch *ds, int port,
if (state->interface != PHY_INTERFACE_MODE_GMII)
return;
break;
- /* case 5: Port 5 is not supported! */
+ case 5: /* 2nd cpu port with phy of port 0 or 4 / external phy */
+ if (priv->p5_interface == state->interface)
+ break;
+ if (!phy_interface_mode_is_rgmii(state->interface) &&
+ state->interface != PHY_INTERFACE_MODE_MII &&
+ state->interface != PHY_INTERFACE_MODE_GMII)
+ return;
+
+ mt7530_setup_port5(ds, state->interface);
+ break;
case 6: /* 1st cpu port */
if (priv->p6_interface == state->interface)
break;
@@ -1324,6 +1441,10 @@ static void mt7530_phylink_mac_config(struct dsa_switch *ds, int port,
mcr_new |= PMCR_IFG_XMIT(1) | PMCR_MAC_MODE | PMCR_BACKOFF_EN |
PMCR_BACKPR_EN | PMCR_FORCE_MODE | PMCR_FORCE_LNK;
+ /* Are we connected to external phy */
+ if (port == 5 && dsa_is_user_port(ds, 5))
+ mcr_new |= PMCR_EXT_PHY;
+
switch (state->speed) {
case SPEED_1000:
mcr_new |= PMCR_FORCE_SPEED_1000;
@@ -1379,7 +1500,13 @@ static void mt7530_phylink_validate(struct dsa_switch *ds, int port,
state->interface != PHY_INTERFACE_MODE_GMII)
goto unsupported;
break;
- /* case 5: Port 5 not supported! */
+ case 5: /* 2nd cpu port with phy of port 0 or 4 / external phy */
+ if (state->interface != PHY_INTERFACE_MODE_NA &&
+ !phy_interface_mode_is_rgmii(state->interface) &&
+ state->interface != PHY_INTERFACE_MODE_MII &&
+ state->interface != PHY_INTERFACE_MODE_GMII)
+ goto unsupported;
+ break;
case 6: /* 1st cpu port */
if (state->interface != PHY_INTERFACE_MODE_NA &&
state->interface != PHY_INTERFACE_MODE_RGMII &&
@@ -1396,15 +1523,21 @@ static void mt7530_phylink_validate(struct dsa_switch *ds, int port,
phylink_set_port_modes(mask);
phylink_set(mask, Autoneg);
- if (state->interface != PHY_INTERFACE_MODE_TRGMII) {
+ if (state->interface == PHY_INTERFACE_MODE_TRGMII) {
+ phylink_set(mask, 1000baseT_Full);
+ } else {
phylink_set(mask, 10baseT_Half);
phylink_set(mask, 10baseT_Full);
phylink_set(mask, 100baseT_Half);
phylink_set(mask, 100baseT_Full);
- phylink_set(mask, 1000baseT_Half);
- }
- phylink_set(mask, 1000baseT_Full);
+ if (state->interface != PHY_INTERFACE_MODE_MII) {
+ phylink_set(mask, 1000baseT_Half);
+ phylink_set(mask, 1000baseT_Full);
+ if (port == 5)
+ phylink_set(mask, 1000baseX_Full);
+ }
+ }
phylink_set(mask, Pause);
phylink_set(mask, Asym_Pause);
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index 107dd04acede..ccb9da8cad0d 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -186,6 +186,7 @@ enum mt7530_vlan_port_attr {
/* Register for port MAC control register */
#define MT7530_PMCR_P(x) (0x3000 + ((x) * 0x100))
#define PMCR_IFG_XMIT(x) (((x) & 0x3) << 18)
+#define PMCR_EXT_PHY BIT(17)
#define PMCR_MAC_MODE BIT(16)
#define PMCR_FORCE_MODE BIT(15)
#define PMCR_TX_EN BIT(14)
@@ -245,6 +246,7 @@ enum mt7530_vlan_port_attr {
/* Register for hw trap modification */
#define MT7530_MHWTRAP 0x7804
+#define MHWTRAP_PHY0_SEL BIT(20)
#define MHWTRAP_MANUAL BIT(16)
#define MHWTRAP_P5_MAC_SEL BIT(13)
#define MHWTRAP_P6_DIS BIT(8)
@@ -402,6 +404,30 @@ struct mt7530_port {
u16 pvid;
};
+/* Port 5 interface select definitions */
+enum p5_interface_select {
+ P5_DISABLED = 0,
+ P5_INTF_SEL_PHY_P0,
+ P5_INTF_SEL_PHY_P4,
+ P5_INTF_SEL_GMAC5,
+};
+
+static const char *p5_intf_modes(unsigned int p5_interface)
+{
+ switch (p5_interface) {
+ case P5_DISABLED:
+ return "DISABLED";
+ case P5_INTF_SEL_PHY_P0:
+ return "PHY P0";
+ case P5_INTF_SEL_PHY_P4:
+ return "PHY P4";
+ case P5_INTF_SEL_GMAC5:
+ return "GMAC5";
+ default:
+ return "unknown";
+ }
+}
+
/* struct mt7530_priv - This is the main data structure for holding the state
* of the driver
* @dev: The device pointer
@@ -418,6 +444,7 @@ struct mt7530_port {
* @reg_mutex: The lock for protecting among process accessing
* registers
* @p6_interface Holding the current port 6 interface
+ * @p5_intf_sel: Holding the current port 5 interface select
*/
struct mt7530_priv {
struct device *dev;
@@ -431,6 +458,8 @@ struct mt7530_priv {
unsigned int id;
bool mcm;
phy_interface_t p6_interface;
+ phy_interface_t p5_interface;
+ unsigned int p5_intf_sel;
struct mt7530_port ports[MT7530_NUM_PORTS];
/* protect among processes for registers access*/
--
2.20.1
^ permalink raw reply related
* [PATCH net-next v3 2/3] dt-bindings: net: dsa: mt7530: Add support for port 5
From: René van Dorst @ 2019-09-02 13:02 UTC (permalink / raw)
To: Sean Wang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
David S . Miller, Matthias Brugger
Cc: netdev, linux-arm-kernel, linux-mediatek, Russell King,
John Crispin, linux-mips, Frank Wunderlich, René van Dorst,
devicetree, Rob Herring
In-Reply-To: <20190902130226.26845-1-opensource@vdorst.com>
MT7530 port 5 has many modes/configurations.
Update the documentation how to use port 5.
Signed-off-by: René van Dorst <opensource@vdorst.com>
Cc: devicetree@vger.kernel.org
Cc: Rob Herring <robh@kernel.org>
---
v2->v3:
* Remove 'status = "okay";' lines, suggested by Rob Herring
v1->v2:
* Adding extra note about RGMII2 and gpio use.
rfc->v1:
* No change
.../devicetree/bindings/net/dsa/mt7530.txt | 214 ++++++++++++++++++
1 file changed, 214 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/dsa/mt7530.txt b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
index 47aa205ee0bd..c5ed5d25f642 100644
--- a/Documentation/devicetree/bindings/net/dsa/mt7530.txt
+++ b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
@@ -35,6 +35,42 @@ Required properties for the child nodes within ports container:
- phy-mode: String, must be either "trgmii" or "rgmii" for port labeled
"cpu".
+Port 5 of the switch is muxed between:
+1. GMAC5: GMAC5 can interface with another external MAC or PHY.
+2. PHY of port 0 or port 4: PHY interfaces with an external MAC like 2nd GMAC
+ of the SOC. Used in many setups where port 0/4 becomes the WAN port.
+ Note: On a MT7621 SOC with integrated switch: 2nd GMAC can only connected to
+ GMAC5 when the gpios for RGMII2 (GPIO 22-33) are not used and not
+ connected to external component!
+
+Port 5 modes/configurations:
+1. Port 5 is disabled and isolated: An external phy can interface to the 2nd
+ GMAC of the SOC.
+ In the case of a build-in MT7530 switch, port 5 shares the RGMII bus with 2nd
+ GMAC and an optional external phy. Mind the GPIO/pinctl settings of the SOC!
+2. Port 5 is muxed to PHY of port 0/4: Port 0/4 interfaces with 2nd GMAC.
+ It is a simple MAC to PHY interface, port 5 needs to be setup for xMII mode
+ and RGMII delay.
+3. Port 5 is muxed to GMAC5 and can interface to an external phy.
+ Port 5 becomes an extra switch port.
+ Only works on platform where external phy TX<->RX lines are swapped.
+ Like in the Ubiquiti ER-X-SFP.
+4. Port 5 is muxed to GMAC5 and interfaces with the 2nd GAMC as 2nd CPU port.
+ Currently a 2nd CPU port is not supported by DSA code.
+
+Depending on how the external PHY is wired:
+1. normal: The PHY can only connect to 2nd GMAC but not to the switch
+2. swapped: RGMII TX, RX are swapped; external phy interface with the switch as
+ a ethernet port. But can't interface to the 2nd GMAC.
+
+Based on the DT the port 5 mode is configured.
+
+Driver tries to lookup the phy-handle of the 2nd GMAC of the master device.
+When phy-handle matches PHY of port 0 or 4 then port 5 set-up as mode 2.
+phy-mode must be set, see also example 2 below!
+ * mt7621: phy-mode = "rgmii-txid";
+ * mt7623: phy-mode = "rgmii";
+
See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of additional
required, optional properties and how the integrated switch subnodes must
be specified.
@@ -94,3 +130,181 @@ Example:
};
};
};
+
+Example 2: MT7621: Port 4 is WAN port: 2nd GMAC -> Port 5 -> PHY port 4.
+
+ð {
+ gmac0: mac@0 {
+ compatible = "mediatek,eth-mac";
+ reg = <0>;
+ phy-mode = "rgmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
+
+ gmac1: mac@1 {
+ compatible = "mediatek,eth-mac";
+ reg = <1>;
+ phy-mode = "rgmii-txid";
+ phy-handle = <&phy4>;
+ };
+
+ mdio: mdio-bus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ /* Internal phy */
+ phy4: ethernet-phy@4 {
+ reg = <4>;
+ };
+
+ mt7530: switch@1f {
+ compatible = "mediatek,mt7621";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ reg = <0x1f>;
+ pinctrl-names = "default";
+ mediatek,mcm;
+
+ resets = <&rstctrl 2>;
+ reset-names = "mcm";
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan0";
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan1";
+ };
+
+ port@2 {
+ reg = <2>;
+ label = "lan2";
+ };
+
+ port@3 {
+ reg = <3>;
+ label = "lan3";
+ };
+
+/* Commented out. Port 4 is handled by 2nd GMAC.
+ port@4 {
+ reg = <4>;
+ label = "lan4";
+ };
+*/
+
+ cpu_port0: port@6 {
+ reg = <6>;
+ label = "cpu";
+ ethernet = <&gmac0>;
+ phy-mode = "rgmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
+ };
+ };
+ };
+};
+
+Example 3: MT7621: Port 5 is connected to external PHY: Port 5 -> external PHY.
+
+ð {
+ gmac0: mac@0 {
+ compatible = "mediatek,eth-mac";
+ reg = <0>;
+ phy-mode = "rgmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
+
+ mdio: mdio-bus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ /* External phy */
+ ephy5: ethernet-phy@7 {
+ reg = <7>;
+ };
+
+ mt7530: switch@1f {
+ compatible = "mediatek,mt7621";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ reg = <0x1f>;
+ pinctrl-names = "default";
+ mediatek,mcm;
+
+ resets = <&rstctrl 2>;
+ reset-names = "mcm";
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan0";
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan1";
+ };
+
+ port@2 {
+ reg = <2>;
+ label = "lan2";
+ };
+
+ port@3 {
+ reg = <3>;
+ label = "lan3";
+ };
+
+ port@4 {
+ reg = <4>;
+ label = "lan4";
+ };
+
+ port@5 {
+ reg = <5>;
+ label = "lan5";
+ phy-mode = "rgmii";
+ phy-handle = <&ephy5>;
+ };
+
+ cpu_port0: port@6 {
+ reg = <6>;
+ label = "cpu";
+ ethernet = <&gmac0>;
+ phy-mode = "rgmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
+ };
+ };
+ };
+};
--
2.20.1
^ permalink raw reply related
* [PATCH net-next v3 0/3] net: dsa: mt7530: Convert to PHYLINK and add support for port 5
From: René van Dorst @ 2019-09-02 13:02 UTC (permalink / raw)
To: Sean Wang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
David S . Miller, Matthias Brugger
Cc: netdev, linux-arm-kernel, linux-mediatek, Russell King,
John Crispin, linux-mips, Frank Wunderlich, René van Dorst
1. net: dsa: mt7530: Convert to PHYLINK API
This patch converts mt7530 to PHYLINK API.
2. dt-bindings: net: dsa: mt7530: Add support for port 5
3. net: dsa: mt7530: Add support for port 5
These 2 patches adding support for port 5 of the switch.
v2->v3:
* Removed 'status = "okay"' lines in patch #2
* Change a port 5 setup message in a debug message in patch #3
* Added ack-by and tested-by tags
v1->v2:
* Mostly phylink improvements after review.
rfc -> v1:
* Mostly phylink improvements after review.
* Drop phy isolation patches. Adds no value for now.
René van Dorst (3):
net: dsa: mt7530: Convert to PHYLINK API
dt-bindings: net: dsa: mt7530: Add support for port 5
net: dsa: mt7530: Add support for port 5
.../devicetree/bindings/net/dsa/mt7530.txt | 214 ++++++++++
drivers/net/dsa/mt7530.c | 371 +++++++++++++++---
drivers/net/dsa/mt7530.h | 61 ++-
3 files changed, 573 insertions(+), 73 deletions(-)
--
2.20.1
^ permalink raw reply
* Re: [PATCH v4] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Kalle Valo @ 2019-09-02 12:18 UTC (permalink / raw)
To: Tony Chuang
Cc: Jian-Hong Pan, David S . Miller, linux-wireless@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux@endlessm.com
In-Reply-To: <F7CD281DE3E379468C6D07993EA72F84D18AE2DA@RTITMBSVM04.realtek.com.tw>
Tony Chuang <yhchuang@realtek.com> writes:
>> From: Jian-Hong Pan
>> Subject: [PATCH v4] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
>>
>> There is a mass of jobs between spin lock and unlock in the hardware
>> IRQ which will occupy much time originally. To make system work more
>> efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
>> reduce the time in hardware IRQ.
>>
>> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
>
> Now it works fine with MSI interrupt enabled.
>
> But this patch is conflicting with MSI interrupt patch.
> Is there a better way we can make Kalle apply them more smoothly?
> I can rebase them and submit both if you're OK.
Yeah, submitting all the MSI patches in the same patchset is the easiest
approach. That way they apply cleanly to wireless-drivers-next.
--
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* [PATCH net-next] r8152: modify rtl8152_set_speed function
From: Hayes Wang @ 2019-09-02 11:52 UTC (permalink / raw)
To: netdev; +Cc: nic_swsd, linux-kernel, Hayes Wang
First, for AUTONEG_DISABLE, we only need to modify MII_BMCR.
Second, add advertising parameter for rtl8152_set_speed(). Add
RTL_ADVERTISED_xxx for advertising parameter of rtl8152_set_speed().
Then, the advertising settings from ethtool could be saved.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
drivers/net/usb/r8152.c | 196 +++++++++++++++++++++++++++-------------
1 file changed, 132 insertions(+), 64 deletions(-)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index c6fa0c17c13d..5d49d8dd93e1 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -757,6 +757,7 @@ struct r8152 {
u32 msg_enable;
u32 tx_qlen;
u32 coalesce;
+ u32 advertising;
u32 rx_buf_sz;
u32 rx_copybreak;
u32 rx_pending;
@@ -790,6 +791,13 @@ enum tx_csum_stat {
TX_CSUM_NONE
};
+#define RTL_ADVERTISED_10_HALF BIT(0)
+#define RTL_ADVERTISED_10_FULL BIT(1)
+#define RTL_ADVERTISED_100_HALF BIT(2)
+#define RTL_ADVERTISED_100_FULL BIT(3)
+#define RTL_ADVERTISED_1000_HALF BIT(4)
+#define RTL_ADVERTISED_1000_FULL BIT(5)
+
/* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
* The RTL chips use a 64 element hash table based on the Ethernet CRC.
*/
@@ -3801,90 +3809,117 @@ static void rtl8153b_disable(struct r8152 *tp)
r8153b_aldps_en(tp, true);
}
-static int rtl8152_set_speed(struct r8152 *tp, u8 autoneg, u16 speed, u8 duplex)
+static int rtl8152_set_speed(struct r8152 *tp, u8 autoneg, u32 speed, u8 duplex,
+ u32 advertising)
{
- u16 bmcr, anar, gbcr;
enum spd_duplex speed_duplex;
+ u16 bmcr;
int ret = 0;
- anar = r8152_mdio_read(tp, MII_ADVERTISE);
- anar &= ~(ADVERTISE_10HALF | ADVERTISE_10FULL |
- ADVERTISE_100HALF | ADVERTISE_100FULL);
- if (tp->mii.supports_gmii) {
- gbcr = r8152_mdio_read(tp, MII_CTRL1000);
- gbcr &= ~(ADVERTISE_1000FULL | ADVERTISE_1000HALF);
- } else {
- gbcr = 0;
- }
-
if (autoneg == AUTONEG_DISABLE) {
- if (speed == SPEED_10) {
- bmcr = 0;
- anar |= ADVERTISE_10HALF | ADVERTISE_10FULL;
- speed_duplex = FORCE_10M_HALF;
- } else if (speed == SPEED_100) {
- bmcr = BMCR_SPEED100;
- anar |= ADVERTISE_100HALF | ADVERTISE_100FULL;
- speed_duplex = FORCE_100M_HALF;
- } else if (speed == SPEED_1000 && tp->mii.supports_gmii) {
- bmcr = BMCR_SPEED1000;
- gbcr |= ADVERTISE_1000FULL | ADVERTISE_1000HALF;
- speed_duplex = NWAY_1000M_FULL;
- } else {
- ret = -EINVAL;
- goto out;
- }
+ if (duplex != DUPLEX_HALF && duplex != DUPLEX_FULL)
+ return -EINVAL;
- if (duplex == DUPLEX_FULL) {
- bmcr |= BMCR_FULLDPLX;
- if (speed != SPEED_1000)
- speed_duplex++;
- }
- } else {
- if (speed == SPEED_10) {
+ switch (speed) {
+ case SPEED_10:
+ bmcr = BMCR_SPEED10;
if (duplex == DUPLEX_FULL) {
- anar |= ADVERTISE_10HALF | ADVERTISE_10FULL;
- speed_duplex = NWAY_10M_FULL;
+ bmcr |= BMCR_FULLDPLX;
+ speed_duplex = FORCE_10M_FULL;
} else {
- anar |= ADVERTISE_10HALF;
- speed_duplex = NWAY_10M_HALF;
+ speed_duplex = FORCE_10M_HALF;
}
- } else if (speed == SPEED_100) {
+ break;
+ case SPEED_100:
+ bmcr = BMCR_SPEED100;
if (duplex == DUPLEX_FULL) {
- anar |= ADVERTISE_10HALF | ADVERTISE_10FULL;
- anar |= ADVERTISE_100HALF | ADVERTISE_100FULL;
- speed_duplex = NWAY_100M_FULL;
+ bmcr |= BMCR_FULLDPLX;
+ speed_duplex = FORCE_100M_FULL;
} else {
- anar |= ADVERTISE_10HALF;
- anar |= ADVERTISE_100HALF;
- speed_duplex = NWAY_100M_HALF;
+ speed_duplex = FORCE_100M_HALF;
}
- } else if (speed == SPEED_1000 && tp->mii.supports_gmii) {
- if (duplex == DUPLEX_FULL) {
- anar |= ADVERTISE_10HALF | ADVERTISE_10FULL;
- anar |= ADVERTISE_100HALF | ADVERTISE_100FULL;
- gbcr |= ADVERTISE_1000FULL | ADVERTISE_1000HALF;
- } else {
- anar |= ADVERTISE_10HALF;
- anar |= ADVERTISE_100HALF;
- gbcr |= ADVERTISE_1000HALF;
+ break;
+ case SPEED_1000:
+ if (tp->mii.supports_gmii) {
+ bmcr = BMCR_SPEED1000 | BMCR_FULLDPLX;
+ speed_duplex = NWAY_1000M_FULL;
+ break;
}
- speed_duplex = NWAY_1000M_FULL;
- } else {
+ /* fall through */
+ default:
ret = -EINVAL;
goto out;
}
+ if (duplex == DUPLEX_FULL)
+ tp->mii.full_duplex = 1;
+ else
+ tp->mii.full_duplex = 0;
+
+ tp->mii.force_media = 1;
+ } else {
+ u16 anar, tmp1;
+ u32 support;
+
+ support = RTL_ADVERTISED_10_HALF | RTL_ADVERTISED_10_FULL |
+ RTL_ADVERTISED_100_HALF | RTL_ADVERTISED_100_FULL;
+
+ if (tp->mii.supports_gmii)
+ support |= RTL_ADVERTISED_1000_FULL;
+
+ if (!(advertising & support))
+ return -EINVAL;
+
+ anar = r8152_mdio_read(tp, MII_ADVERTISE);
+ tmp1 = anar & ~(ADVERTISE_10HALF | ADVERTISE_10FULL |
+ ADVERTISE_100HALF | ADVERTISE_100FULL);
+ if (advertising & RTL_ADVERTISED_10_HALF) {
+ tmp1 |= ADVERTISE_10HALF;
+ speed_duplex = NWAY_10M_HALF;
+ }
+ if (advertising & RTL_ADVERTISED_10_FULL) {
+ tmp1 |= ADVERTISE_10FULL;
+ speed_duplex = NWAY_10M_FULL;
+ }
+
+ if (advertising & RTL_ADVERTISED_100_HALF) {
+ tmp1 |= ADVERTISE_100HALF;
+ speed_duplex = NWAY_100M_HALF;
+ }
+ if (advertising & RTL_ADVERTISED_100_FULL) {
+ tmp1 |= ADVERTISE_100FULL;
+ speed_duplex = NWAY_100M_FULL;
+ }
+
+ if (anar != tmp1) {
+ r8152_mdio_write(tp, MII_ADVERTISE, tmp1);
+ tp->mii.advertising = tmp1;
+ }
+
+ if (tp->mii.supports_gmii) {
+ u16 gbcr;
+
+ gbcr = r8152_mdio_read(tp, MII_CTRL1000);
+ tmp1 = gbcr & ~(ADVERTISE_1000FULL |
+ ADVERTISE_1000HALF);
+
+ if (advertising & RTL_ADVERTISED_1000_FULL) {
+ tmp1 |= ADVERTISE_1000FULL;
+ speed_duplex = NWAY_1000M_FULL;
+ }
+
+ if (gbcr != tmp1)
+ r8152_mdio_write(tp, MII_CTRL1000, tmp1);
+ }
+
bmcr = BMCR_ANENABLE | BMCR_ANRESTART;
+
+ tp->mii.force_media = 0;
}
if (test_and_clear_bit(PHY_RESET, &tp->flags))
bmcr |= BMCR_RESET;
- if (tp->mii.supports_gmii)
- r8152_mdio_write(tp, MII_CTRL1000, gbcr);
-
- r8152_mdio_write(tp, MII_ADVERTISE, anar);
r8152_mdio_write(tp, MII_BMCR, bmcr);
switch (tp->version) {
@@ -4122,7 +4157,8 @@ static void rtl_hw_phy_work_func_t(struct work_struct *work)
tp->rtl_ops.hw_phy_cfg(tp);
- rtl8152_set_speed(tp, tp->autoneg, tp->speed, tp->duplex);
+ rtl8152_set_speed(tp, tp->autoneg, tp->speed, tp->duplex,
+ tp->advertising);
mutex_unlock(&tp->control);
@@ -4841,20 +4877,46 @@ static int rtl8152_set_link_ksettings(struct net_device *dev,
const struct ethtool_link_ksettings *cmd)
{
struct r8152 *tp = netdev_priv(dev);
+ u32 advertising = 0;
int ret;
ret = usb_autopm_get_interface(tp->intf);
if (ret < 0)
goto out;
+ if (test_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
+ cmd->link_modes.advertising))
+ advertising |= RTL_ADVERTISED_10_HALF;
+
+ if (test_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT,
+ cmd->link_modes.advertising))
+ advertising |= RTL_ADVERTISED_10_FULL;
+
+ if (test_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT,
+ cmd->link_modes.advertising))
+ advertising |= RTL_ADVERTISED_100_HALF;
+
+ if (test_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT,
+ cmd->link_modes.advertising))
+ advertising |= RTL_ADVERTISED_100_FULL;
+
+ if (test_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
+ cmd->link_modes.advertising))
+ advertising |= RTL_ADVERTISED_1000_HALF;
+
+ if (test_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
+ cmd->link_modes.advertising))
+ advertising |= RTL_ADVERTISED_1000_FULL;
+
mutex_lock(&tp->control);
ret = rtl8152_set_speed(tp, cmd->base.autoneg, cmd->base.speed,
- cmd->base.duplex);
+ cmd->base.duplex, advertising);
if (!ret) {
tp->autoneg = cmd->base.autoneg;
tp->speed = cmd->base.speed;
tp->duplex = cmd->base.duplex;
+ tp->advertising = advertising;
}
mutex_unlock(&tp->control);
@@ -5569,7 +5631,13 @@ static int rtl8152_probe(struct usb_interface *intf,
tp->mii.phy_id = R8152_PHY_ID;
tp->autoneg = AUTONEG_ENABLE;
- tp->speed = tp->mii.supports_gmii ? SPEED_1000 : SPEED_100;
+ tp->speed = SPEED_100;
+ tp->advertising = RTL_ADVERTISED_10_HALF | RTL_ADVERTISED_10_FULL |
+ RTL_ADVERTISED_100_HALF | RTL_ADVERTISED_100_FULL;
+ if (tp->mii.supports_gmii) {
+ tp->speed = SPEED_1000;
+ tp->advertising |= RTL_ADVERTISED_1000_FULL;
+ }
tp->duplex = DUPLEX_FULL;
tp->rx_copybreak = RTL8152_RXFG_HEADSZ;
--
2.21.0
^ permalink raw reply related
* RE: [net-next 3/3] ravb: TROCR register is only present on R-Car Gen3
From: Yoshihiro Shimoda @ 2019-09-02 10:51 UTC (permalink / raw)
To: Simon Horman, David Miller, Sergei Shtylyov
Cc: Magnus Damm, netdev@vger.kernel.org,
linux-renesas-soc@vger.kernel.org
In-Reply-To: <20190902080603.5636-4-horms+renesas@verge.net.au>
Hi Simon-san,
> From: Simon Horman, Sent: Monday, September 2, 2019 5:06 PM
>
> Only use the TROCR register on R-Car Gen3.
> It is not present on other SoCs.
>
> Offsets used for the undocumented registers are also considered reserved
> and should not be written to.
>
> After some internal investigation with Renesas it remains unclear why this
> driver accesses these fields on R-Car Gen2 but regardless of what the
> historical reasons are the current code is considered incorrect.
>
> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Thank you for the patch!
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Best regards,
Yoshihiro Shimoda
^ permalink raw reply
* RE: [net-next 2/3] ravb: Remove undocumented processing
From: Yoshihiro Shimoda @ 2019-09-02 10:49 UTC (permalink / raw)
To: Simon Horman, David Miller, Sergei Shtylyov
Cc: Magnus Damm, netdev@vger.kernel.org,
linux-renesas-soc@vger.kernel.org, Kazuya Mizuguchi
In-Reply-To: <20190902080603.5636-3-horms+renesas@verge.net.au>
Hi Simon-san,
> From: Simon Horman, Sent: Monday, September 2, 2019 5:06 PM
>
> From: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
>
> This patch removes the use of the undocumented registers
> CDCR, LCCR, CERCR, CEECR and the undocumented BOC bit of the CCC register.
>
> Current documentation for EtherAVB (ravb) describes the offset of
> what the driver uses as the BOC bit as reserved and that only a value of
> 0 should be written. Furthermore, the offsets used for the undocumented
> registers are also considered reserved nd should not be written to.
>
> After some internal investigation with Renesas it remains unclear
> why this driver accesses these fields but regardless of what the historical
> reasons are the current code is considered incorrect.
>
> Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Thank you for the patch!
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Best regards,
Yoshihiro Shimoda
^ permalink raw reply
* RE: [net-next 1/3] ravb: correct typo in FBP field of SFO register
From: Yoshihiro Shimoda @ 2019-09-02 10:47 UTC (permalink / raw)
To: Simon Horman, David Miller, Sergei Shtylyov
Cc: Magnus Damm, netdev@vger.kernel.org,
linux-renesas-soc@vger.kernel.org, Kazuya Mizuguchi
In-Reply-To: <20190902080603.5636-2-horms+renesas@verge.net.au>
Hi Simon-san,
Thank you for the patch!
> From: Simon Horman, Sent: Monday, September 2, 2019 5:06 PM
>
> From: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
>
> The field name is FBP rather than FPB.
>
> This field is unused and could equally be removed from the driver entirely.
> But there seems no harm in leaving as documentation of the presence of the
> field.
>
> Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
> ---
> v0 - Kazuya Mizuguchi
>
> v1 - Simon Horman
> * Extracted from larger patch
> * Wrote changelog
>
> diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
> index ac9195add811..bdb051f04b0c 100644
> --- a/drivers/net/ethernet/renesas/ravb.h
> +++ b/drivers/net/ethernet/renesas/ravb.h
> @@ -317,7 +312,7 @@ enum UFCD_BIT {
>
> /* SFO */
> enum SFO_BIT {
> - SFO_FPB = 0x0000003F,
> + SFO_FBP = 0x0000003F,
> };
>
> /* RTC */
> ---
> drivers/net/ethernet/renesas/ravb.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
This patch has two same diff. After removed either one of them,
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Best regards,
Yoshihiro Shimoda
> index ac9195add811..2596a95a4300 100644
> --- a/drivers/net/ethernet/renesas/ravb.h
> +++ b/drivers/net/ethernet/renesas/ravb.h
> @@ -317,7 +317,7 @@ enum UFCD_BIT {
>
> /* SFO */
> enum SFO_BIT {
> - SFO_FPB = 0x0000003F,
> + SFO_FBP = 0x0000003F,
> };
>
> /* RTC */
> --
> 2.11.0
^ permalink raw reply
* Re: [PATCH net-next 05/13] net: stmmac: selftests: Add selftest for L3/L4 Filters
From: kbuild test robot @ 2019-09-02 10:35 UTC (permalink / raw)
To: Jose Abreu
Cc: kbuild-all, netdev, Joao Pinto, Jose Abreu, Giuseppe Cavallaro,
Alexandre Torgue, David S. Miller, Maxime Coquelin, linux-stm32,
linux-arm-kernel, linux-kernel
In-Reply-To: <f2b536e9370a448a40fc411712e717d4c605a3f5.1567410970.git.joabreu@synopsys.com>
[-- Attachment #1: Type: text/plain, Size: 4087 bytes --]
Hi Jose,
I love your patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Jose-Abreu/net-stmmac-Improvements-for-next/20190902-160927
config: mips-allmodconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 7.4.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=mips
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c: In function '__stmmac_test_l3filt':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c:1249:1: warning: the frame size of 1280 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^
drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c: In function '__stmmac_test_l4filt':
drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c:1362:1: warning: the frame size of 1280 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^
vim +1249 drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c
1170
1171 #ifdef CONFIG_NET_CLS_ACT
1172 static int __stmmac_test_l3filt(struct stmmac_priv *priv, u32 dst, u32 src,
1173 u32 dst_mask, u32 src_mask)
1174 {
1175 struct flow_dissector_key_ipv4_addrs key, mask;
1176 unsigned long dummy_cookie = 0xdeadbeef;
1177 struct flow_dissector dissector = { };
1178 struct stmmac_packet_attrs attr = { };
1179 struct flow_cls_offload cls = { };
1180 struct flow_rule *rule;
1181 int ret;
1182
1183 if (!tc_can_offload(priv->dev))
1184 return -EOPNOTSUPP;
1185 if (!priv->dma_cap.l3l4fnum)
1186 return -EOPNOTSUPP;
1187 if (priv->rss.enable) {
1188 struct stmmac_rss rss = { .enable = false, };
1189
1190 stmmac_rss_configure(priv, priv->hw, &rss,
1191 priv->plat->rx_queues_to_use);
1192 }
1193
1194 dissector.used_keys |= (1 << FLOW_DISSECTOR_KEY_IPV4_ADDRS);
1195 dissector.offset[FLOW_DISSECTOR_KEY_IPV4_ADDRS] = 0;
1196
1197 cls.common.chain_index = 0;
1198 cls.command = FLOW_CLS_REPLACE;
1199 cls.cookie = dummy_cookie;
1200
1201 rule = kzalloc(struct_size(rule, action.entries, 1), GFP_KERNEL);
1202 if (!rule) {
1203 ret = -ENOMEM;
1204 goto cleanup_rss;
1205 }
1206
1207 rule->match.dissector = &dissector;
1208 rule->match.key = (void *)&key;
1209 rule->match.mask = (void *)&mask;
1210
1211 key.src = htonl(src);
1212 key.dst = htonl(dst);
1213 mask.src = src_mask;
1214 mask.dst = dst_mask;
1215
1216 cls.rule = rule;
1217
1218 rule->action.entries[0].id = FLOW_ACTION_DROP;
1219 rule->action.num_entries = 1;
1220
1221 attr.dst = priv->dev->dev_addr;
1222 attr.ip_dst = dst;
1223 attr.ip_src = src;
1224
1225 /* Shall receive packet */
1226 ret = __stmmac_test_loopback(priv, &attr);
1227 if (ret)
1228 goto cleanup_rule;
1229
1230 ret = stmmac_tc_setup_cls(priv, priv, &cls);
1231 if (ret)
1232 goto cleanup_rule;
1233
1234 /* Shall NOT receive packet */
1235 ret = __stmmac_test_loopback(priv, &attr);
1236 ret = ret ? 0 : -EINVAL;
1237
1238 cls.command = FLOW_CLS_DESTROY;
1239 stmmac_tc_setup_cls(priv, priv, &cls);
1240 cleanup_rule:
1241 kfree(rule);
1242 cleanup_rss:
1243 if (priv->rss.enable) {
1244 stmmac_rss_configure(priv, priv->hw, &priv->rss,
1245 priv->plat->rx_queues_to_use);
1246 }
1247
1248 return ret;
> 1249 }
1250 #else
1251 static int __stmmac_test_l3filt(struct stmmac_priv *priv, u32 dst, u32 src,
1252 u32 dst_mask, u32 src_mask)
1253 {
1254 return -EOPNOTSUPP;
1255 }
1256 #endif
1257
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 61542 bytes --]
^ permalink raw reply
* [PATCH net] rxrpc: Fix misplaced traceline
From: David Howells @ 2019-09-02 10:34 UTC (permalink / raw)
To: netdev; +Cc: Hillf Danton, dhowells, linux-afs, linux-kernel
There's a misplaced traceline in rxrpc_input_packet() which is looking at a
packet that just got released rather than the replacement packet.
Fix this by moving the traceline after the assignment that moves the new
packet pointer to the actual packet pointer.
Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index d122c53c8697..157be1ff8697 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1262,8 +1262,8 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
if (nskb != skb) {
rxrpc_eaten_skb(skb, rxrpc_skb_received);
- rxrpc_new_skb(skb, rxrpc_skb_unshared);
skb = nskb;
+ rxrpc_new_skb(skb, rxrpc_skb_unshared);
sp = rxrpc_skb(skb);
}
}
^ permalink raw reply related
* [PATCH net-next v2 1/3] dpaa2-eth: Minor refactoring in ethtool stats
From: Ioana Radulescu @ 2019-09-02 10:23 UTC (permalink / raw)
To: netdev, davem; +Cc: ioana.ciornei
In-Reply-To: <1567419799-28179-1-git-send-email-ruxandra.radulescu@nxp.com>
As we prepare to read more pages from the DPNI stat counters,
reorganize the code a bit to make it easier to extend.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
---
v2: no changes
drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
index 93076fe..1c5b54b 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
@@ -188,6 +188,11 @@ static void dpaa2_eth_get_ethtool_stats(struct net_device *net_dev,
struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
struct dpaa2_eth_drv_stats *extras;
struct dpaa2_eth_ch_stats *ch_stats;
+ int dpni_stats_page_size[DPNI_STATISTICS_CNT] = {
+ sizeof(dpni_stats.page_0),
+ sizeof(dpni_stats.page_1),
+ sizeof(dpni_stats.page_2),
+ };
memset(data, 0,
sizeof(u64) * (DPAA2_ETH_NUM_STATS + DPAA2_ETH_NUM_EXTRA_STATS));
@@ -198,17 +203,8 @@ static void dpaa2_eth_get_ethtool_stats(struct net_device *net_dev,
j, &dpni_stats);
if (err != 0)
netdev_warn(net_dev, "dpni_get_stats(%d) failed\n", j);
- switch (j) {
- case 0:
- num_cnt = sizeof(dpni_stats.page_0) / sizeof(u64);
- break;
- case 1:
- num_cnt = sizeof(dpni_stats.page_1) / sizeof(u64);
- break;
- case 2:
- num_cnt = sizeof(dpni_stats.page_2) / sizeof(u64);
- break;
- }
+
+ num_cnt = dpni_stats_page_size[j] / sizeof(u64);
for (k = 0; k < num_cnt; k++)
*(data + i++) = dpni_stats.raw.counter[k];
}
--
2.7.4
^ permalink raw reply related
* [PATCH net-next v2 3/3] dpaa2-eth: Poll Tx pending frames counter on if down
From: Ioana Radulescu @ 2019-09-02 10:23 UTC (permalink / raw)
To: netdev, davem; +Cc: ioana.ciornei
In-Reply-To: <1567419799-28179-1-git-send-email-ruxandra.radulescu@nxp.com>
Starting with firmware version MC10.18.0, a new counter for in flight
Tx frames is offered. Use it when bringing down the interface to
determine when all pending Tx frames have been processed by hardware
instead of sleeping a fixed amount of time.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
---
v2: no changes
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 31 +++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 5402867..162d7d8 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -1348,7 +1348,7 @@ static u32 ingress_fq_count(struct dpaa2_eth_priv *priv)
return total;
}
-static void wait_for_fq_empty(struct dpaa2_eth_priv *priv)
+static void wait_for_ingress_fq_empty(struct dpaa2_eth_priv *priv)
{
int retries = 10;
u32 pending;
@@ -1360,6 +1360,31 @@ static void wait_for_fq_empty(struct dpaa2_eth_priv *priv)
} while (pending && --retries);
}
+#define DPNI_TX_PENDING_VER_MAJOR 7
+#define DPNI_TX_PENDING_VER_MINOR 13
+static void wait_for_egress_fq_empty(struct dpaa2_eth_priv *priv)
+{
+ union dpni_statistics stats;
+ int retries = 10;
+ int err;
+
+ if (dpaa2_eth_cmp_dpni_ver(priv, DPNI_TX_PENDING_VER_MAJOR,
+ DPNI_TX_PENDING_VER_MINOR) < 0)
+ goto out;
+
+ do {
+ err = dpni_get_statistics(priv->mc_io, 0, priv->mc_token, 6,
+ &stats);
+ if (err)
+ goto out;
+ if (stats.page_6.tx_pending_frames == 0)
+ return;
+ } while (--retries);
+
+out:
+ msleep(500);
+}
+
static int dpaa2_eth_stop(struct net_device *net_dev)
{
struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
@@ -1379,7 +1404,7 @@ static int dpaa2_eth_stop(struct net_device *net_dev)
* on WRIOP. After it finishes, wait until all remaining frames on Rx
* and Tx conf queues are consumed on NAPI poll.
*/
- msleep(500);
+ wait_for_egress_fq_empty(priv);
do {
dpni_disable(priv->mc_io, 0, priv->mc_token);
@@ -1395,7 +1420,7 @@ static int dpaa2_eth_stop(struct net_device *net_dev)
*/
}
- wait_for_fq_empty(priv);
+ wait_for_ingress_fq_empty(priv);
disable_ch_napi(priv);
/* Empty the buffer pool */
--
2.7.4
^ permalink raw reply related
* [PATCH net-next v2 2/3] dpaa2-eth: Add new DPNI statistics counters
From: Ioana Radulescu @ 2019-09-02 10:23 UTC (permalink / raw)
To: netdev, davem; +Cc: ioana.ciornei
In-Reply-To: <1567419799-28179-1-git-send-email-ruxandra.radulescu@nxp.com>
Recent firmware versions expose more DPNI counters.
Export relevant ones via ethtool -S.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
---
v2: treat separately error case for unsupported statistics pages
.../net/ethernet/freescale/dpaa2/dpaa2-ethtool.c | 19 ++++++++--
drivers/net/ethernet/freescale/dpaa2/dpni.c | 2 +-
drivers/net/ethernet/freescale/dpaa2/dpni.h | 40 ++++++++++++++++++++++
3 files changed, 58 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
index 1c5b54b..0aa1c34 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
@@ -28,6 +28,11 @@ static char dpaa2_ethtool_stats[][ETH_GSTRING_LEN] = {
"[hw] rx nobuffer discards",
"[hw] tx discarded frames",
"[hw] tx confirmed frames",
+ "[hw] tx dequeued bytes",
+ "[hw] tx dequeued frames",
+ "[hw] tx rejected bytes",
+ "[hw] tx rejected frames",
+ "[hw] tx pending frames",
};
#define DPAA2_ETH_NUM_STATS ARRAY_SIZE(dpaa2_ethtool_stats)
@@ -192,16 +197,26 @@ static void dpaa2_eth_get_ethtool_stats(struct net_device *net_dev,
sizeof(dpni_stats.page_0),
sizeof(dpni_stats.page_1),
sizeof(dpni_stats.page_2),
+ sizeof(dpni_stats.page_3),
+ sizeof(dpni_stats.page_4),
+ sizeof(dpni_stats.page_5),
+ sizeof(dpni_stats.page_6),
};
memset(data, 0,
sizeof(u64) * (DPAA2_ETH_NUM_STATS + DPAA2_ETH_NUM_EXTRA_STATS));
/* Print standard counters, from DPNI statistics */
- for (j = 0; j <= 2; j++) {
+ for (j = 0; j <= 6; j++) {
+ /* We're not interested in pages 4 & 5 for now */
+ if (j == 4 || j == 5)
+ continue;
err = dpni_get_statistics(priv->mc_io, 0, priv->mc_token,
j, &dpni_stats);
- if (err != 0)
+ if (err == -EINVAL)
+ /* Older firmware versions don't support all pages */
+ memset(&dpni_stats, 0, sizeof(dpni_stats));
+ else
netdev_warn(net_dev, "dpni_get_stats(%d) failed\n", j);
num_cnt = dpni_stats_page_size[j] / sizeof(u64);
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpni.c b/drivers/net/ethernet/freescale/dpaa2/dpni.c
index 05e3089..dd54e69 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpni.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpni.c
@@ -1470,7 +1470,7 @@ int dpni_get_queue(struct fsl_mc_io *mc_io,
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPNI object
* @page: Selects the statistics page to retrieve, see
- * DPNI_GET_STATISTICS output. Pages are numbered 0 to 2.
+ * DPNI_GET_STATISTICS output. Pages are numbered 0 to 6.
* @stat: Structure containing the statistics
*
* Return: '0' on Success; Error code otherwise.
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpni.h b/drivers/net/ethernet/freescale/dpaa2/dpni.h
index 3e8fc6c..fd583911 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpni.h
+++ b/drivers/net/ethernet/freescale/dpaa2/dpni.h
@@ -416,6 +416,26 @@ int dpni_get_tx_data_offset(struct fsl_mc_io *mc_io,
* lack of buffers
* @page_2.egress_discarded_frames: Egress discarded frame count
* @page_2.egress_confirmed_frames: Egress confirmed frame count
+ * @page3: Page_3 statistics structure
+ * @page_3.egress_dequeue_bytes: Cumulative count of the number of bytes
+ * dequeued from egress FQs
+ * @page_3.egress_dequeue_frames: Cumulative count of the number of frames
+ * dequeued from egress FQs
+ * @page_3.egress_reject_bytes: Cumulative count of the number of bytes in
+ * egress frames whose enqueue was rejected
+ * @page_3.egress_reject_frames: Cumulative count of the number of egress
+ * frames whose enqueue was rejected
+ * @page_4: Page_4 statistics structure: congestion points
+ * @page_4.cgr_reject_frames: number of rejected frames due to congestion point
+ * @page_4.cgr_reject_bytes: number of rejected bytes due to congestion point
+ * @page_5: Page_5 statistics structure: policer
+ * @page_5.policer_cnt_red: NUmber of red colored frames
+ * @page_5.policer_cnt_yellow: number of yellow colored frames
+ * @page_5.policer_cnt_green: number of green colored frames
+ * @page_5.policer_cnt_re_red: number of recolored red frames
+ * @page_5.policer_cnt_re_yellow: number of recolored yellow frames
+ * @page_6: Page_6 statistics structure
+ * @page_6.tx_pending_frames: total number of frames pending in egress FQs
* @raw: raw statistics structure, used to index counters
*/
union dpni_statistics {
@@ -443,6 +463,26 @@ union dpni_statistics {
u64 egress_confirmed_frames;
} page_2;
struct {
+ u64 egress_dequeue_bytes;
+ u64 egress_dequeue_frames;
+ u64 egress_reject_bytes;
+ u64 egress_reject_frames;
+ } page_3;
+ struct {
+ u64 cgr_reject_frames;
+ u64 cgr_reject_bytes;
+ } page_4;
+ struct {
+ u64 policer_cnt_red;
+ u64 policer_cnt_yellow;
+ u64 policer_cnt_green;
+ u64 policer_cnt_re_red;
+ u64 policer_cnt_re_yellow;
+ } page_5;
+ struct {
+ u64 tx_pending_frames;
+ } page_6;
+ struct {
u64 counter[DPNI_STATISTICS_CNT];
} raw;
};
--
2.7.4
^ permalink raw reply related
* [PATCH net-next v2 0/3] dpaa2-eth: Add new statistics counters
From: Ioana Radulescu @ 2019-09-02 10:23 UTC (permalink / raw)
To: netdev, davem; +Cc: ioana.ciornei
Recent firmware versions offer access to more DPNI statistics
counters. Add the relevant ones to ethtool interface stats.
Also we can now make use of a new counter for in flight egress frames
to avoid sleeping an arbitrary amount of time in the ndo_stop routine.
v2: in patch 2/3, treat separately the error case for unsupported
statistics pages
Ioana Radulescu (3):
dpaa2-eth: Minor refactoring in ethtool stats
dpaa2-eth: Add new DPNI statistics counters
dpaa2-eth: Poll Tx pending frames counter on if down
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 31 +++++++++++++++--
.../net/ethernet/freescale/dpaa2/dpaa2-ethtool.c | 37 +++++++++++++-------
drivers/net/ethernet/freescale/dpaa2/dpni.c | 2 +-
drivers/net/ethernet/freescale/dpaa2/dpni.h | 40 ++++++++++++++++++++++
4 files changed, 93 insertions(+), 17 deletions(-)
--
2.7.4
^ permalink raw reply
* [PATCH net-next 2/2] mvpp2: percpu buffers
From: Matteo Croce @ 2019-09-02 10:21 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, Maxime Chevallier, Marcin Wojtas, Antoine Tenart,
Stefan Chulski, Nadav Haklai, Lorenzo Bianconi, David S. Miller
In-Reply-To: <20190902102137.841-1-mcroce@redhat.com>
Every mvpp2 unit can use up to 8 buffers mapped by the BM (the HW buffer
manager). The HW will place the frames in the buffer pool depending on the
frame size: short (< 128 bytes), long (< 1664) or jumbo (up to 9856).
As any unit can have up to 4 ports, the driver allocates only 2 pools,
one for small and one long frames, and share them between ports.
When the first port MTU is set higher than 1664 bytes, a third pool is
allocated for jumbo frames.
This shared allocation makes impossible to use percpu allocators,
and creates contention between HW queues.
If possible, i.e. if the number of possible CPU are less than 8 and jumbo
frames are not used, switch to a new scheme: allocate 8 per-cpu pools for
short and long frames and bind every pool to an RXQ.
When the first port MTU is set higher than 1664 bytes, the allocation
scheme is reverted to the old behaviour (3 shared pools), and when all
ports MTU are lowered, the per-cpu buffers are allocated again.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 4 +
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 241 ++++++++++++++++--
2 files changed, 222 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 4d9564ba68f6..c89dd7169e3c 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -683,6 +683,7 @@ enum mvpp2_prs_l3_cast {
#define MVPP2_BM_SHORT_BUF_NUM 2048
#define MVPP2_BM_POOL_SIZE_MAX (16*1024 - MVPP2_BM_POOL_PTR_ALIGN/4)
#define MVPP2_BM_POOL_PTR_ALIGN 128
+#define MVPP2_BM_MAX_POOLS 8
/* BM cookie (32 bits) definition */
#define MVPP2_BM_COOKIE_POOL_OFFS 8
@@ -787,6 +788,9 @@ struct mvpp2 {
/* Aggregated TXQs */
struct mvpp2_tx_queue *aggr_txqs;
+ /* Are we using page_pool with per-cpu pools? */
+ int percpu_pools;
+
/* BM pools */
struct mvpp2_bm_pool *bm_pools;
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 871f14cc7284..637d9269d4d3 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -292,6 +292,26 @@ static void mvpp2_txq_inc_put(struct mvpp2_port *port,
txq_pcpu->txq_put_index = 0;
}
+/* Get number of maximum RXQ */
+static int mvpp2_get_nrxqs(struct mvpp2 *priv)
+{
+ unsigned int nrxqs;
+
+ if (priv->hw_version == MVPP22 && queue_mode == MVPP2_QDIST_SINGLE_MODE)
+ return 1;
+
+ /* According to the PPv2.2 datasheet and our experiments on
+ * PPv2.1, RX queues have an allocation granularity of 4 (when
+ * more than a single one on PPv2.2).
+ * Round up to nearest multiple of 4.
+ */
+ nrxqs = (num_possible_cpus() + 3) & ~0x3;
+ if (nrxqs > MVPP2_PORT_MAX_RXQ)
+ nrxqs = MVPP2_PORT_MAX_RXQ;
+
+ return nrxqs;
+}
+
/* Get number of physical egress port */
static inline int mvpp2_egress_port(struct mvpp2_port *port)
{
@@ -496,12 +516,15 @@ static int mvpp2_bm_pool_destroy(struct device *dev, struct mvpp2 *priv,
static int mvpp2_bm_pools_init(struct device *dev, struct mvpp2 *priv)
{
- int i, err, size;
+ int i, err, size, poolnum = MVPP2_BM_POOLS_NUM;
struct mvpp2_bm_pool *bm_pool;
+ if (priv->percpu_pools)
+ poolnum = mvpp2_get_nrxqs(priv) * 2;
+
/* Create all pools with maximum size */
size = MVPP2_BM_POOL_SIZE_MAX;
- for (i = 0; i < MVPP2_BM_POOLS_NUM; i++) {
+ for (i = 0; i < poolnum; i++) {
bm_pool = &priv->bm_pools[i];
bm_pool->id = i;
err = mvpp2_bm_pool_create(dev, priv, bm_pool, size);
@@ -520,9 +543,15 @@ static int mvpp2_bm_pools_init(struct device *dev, struct mvpp2 *priv)
static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
{
- int i, err;
+ int i, err, poolnum = MVPP2_BM_POOLS_NUM;
- for (i = 0; i < MVPP2_BM_POOLS_NUM; i++) {
+ if (priv->percpu_pools)
+ poolnum = mvpp2_get_nrxqs(priv) * 2;
+
+ dev_info(dev, "using %d %s buffers\n", poolnum,
+ priv->percpu_pools ? "per-cpu" : "shared");
+
+ for (i = 0; i < poolnum; i++) {
/* Mask BM all interrupts */
mvpp2_write(priv, MVPP2_BM_INTR_MASK_REG(i), 0);
/* Clear BM cause register */
@@ -530,7 +559,7 @@ static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
}
/* Allocate and initialize BM pools */
- priv->bm_pools = devm_kcalloc(dev, MVPP2_BM_POOLS_NUM,
+ priv->bm_pools = devm_kcalloc(dev, poolnum,
sizeof(*priv->bm_pools), GFP_KERNEL);
if (!priv->bm_pools)
return -ENOMEM;
@@ -676,6 +705,13 @@ static int mvpp2_bm_bufs_add(struct mvpp2_port *port,
phys_addr_t phys_addr;
void *buf;
+ if (port->priv->percpu_pools &&
+ bm_pool->pkt_size > MVPP2_BM_LONG_PKT_SIZE) {
+ netdev_err(port->dev,
+ "attempted to use jumbo frames with per-cpu pools");
+ return 0;
+ }
+
buf_size = MVPP2_RX_BUF_SIZE(bm_pool->pkt_size);
total_size = MVPP2_RX_TOTAL_SIZE(buf_size);
@@ -719,7 +755,64 @@ mvpp2_bm_pool_use(struct mvpp2_port *port, unsigned pool, int pkt_size)
struct mvpp2_bm_pool *new_pool = &port->priv->bm_pools[pool];
int num;
- if (pool >= MVPP2_BM_POOLS_NUM) {
+ if ((port->priv->percpu_pools && pool > mvpp2_get_nrxqs(port->priv) * 2) ||
+ (!port->priv->percpu_pools && pool >= MVPP2_BM_POOLS_NUM)) {
+ netdev_err(port->dev, "Invalid pool %d\n", pool);
+ return NULL;
+ }
+
+ /* Allocate buffers in case BM pool is used as long pool, but packet
+ * size doesn't match MTU or BM pool hasn't being used yet
+ */
+ if (new_pool->pkt_size == 0) {
+ int pkts_num;
+
+ /* Set default buffer number or free all the buffers in case
+ * the pool is not empty
+ */
+ pkts_num = new_pool->buf_num;
+ if (pkts_num == 0) {
+ if (port->priv->percpu_pools) {
+ if (pool < port->nrxqs)
+ pkts_num = mvpp2_pools[MVPP2_BM_SHORT].buf_num;
+ else
+ pkts_num = mvpp2_pools[MVPP2_BM_LONG].buf_num;
+ } else {
+ pkts_num = mvpp2_pools[pool].buf_num;
+ }
+ } else {
+ mvpp2_bm_bufs_free(port->dev->dev.parent,
+ port->priv, new_pool, pkts_num);
+ }
+
+ new_pool->pkt_size = pkt_size;
+ new_pool->frag_size =
+ SKB_DATA_ALIGN(MVPP2_RX_BUF_SIZE(pkt_size)) +
+ MVPP2_SKB_SHINFO_SIZE;
+
+ /* Allocate buffers for this pool */
+ num = mvpp2_bm_bufs_add(port, new_pool, pkts_num);
+ if (num != pkts_num) {
+ WARN(1, "pool %d: %d of %d allocated\n",
+ new_pool->id, num, pkts_num);
+ return NULL;
+ }
+ }
+
+ mvpp2_bm_pool_bufsize_set(port->priv, new_pool,
+ MVPP2_RX_BUF_SIZE(new_pool->pkt_size));
+
+ return new_pool;
+}
+
+static struct mvpp2_bm_pool *
+mvpp2_bm_pool_use_percpu(struct mvpp2_port *port, int type,
+ unsigned int pool, int pkt_size)
+{
+ struct mvpp2_bm_pool *new_pool = &port->priv->bm_pools[pool];
+ int num;
+
+ if (pool > port->nrxqs * 2) {
netdev_err(port->dev, "Invalid pool %d\n", pool);
return NULL;
}
@@ -735,7 +828,7 @@ mvpp2_bm_pool_use(struct mvpp2_port *port, unsigned pool, int pkt_size)
*/
pkts_num = new_pool->buf_num;
if (pkts_num == 0)
- pkts_num = mvpp2_pools[pool].buf_num;
+ pkts_num = mvpp2_pools[type].buf_num;
else
mvpp2_bm_bufs_free(port->dev->dev.parent,
port->priv, new_pool, pkts_num);
@@ -760,11 +853,11 @@ mvpp2_bm_pool_use(struct mvpp2_port *port, unsigned pool, int pkt_size)
return new_pool;
}
-/* Initialize pools for swf */
-static int mvpp2_swf_bm_pool_init(struct mvpp2_port *port)
+/* Initialize pools for swf, shared buffers variant */
+static int mvpp2_swf_bm_pool_init_shared(struct mvpp2_port *port)
{
- int rxq;
enum mvpp2_bm_pool_log_num long_log_pool, short_log_pool;
+ int rxq;
/* If port pkt_size is higher than 1518B:
* HW Long pool - SW Jumbo pool, HW Short pool - SW Long pool
@@ -808,6 +901,47 @@ static int mvpp2_swf_bm_pool_init(struct mvpp2_port *port)
return 0;
}
+/* Initialize pools for swf, percpu buffers variant */
+static int mvpp2_swf_bm_pool_init_percpu(struct mvpp2_port *port)
+{
+ struct mvpp2_bm_pool *p;
+ int i;
+
+ for (i = 0; i < port->nrxqs; i++) {
+ p = mvpp2_bm_pool_use_percpu(port, MVPP2_BM_SHORT, i,
+ mvpp2_pools[MVPP2_BM_SHORT].pkt_size);
+ if (!p)
+ return -ENOMEM;
+
+ port->priv->bm_pools[i].port_map |= BIT(port->id);
+ mvpp2_rxq_short_pool_set(port, i, port->priv->bm_pools[i].id);
+ }
+
+ for (i = 0; i < port->nrxqs; i++) {
+ p = mvpp2_bm_pool_use_percpu(port, MVPP2_BM_LONG, i + port->nrxqs,
+ mvpp2_pools[MVPP2_BM_LONG].pkt_size);
+ if (!p)
+ return -ENOMEM;
+
+ port->priv->bm_pools[i + port->nrxqs].port_map |= BIT(port->id);
+ mvpp2_rxq_long_pool_set(port, i,
+ port->priv->bm_pools[i + port->nrxqs].id);
+ }
+
+ port->pool_long = NULL;
+ port->pool_short = NULL;
+
+ return 0;
+}
+
+static int mvpp2_swf_bm_pool_init(struct mvpp2_port *port)
+{
+ if (port->priv->percpu_pools)
+ return mvpp2_swf_bm_pool_init_percpu(port);
+ else
+ return mvpp2_swf_bm_pool_init_shared(port);
+}
+
static void mvpp2_set_hw_csum(struct mvpp2_port *port,
enum mvpp2_bm_pool_log_num new_long_pool)
{
@@ -834,6 +968,9 @@ static int mvpp2_bm_update_mtu(struct net_device *dev, int mtu)
enum mvpp2_bm_pool_log_num new_long_pool;
int pkt_size = MVPP2_RX_PKT_SIZE(mtu);
+ if (port->priv->percpu_pools)
+ goto out_set;
+
/* If port MTU is higher than 1518B:
* HW Long pool - SW Jumbo pool, HW Short pool - SW Long pool
* else: HW Long pool - SW Long pool, HW Short pool - SW Short pool
@@ -863,6 +1000,7 @@ static int mvpp2_bm_update_mtu(struct net_device *dev, int mtu)
mvpp2_set_hw_csum(port, new_long_pool);
}
+out_set:
dev->mtu = mtu;
dev->wanted_features = dev->features;
@@ -3706,10 +3844,48 @@ static int mvpp2_set_mac_address(struct net_device *dev, void *p)
return err;
}
+/* Shut down all the ports, reconfigure the pools as percpu or shared,
+ * then bring up again all ports.
+ */
+static int mvpp2_bm_switch_buffers(struct mvpp2 *priv, bool percpu)
+{
+ int numbufs = MVPP2_BM_POOLS_NUM, i;
+ struct mvpp2_port *port = NULL;
+ bool status[MVPP2_MAX_PORTS];
+
+ for (i = 0; i < priv->port_count; i++) {
+ port = priv->port_list[i];
+ status[i] = netif_running(port->dev);
+ if (status[i])
+ mvpp2_stop(port->dev);
+ }
+
+ /* nrxqs is the same for all ports */
+ if (priv->percpu_pools)
+ numbufs = port->nrxqs * 2;
+
+ for (i = 0; i < numbufs; i++)
+ mvpp2_bm_pool_destroy(port->dev->dev.parent, priv, &priv->bm_pools[i]);
+
+ devm_kfree(port->dev->dev.parent, priv->bm_pools);
+ priv->percpu_pools = percpu;
+ mvpp2_bm_init(port->dev->dev.parent, priv);
+
+ for (i = 0; i < priv->port_count; i++) {
+ port = priv->port_list[i];
+ mvpp2_swf_bm_pool_init(port);
+ if (status[i])
+ mvpp2_open(port->dev);
+ }
+
+ return 0;
+}
+
static int mvpp2_change_mtu(struct net_device *dev, int mtu)
{
struct mvpp2_port *port = netdev_priv(dev);
bool running = netif_running(dev);
+ struct mvpp2 *priv = port->priv;
int err;
if (!IS_ALIGNED(MVPP2_RX_PKT_SIZE(mtu), 8)) {
@@ -3718,6 +3894,31 @@ static int mvpp2_change_mtu(struct net_device *dev, int mtu)
mtu = ALIGN(MVPP2_RX_PKT_SIZE(mtu), 8);
}
+ if (MVPP2_RX_PKT_SIZE(mtu) > MVPP2_BM_LONG_PKT_SIZE) {
+ if (priv->percpu_pools) {
+ netdev_warn(dev, "mtu %d too high, switching to shared buffers", mtu);
+ mvpp2_bm_switch_buffers(priv, false);
+ }
+ } else {
+ bool jumbo = false;
+ int i;
+
+ for (i = 0; i < priv->port_count; i++)
+ if (priv->port_list[i] != port &&
+ MVPP2_RX_PKT_SIZE(priv->port_list[i]->dev->mtu) >
+ MVPP2_BM_LONG_PKT_SIZE) {
+ jumbo = true;
+ break;
+ }
+
+ /* No port is using jumbo frames */
+ if (!jumbo) {
+ dev_info(port->dev->dev.parent,
+ "all ports have a low MTU, switching to per-cpu buffers");
+ mvpp2_bm_switch_buffers(priv, true);
+ }
+ }
+
if (running)
mvpp2_stop_dev(port);
@@ -5025,18 +5226,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
}
ntxqs = MVPP2_MAX_TXQ;
- if (priv->hw_version == MVPP22 && queue_mode == MVPP2_QDIST_SINGLE_MODE) {
- nrxqs = 1;
- } else {
- /* According to the PPv2.2 datasheet and our experiments on
- * PPv2.1, RX queues have an allocation granularity of 4 (when
- * more than a single one on PPv2.2).
- * Round up to nearest multiple of 4.
- */
- nrxqs = (num_possible_cpus() + 3) & ~0x3;
- if (nrxqs > MVPP2_PORT_MAX_RXQ)
- nrxqs = MVPP2_PORT_MAX_RXQ;
- }
+ nrxqs = mvpp2_get_nrxqs(priv);
dev = alloc_etherdev_mqs(sizeof(*port), ntxqs, nrxqs);
if (!dev)
@@ -5202,7 +5392,8 @@ static int mvpp2_port_probe(struct platform_device *pdev,
dev->features |= NETIF_F_NTUPLE;
}
- mvpp2_set_hw_csum(port, port->pool_long->id);
+ if (!port->priv->percpu_pools)
+ mvpp2_set_hw_csum(port, port->pool_long->id);
dev->vlan_features |= features;
dev->gso_max_segs = MVPP2_MAX_TSO_SEGS;
@@ -5582,6 +5773,10 @@ static int mvpp2_probe(struct platform_device *pdev)
priv->sysctrl_base = NULL;
}
+ if (priv->hw_version == MVPP22 &&
+ mvpp2_get_nrxqs(priv) * 2 <= MVPP2_BM_MAX_POOLS)
+ priv->percpu_pools = 1;
+
mvpp2_setup_bm_pool();
--
2.21.0
^ permalink raw reply related
* [PATCH net-next 1/2] mvpp2: refactor BM pool functions
From: Matteo Croce @ 2019-09-02 10:21 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, Maxime Chevallier, Marcin Wojtas, Antoine Tenart,
Stefan Chulski, Nadav Haklai, Lorenzo Bianconi, David S. Miller
In-Reply-To: <20190902102137.841-1-mcroce@redhat.com>
Refactor mvpp2_bm_pool_create(), mvpp2_bm_pool_destroy() and
mvpp2_bm_pools_init() so that they accept a struct device instead
of a struct platform_device, as they just need platform_device->dev.
Removing such dependency makes the BM code more reusable in context
where we don't have a pointer to the platform_device.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 35 +++++++++----------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index ccdd47f3b8fb..871f14cc7284 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -323,8 +323,7 @@ static void mvpp2_frag_free(const struct mvpp2_bm_pool *pool, void *data)
/* Buffer Manager configuration routines */
/* Create pool */
-static int mvpp2_bm_pool_create(struct platform_device *pdev,
- struct mvpp2 *priv,
+static int mvpp2_bm_pool_create(struct device *dev, struct mvpp2 *priv,
struct mvpp2_bm_pool *bm_pool, int size)
{
u32 val;
@@ -343,7 +342,7 @@ static int mvpp2_bm_pool_create(struct platform_device *pdev,
else
bm_pool->size_bytes = 2 * sizeof(u64) * size;
- bm_pool->virt_addr = dma_alloc_coherent(&pdev->dev, bm_pool->size_bytes,
+ bm_pool->virt_addr = dma_alloc_coherent(dev, bm_pool->size_bytes,
&bm_pool->dma_addr,
GFP_KERNEL);
if (!bm_pool->virt_addr)
@@ -351,9 +350,9 @@ static int mvpp2_bm_pool_create(struct platform_device *pdev,
if (!IS_ALIGNED((unsigned long)bm_pool->virt_addr,
MVPP2_BM_POOL_PTR_ALIGN)) {
- dma_free_coherent(&pdev->dev, bm_pool->size_bytes,
+ dma_free_coherent(dev, bm_pool->size_bytes,
bm_pool->virt_addr, bm_pool->dma_addr);
- dev_err(&pdev->dev, "BM pool %d is not %d bytes aligned\n",
+ dev_err(dev, "BM pool %d is not %d bytes aligned\n",
bm_pool->id, MVPP2_BM_POOL_PTR_ALIGN);
return -ENOMEM;
}
@@ -468,15 +467,14 @@ static int mvpp2_check_hw_buf_num(struct mvpp2 *priv, struct mvpp2_bm_pool *bm_p
}
/* Cleanup pool */
-static int mvpp2_bm_pool_destroy(struct platform_device *pdev,
- struct mvpp2 *priv,
+static int mvpp2_bm_pool_destroy(struct device *dev, struct mvpp2 *priv,
struct mvpp2_bm_pool *bm_pool)
{
int buf_num;
u32 val;
buf_num = mvpp2_check_hw_buf_num(priv, bm_pool);
- mvpp2_bm_bufs_free(&pdev->dev, priv, bm_pool, buf_num);
+ mvpp2_bm_bufs_free(dev, priv, bm_pool, buf_num);
/* Check buffer counters after free */
buf_num = mvpp2_check_hw_buf_num(priv, bm_pool);
@@ -490,14 +488,13 @@ static int mvpp2_bm_pool_destroy(struct platform_device *pdev,
val |= MVPP2_BM_STOP_MASK;
mvpp2_write(priv, MVPP2_BM_POOL_CTRL_REG(bm_pool->id), val);
- dma_free_coherent(&pdev->dev, bm_pool->size_bytes,
+ dma_free_coherent(dev, bm_pool->size_bytes,
bm_pool->virt_addr,
bm_pool->dma_addr);
return 0;
}
-static int mvpp2_bm_pools_init(struct platform_device *pdev,
- struct mvpp2 *priv)
+static int mvpp2_bm_pools_init(struct device *dev, struct mvpp2 *priv)
{
int i, err, size;
struct mvpp2_bm_pool *bm_pool;
@@ -507,7 +504,7 @@ static int mvpp2_bm_pools_init(struct platform_device *pdev,
for (i = 0; i < MVPP2_BM_POOLS_NUM; i++) {
bm_pool = &priv->bm_pools[i];
bm_pool->id = i;
- err = mvpp2_bm_pool_create(pdev, priv, bm_pool, size);
+ err = mvpp2_bm_pool_create(dev, priv, bm_pool, size);
if (err)
goto err_unroll_pools;
mvpp2_bm_pool_bufsize_set(priv, bm_pool, 0);
@@ -515,13 +512,13 @@ static int mvpp2_bm_pools_init(struct platform_device *pdev,
return 0;
err_unroll_pools:
- dev_err(&pdev->dev, "failed to create BM pool %d, size %d\n", i, size);
+ dev_err(dev, "failed to create BM pool %d, size %d\n", i, size);
for (i = i - 1; i >= 0; i--)
- mvpp2_bm_pool_destroy(pdev, priv, &priv->bm_pools[i]);
+ mvpp2_bm_pool_destroy(dev, priv, &priv->bm_pools[i]);
return err;
}
-static int mvpp2_bm_init(struct platform_device *pdev, struct mvpp2 *priv)
+static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
{
int i, err;
@@ -533,12 +530,12 @@ static int mvpp2_bm_init(struct platform_device *pdev, struct mvpp2 *priv)
}
/* Allocate and initialize BM pools */
- priv->bm_pools = devm_kcalloc(&pdev->dev, MVPP2_BM_POOLS_NUM,
+ priv->bm_pools = devm_kcalloc(dev, MVPP2_BM_POOLS_NUM,
sizeof(*priv->bm_pools), GFP_KERNEL);
if (!priv->bm_pools)
return -ENOMEM;
- err = mvpp2_bm_pools_init(pdev, priv);
+ err = mvpp2_bm_pools_init(dev, priv);
if (err < 0)
return err;
return 0;
@@ -5497,7 +5494,7 @@ static int mvpp2_init(struct platform_device *pdev, struct mvpp2 *priv)
mvpp2_write(priv, MVPP2_TX_SNOOP_REG, 0x1);
/* Buffer Manager initialization */
- err = mvpp2_bm_init(pdev, priv);
+ err = mvpp2_bm_init(&pdev->dev, priv);
if (err < 0)
return err;
@@ -5766,7 +5763,7 @@ static int mvpp2_remove(struct platform_device *pdev)
for (i = 0; i < MVPP2_BM_POOLS_NUM; i++) {
struct mvpp2_bm_pool *bm_pool = &priv->bm_pools[i];
- mvpp2_bm_pool_destroy(pdev, priv, bm_pool);
+ mvpp2_bm_pool_destroy(&pdev->dev, priv, bm_pool);
}
for (i = 0; i < MVPP2_MAX_THREADS; i++) {
--
2.21.0
^ permalink raw reply related
* [PATCH net-next 0/2] mvpp2: per-cpu buffers
From: Matteo Croce @ 2019-09-02 10:21 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, Maxime Chevallier, Marcin Wojtas, Antoine Tenart,
Stefan Chulski, Nadav Haklai, Lorenzo Bianconi, David S. Miller
This patchset workarounds an PP2 HW limitation which prevents to use
per-cpu rx buffers.
The first patch is just a refactor to prepare for the second one.
The second one allocates percpu buffers if the following conditions are met:
- CPU number is less or equal 4
- no port is using jumbo frames
If the following conditions are not met at load time, of jumbo frame is enabled
later on, the shared allocation is reverted.
Matteo Croce (2):
mvpp2: refactor BM pool functions
mvpp2: percpu buffers
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 4 +
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 272 +++++++++++++++---
2 files changed, 235 insertions(+), 41 deletions(-)
--
2.21.0
^ permalink raw reply
* Re: [PATCH net 0/5] net: aquantia: fixes on vlan filters and other conditions
From: Igor Russkikh @ 2019-09-02 10:07 UTC (permalink / raw)
To: David Miller, jakub.kicinski@netronome.com; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190831.133618.60802477215444924.davem@davemloft.net>
>>
>> LGTM, Fixes tag should had been first there on patch 4.
>
> Series applied with fixes tag ordering fixed in patch 4.
Thanks Jakub, David,
> You should also perhaps check the return value from
> napi_complete_done() as an optimization for net-next?
Right, thanks, will put that with next net-next patchset.
Regards,
Igor
^ permalink raw reply
* Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
From: Stefano Garzarella @ 2019-09-02 9:57 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
virtualization, Jason Wang, kvm
In-Reply-To: <20190901061707-mutt-send-email-mst@kernel.org>
On Sun, Sep 01, 2019 at 06:17:58AM -0400, Michael S. Tsirkin wrote:
> On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > > >
> > > > > (...)
> > > > >
> > > > > > >
> > > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > > of 4K, there might be issues.
> > > > > >
> > > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > > the broken parts.
> > > > > >
> > > > > > >
> > > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stefano
> > > > > >
> > > > > > Why would a remote care about buffer sizes?
> > > > > >
> > > > > > Let's first see what the issues are. If they exist
> > > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > > >
> > > > >
> > > > > The vhost_transport '.stream_enqueue' callback
> > > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > > passing the user message. This function allocates a new packet, copying
> > > > > the user message, but (before this series) it limits the packet size to
> > > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > > >
> > > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > struct virtio_vsock_pkt_info *info)
> > > > > {
> > > > > ...
> > > > > /* we can send less than pkt_len bytes */
> > > > > if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > > pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > >
> > > > > /* virtio_transport_get_credit might return less than pkt_len credit */
> > > > > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > >
> > > > > /* Do not send zero length OP_RW pkt */
> > > > > if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > return pkt_len;
> > > > > ...
> > > > > }
> > > > >
> > > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > > >
> > > > > The main function executed by the TX worker is
> > > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > > and it tries to copy the packet (up to 4K) on it. If the buffer
> > > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > > be discarded with an error:
> > > > >
> > >
> > > I'm adding more lines to explain better.
> > >
> > > > > static void
> > > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > > struct vhost_virtqueue *vq)
> > > > > {
> > > ...
> > >
> > > head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > > &out, &in, NULL, NULL);
> > >
> > > ...
> > >
> > > len = iov_length(&vq->iov[out], in);
> > > iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > >
> > > nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > > if (nbytes != sizeof(pkt->hdr)) {
> > > virtio_transport_free_pkt(pkt);
> > > vq_err(vq, "Faulted on copying pkt hdr\n");
> > > break;
> > > }
> > >
> > > > > ...
> > > > > nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > > >
> > > > isn't pck len the actual length though?
> > > >
> > >
> > > It is the length of the packet that we are copying in the guest RX
> > > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > > buffers, one for the header and one for the payload (4KB).
> >
> > BTW at the moment that forces another kmalloc within virtio core. Maybe
> > vsock needs a flag to skip allocation in this case. Worth benchmarking.
> > See virtqueue_use_indirect which just does total_sg > 1.
Okay, I'll take a look at virtqueue_use_indirect and I'll do some
benchmarking.
> >
> > >
> > > > > if (nbytes != pkt->len) {
> > > > > virtio_transport_free_pkt(pkt);
> > > > > vq_err(vq, "Faulted on copying pkt buf\n");
> > > > > break;
> > > > > }
> > > > > ...
> > > > > }
> > > > >
> > > > >
> > > > > This series changes this behavior since now we will split the packet in
> > > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > > virtqueue.
> > > > >
> > > > > We didn't change the buffer size in this series, so we still backward
> > > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > > encounter the error described above.
> >
> > So that's an implementation bug then? It made an assumption
> > of a 4K sized buffer? Or even PAGE_SIZE sized buffer?
Yes, I think it made an assumption and it used this macro as a limit:
include/linux/virtio_vsock.h:13:
#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)
>
> Assuming we miss nothing and buffers < 4K are broken,
> I think we need to add this to the spec, possibly with
> a feature bit to relax the requirement that all buffers
> are at least 4k in size.
>
Okay, should I send a proposal to virtio-dev@lists.oasis-open.org?
Thanks,
Stefano
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox