Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next v4 1/3] selftests/bpf: test_progs: switch to open_memstream
From: Andrii Nakryiko @ 2019-08-06 17:55 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, Networking, bpf, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
In-Reply-To: <20190806175411.GC23939@mini-arch>

On Tue, Aug 6, 2019 at 10:54 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 08/06, Andrii Nakryiko wrote:
> > On Tue, Aug 6, 2019 at 10:40 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 08/06, Andrii Nakryiko wrote:
> > > > On Tue, Aug 6, 2019 at 10:19 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >
> > > > > Use open_memstream to override stdout during test execution.
> > > > > The copy of the original stdout is held in env.stdout and used
> > > > > to print subtest info and dump failed log.
> > > > >
> > > > > test_{v,}printf are now simple wrappers around stdout and will be
> > > > > removed in the next patch.
> > > > >
> > > > > v4:
> > > > > * one field per line for stdout/stderr (Andrii Nakryiko)
> > > > >
> > > > > v3:
> > > > > * don't do strlen over log_buf, log_cnt has it already (Andrii Nakryiko)
> > > > >
> > > > > v2:
> > > > > * add ifdef __GLIBC__ around open_memstream (maybe pointless since
> > > > >   we already depend on glibc for argp_parse)
> > > > > * hijack stderr as well (Andrii Nakryiko)
> > > > > * don't hijack for every test, do it once (Andrii Nakryiko)
> > > > > * log_cap -> log_size (Andrii Nakryiko)
> > > > > * do fseeko in a proper place (Andrii Nakryiko)
> > > > > * check open_memstream returned value (Andrii Nakryiko)
> > > > >
> > > > > Cc: Andrii Nakryiko <andriin@fb.com>
> > > > > Acked-by: Andrii Nakryiko <andriin@fb.com>
> > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > ---
> > > > >  tools/testing/selftests/bpf/test_progs.c | 115 ++++++++++++-----------
> > > > >  tools/testing/selftests/bpf/test_progs.h |   3 +-
> > > > >  2 files changed, 62 insertions(+), 56 deletions(-)
> > > > >
> >
> > [...]
> >
> > > > >  void test__printf(const char *fmt, ...)
> > > > > @@ -477,6 +438,48 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +static void stdio_hijack(void)
> > > > > +{
> > > > > +#ifdef __GLIBC__
> > > > > +       if (env.verbose || (env.test && env.test->force_log)) {
> > > >
> > > > I just also realized that you don't need `(env.test &&
> > > > env.test->force_log)` test. We hijack stdout/stderr before env.test is
> > > > even set, so this does nothing anyways. Plus, force_log can be set in
> > > > the middle of test/sub-test, yet we hijack stdout just once (or even
> > > > if per-test), so it's still going to be "racy". Let's buffer output
> > > > (unless it's env.verbose, which is important to not buffer because
> > > > some tests will have huge output, when failing, so this allows to
> > > > bypass using tons of memory for those, when debugging) and dump at the
> > > > end.
> > > Makes sense, will drop this test and resubmit along with a fix for '-v'
> > > that Alexei discovered. Thanks!
> >
> > Oh, is it because env.stdout and env.stderr is not set in verbose
> > mode? I'd always set env.stdout/env.stderr at the beginning, next to
> > env.jit_enabled, to never care about "logging modes".
> Yeah, I moved it to the beginning of stdio_hijack() to mirror the 'undo'
> in stdio_restore(). Let me know if you feel strongly about it and want
> it to be near env.jit_enabled instead.

I'm fine with it, no big deal, I wrote email before I saw your v5.

>
> > > > > +               /* nothing to do, output to stdout by default */
> > > > > +               return;
> > > > > +       }
> > > > > +
> >
> > [...]

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Andrew Lunn @ 2019-08-06 18:03 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <c615dce5-9307-7640-2877-4e5c01e565c0@gmail.com>

On Tue, Aug 06, 2019 at 11:38:32AM -0600, David Ahern wrote:
> On 8/6/19 10:40 AM, Jiri Pirko wrote:
> > Hi all.
> > 
> > I just discussed this with DavidA and I would like to bring this to
> > broader audience. David wants to limit kernel resources in network
> > namespaces, for example fibs, fib rules, etc.
> > 
> > He claims that devlink api is rich enough to program this limitations
> > as it already does for mlxsw hw resources for example. If we have this
> > api for hardware, why don't to reuse it for the kernel and it's
> > resources too?
> 
> The analogy is that a kernel is 'programmed' just like hardware, it has
> resources just like hardware (e.g., memory) and those resources are
> limited as well. So the resources consumed by fib entries, rules,
> nexthops, etc should be controllable.

I expect one question that will come up is why not control
groups. That is often used by the rest of the kernel for resource
control.

But cgroups are mostly about limiting resources for a collection of
processes. I don't think that is true for networking resources. The
resources we are talking about are orthogonal to processes. Or are
there any resources which should be linked to processes? eBPF
resources?

> > So the proposal is to have some new device, say "kernelnet", that would
> > implicitly create per-namespace devlink instance.

Maybe kernelns, to make it clear we are talking about namespace
resources.

Going back to cgroups concept. They are generally hierarchical. Do we
need any sort of hierarchy here? Are there some resources we want to
set a global limit on, and then a per namespace limit on top of that?
We would then need two names, and kernelnet sounds more like the
global level?

       Andrew

^ permalink raw reply

* Re: [patch net-next v2 1/3] net: devlink: allow to change namespaces
From: Jiri Pirko @ 2019-08-06 18:07 UTC (permalink / raw)
  To: David Ahern; +Cc: Jakub Kicinski, netdev, davem, sthemmin, mlxsw
In-Reply-To: <fc6a7342-246c-2fe1-a7d1-c7be5bd0a3a3@gmail.com>

Tue, Aug 06, 2019 at 07:55:30PM CEST, dsahern@gmail.com wrote:
>On 8/6/19 11:53 AM, Jiri Pirko wrote:
>> Let's figure out the devlink-controlling-kernel-resources thread first.
>> What you describe here is exactly that.
>
>as I mentioned on the phone, any outcome of that thread will be in 5.4
>at best. Meanwhile this breakage in 5.2 and 5.3 needs to be fixed.

Why? netdevsim is a dummy device, the purpose of existence is to test
kernel api. No real harm breaking it.

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Jakub Kicinski @ 2019-08-06 18:27 UTC (permalink / raw)
  To: Jiri Pirko, dsahern
  Cc: netdev, davem, mlxsw, andrew, f.fainelli, vivien.didelot,
	mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <20190806164036.GA2332@nanopsycho.orion>

On Tue, 6 Aug 2019 18:40:36 +0200, Jiri Pirko wrote:
> Hi all.
> 
> I just discussed this with DavidA and I would like to bring this to
> broader audience. David wants to limit kernel resources in network
> namespaces, for example fibs, fib rules, etc.
> 
> He claims that devlink api is rich enough to program this limitations
> as it already does for mlxsw hw resources for example. 

TBH I don't see how you changed anything to do with FIB notifications,
so the fact that the accounting is off now is a bit confusing. I don't
understand how devlink, FIB and namespaces mix :(

> If we have this api for hardware, why don't to reuse it for the
> kernel and it's resources too?

IMHO the netdevsim use of this API is a slight abuse, to prove the
device can fail the FIB changes, nothing more..

> So the proposal is to have some new device, say "kernelnet", that
> would implicitly create per-namespace devlink instance. This devlink
> instance would be used to setup resource limits. Like:
> 
> devlink resource set kernelnet path /IPv4/fib size 96
> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
> 
> To me it sounds a bit odd for kernel namespace to act as a device, but
> thinking about it more, it makes sense. Probably better than to define
> a new api. User would use the same tool to work with kernel and hw.
> 
> Also we can implement other devlink functionality, like dpipe.
> User would then have visibility of network pipeline, tables,
> utilization, etc. It is related to the resources too.
> 
> What do you think?

I'm no expert here but seems counter intuitive that device tables would
be aware of namespaces in the first place. Are we not reinventing
cgroup controllers based on a device API? IMHO from a perspective of
someone unfamiliar with routing offload this seems backwards :)

^ permalink raw reply

* [PATCH net-next] mlx5: use correct counter
From: Jonathan Lemon @ 2019-08-06 18:28 UTC (permalink / raw)
  To: saeedm; +Cc: kernel-team, netdev

mlx5e_grp_q_update_stats seems to be using the wrong counter
for if_down_packets.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 6eee3c7d4b06..1d16e03a987d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -363,7 +363,7 @@ static void mlx5e_grp_q_update_stats(struct mlx5e_priv *priv)
 	    !mlx5_core_query_q_counter(priv->mdev, priv->drop_rq_q_counter, 0,
 				       out, sizeof(out)))
 		qcnt->rx_if_down_packets = MLX5_GET(query_q_counter_out, out,
-						    out_of_buffer);
+						    rx_if_down_packets);
 }
 
 #define VNIC_ENV_OFF(c) MLX5_BYTE_OFF(query_vnic_env_out, c)
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net] net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
From: David Miller @ 2019-08-06 18:29 UTC (permalink / raw)
  To: wens; +Cc: mripard, wens, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <20190806073539.32519-1-wens@kernel.org>

From: Chen-Yu Tsai <wens@kernel.org>
Date: Tue,  6 Aug 2019 15:35:39 +0800

> From: Chen-Yu Tsai <wens@csie.org>
> 
> The sun4i-emac uses the "phy" property to find the PHY it's supposed to
> use. This property was deprecated in favor of "phy-handle" in commit
> 8c5b09447625 ("dt-bindings: net: sun4i-emac: Convert the binding to a
> schemas").
> 
> Add support for this new property name, and fall back to the old one in
> case the device tree hasn't been updated.
> 
> Signed-off-by: Chen-Yu Tsai <wens@csie.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
From: David Miller @ 2019-08-06 18:34 UTC (permalink / raw)
  To: christophe.jaillet; +Cc: vishal, netdev, linux-kernel, kernel-janitors
In-Reply-To: <20190806085512.11729-1-christophe.jaillet@wanadoo.fr>

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Tue,  6 Aug 2019 10:55:12 +0200

> A call to 'kfree_skb()' is missing in the error handling path of
> 'init_one()'.
> This is already present in 'remove_one()' but is missing here.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Looks good, applied, thanks.

^ permalink raw reply

* Re: [PATCH] net/mlx4_core: Use refcount_t for refcount
From: Jason Gunthorpe @ 2019-08-06 18:36 UTC (permalink / raw)
  To: Chuhong Yuan
  Cc: Saeed Mahameed, linux-kernel@vger.kernel.org, davem@davemloft.net,
	Tariq Toukan, linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <CANhBUQ1wZPinWicu2c_VZjpTtP_9+AxB=7zn+ymPyYVo_rsxZQ@mail.gmail.com>

On Sat, Aug 03, 2019 at 10:42:31AM +0800, Chuhong Yuan wrote:
> Saeed Mahameed <saeedm@mellanox.com> 于2019年8月3日周六 上午2:38写道：
> >
> > On Sat, 2019-08-03 at 00:10 +0800, Chuhong Yuan wrote:
> > > Chuhong Yuan <hslester96@gmail.com> 于2019年8月2日周五 下午8:10写道：
> > > > refcount_t is better for reference counters since its
> > > > implementation can prevent overflows.
> > > > So convert atomic_t ref counters to refcount_t.
> > > >
> > > > Also convert refcount from 0-based to 1-based.
> > > >
> > >
> > > It seems that directly converting refcount from 0-based
> > > to 1-based is infeasible.
> > > I am sorry for this mistake.
> >
> > Just curious, why not keep it 0 based and use refcout_t ?
> >
> > refcount API should have the same semantics as atomic_t API .. no ?
> 
> refcount API will warn when increase a 0 refcount.
> It regards this as a use-after-free.

If this causes failures then the code is not doing atomic as a
refcount properly anyhow.. 

There are some cases where the atomic refcount is just a imprecise
debugging aide.

Jason

^ permalink raw reply

* Re: [PATCH 1/1] ixgbe: sync the first fragment unconditionally
From: David Miller @ 2019-08-06 18:36 UTC (permalink / raw)
  To: firo.yang; +Cc: netdev, jeffrey.t.kirsher, intel-wired-lan, linux-kernel
In-Reply-To: <20190806092919.13211-1-firo.yang@suse.com>

From: Firo Yang <firo.yang@suse.com>
Date: Tue, 6 Aug 2019 09:29:51 +0000

> In Xen environment, if Xen-swiotlb is enabled, ixgbe driver
> could possibly allocate a page, DMA memory buffer, for the first
> fragment which is not suitable for Xen-swiotlb to do DMA operations.
> Xen-swiotlb will internally allocate another page for doing DMA
> operations. It requires syncing between those two pages. Otherwise,
> we may get an incomplete skb. To fix this problem, sync the first
> fragment no matter the first fargment is makred as "page_released"
> or not.
> 
> Signed-off-by: Firo Yang <firo.yang@suse.com>

I don't understand, an unmap operation implies a sync operation.

^ permalink raw reply

* Re: [PATCH v2] net/mlx5e: Use refcount_t for refcount
From: Jason Gunthorpe @ 2019-08-06 18:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Saeed Mahameed, hslester96@gmail.com, davem@davemloft.net,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190806065958.GQ4832@mtr-leonro.mtl.com>

On Tue, Aug 06, 2019 at 09:59:58AM +0300, Leon Romanovsky wrote:
> On Mon, Aug 05, 2019 at 08:06:36PM +0000, Saeed Mahameed wrote:
> > On Mon, 2019-08-05 at 14:55 +0800, Chuhong Yuan wrote:
> > > On Mon, Aug 5, 2019 at 2:13 PM Leon Romanovsky <leon@kernel.org>
> > > wrote:
> > > > On Sun, Aug 04, 2019 at 10:44:47PM +0800, Chuhong Yuan wrote:
> > > > > On Sun, Aug 4, 2019 at 8:59 PM Leon Romanovsky <leon@kernel.org>
> > > > > wrote:
> > > > > > On Sat, Aug 03, 2019 at 12:48:28AM +0800, Chuhong Yuan wrote:
> > > > > > > refcount_t is better for reference counters since its
> > > > > > > implementation can prevent overflows.
> > > > > > > So convert atomic_t ref counters to refcount_t.
> > > > > >
> > > > > > I'm not thrilled to see those automatic conversion patches,
> > > > > > especially
> > > > > > for flows which can't overflow. There is nothing wrong in using
> > > > > > atomic_t
> > > > > > type of variable, do you have in mind flow which will cause to
> > > > > > overflow?
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > I have to say that these patches are not done automatically...
> > > > > Only the detection of problems is done by a script.
> > > > > All conversions are done manually.
> > > >
> > > > Even worse, you need to audit usage of atomic_t and replace there
> > > > it can overflow.
> > > >
> > > > > I am not sure whether the flow can cause an overflow.
> > > >
> > > > It can't.
> > > >
> > > > > But I think it is hard to ensure that a data path is impossible
> > > > > to have problems in any cases including being attacked.
> > > >
> > > > It is not data path, and I doubt that such conversion will be
> > > > allowed
> > > > in data paths without proving that no performance regression is
> > > > introduced.
> > > > > So I think it is better to do this minor revision to prevent
> > > > > potential risk, just like we have done in mlx5/core/cq.c.
> > > >
> > > > mlx5/core/cq.c is a different beast, refcount there means actual
> > > > users
> > > > of CQ which are limited in SW, so in theory, they have potential
> > > > to be overflown.
> > > >
> > > > It is not the case here, there your are adding new port.
> > > > There is nothing wrong with atomic_t.
> > > >
> > >
> > > Thanks for your explanation!
> > > I will pay attention to this point in similar cases.
> > > But it seems that the semantic of refcount is not always as clear as
> > > here...
> > >
> >
> > Semantically speaking, there is nothing wrong with moving to refcount_t
> > in the case of vxlan ports.. it also seems more accurate and will
> > provide the type protection, even if it is not necessary. Please let me
> > know what is the verdict here, i can apply this patch to net-next-mlx5.
> 
> There is no verdict here, it is up to you., if you like code churn, go
> for it.

IMHO CONFIG_REFCOUNT_FULL is a valuable enough reason to not open code
with an atomic even if overflow is not possible. 

Races resulting in incrs on 0 refcounts is a common enough mistake.

Jason

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Jiri Pirko @ 2019-08-06 18:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: dsahern, netdev, davem, mlxsw, andrew, f.fainelli, vivien.didelot,
	mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <20190806112717.3b070d07@cakuba.netronome.com>

Tue, Aug 06, 2019 at 08:27:17PM CEST, jakub.kicinski@netronome.com wrote:
>On Tue, 6 Aug 2019 18:40:36 +0200, Jiri Pirko wrote:
>> Hi all.
>> 
>> I just discussed this with DavidA and I would like to bring this to
>> broader audience. David wants to limit kernel resources in network
>> namespaces, for example fibs, fib rules, etc.
>> 
>> He claims that devlink api is rich enough to program this limitations
>> as it already does for mlxsw hw resources for example. 
>
>TBH I don't see how you changed anything to do with FIB notifications,
>so the fact that the accounting is off now is a bit confusing. I don't
>understand how devlink, FIB and namespaces mix :(
>
>> If we have this api for hardware, why don't to reuse it for the
>> kernel and it's resources too?
>
>IMHO the netdevsim use of this API is a slight abuse, to prove the
>device can fail the FIB changes, nothing more..

It's slightly bigger abuse :) But in this thread, we are not discussing
netdevsim, but separate "dev".


>
>> So the proposal is to have some new device, say "kernelnet", that
>> would implicitly create per-namespace devlink instance. This devlink
>> instance would be used to setup resource limits. Like:
>> 
>> devlink resource set kernelnet path /IPv4/fib size 96
>> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
>> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
>> 
>> To me it sounds a bit odd for kernel namespace to act as a device, but
>> thinking about it more, it makes sense. Probably better than to define
>> a new api. User would use the same tool to work with kernel and hw.
>> 
>> Also we can implement other devlink functionality, like dpipe.
>> User would then have visibility of network pipeline, tables,
>> utilization, etc. It is related to the resources too.
>> 
>> What do you think?
>
>I'm no expert here but seems counter intuitive that device tables would
>be aware of namespaces in the first place. Are we not reinventing
>cgroup controllers based on a device API? IMHO from a perspective of
>someone unfamiliar with routing offload this seems backwards :)

Can we use cgroup for fib and other limitations instead?


^ permalink raw reply

* Re: [PATCH] net: sched: sch_taprio: fix memleak in error path for sched list parse
From: David Miller @ 2019-08-06 18:41 UTC (permalink / raw)
  To: ivan.khoronzhuk
  Cc: vinicius.gomes, jhs, xiyou.wangcong, jiri, netdev, linux-kernel
In-Reply-To: <20190806100425.4356-1-ivan.khoronzhuk@linaro.org>

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Tue,  6 Aug 2019 13:04:25 +0300

> Based on net/master

I wonder about that because:

> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -1451,7 +1451,8 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>  	spin_unlock_bh(qdisc_lock(sch));
>  
>  free_sched:
> -	kfree(new_admin);
> +	if (new_admin)
> +		call_rcu(&new_admin->rcu, taprio_free_sched_cb);
>  
>  	return err;

In my tree the context around line 1451 is:

	nla_nest_end(skb, sched_nest);

done:
	rcu_read_unlock();

	return nla_nest_end(skb, nest);


which is part of function taprio_dump().

Please respin this properly against current 'net' sources.

^ permalink raw reply

* Re: [PATCH 0/2 net,v4] flow_offload hardware priority fixes
From: Jakub Kicinski @ 2019-08-06 18:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, marcelo.leitner, jiri, wenxu,
	saeedm, paulb, gerlitz.or
In-Reply-To: <20190806160310.6663-1-pablo@netfilter.org>

On Tue,  6 Aug 2019 18:03:08 +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> This patchset contains two updates for the flow_offload users:
> 
> 1) Pass the major tc priority to drivers so they do not have to
>    lshift it. This is a preparation patch for the fix coming in
>    patch #2.
> 
> 2) Set the hardware priority from the netfilter basechain priority,
>    some drivers break when using the existing hardware priority
>    number that is set to zero.

Seems reasonable, thanks.

^ permalink raw reply

* Re: [PATCH] net/ipv4: reset mac head before call ip_tunnel_rcv()
From: David Miller @ 2019-08-06 18:43 UTC (permalink / raw)
  To: ptpt52; +Cc: kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20190806104731.30603-1-ptpt52@gmail.com>

From: Chen Minqiang <ptpt52@gmail.com>
Date: Tue,  6 Aug 2019 18:47:31 +0800

> Signed-off-by: Chen Minqiang <ptpt52@gmail.com>

No commit message means I'm not even going to look at this patch and
try to understand it.

You must always completely explain, in detail, what change you are
making, how you are making it, and above all why you are making this
change.

Is there some bug you are fixing?  What is that bug and what does that
bug cause to happen?  Are there potential negative side effects to
your fix?  What are they and what was the cost/benefit analysis for
that?

Where was the bug introduced?  You must provide a proper Fixes: tag
which shows this.

Thank you.

^ permalink raw reply

* Re: [PATCH v3 net-next] be2net: disable bh with spin_lock in be_process_mcc
From: David Miller @ 2019-08-06 18:47 UTC (permalink / raw)
  To: kda; +Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, netdev
In-Reply-To: <20190806105111.27058-1-dkirjanov@suse.com>

From: Denis Kirjanov <kda@linux-powerpc.org>
Date: Tue,  6 Aug 2019 12:51:11 +0200

> be_process_mcc() is invoked in 3 different places and
> always with BHs disabled except the be_poll function
> but since it's invoked from softirq with BHs
> disabled it won't hurt.
> 
> v1->v2: added explanation to the patch
> v2->v3: add a missing call from be_cmds.c
> 
> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>

Applied.

^ permalink raw reply

* Re: pull-request: wireless-drivers 2019-08-06
From: David Miller @ 2019-08-06 18:49 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <87h86ufs89.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Tue, 06 Aug 2019 14:28:22 +0300

> here's a pull request to net tree for v5.3, more information below.
> Please let me know if there are any problems.

Pulled, thanks Kalle.

^ permalink raw reply

* Re: [PATCH v2] net/mlx5e: Use refcount_t for refcount
From: Saeed Mahameed @ 2019-08-06 18:54 UTC (permalink / raw)
  To: jgg@ziepe.ca, leon@kernel.org
  Cc: linux-kernel@vger.kernel.org, davem@davemloft.net,
	hslester96@gmail.com, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20190806183821.GR11627@ziepe.ca>

On Tue, 2019-08-06 at 15:38 -0300, Jason Gunthorpe wrote:
> On Tue, Aug 06, 2019 at 09:59:58AM +0300, Leon Romanovsky wrote:
> > On Mon, Aug 05, 2019 at 08:06:36PM +0000, Saeed Mahameed wrote:
> > > On Mon, 2019-08-05 at 14:55 +0800, Chuhong Yuan wrote:
> > > > On Mon, Aug 5, 2019 at 2:13 PM Leon Romanovsky <leon@kernel.org
> > > > >
> > > > wrote:
> > > > > On Sun, Aug 04, 2019 at 10:44:47PM +0800, Chuhong Yuan wrote:
> > > > > > On Sun, Aug 4, 2019 at 8:59 PM Leon Romanovsky <
> > > > > > leon@kernel.org>
> > > > > > wrote:
> > > > > > > On Sat, Aug 03, 2019 at 12:48:28AM +0800, Chuhong Yuan
> > > > > > > wrote:
> > > > > > > > refcount_t is better for reference counters since its
> > > > > > > > implementation can prevent overflows.
> > > > > > > > So convert atomic_t ref counters to refcount_t.
> > > > > > > 
> > > > > > > I'm not thrilled to see those automatic conversion
> > > > > > > patches,
> > > > > > > especially
> > > > > > > for flows which can't overflow. There is nothing wrong in
> > > > > > > using
> > > > > > > atomic_t
> > > > > > > type of variable, do you have in mind flow which will
> > > > > > > cause to
> > > > > > > overflow?
> > > > > > > 
> > > > > > > Thanks
> > > > > > 
> > > > > > I have to say that these patches are not done
> > > > > > automatically...
> > > > > > Only the detection of problems is done by a script.
> > > > > > All conversions are done manually.
> > > > > 
> > > > > Even worse, you need to audit usage of atomic_t and replace
> > > > > there
> > > > > it can overflow.
> > > > > 
> > > > > > I am not sure whether the flow can cause an overflow.
> > > > > 
> > > > > It can't.
> > > > > 
> > > > > > But I think it is hard to ensure that a data path is
> > > > > > impossible
> > > > > > to have problems in any cases including being attacked.
> > > > > 
> > > > > It is not data path, and I doubt that such conversion will be
> > > > > allowed
> > > > > in data paths without proving that no performance regression
> > > > > is
> > > > > introduced.
> > > > > > So I think it is better to do this minor revision to
> > > > > > prevent
> > > > > > potential risk, just like we have done in mlx5/core/cq.c.
> > > > > 
> > > > > mlx5/core/cq.c is a different beast, refcount there means
> > > > > actual
> > > > > users
> > > > > of CQ which are limited in SW, so in theory, they have
> > > > > potential
> > > > > to be overflown.
> > > > > 
> > > > > It is not the case here, there your are adding new port.
> > > > > There is nothing wrong with atomic_t.
> > > > > 
> > > > 
> > > > Thanks for your explanation!
> > > > I will pay attention to this point in similar cases.
> > > > But it seems that the semantic of refcount is not always as
> > > > clear as
> > > > here...
> > > > 
> > > 
> > > Semantically speaking, there is nothing wrong with moving to
> > > refcount_t
> > > in the case of vxlan ports.. it also seems more accurate and will
> > > provide the type protection, even if it is not necessary. Please
> > > let me
> > > know what is the verdict here, i can apply this patch to net-
> > > next-mlx5.
> > 
> > There is no verdict here, it is up to you., if you like code churn,
> > go
> > for it.
> 
> IMHO CONFIG_REFCOUNT_FULL is a valuable enough reason to not open
> code
> with an atomic even if overflow is not possible. 
> 
> Races resulting in incrs on 0 refcounts is a common enough mistake.
> 
> Jason

Indeed, thanks Jason, I will take this patch to net-next-mlx5.


^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Jakub Kicinski @ 2019-08-06 18:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: dsahern, netdev, davem, mlxsw, andrew, f.fainelli, vivien.didelot,
	mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <20190806183841.GD2332@nanopsycho.orion>

On Tue, 6 Aug 2019 20:38:41 +0200, Jiri Pirko wrote:
> >> So the proposal is to have some new device, say "kernelnet", that
> >> would implicitly create per-namespace devlink instance. This devlink
> >> instance would be used to setup resource limits. Like:
> >> 
> >> devlink resource set kernelnet path /IPv4/fib size 96
> >> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
> >> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
> >> 
> >> To me it sounds a bit odd for kernel namespace to act as a device, but
> >> thinking about it more, it makes sense. Probably better than to define
> >> a new api. User would use the same tool to work with kernel and hw.
> >> 
> >> Also we can implement other devlink functionality, like dpipe.
> >> User would then have visibility of network pipeline, tables,
> >> utilization, etc. It is related to the resources too.
> >> 
> >> What do you think?  
> >
> >I'm no expert here but seems counter intuitive that device tables would
> >be aware of namespaces in the first place. Are we not reinventing
> >cgroup controllers based on a device API? IMHO from a perspective of
> >someone unfamiliar with routing offload this seems backwards :)  
> 
> Can we use cgroup for fib and other limitations instead?

Not sure the question is to me, I don't feel particularly qualified,
I've never worked with VDCs or wrote a switch driver.. But I'd see
cgroups as a natural fit, and if I read Andrew's reply right so does
he.. There's certainly a feeling of reinventing the wheel here.

We usually model things in software and then compile that abstraction
into device terms. Devlink allows for low level access to the device,
it allows us to, in a sense, see the result of that compilation. But
that's more of a debugging/low level knob than first class citizen :(

^ permalink raw reply

* Re: [PATCH 1/3] net: dsa: ksz: Remove dead code and fix warnings
From: David Miller @ 2019-08-06 18:59 UTC (permalink / raw)
  To: marex; +Cc: netdev, andrew, f.fainelli, Tristram.Ha, vivien.didelot,
	woojung.huh
In-Reply-To: <20190806130609.29686-1-marex@denx.de>

From: Marek Vasut <marex@denx.de>
Date: Tue,  6 Aug 2019 15:06:07 +0200

> Remove ksz_port_cleanup(), which is unused. Add missing include
> "ksz_common.h", which fixes the following warning when built with
> make ... W=1
> 
> drivers/net/dsa/microchip/ksz_common.c:23:6: warning: no previous prototype for ‘...’ [-Wmissing-prototypes]
> 
> Note that the order of the headers cannot be swapped, as that would
> trigger missing forward declaration errors, which would indicate the
> way forward is to merge the two headers into one.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>

Applied.

^ permalink raw reply

* Re: [PATCH 2/3] net: dsa: ksz: Merge ksz_priv.h into ksz_common.h
From: David Miller @ 2019-08-06 18:59 UTC (permalink / raw)
  To: marex; +Cc: netdev, andrew, f.fainelli, Tristram.Ha, vivien.didelot,
	woojung.huh
In-Reply-To: <20190806130609.29686-2-marex@denx.de>

From: Marek Vasut <marex@denx.de>
Date: Tue,  6 Aug 2019 15:06:08 +0200

> Merge the two headers into one, no functional change.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>

Applied.

^ permalink raw reply

* Re: [PATCH 3/3] net: dsa: ksz: Drop NET_DSA_TAG_KSZ9477
From: David Miller @ 2019-08-06 18:59 UTC (permalink / raw)
  To: marex; +Cc: netdev, andrew, f.fainelli, Tristram.Ha, vivien.didelot,
	woojung.huh
In-Reply-To: <20190806130609.29686-3-marex@denx.de>

From: Marek Vasut <marex@denx.de>
Date: Tue,  6 Aug 2019 15:06:09 +0200

> This Kconfig option is unused, drop it.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] mlx5: use correct counter
From: Saeed Mahameed @ 2019-08-06 19:03 UTC (permalink / raw)
  To: jonathan.lemon@gmail.com; +Cc: kernel-team@fb.com, netdev@vger.kernel.org
In-Reply-To: <20190806182819.788750-1-jonathan.lemon@gmail.com>

On Tue, 2019-08-06 at 11:28 -0700, Jonathan Lemon wrote:
> mlx5e_grp_q_update_stats seems to be using the wrong counter
> for if_down_packets.
> 
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> index 6eee3c7d4b06..1d16e03a987d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> @@ -363,7 +363,7 @@ static void mlx5e_grp_q_update_stats(struct
> mlx5e_priv *priv)
>  	    !mlx5_core_query_q_counter(priv->mdev, priv-
> >drop_rq_q_counter, 0,
>  				       out, sizeof(out)))
>  		qcnt->rx_if_down_packets =
> MLX5_GET(query_q_counter_out, out,
> -						    out_of_buffer);
> +						    rx_if_down_packets)
> ;

Hi Jonathan, 

This patch in not applicable (won't compile and there is no issue with
current code).

Although it is confusing but the code is correct as is.

1) your patch won't compile since there is no rx_if_down_packets field
in query_q_counter_out hw definition struct: please check
include/linux/mlx5/mlx5_ifc.h
mlx5_ifc_query_q_counter_out_bits
2) the code works as is since when interface is down and port is up,
technically from hw perspective there is "no buffer available" so the
out_of_buffer counter of the drop_rq_q_counter will count packets
dropped due to interface down..

Thanks,
Saeed.

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Andrew Lunn @ 2019-08-06 19:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, dsahern, netdev, davem, mlxsw, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <20190806115449.5b3a9d97@cakuba.netronome.com>

On Tue, Aug 06, 2019 at 11:54:49AM -0700, Jakub Kicinski wrote:
> On Tue, 6 Aug 2019 20:38:41 +0200, Jiri Pirko wrote:
> > >> So the proposal is to have some new device, say "kernelnet", that
> > >> would implicitly create per-namespace devlink instance. This devlink
> > >> instance would be used to setup resource limits. Like:
> > >> 
> > >> devlink resource set kernelnet path /IPv4/fib size 96
> > >> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
> > >> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules size 8
> > >> 
> > >> To me it sounds a bit odd for kernel namespace to act as a device, but
> > >> thinking about it more, it makes sense. Probably better than to define
> > >> a new api. User would use the same tool to work with kernel and hw.
> > >> 
> > >> Also we can implement other devlink functionality, like dpipe.
> > >> User would then have visibility of network pipeline, tables,
> > >> utilization, etc. It is related to the resources too.
> > >> 
> > >> What do you think?  
> > >
> > >I'm no expert here but seems counter intuitive that device tables would
> > >be aware of namespaces in the first place. Are we not reinventing
> > >cgroup controllers based on a device API? IMHO from a perspective of
> > >someone unfamiliar with routing offload this seems backwards :)  
> > 
> > Can we use cgroup for fib and other limitations instead?
> 
> Not sure the question is to me, I don't feel particularly qualified,
> I've never worked with VDCs or wrote a switch driver.. But I'd see
> cgroups as a natural fit, and if I read Andrew's reply right so does
> he.. 

Hi Jakub

I think there needs to be a clearly reasoned argument why cgroups is
the wrong answer to this problem. I myself don't know enough to give
that answer, but i can pose the question.

     Andrew


^ permalink raw reply

* Re: [PATCH v2 2/2] dt-bindings: net: meson-dwmac: convert to yaml
From: Martin Blumenstingl @ 2019-08-06 19:11 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: robh+dt, devicetree, netdev, linux-amlogic, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20190806125041.16105-3-narmstrong@baylibre.com>

On Tue, Aug 6, 2019 at 2:50 PM Neil Armstrong <narmstrong@baylibre.com> wrote:
>
> Now that we have the DT validation in place, let's convert the device tree
> bindings for the Synopsys DWMAC Glue for Amlogic SoCs over to a YAML schemas.
>
> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
thank you for taking care of this conversion!
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>

[...]
> +        amlogic,tx-delay-ns:
> +          $ref: /schemas/types.yaml#definitions/uint32
> +          description:
> +            The internal RGMII TX clock delay (provided by this driver) in
> +            nanoseconds. Allowed values are 0ns, 2ns, 4ns, 6ns.
once I have more time I will try to see whether we can define an enum
with these values, then invalid values will yield a warning/error when
building the .dtb (which seems to be a good idea)
this comment shouldn't prevent this patch from being applied as the
initial conversion will already make life a lot easier


Martin

^ permalink raw reply

* Re: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Jakub Kicinski @ 2019-08-06 19:12 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org, KY Srinivasan,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB0169AECABF6094A3E7BEE381BFD50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Tue, 6 Aug 2019 05:17:44 +0000, Dexuan Cui wrote:
> This fixes a warning of "suspicious rcu_dereference_check() usage"
> when nload runs.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Minor change in behaviour would perhaps be worth acknowledging in the
commit message (since you check ndev for NULL later now), and a Fixes
tag would be good.

But the looks pretty straightforward and correct!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox