Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-12 23:32 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712213559.GA175138@google.com>

On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > The rcu/sync code was doing its own check whether we are in a reader
> > section. With RCU consolidating flavors and the generic helper added in
> > this series, this is no longer need. We can just use the generic helper
> > and it results in a nice cleanup.
> > 
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Hi Oleg,
> Slightly unrelated to the patch,
> I tried hard to understand this comment below in percpu_down_read() but no dice.
> 
> I do understand how rcu sync and percpu rwsem works, however the comment
> below didn't make much sense to me. For one, there's no readers_fast anymore
> so I did not follow what readers_fast means. Could the comment be updated to
> reflect latest changes?
> Also could you help understand how is a writer not able to change
> sem->state and count the per-cpu read counters at the same time as the
> comment tries to say?
> 
> 	/*
> 	 * We are in an RCU-sched read-side critical section, so the writer
> 	 * cannot both change sem->state from readers_fast and start checking
> 	 * counters while we are here. So if we see !sem->state, we know that
> 	 * the writer won't be checking until we're past the preempt_enable()
> 	 * and that once the synchronize_rcu() is done, the writer will see
> 	 * anything we did within this RCU-sched read-size critical section.
> 	 */
> 
> Also,
> I guess we could get rid of all of the gp_ops struct stuff now that since all
> the callbacks are the same now. I will post that as a follow-up patch to this
> series.

Hello, Joel,

Oleg has a set of patches updating this code that just hit mainline
this week.  These patches get rid of the code that previously handled
RCU's multiple flavors.  Or are you looking at current mainline and
me just missing your point?

							Thanx, Paul

> thanks!
> 
>  - Joel
> 
> 
> > ---
> > Please note: Only build and boot tested this particular patch so far.
> > 
> >  include/linux/rcu_sync.h |  5 ++---
> >  kernel/rcu/sync.c        | 22 ----------------------
> >  2 files changed, 2 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > index 6fc53a1345b3..c954f1efc919 100644
> > --- a/include/linux/rcu_sync.h
> > +++ b/include/linux/rcu_sync.h
> > @@ -39,9 +39,8 @@ extern void rcu_sync_lockdep_assert(struct rcu_sync *);
> >   */
> >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> >  {
> > -#ifdef CONFIG_PROVE_RCU
> > -	rcu_sync_lockdep_assert(rsp);
> > -#endif
> > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > +			 "suspicious rcu_sync_is_idle() usage");
> >  	return !rsp->gp_state; /* GP_IDLE */
> >  }
> >  
> > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > index a8304d90573f..535e02601f56 100644
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -10,37 +10,25 @@
> >  #include <linux/rcu_sync.h>
> >  #include <linux/sched.h>
> >  
> > -#ifdef CONFIG_PROVE_RCU
> > -#define __INIT_HELD(func)	.held = func,
> > -#else
> > -#define __INIT_HELD(func)
> > -#endif
> > -
> >  static const struct {
> >  	void (*sync)(void);
> >  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> >  	void (*wait)(void);
> > -#ifdef CONFIG_PROVE_RCU
> > -	int  (*held)(void);
> > -#endif
> >  } gp_ops[] = {
> >  	[RCU_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_held)
> >  	},
> >  	[RCU_SCHED_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_sched_held)
> >  	},
> >  	[RCU_BH_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_bh_held)
> >  	},
> >  };
> >  
> > @@ -49,16 +37,6 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
> >  
> >  #define	rss_lock	gp_wait.lock
> >  
> > -#ifdef CONFIG_PROVE_RCU
> > -void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
> > -{
> > -	RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
> > -			 "suspicious rcu_sync_is_idle() usage");
> > -}
> > -
> > -EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
> > -#endif
> > -
> >  /**
> >   * rcu_sync_init() - Initialize an rcu_sync structure
> >   * @rsp: Pointer to rcu_sync structure to be initialized
> > -- 
> > 2.22.0.510.g264f2c817a-goog
> > 
> 

^ permalink raw reply

* Re: [PATCH v1 1/6] rcu: Add support for consolidated-RCU reader checking
From: Paul E. McKenney @ 2019-07-12 23:27 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Peter Zijlstra, linux-kernel, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Pavel Machek,
	Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712194040.GA150253@google.com>

On Fri, Jul 12, 2019 at 03:40:40PM -0400, Joel Fernandes wrote:
> On Fri, Jul 12, 2019 at 10:46:30AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 12, 2019 at 01:06:31PM -0400, Joel Fernandes wrote:
> > > On Fri, Jul 12, 2019 at 09:45:31AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jul 12, 2019 at 11:10:51AM -0400, Joel Fernandes wrote:
> > > > > On Fri, Jul 12, 2019 at 01:11:25PM +0200, Peter Zijlstra wrote:
> > > > > > On Thu, Jul 11, 2019 at 07:43:56PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > +int rcu_read_lock_any_held(void)
> > > > > > > +{
> > > > > > > +	int lockdep_opinion = 0;
> > > > > > > +
> > > > > > > +	if (!debug_lockdep_rcu_enabled())
> > > > > > > +		return 1;
> > > > > > > +	if (!rcu_is_watching())
> > > > > > > +		return 0;
> > > > > > > +	if (!rcu_lockdep_current_cpu_online())
> > > > > > > +		return 0;
> > > > > > > +
> > > > > > > +	/* Preemptible RCU flavor */
> > > > > > > +	if (lock_is_held(&rcu_lock_map))
> > > > > > 
> > > > > > you forgot debug_locks here.
> > > > > 
> > > > > Actually, it turns out debug_locks checking is not even needed. If
> > > > > debug_locks == 0, then debug_lockdep_rcu_enabled() returns 0 and we would not
> > > > > get to this point.
> > > > > 
> > > > > > > +		return 1;
> > > > > > > +
> > > > > > > +	/* BH flavor */
> > > > > > > +	if (in_softirq() || irqs_disabled())
> > > > > > 
> > > > > > I'm not sure I'd put irqs_disabled() under BH, also this entire
> > > > > > condition is superfluous, see below.
> > > > > > 
> > > > > > > +		return 1;
> > > > > > > +
> > > > > > > +	/* Sched flavor */
> > > > > > > +	if (debug_locks)
> > > > > > > +		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> > > > > > > +	return lockdep_opinion || !preemptible();
> > > > > > 
> > > > > > that !preemptible() turns into:
> > > > > > 
> > > > > >   !(preempt_count()==0 && !irqs_disabled())
> > > > > > 
> > > > > > which is:
> > > > > > 
> > > > > >   preempt_count() != 0 || irqs_disabled()
> > > > > > 
> > > > > > and already includes irqs_disabled() and in_softirq().
> > > > > > 
> > > > > > > +}
> > > > > > 
> > > > > > So maybe something lke:
> > > > > > 
> > > > > > 	if (debug_locks && (lock_is_held(&rcu_lock_map) ||
> > > > > > 			    lock_is_held(&rcu_sched_lock_map)))
> > > > > > 		return true;
> > > > > 
> > > > > Agreed, I will do it this way (without the debug_locks) like:
> > > > > 
> > > > > ---8<-----------------------
> > > > > 
> > > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > > index ba861d1716d3..339aebc330db 100644
> > > > > --- a/kernel/rcu/update.c
> > > > > +++ b/kernel/rcu/update.c
> > > > > @@ -296,27 +296,15 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
> > > > >  
> > > > >  int rcu_read_lock_any_held(void)
> > > > >  {
> > > > > -	int lockdep_opinion = 0;
> > > > > -
> > > > >  	if (!debug_lockdep_rcu_enabled())
> > > > >  		return 1;
> > > > >  	if (!rcu_is_watching())
> > > > >  		return 0;
> > > > >  	if (!rcu_lockdep_current_cpu_online())
> > > > >  		return 0;
> > > > > -
> > > > > -	/* Preemptible RCU flavor */
> > > > > -	if (lock_is_held(&rcu_lock_map))
> > > > > -		return 1;
> > > > > -
> > > > > -	/* BH flavor */
> > > > > -	if (in_softirq() || irqs_disabled())
> > > > > -		return 1;
> > > > > -
> > > > > -	/* Sched flavor */
> > > > > -	if (debug_locks)
> > > > > -		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> > > > > -	return lockdep_opinion || !preemptible();
> > > > > +	if (lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_sched_lock_map))
> > > > 
> > > > OK, I will bite...  Why not also lock_is_held(&rcu_bh_lock_map)?
> > > 
> > > Hmm, I was borrowing the strategy from rcu_read_lock_bh_held() which does not
> > > check for a lock held in this map.
> > > 
> > > Honestly, even  lock_is_held(&rcu_sched_lock_map) seems unnecessary per-se
> > > since !preemptible() will catch that? rcu_read_lock_sched() disables
> > > preemption already, so lockdep's opinion of the matter seems redundant there.
> > 
> > Good point!  At least as long as the lockdep splats list RCU-bh among
> > the locks held, which they did last I checked.
> > 
> > Of course, you could make the same argument for getting rid of
> > rcu_sched_lock_map.  Does it make sense to have the one without
> > the other?
> 
> It probably makes it inconsistent in the least. I will add the check for
> the rcu_bh_lock_map in a separate patch, if that's Ok with you - since I also
> want to update the rcu_read_lock_bh_held() logic in the same patch.
> 
> That rcu_read_lock_bh_held() could also just return !preemptible as Peter
> suggested for the bh case.

Although that seems reasonable, please check the call sites.

> > > Sorry I already sent out patches again before seeing your comment but I can
> > > rework and resend them based on any other suggestions.
> > 
> > Not a problem!
> 
> Thanks. Depending on whether there is any other feedback, I will work on the
> bh_ stuff as a separate patch on top of this series, or work it into the next
> series revision if I'm reposting. Hopefully that sounds Ok to you.

Agreed -- let's separate concerns.  And promote bisectability.

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
From: Gregory Rose @ 2019-07-12 23:11 UTC (permalink / raw)
  To: David Miller, ap420073; +Cc: pshelar, netdev, dev
In-Reply-To: <20190712.151846.1093841226730573129.davem@davemloft.net>



On 7/12/2019 3:18 PM, David Miller wrote:
> From: Taehee Yoo <ap420073@gmail.com>
> Date: Sat,  6 Jul 2019 01:08:09 +0900
>
>> When a vport is deleted, the maximum headroom size would be changed.
>> If the vport which has the largest headroom is deleted,
>> the new max_headroom would be set.
>> But, if the new headroom size is equal to the old headroom size,
>> updating routine is unnecessary.
>>
>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> I don't think Taehee should be punished because it took several days
> to get someone to look at and review and/or test this patch and
> meanwhile the net-next tree closed down.
>
> I ask for maintainer review as both a courtesy and a way to lessen
> my workload.  But if that means patches rot for days in patchwork
> I'm just going to apply them after my own review.
>
> So I'm applying this now.
>
My apologies Dave.  I did test and review the patch, perhaps you didn't 
see it.  In any case, you're right, Taehee was owed a more timely review 
and I missed it.

Thanks for applying the patch.

- Greg

^ permalink raw reply

* Re: [PATCH] be2net: fix adapter->big_page_size miscaculation
From: David Miller @ 2019-07-12 22:46 UTC (permalink / raw)
  To: cai
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
	arnd, dhowells, hpa, netdev, linux-arch, linux-kernel
In-Reply-To: <1562959401-19815-1-git-send-email-cai@lca.pw>

From: Qian Cai <cai@lca.pw>
Date: Fri, 12 Jul 2019 15:23:21 -0400

> The commit d66acc39c7ce ("bitops: Optimise get_order()") introduced a
> problem for the be2net driver as "rx_frag_size" could be a module
> parameter that can be changed while loading the module.

Why is this a problem?

> That commit checks __builtin_constant_p() first in get_order() which
> cause "adapter->big_page_size" to be assigned a value based on the
> the default "rx_frag_size" value at the compilation time. It also
> generate a compilation warning,

rx_frag_size is not a constant, therefore the __builtin_constant_p()
test should not pass.

This explanation doesn't seem valid.

^ permalink raw reply

* Re: [PATCH net] net: neigh: fix multiple neigh timer scheduling
From: David Miller @ 2019-07-12 22:42 UTC (permalink / raw)
  To: lorenzo.bianconi; +Cc: netdev, dsahern, marek
In-Reply-To: <20190712.154047.1787144778692165503.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 12 Jul 2019 15:40:47 -0700 (PDT)

> From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> Date: Fri, 12 Jul 2019 19:22:51 +0200
> 
>> Neigh timer can be scheduled multiple times from userspace adding
>> multiple neigh entries and forcing the neigh timer scheduling passing
>> NTF_USE in the netlink requests.
>> This will result in a refcount leak and in the following dump stack:
>  ...
>> Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
>> receiving a netlink request with NTF_USE flag set
>> 
>> Reported-by: Marek Majkowski <marek@cloudflare.com>
>> Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
>> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> 
> Applied and queued up for -stable, thanks.

Actually, reverted, you didn't test the build thoroughly as Infiniband
fails:

drivers/infiniband/core/addr.c: In function ‘dst_fetch_ha’:
drivers/infiniband/core/addr.c:337:3: error: too few arguments to function ‘neigh_event_send’
   neigh_event_send(n, NULL);
   ^~~~~~~~~~~~~~~~

^ permalink raw reply

* Re: [PATCH net] net: neigh: fix multiple neigh timer scheduling
From: David Miller @ 2019-07-12 22:40 UTC (permalink / raw)
  To: lorenzo.bianconi; +Cc: netdev, dsahern, marek
In-Reply-To: <7b254317bcb84a33cdbe8eed96e510324d6eb97c.1562951883.git.lorenzo.bianconi@redhat.com>

From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Date: Fri, 12 Jul 2019 19:22:51 +0200

> Neigh timer can be scheduled multiple times from userspace adding
> multiple neigh entries and forcing the neigh timer scheduling passing
> NTF_USE in the netlink requests.
> This will result in a refcount leak and in the following dump stack:
 ...
> Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
> receiving a netlink request with NTF_USE flag set
> 
> Reported-by: Marek Majkowski <marek@cloudflare.com>
> Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] net: dsa: qca8k: replace legacy gpio include
From: David Miller @ 2019-07-12 22:38 UTC (permalink / raw)
  To: chunkeey; +Cc: netdev, f.fainelli, vivien.didelot, andrew, lkp
In-Reply-To: <20190712153336.5018-1-chunkeey@gmail.com>

From: Christian Lamparter <chunkeey@gmail.com>
Date: Fri, 12 Jul 2019 17:33:36 +0200

> This patch replaces the legacy bulk gpio.h include
> with the proper gpio/consumer.h variant. This was
> caught by the kbuild test robot that was running
> into an error because of this.
> 
> For more information why linux/gpio.h is bad can be found in:
> commit 56a46b6144e7 ("gpio: Clarify that <linux/gpio.h> is legacy")
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Link: https://www.spinics.net/lists/netdev/msg584447.html
> Fixes: a653f2f538f9 ("net: dsa: qca8k: introduce reset via gpio feature")
> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: hisilicon: Use devm_platform_ioremap_resource
From: David Miller @ 2019-07-12 22:37 UTC (permalink / raw)
  To: xiaojiangfeng
  Cc: yisen.zhuang, salil.mehta, netdev, linux-kernel, sergei.shtylyov,
	leeyou.li, nixiaoming
In-Reply-To: <1562937384-121710-1-git-send-email-xiaojiangfeng@huawei.com>

From: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Date: Fri, 12 Jul 2019 21:16:24 +0800

> Use devm_platform_ioremap_resource instead of
> devm_ioremap_resource. Make the code simpler.
> 
> Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH] [net-next] cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()
From: David Miller @ 2019-07-12 22:36 UTC (permalink / raw)
  To: arnd
  Cc: vishal, rahul.lakkireddy, ganeshgr, alexios.zavras, arjun,
	surendra, netdev, linux-kernel, clang-built-linux
In-Reply-To: <20190712090700.317887-1-arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Fri, 12 Jul 2019 11:06:33 +0200

> The cudbg_collect_mem_region() and cudbg_read_fw_mem() both use several
> hundred kilobytes of kernel stack space. One gets inlined into the other,
> which causes the stack usage to be combined beyond the warning limit
> when building with clang:
> 
> drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c:1057:12: error: stack frame size of 1244 bytes in function 'cudbg_collect_mem_region' [-Werror,-Wframe-larger-than=]
> 
> Restructuring cudbg_collect_mem_region() lets clang do the same
> optimization that gcc does and reuse the stack slots as it can
> see that the large variables are never used together.
> 
> A better fix might be to avoid using cudbg_meminfo on the stack
> altogether, but that requires a larger rewrite.
> 
> Fixes: a1c69520f785 ("cxgb4: collect MC memory dump")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Applied.

^ permalink raw reply

* Re: [PATCH v2] tipc: ensure head->lock is initialised
From: David Miller @ 2019-07-12 22:34 UTC (permalink / raw)
  To: chris.packham
  Cc: jon.maloy, eric.dumazet, ying.xue, linux-kernel, netdev,
	tipc-discussion
In-Reply-To: <20190711224115.21499-1-chris.packham@alliedtelesis.co.nz>

From: Chris Packham <chris.packham@alliedtelesis.co.nz>
Date: Fri, 12 Jul 2019 10:41:15 +1200

> tipc_named_node_up() creates a skb list. It passes the list to
> tipc_node_xmit() which has some code paths that can call
> skb_queue_purge() which relies on the list->lock being initialised.
> 
> The spin_lock is only needed if the messages end up on the receive path
> but when the list is created in tipc_named_node_up() we don't
> necessarily know if it is going to end up there.
> 
> Once all the skb list users are updated in tipc it will then be possible
> to update them to use the unlocked variants of the skb list functions
> and initialise the lock when we know the message will follow the receive
> path.
> 
> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/1] tc-tests: updated skbedit tests
From: David Miller @ 2019-07-12 22:33 UTC (permalink / raw)
  To: mrv; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1562862540-16509-1-git-send-email-mrv@mojatatu.com>

From: Roman Mashak <mrv@mojatatu.com>
Date: Thu, 11 Jul 2019 12:29:00 -0400

> - Added mask upper bound test case
> - Added mask validation test case
> - Added mask replacement case
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>

New tests I'll allow still now, applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/2] Fix bugs in NFP flower match offload
From: David Miller @ 2019-07-12 22:33 UTC (permalink / raw)
  To: john.hurley; +Cc: netdev, simon.horman, jakub.kicinski, oss-drivers
In-Reply-To: <1562783430-7031-1-git-send-email-john.hurley@netronome.com>

From: John Hurley <john.hurley@netronome.com>
Date: Wed, 10 Jul 2019 19:30:28 +0100

> This patchset contains bug fixes for corner cases in the match fields of
> flower offloads. The patches ensure that flows that should not be
> supported are not (incorrectly) offloaded. These include rules that match
> on layer 3 and/or 4 data without specified ethernet or ip protocol fields.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5e: Provide cb_list pointer when setting up tc block on rep
From: David Miller @ 2019-07-12 22:29 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, pablo, saeedm
In-Reply-To: <20190710182554.2988-1-vladbu@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>
Date: Wed, 10 Jul 2019 21:25:54 +0300

> Recent refactoring of tc block offloads infrastructure introduced new
> flow_block_cb_setup_simple() method intended to be used as unified way for
> all drivers to register offload callbacks. However, commit that actually
> extended all users (drivers) with block cb list and provided it to
> flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer
> dereference when creating Qdisc:
 ...
> Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to
> flow_block_cb_setup_simple() function instead of hardcoded NULL pointer.
> 
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: phy: make exported variables non-static
From: David Miller @ 2019-07-12 22:26 UTC (permalink / raw)
  To: efremov; +Cc: andrew, f.fainelli, hkallweit1, netdev, linux-kernel
In-Reply-To: <20190710180324.8131-1-efremov@linux.com>

From: Denis Efremov <efremov@linux.com>
Date: Wed, 10 Jul 2019 21:03:24 +0300

> The variables phy_basic_ports_array, phy_fibre_port_array and
> phy_all_ports_features_array are declared static and marked
> EXPORT_SYMBOL_GPL(), which is at best an odd combination.
> Because the variables were decided to be a part of API, this commit
> removes the static attributes and adds the declarations to the header.
> 
> Fixes: 3c1bcc8614db ("net: ethernet: Convert phydev advertize and supported from u32 to link mode")
> Signed-off-by: Denis Efremov <efremov@linux.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd()
From: David Miller @ 2019-07-12 22:25 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, pablo, saeedm
In-Reply-To: <20190710171229.26900-1-vladbu@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>
Date: Wed, 10 Jul 2019 20:12:29 +0300

> After recent refactoring of block offlads infrastructure, indr_dev->block
> pointer is dereferenced before it is verified to be non-NULL. Example stack
> trace where this behavior leads to NULL-pointer dereference error when
> creating vxlan dev on system with mlx5 NIC with offloads enabled:
 ...
> Introduce new function tcf_block_non_null_shared() that verifies block
> pointer before dereferencing it to obtain index. Use the function in
> tc_indr_block_ing_cmd() to prevent NULL pointer dereference.
> 
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Applied.

^ permalink raw reply

* [PATCH net-next 2/2] tc-testing: updated skbedit action tests with batch create/delete
From: Roman Mashak @ 2019-07-12 22:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1562970120-29517-1-git-send-email-mrv@mojatatu.com>

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/skbedit.json       | 47 ++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
index 45e7e89928a5..797477c1208f 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
@@ -553,5 +553,52 @@
         "teardown": [
             "$TC actions flush action skbedit"
         ]
+    },
+    {
+        "id": "630c",
+        "name": "Add batch of 32 skbedit actions with all parameters and cookie",
+        "category": [
+            "actions",
+            "skbedit"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action skbedit",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd ptype host inheritdsfield index \\$i cookie aabbccddeeff112233445566778800a1 \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action skbedit",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action skbedit"
+        ]
+    },
+    {
+        "id": "706d",
+        "name": "Delete batch of 32 skbedit actions with all parameters",
+        "category": [
+            "actions",
+            "skbedit"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action skbedit",
+                0,
+                1,
+                255
+            ],
+            "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd ptype host inheritdsfield index \\$i \\\"; args=\\\"\\$args\\$cmd\\\"; done && $TC actions add \\$args\""
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action skbedit",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
     }
 ]
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next 1/2] net sched: update skbedit action for batched events operations
From: Roman Mashak @ 2019-07-12 22:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_skbedit.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 215a06705cef..dc3c653ec45e 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -306,6 +306,17 @@ static int tcf_skbedit_search(struct net *net, struct tc_action **a, u32 index)
 	return tcf_idr_search(tn, a, index);
 }
 
+static size_t tcf_skbedit_get_fill_size(const struct tc_action *act)
+{
+	return nla_total_size(sizeof(struct tc_skbedit))
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_PRIORITY */
+		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_QUEUE_MAPPING */
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MARK */
+		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_PTYPE */
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MASK */
+		+ nla_total_size_64bit(sizeof(u64)); /* TCA_SKBEDIT_FLAGS */
+}
+
 static struct tc_action_ops act_skbedit_ops = {
 	.kind		=	"skbedit",
 	.id		=	TCA_ID_SKBEDIT,
@@ -315,6 +326,7 @@ static struct tc_action_ops act_skbedit_ops = {
 	.init		=	tcf_skbedit_init,
 	.cleanup	=	tcf_skbedit_cleanup,
 	.walk		=	tcf_skbedit_walker,
+	.get_fill_size	=	tcf_skbedit_get_fill_size,
 	.lookup		=	tcf_skbedit_search,
 	.size		=	sizeof(struct tcf_skbedit),
 };
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH] [net-next] davinci_cpdma: don't cast dma_addr_t to pointer
From: David Miller @ 2019-07-12 22:19 UTC (permalink / raw)
  To: arnd
  Cc: ivan.khoronzhuk, grygorii.strashko, andrew, ilias.apalodimas,
	linux-omap, netdev, linux-kernel
In-Reply-To: <20190710080106.24237-1-arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 10 Jul 2019 10:00:33 +0200

> dma_addr_t may be 64-bit wide on 32-bit architectures, so it is not
> valid to cast between it and a pointer:
> 
> drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit_si':
> drivers/net/ethernet/ti/davinci_cpdma.c:1047:12: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
> drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_idle_submit_mapped':
> drivers/net/ethernet/ti/davinci_cpdma.c:1114:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
> drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit_mapped':
> drivers/net/ethernet/ti/davinci_cpdma.c:1164:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
> 
> Solve this by using two separate members in 'struct submit_info'.
> Since this avoids the use of the 'flag' member, the structure does
> not even grow in typical configurations.
> 
> Fixes: 6670acacd59e ("net: ethernet: ti: davinci_cpdma: add dma mapped submit")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
From: David Miller @ 2019-07-12 22:18 UTC (permalink / raw)
  To: ap420073; +Cc: pshelar, netdev, dev
In-Reply-To: <20190705160809.5202-1-ap420073@gmail.com>

From: Taehee Yoo <ap420073@gmail.com>
Date: Sat,  6 Jul 2019 01:08:09 +0900

> When a vport is deleted, the maximum headroom size would be changed.
> If the vport which has the largest headroom is deleted,
> the new max_headroom would be set.
> But, if the new headroom size is equal to the old headroom size,
> updating routine is unnecessary.
> 
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

I don't think Taehee should be punished because it took several days
to get someone to look at and review and/or test this patch and
meanwhile the net-next tree closed down.

I ask for maintainer review as both a courtesy and a way to lessen
my workload.  But if that means patches rot for days in patchwork
I'm just going to apply them after my own review.

So I'm applying this now.


^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-12 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
	c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Paul E. McKenney,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712170024.111093-4-joel@joelfernandes.org>

On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> The rcu/sync code was doing its own check whether we are in a reader
> section. With RCU consolidating flavors and the generic helper added in
> this series, this is no longer need. We can just use the generic helper
> and it results in a nice cleanup.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Hi Oleg,
Slightly unrelated to the patch,
I tried hard to understand this comment below in percpu_down_read() but no dice.

I do understand how rcu sync and percpu rwsem works, however the comment
below didn't make much sense to me. For one, there's no readers_fast anymore
so I did not follow what readers_fast means. Could the comment be updated to
reflect latest changes?
Also could you help understand how is a writer not able to change
sem->state and count the per-cpu read counters at the same time as the
comment tries to say?

	/*
	 * We are in an RCU-sched read-side critical section, so the writer
	 * cannot both change sem->state from readers_fast and start checking
	 * counters while we are here. So if we see !sem->state, we know that
	 * the writer won't be checking until we're past the preempt_enable()
	 * and that once the synchronize_rcu() is done, the writer will see
	 * anything we did within this RCU-sched read-size critical section.
	 */

Also,
I guess we could get rid of all of the gp_ops struct stuff now that since all
the callbacks are the same now. I will post that as a follow-up patch to this
series.

thanks!

 - Joel


> ---
> Please note: Only build and boot tested this particular patch so far.
> 
>  include/linux/rcu_sync.h |  5 ++---
>  kernel/rcu/sync.c        | 22 ----------------------
>  2 files changed, 2 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> index 6fc53a1345b3..c954f1efc919 100644
> --- a/include/linux/rcu_sync.h
> +++ b/include/linux/rcu_sync.h
> @@ -39,9 +39,8 @@ extern void rcu_sync_lockdep_assert(struct rcu_sync *);
>   */
>  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
>  {
> -#ifdef CONFIG_PROVE_RCU
> -	rcu_sync_lockdep_assert(rsp);
> -#endif
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> +			 "suspicious rcu_sync_is_idle() usage");
>  	return !rsp->gp_state; /* GP_IDLE */
>  }
>  
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index a8304d90573f..535e02601f56 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -10,37 +10,25 @@
>  #include <linux/rcu_sync.h>
>  #include <linux/sched.h>
>  
> -#ifdef CONFIG_PROVE_RCU
> -#define __INIT_HELD(func)	.held = func,
> -#else
> -#define __INIT_HELD(func)
> -#endif
> -
>  static const struct {
>  	void (*sync)(void);
>  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
>  	void (*wait)(void);
> -#ifdef CONFIG_PROVE_RCU
> -	int  (*held)(void);
> -#endif
>  } gp_ops[] = {
>  	[RCU_SYNC] = {
>  		.sync = synchronize_rcu,
>  		.call = call_rcu,
>  		.wait = rcu_barrier,
> -		__INIT_HELD(rcu_read_lock_held)
>  	},
>  	[RCU_SCHED_SYNC] = {
>  		.sync = synchronize_rcu,
>  		.call = call_rcu,
>  		.wait = rcu_barrier,
> -		__INIT_HELD(rcu_read_lock_sched_held)
>  	},
>  	[RCU_BH_SYNC] = {
>  		.sync = synchronize_rcu,
>  		.call = call_rcu,
>  		.wait = rcu_barrier,
> -		__INIT_HELD(rcu_read_lock_bh_held)
>  	},
>  };
>  
> @@ -49,16 +37,6 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
>  
>  #define	rss_lock	gp_wait.lock
>  
> -#ifdef CONFIG_PROVE_RCU
> -void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
> -{
> -	RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
> -			 "suspicious rcu_sync_is_idle() usage");
> -}
> -
> -EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
> -#endif
> -
>  /**
>   * rcu_sync_init() - Initialize an rcu_sync structure
>   * @rsp: Pointer to rcu_sync structure to be initialized
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

^ permalink raw reply

* [Patch net] fib: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-12 20:17 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Julian Anastasov, David Ahern

In a rare case where we redirect local packets from veth to lo,
these packets fail to pass the source validation when rp_filter
is turned on, as the tracing shows:

  <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
  <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130

So, the fib table lookup returns eth0 as the nexthop even though
the packets are local and should be routed to loopback nonetheless,
but they can't pass the dev match check in fib_info_nh_uses_dev().

It should be safe to relax this check for this special case, as
normally packets coming out of loopback device still have skb_dst
so they won't even hit this slow path.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/ipv4/fib_frontend.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 317339cd7f03..8662a44a28f9 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -388,6 +388,12 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	fib_combine_itag(itag, &res);
 
 	dev_match = fib_info_nh_uses_dev(res.fi, dev);
+	/* This is rare, loopback packets retain skb_dst so normally they
+	 * would not even hit this slow path.
+	 */
+	dev_match = dev_match || (res.type == RTN_LOCAL &&
+				  dev == net->loopback_dev &&
+				  IN_DEV_ACCEPT_LOCAL(idev));
 	if (dev_match) {
 		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
 		return ret;
-- 
2.21.0


^ permalink raw reply related

* [Patch net] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: Cong Wang @ 2019-07-12 20:17 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Eric Dumazet

For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
notably fq_codel, it makes no sense to let packets bypass the TC
filters we setup in any scenario, otherwise our packets steering
policy could not be enforced.

This can be easily reproduced with the following script:

 ip li add dev dummy0 type dummy
 ifconfig dummy0 up
 tc qd add dev dummy0 root fq_codel
 tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
 tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
 ping -I dummy0 192.168.112.1

Without this patch, packets are sent directly to dummy0 without
hitting any of the filters. With this patch, packets are redirected
to loopback as expected.

This fix is not perfect, it only unsets the flag but does not set it back
because we have to save the information somewhere in the qdisc if we
really want that.

Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_api.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 638c1bc1ea1b..5c800b0c810b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2152,6 +2152,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false, rtnl_held);
 		tfilter_put(tp, fh);
+		q->flags &= ~TCQ_F_CAN_BYPASS;
 	}
 
 errout:
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf] selftests/bpf: put test_stub.o into $(OUTPUT)
From: Andrii Nakryiko @ 2019-07-12 20:15 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, Networking, gor, heiko.carstens
In-Reply-To: <20190712135950.91600-1-iii@linux.ibm.com>

On Fri, Jul 12, 2019 at 7:00 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> Add a rule to put test_stub.o in $(OUTPUT) and change the references to
> it accordingly. This prevents test_stub.o from being created in the
> source directory.
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---

Makes sense.

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/testing/selftests/bpf/Makefile | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 277d8605e340..66b6f7fb683c 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -83,13 +83,16 @@ all: $(TEST_CUSTOM_PROGS)
>  $(OUTPUT)/urandom_read: $(OUTPUT)/%: %.c
>         $(CC) -o $@ $< -Wl,--build-id
>
> +$(OUTPUT)/test_stub.o: test_stub.c
> +       $(CC) $(TEST_PROGS_CFLAGS) $(CFLAGS) -c -o $@ $<
> +
>  $(OUTPUT)/test_maps: map_tests/*.c
>
>  BPFOBJ := $(OUTPUT)/libbpf.a
>
> -$(TEST_GEN_PROGS): test_stub.o $(BPFOBJ)
> +$(TEST_GEN_PROGS): $(OUTPUT)/test_stub.o $(BPFOBJ)
>
> -$(TEST_GEN_PROGS_EXTENDED): test_stub.o $(OUTPUT)/libbpf.a
> +$(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/test_stub.o $(OUTPUT)/libbpf.a
>
>  $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
>  $(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c
> --
> 2.21.0
>

^ permalink raw reply

* Re: [PATCH][bpf-next] bpf: verifier: avoid fall-through warnings
From: Gustavo A. R. Silva @ 2019-07-12 19:44 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, Lawrence Brakmo
  Cc: netdev, bpf, linux-kernel, Kees Cook
In-Reply-To: <d3f4a4d3-e763-f146-8383-d5ef48d9d382@iogearbox.net>



On 7/12/19 8:41 AM, Daniel Borkmann wrote:

>>
>> This patch is part of the ongoing efforts to enable
>> -Wimplicit-fallthrough.
>>
>> Acked-by: Andrii Nakryiko <andriin@fb.com>
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> 
> Looks good, applied to bpf, thanks.
> 

Awesome. :)

Thanks, Daniel.
--
Gustavo

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: fix test_send_signal_nmi on s390
From: Andrii Nakryiko @ 2019-07-12 19:59 UTC (permalink / raw)
  To: Y Song; +Cc: Ilya Leoshkevich, bpf, Networking, gor, heiko.carstens
In-Reply-To: <CAH3MdRWEfrQt6P4eMYgGRE9OgLkjQLqoZnCwFbrxwqKPyrrHpQ@mail.gmail.com>

On Fri, Jul 12, 2019 at 12:55 PM Y Song <ys114321@gmail.com> wrote:
>
> On Fri, Jul 12, 2019 at 11:24 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Jul 12, 2019 at 10:46 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> > >
> > > Many s390 setups (most notably, KVM guests) do not have access to
> > > hardware performance events.
> > >
> > > Therefore, use the software event instead.
> > >
> > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > > Acked-by: Vasily Gorbik <gor@linux.ibm.com>
> > > ---
> > >  tools/testing/selftests/bpf/prog_tests/send_signal.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > > index 67cea1686305..4a45ea0b8448 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> > > @@ -176,10 +176,19 @@ static int test_send_signal_tracepoint(void)
> > >  static int test_send_signal_nmi(void)
> > >  {
> > >         struct perf_event_attr attr = {
> > > +#if defined(__s390__)
> > > +               /* Many s390 setups (most notably, KVM guests) do not have
> > > +                * access to hardware performance events.
> > > +                */
> > > +               .sample_period = 1,
> > > +               .type = PERF_TYPE_SOFTWARE,
> > > +               .config = PERF_COUNT_SW_CPU_CLOCK,
> > > +#else
> >
> > Is there any harm in switching all archs to software event? I'd rather
> > avoid all those special arch cases, which will be really hard to test
> > for people without direct access to them.
>
> I still like to do hardware cpu_cycles in order to test nmi.
> In a physical box.
> $ perf list
> List of pre-defined events (to be used in -e):
>
>   branch-instructions OR branches                    [Hardware event]
>   branch-misses                                      [Hardware event]
>   bus-cycles                                         [Hardware event]
>   cache-misses                                       [Hardware event]
>   cache-references                                   [Hardware event]
>   cpu-cycles OR cycles                               [Hardware event]
>   instructions                                       [Hardware event]
>   ref-cycles                                         [Hardware event]
>
>   alignment-faults                                   [Software event]
>   bpf-output                                         [Software event]
>   context-switches OR cs                             [Software event]
>   cpu-clock                                          [Software event]
>   cpu-migrations OR migrations                       [Software event]
>   dummy                                              [Software event]
>   emulation-faults                                   [Software event]
>   major-faults                                       [Software event]
>   minor-faults                                       [Software event]
>   page-faults OR faults                              [Software event]
>   task-clock                                         [Software event]
>
>   L1-dcache-load-misses                              [Hardware cache event]
> ...
>
> In a VM
> $ perf list
> List of pre-defined events (to be used in -e):
>
>   alignment-faults                                   [Software event]
>   bpf-output                                         [Software event]
>   context-switches OR cs                             [Software event]
>   cpu-clock                                          [Software event]
>   cpu-migrations OR migrations                       [Software event]
>   dummy                                              [Software event]
>   emulation-faults                                   [Software event]
>   major-faults                                       [Software event]
>   minor-faults                                       [Software event]
>   page-faults OR faults                              [Software event]
>   task-clock                                         [Software event]
>
>   msr/smi/                                           [Kernel PMU
> event]
>   msr/tsc/                                           [Kernel PMU event]
> .....
>
> Is it possible that we detect at runtime whether the hardware
> cpu_cycles available or not?
> If available, let us do hardware one. Otherwise, skip or do the
> software one? The software one does not really do nmi so it will take
> the same code path in kernel as tracepoint.

Yeah, that's what I was worried about.

Ilya, could you please take a look how hard would it be to do this HW
vs SW perf event support?

>
> >
> > >                 .sample_freq = 50,
> > >                 .freq = 1,
> > >                 .type = PERF_TYPE_HARDWARE,
> > >                 .config = PERF_COUNT_HW_CPU_CYCLES,
> > > +#endif
> > >         };
> > >
> > >         return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event");
> > > --
> > > 2.21.0
> > >

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox