Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: David Miller @ 2019-07-14  2:38 UTC (permalink / raw)
  To: idosch
  Cc: nhorman, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch
In-Reply-To: <20190711123909.GA10978@splinter>

From: Ido Schimmel <idosch@idosch.org>
Date: Thu, 11 Jul 2019 15:39:09 +0300

> Before I start working on v2, I would like to get your feedback on the
> high level plan. Also adding Neil who is the maintainer of drop_monitor
> (and counterpart DropWatch tool [1]).
> 
> IIUC, the problem you point out is that users need to use different
> tools to monitor packet drops based on where these drops occur
> (SW/HW/XDP).
> 
> Therefore, my plan is to extend the existing drop_monitor netlink
> channel to also cover HW drops. I will add a new message type and a new
> multicast group for HW drops and encode in the message what is currently
> encoded in the devlink events.

Consolidating the reporting to merge with the drop monitor facilities
is definitely a step in the right direction.

^ permalink raw reply

* Re: [net-next] bonding: add documentation for peer_notif_delay
From: David Miller @ 2019-07-14  2:29 UTC (permalink / raw)
  To: vincent; +Cc: j.vosburgh, vfalico, andy, netdev
In-Reply-To: <20190713143527.27562-1-vincent@bernat.ch>

From: Vincent Bernat <vincent@bernat.ch>
Date: Sat, 13 Jul 2019 16:35:27 +0200

> Ability to tweak the interval between peer notifications has been
> added in 07a4ddec3ce9 ("bonding: add an option to specify a delay
> between peer notifications") but the documentation was not updated.
> 
> Signed-off-by: Vincent Bernat <vincent@bernat.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net] r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
From: David Miller @ 2019-07-14  2:28 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev, ionut.radu
In-Reply-To: <dfc533d0-a90e-37fe-2338-483abc9c1177@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 13 Jul 2019 13:45:47 +0200

> On RTL8411b the RX unit gets confused if the PHY is powered-down.
> This was reported in [0] and confirmed by Realtek. Realtek provided
> a sequence to fix the RX unit after PHY wakeup.
> 
> The issue itself seems to have been there longer, the Fixes tag
> refers to where the fix applies properly.
> 
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
> 
> Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support")
> Tested-by: Ionut Radu <ionut.radu@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 2/2] net-next: ag71xx: Rearrange ag711xx struct to remove holes
From: David Miller @ 2019-07-14  2:24 UTC (permalink / raw)
  To: rosenp; +Cc: netdev
In-Reply-To: <20190713020921.18202-2-rosenp@gmail.com>

From: Rosen Penev <rosenp@gmail.com>
Date: Fri, 12 Jul 2019 19:09:21 -0700

> Removed ____cacheline_aligned attribute to ring structs. This actually
> causes holes in the ag71xx struc as well as lower performance.
> 
> Rearranged struct members to fall within respective cachelines. The RX
> ring struct now does not share a cacheline with the TX ring. The NAPI
> atruct now takes up its own cachelines and does not share.
> 
> According to pahole -C ag71xx -c 32
 ...

net-next is closed, therefore it is not appropriate to submit this change
at the current time.

^ permalink raw reply

* Re: [PATCH 1/2] net-next: ag71xx: Add missing header
From: David Miller @ 2019-07-14  2:24 UTC (permalink / raw)
  To: rosenp; +Cc: netdev
In-Reply-To: <20190713020921.18202-1-rosenp@gmail.com>

From: Rosen Penev <rosenp@gmail.com>
Date: Fri, 12 Jul 2019 19:09:20 -0700

> ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function
> 
> Signed-off-by: Rosen Penev <rosenp@gmail.com>

This is a bug fix and should target 'net'.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net sched: update skbedit action for batched events operations
From: David Miller @ 2019-07-14  2:23 UTC (permalink / raw)
  To: mrv; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1562970120-29517-1-git-send-email-mrv@mojatatu.com>

From: Roman Mashak <mrv@mojatatu.com>
Date: Fri, 12 Jul 2019 18:21:59 -0400

> Add get_fill_size() routine used to calculate the action size
> when building a batch of events.
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>

Please add an appropriate Fixes: tag, and also when you respin this provide the
required "[PATCH 0/N] " header posting explaining what the series is doing at
a high level, how it is doing it, and why it is doing it that way.

Thank you.

^ permalink raw reply

* Re: [Patch net] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: Cong Wang @ 2019-07-14  2:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <8733195c-ac70-4a2a-db2f-b9bdfd05a703@gmail.com>

On Sat, Jul 13, 2019 at 5:54 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/12/19 10:17 PM, Cong Wang wrote:
> > For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
> > notably fq_codel, it makes no sense to let packets bypass the TC
> > filters we setup in any scenario, otherwise our packets steering
> > policy could not be enforced.
> >
> > This can be easily reproduced with the following script:
> >
> >  ip li add dev dummy0 type dummy
> >  ifconfig dummy0 up
> >  tc qd add dev dummy0 root fq_codel
> >  tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
> >  tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
> >  ping -I dummy0 192.168.112.1
> >
> > Without this patch, packets are sent directly to dummy0 without
> > hitting any of the filters. With this patch, packets are redirected
> > to loopback as expected.
> >
> > This fix is not perfect, it only unsets the flag but does not set it back
> > because we have to save the information somewhere in the qdisc if we
> > really want that.
> >
> > Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> > Cc: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  net/sched/cls_api.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> > index 638c1bc1ea1b..5c800b0c810b 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -2152,6 +2152,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
> >               tfilter_notify(net, skb, n, tp, block, q, parent, fh,
> >                              RTM_NEWTFILTER, false, rtnl_held);
> >               tfilter_put(tp, fh);
> > +             q->flags &= ~TCQ_F_CAN_BYPASS;
> >       }
> >
> >  errout:
> >
>
> Strange, because sfq and fq_codel are roughly the same for TCQ_F_CAN_BYPASS handling.
>
> Why is fq_codel_bind() not effective ?

Because I don't have class id set in the filter.

>
> If not effective, sfq had the same issue, so the Fixes: tag needs to be refined,
> maybe to commit 23624935e0c4 net_sched: TCQ_F_CAN_BYPASS generalization
>

Yeah, I think it is probably a better commit here.

Thanks.

^ permalink raw reply

* Re: [Patch net] fib: relax source validation check for loopback packets
From: David Ahern @ 2019-07-14  0:29 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Julian Anastasov
In-Reply-To: <8355af23-100f-a3bb-0759-fca8b0aa583b@gmail.com>

On 7/13/19 4:42 PM, David Ahern wrote:
> On 7/12/19 2:17 PM, Cong Wang wrote:
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 317339cd7f03..8662a44a28f9 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -388,6 +388,12 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>>  	fib_combine_itag(itag, &res);
>>  
>>  	dev_match = fib_info_nh_uses_dev(res.fi, dev);
>> +	/* This is rare, loopback packets retain skb_dst so normally they
>> +	 * would not even hit this slow path.
>> +	 */
>> +	dev_match = dev_match || (res.type == RTN_LOCAL &&
>> +				  dev == net->loopback_dev &&
> 
> The dev should not be needed. res.type == RTN_LOCAL should be enough, no?
> 

nevermind, I see why you have the dev check.

^ permalink raw reply

* Re: [GIT] Networking
From: pr-tracker-bot @ 2019-07-13 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20190712.231704.904072376124323665.davem@davemloft.net>

The pull request you sent on Fri, 12 Jul 2019 23:17:04 -0700 (PDT):

> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git refs/heads/master

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d12109291ccbef7e879cc0d0733f31685cd80854

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [Patch net] fib: relax source validation check for loopback packets
From: David Ahern @ 2019-07-13 22:42 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Julian Anastasov
In-Reply-To: <20190712201749.28421-2-xiyou.wangcong@gmail.com>

On 7/12/19 2:17 PM, Cong Wang wrote:
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 317339cd7f03..8662a44a28f9 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -388,6 +388,12 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	fib_combine_itag(itag, &res);
>  
>  	dev_match = fib_info_nh_uses_dev(res.fi, dev);
> +	/* This is rare, loopback packets retain skb_dst so normally they
> +	 * would not even hit this slow path.
> +	 */
> +	dev_match = dev_match || (res.type == RTN_LOCAL &&
> +				  dev == net->loopback_dev &&

The dev should not be needed. res.type == RTN_LOCAL should be enough, no?

> +				  IN_DEV_ACCEPT_LOCAL(idev));

Why is this check needed? Can you give an example use that is fixed -
and add one to selftests/net/fib_tests.sh?


>  	if (dev_match) {
>  		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
>  		return ret;
> 


^ permalink raw reply

* Re: [PATCH] Fix dumping vlan rules
From: Florian Westphal @ 2019-07-13 21:43 UTC (permalink / raw)
  To: michael-dev; +Cc: netdev, netfilter-devel
In-Reply-To: <20190713210306.30815-1-michael-dev@fami-braun.de>

michael-dev@fami-braun.de <michael-dev@fami-braun.de> wrote:
> From: "M. Braun" <michael-dev@fami-braun.de>
> 
> Given the following bridge rules:
> 1. ip protocol icmp accept
> 2. ether type vlan vlan type ip ip protocol icmp accept
> 
> The are currently both dumped by "nft list ruleset" as
> 1. ip protocol icmp accept
> 2. ip protocol icmp accept

Yes, thats a bug, the dependency removal is incorrect.

> +++ b/src/payload.c
> @@ -506,6 +506,18 @@ static bool payload_may_dependency_kill(struct payload_dep_ctx *ctx,
>  		     dep->left->payload.desc == &proto_ip6) &&
>  		    expr->payload.base == PROTO_BASE_TRANSPORT_HDR)
>  			return false;
> +		/* Do not kill
> +		 *  ether type vlan and vlan type ip and ip protocol icmp
> +		 * into
> +		 *  ip protocol icmp
> +		 * as this lacks ether type vlan.
> +		 * More generally speaking, do not kill protocol type
> +		 * for stacked protocols if we only have protcol type matches.
> +		 */
> +		if (dep->left->etype == EXPR_PAYLOAD && dep->op == OP_EQ &&
> +		    expr->flags & EXPR_F_PROTOCOL &&
> +		    expr->payload.base == dep->left->payload.base)
> +			return false;

Can you please add a test case for this problem to
tests/py/bridge/vlan.t so we catch this when messing with dependency
handling in the future?

Also, please submit v2 directly to netfilter-devel@.

Thanks!

^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-13 21:28 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713161316.GA39321@google.com>

On Sat, Jul 13, 2019 at 12:13:16PM -0400, Joel Fernandes wrote:
> On Sat, Jul 13, 2019 at 08:50:10AM -0700, Paul E. McKenney wrote:
> > On Sat, Jul 13, 2019 at 11:36:06AM -0400, Joel Fernandes wrote:
> > > On Sat, Jul 13, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> > > > On Sat, Jul 13, 2019 at 09:30:49AM -0400, Joel Fernandes wrote:
> > > > > On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> > > > > > On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > > > > > > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > > > > > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > > > > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > > > > > and it results in a nice cleanup.
> > > > > > > > > > > 
> > > > > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > > 
> > > > > > > > > > Hi Oleg,
> > > > > > > > > > Slightly unrelated to the patch,
> > > > > > > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > > > > > > 
> > > > > > > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > > > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > > > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > > > > > > reflect latest changes?
> > > > > > > > > > Also could you help understand how is a writer not able to change
> > > > > > > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > > > > > > comment tries to say?
> > > > > > > > > > 
> > > > > > > > > > 	/*
> > > > > > > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > > > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > > > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > > > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > > > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > > > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > > > > > > 	 */
> > > > > > > > > > 
> > > > > > > > > > Also,
> > > > > > > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > > > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > > > > > > series.
> > > > > > > > > 
> > > > > > > > > Hello, Joel,
> > > > > > > > > 
> > > > > > > > > Oleg has a set of patches updating this code that just hit mainline
> > > > > > > > > this week.  These patches get rid of the code that previously handled
> > > > > > > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > > > > > > me just missing your point?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Hi Paul,
> > > > > > > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > > > > > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > > > > > > say, I need to rebase my tree.
> > > > > > > > 
> > > > > > > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > > > > > > rebase just fine.
> > > > > > > > 
> > > > > > > > The 2 options I see are:
> > > > > > > > 1. Let us drop this patch for now and I resend it later.
> > > > > > > > 2. I resend all patches based on Linus's master branch.
> > > > > > > 
> > > > > > > Below is the updated patch based on Linus master branch:
> > > > > > > 
> > > > > > > ---8<-----------------------
> > > > > > > 
> > > > > > > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > > > > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > > > > > > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > > > > > > 
> > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > and it results in a nice cleanup.
> > > > > > > 
> > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > ---
> > > > > > >  include/linux/rcu_sync.h | 4 +---
> > > > > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > > > > > index 9b83865d24f9..0027d4c8087c 100644
> > > > > > > --- a/include/linux/rcu_sync.h
> > > > > > > +++ b/include/linux/rcu_sync.h
> > > > > > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > > > > > >   */
> > > > > > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > > > > > >  {
> > > > > > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > > > > > -			 !rcu_read_lock_bh_held() &&
> > > > > > > -			 !rcu_read_lock_sched_held(),
> > > > > > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > > > > > 
> > > > > > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > > > > > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > > > > > If you have not already done so, could you please give it a try?
> > > > > 
> > > > > Hi Paul,
> > > > > I don't think it will cause splats for !CONFIG_PREEMPT.
> > > > > 
> > > > > Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> > > > > !preemptible(). This means that:
> > > > > 
> > > > > The following expression above:
> > > > > RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> > > > > 
> > > > > Becomes:
> > > > > RCU_LOCKDEP_WARN(preemptible(), ...)
> > > > > 
> > > > > For, CONFIG_PREEMPT=n kernels, this means:
> > > > > RCU_LOCKDEP_WARN(0, ...)
> > > > > 
> > > > > Which would mean no splats. Or, did I miss the point?
> > > > 
> > > > I suggest trying it out on a CONFIG_PREEMPT=n kernel.
> > > 
> > > Sure, will do, sorry did not try it out yet because was busy with weekend
> > > chores but will do soon, thanks!
> > 
> > I am not faulting you for taking the weekend off, actually.  ;-)
> 
> ;-) 
> 
> I tried doing RCU_LOCKDEP_WARN(preemptible(), ...) in this code path and I
> don't get any splats. I also disassembled the code and it seems to me
> RCU_LOCKDEP_WARN() becomes a NOOP which also the above reasoning confirms.

OK, very good.  Could you do the same thing for the RCU_LOCKDEP_WARN()
in synchronize_rcu()?  Why or why not?

(No need to work this on your Sunday.)

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH] Fix dumping vlan rules
From: Laura Garcia @ 2019-07-13 21:15 UTC (permalink / raw)
  To: michael-dev; +Cc: netdev, Netfilter Development Mailing list
In-Reply-To: <20190713210306.30815-1-michael-dev@fami-braun.de>

CC'ing netfilter.

On Sat, Jul 13, 2019 at 11:10 PM <michael-dev@fami-braun.de> wrote:
>
> From: "M. Braun" <michael-dev@fami-braun.de>
>
> Given the following bridge rules:
> 1. ip protocol icmp accept
> 2. ether type vlan vlan type ip ip protocol icmp accept
>
> The are currently both dumped by "nft list ruleset" as
> 1. ip protocol icmp accept
> 2. ip protocol icmp accept
>
> Though, the netlink code actually is different
>
> bridge filter FORWARD 4
>   [ payload load 2b @ link header + 12 => reg 1 ]
>   [ cmp eq reg 1 0x00000008 ]
>   [ payload load 1b @ network header + 9 => reg 1 ]
>   [ cmp eq reg 1 0x00000001 ]
>   [ immediate reg 0 accept ]
>
> bridge filter FORWARD 5 4
>   [ payload load 2b @ link header + 12 => reg 1 ]
>   [ cmp eq reg 1 0x00000081 ]
>   [ payload load 2b @ link header + 16 => reg 1 ]
>   [ cmp eq reg 1 0x00000008 ]
>   [ payload load 1b @ network header + 9 => reg 1 ]
>   [ cmp eq reg 1 0x00000001 ]
>   [ immediate reg 0 accept ]
>
> Fix this by avoiding the removal of all vlan statements
> in the given example.
>
> Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
> ---
>  src/payload.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/src/payload.c b/src/payload.c
> index 3bf1ecc..905422a 100644
> --- a/src/payload.c
> +++ b/src/payload.c
> @@ -506,6 +506,18 @@ static bool payload_may_dependency_kill(struct payload_dep_ctx *ctx,
>                      dep->left->payload.desc == &proto_ip6) &&
>                     expr->payload.base == PROTO_BASE_TRANSPORT_HDR)
>                         return false;
> +               /* Do not kill
> +                *  ether type vlan and vlan type ip and ip protocol icmp
> +                * into
> +                *  ip protocol icmp
> +                * as this lacks ether type vlan.
> +                * More generally speaking, do not kill protocol type
> +                * for stacked protocols if we only have protcol type matches.
> +                */
> +               if (dep->left->etype == EXPR_PAYLOAD && dep->op == OP_EQ &&
> +                   expr->flags & EXPR_F_PROTOCOL &&
> +                   expr->payload.base == dep->left->payload.base)
> +                       return false;
>                 break;
>         }
>
> --
> 2.20.1
>

^ permalink raw reply

* [PATCH] Fix dumping vlan rules
From: michael-dev @ 2019-07-13 21:03 UTC (permalink / raw)
  To: netdev; +Cc: michael-dev

From: "M. Braun" <michael-dev@fami-braun.de>

Given the following bridge rules:
1. ip protocol icmp accept
2. ether type vlan vlan type ip ip protocol icmp accept

The are currently both dumped by "nft list ruleset" as
1. ip protocol icmp accept
2. ip protocol icmp accept

Though, the netlink code actually is different

bridge filter FORWARD 4
  [ payload load 2b @ link header + 12 => reg 1 ]
  [ cmp eq reg 1 0x00000008 ]
  [ payload load 1b @ network header + 9 => reg 1 ]
  [ cmp eq reg 1 0x00000001 ]
  [ immediate reg 0 accept ]

bridge filter FORWARD 5 4
  [ payload load 2b @ link header + 12 => reg 1 ]
  [ cmp eq reg 1 0x00000081 ]
  [ payload load 2b @ link header + 16 => reg 1 ]
  [ cmp eq reg 1 0x00000008 ]
  [ payload load 1b @ network header + 9 => reg 1 ]
  [ cmp eq reg 1 0x00000001 ]
  [ immediate reg 0 accept ]

Fix this by avoiding the removal of all vlan statements
in the given example.

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
---
 src/payload.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/payload.c b/src/payload.c
index 3bf1ecc..905422a 100644
--- a/src/payload.c
+++ b/src/payload.c
@@ -506,6 +506,18 @@ static bool payload_may_dependency_kill(struct payload_dep_ctx *ctx,
 		     dep->left->payload.desc == &proto_ip6) &&
 		    expr->payload.base == PROTO_BASE_TRANSPORT_HDR)
 			return false;
+		/* Do not kill
+		 *  ether type vlan and vlan type ip and ip protocol icmp
+		 * into
+		 *  ip protocol icmp
+		 * as this lacks ether type vlan.
+		 * More generally speaking, do not kill protocol type
+		 * for stacked protocols if we only have protcol type matches.
+		 */
+		if (dep->left->etype == EXPR_PAYLOAD && dep->op == OP_EQ &&
+		    expr->flags & EXPR_F_PROTOCOL &&
+		    expr->payload.base == dep->left->payload.base)
+			return false;
 		break;
 	}
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-13 16:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713155010.GF26519@linux.ibm.com>

On Sat, Jul 13, 2019 at 08:50:10AM -0700, Paul E. McKenney wrote:
> On Sat, Jul 13, 2019 at 11:36:06AM -0400, Joel Fernandes wrote:
> > On Sat, Jul 13, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> > > On Sat, Jul 13, 2019 at 09:30:49AM -0400, Joel Fernandes wrote:
> > > > On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > > > > > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > > > > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > > > > and it results in a nice cleanup.
> > > > > > > > > > 
> > > > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > 
> > > > > > > > > Hi Oleg,
> > > > > > > > > Slightly unrelated to the patch,
> > > > > > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > > > > > 
> > > > > > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > > > > > reflect latest changes?
> > > > > > > > > Also could you help understand how is a writer not able to change
> > > > > > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > > > > > comment tries to say?
> > > > > > > > > 
> > > > > > > > > 	/*
> > > > > > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > > > > > 	 */
> > > > > > > > > 
> > > > > > > > > Also,
> > > > > > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > > > > > series.
> > > > > > > > 
> > > > > > > > Hello, Joel,
> > > > > > > > 
> > > > > > > > Oleg has a set of patches updating this code that just hit mainline
> > > > > > > > this week.  These patches get rid of the code that previously handled
> > > > > > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > > > > > me just missing your point?
> > > > > > > > 
> > > > > > > 
> > > > > > > Hi Paul,
> > > > > > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > > > > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > > > > > say, I need to rebase my tree.
> > > > > > > 
> > > > > > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > > > > > rebase just fine.
> > > > > > > 
> > > > > > > The 2 options I see are:
> > > > > > > 1. Let us drop this patch for now and I resend it later.
> > > > > > > 2. I resend all patches based on Linus's master branch.
> > > > > > 
> > > > > > Below is the updated patch based on Linus master branch:
> > > > > > 
> > > > > > ---8<-----------------------
> > > > > > 
> > > > > > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > > > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > > > > > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > > > > > 
> > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > and it results in a nice cleanup.
> > > > > > 
> > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  include/linux/rcu_sync.h | 4 +---
> > > > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > > > > index 9b83865d24f9..0027d4c8087c 100644
> > > > > > --- a/include/linux/rcu_sync.h
> > > > > > +++ b/include/linux/rcu_sync.h
> > > > > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > > > > >   */
> > > > > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > > > > >  {
> > > > > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > > > > -			 !rcu_read_lock_bh_held() &&
> > > > > > -			 !rcu_read_lock_sched_held(),
> > > > > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > > > > 
> > > > > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > > > > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > > > > If you have not already done so, could you please give it a try?
> > > > 
> > > > Hi Paul,
> > > > I don't think it will cause splats for !CONFIG_PREEMPT.
> > > > 
> > > > Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> > > > !preemptible(). This means that:
> > > > 
> > > > The following expression above:
> > > > RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> > > > 
> > > > Becomes:
> > > > RCU_LOCKDEP_WARN(preemptible(), ...)
> > > > 
> > > > For, CONFIG_PREEMPT=n kernels, this means:
> > > > RCU_LOCKDEP_WARN(0, ...)
> > > > 
> > > > Which would mean no splats. Or, did I miss the point?
> > > 
> > > I suggest trying it out on a CONFIG_PREEMPT=n kernel.
> > 
> > Sure, will do, sorry did not try it out yet because was busy with weekend
> > chores but will do soon, thanks!
> 
> I am not faulting you for taking the weekend off, actually.  ;-)

;-) 

I tried doing RCU_LOCKDEP_WARN(preemptible(), ...) in this code path and I
don't get any splats. I also disassembled the code and it seems to me
RCU_LOCKDEP_WARN() becomes a NOOP which also the above reasoning confirms.

thanks,

 - Joel


^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-13 15:50 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713153606.GD133650@google.com>

On Sat, Jul 13, 2019 at 11:36:06AM -0400, Joel Fernandes wrote:
> On Sat, Jul 13, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> > On Sat, Jul 13, 2019 at 09:30:49AM -0400, Joel Fernandes wrote:
> > > On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > > > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > > > and it results in a nice cleanup.
> > > > > > > > > 
> > > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > 
> > > > > > > > Hi Oleg,
> > > > > > > > Slightly unrelated to the patch,
> > > > > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > > > > 
> > > > > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > > > > reflect latest changes?
> > > > > > > > Also could you help understand how is a writer not able to change
> > > > > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > > > > comment tries to say?
> > > > > > > > 
> > > > > > > > 	/*
> > > > > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > > > > 	 */
> > > > > > > > 
> > > > > > > > Also,
> > > > > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > > > > series.
> > > > > > > 
> > > > > > > Hello, Joel,
> > > > > > > 
> > > > > > > Oleg has a set of patches updating this code that just hit mainline
> > > > > > > this week.  These patches get rid of the code that previously handled
> > > > > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > > > > me just missing your point?
> > > > > > > 
> > > > > > 
> > > > > > Hi Paul,
> > > > > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > > > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > > > > say, I need to rebase my tree.
> > > > > > 
> > > > > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > > > > rebase just fine.
> > > > > > 
> > > > > > The 2 options I see are:
> > > > > > 1. Let us drop this patch for now and I resend it later.
> > > > > > 2. I resend all patches based on Linus's master branch.
> > > > > 
> > > > > Below is the updated patch based on Linus master branch:
> > > > > 
> > > > > ---8<-----------------------
> > > > > 
> > > > > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > > > > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > > > > 
> > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > this series, this is no longer need. We can just use the generic helper
> > > > > and it results in a nice cleanup.
> > > > > 
> > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > ---
> > > > >  include/linux/rcu_sync.h | 4 +---
> > > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > > > index 9b83865d24f9..0027d4c8087c 100644
> > > > > --- a/include/linux/rcu_sync.h
> > > > > +++ b/include/linux/rcu_sync.h
> > > > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > > > >   */
> > > > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > > > >  {
> > > > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > > > -			 !rcu_read_lock_bh_held() &&
> > > > > -			 !rcu_read_lock_sched_held(),
> > > > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > > > 
> > > > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > > > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > > > If you have not already done so, could you please give it a try?
> > > 
> > > Hi Paul,
> > > I don't think it will cause splats for !CONFIG_PREEMPT.
> > > 
> > > Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> > > !preemptible(). This means that:
> > > 
> > > The following expression above:
> > > RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> > > 
> > > Becomes:
> > > RCU_LOCKDEP_WARN(preemptible(), ...)
> > > 
> > > For, CONFIG_PREEMPT=n kernels, this means:
> > > RCU_LOCKDEP_WARN(0, ...)
> > > 
> > > Which would mean no splats. Or, did I miss the point?
> > 
> > I suggest trying it out on a CONFIG_PREEMPT=n kernel.
> 
> Sure, will do, sorry did not try it out yet because was busy with weekend
> chores but will do soon, thanks!

I am not faulting you for taking the weekend off, actually.  ;-)

							Thanx, Paul


^ permalink raw reply

* [net-next 0/2] ipvs: speedup ipvs netns dismantle
From: Haishuang Yan @ 2019-07-13 15:19 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Simon Horman
  Cc: netdev, lvs-devel, linux-kernel, netfilter-devel, Haishuang Yan

Implement exit_batch() method to dismantle more ipvs netns
per round.

Haishuang Yan (2):
  ipvs: batch __ip_vs_cleanup
  ipvs: batch __ip_vs_dev_cleanup

 include/net/ip_vs.h             |  2 +-
 net/netfilter/ipvs/ip_vs_core.c | 49 +++++++++++++++++++++++++----------------
 net/netfilter/ipvs/ip_vs_ctl.c  | 13 ++++++++---
 3 files changed, 41 insertions(+), 23 deletions(-)

-- 
1.8.3.1




^ permalink raw reply

* [net-next 2/2] ipvs: batch __ip_vs_dev_cleanup
From: Haishuang Yan @ 2019-07-13 15:19 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Simon Horman
  Cc: netdev, lvs-devel, linux-kernel, netfilter-devel, Haishuang Yan
In-Reply-To: <1563031186-2101-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

It's better to batch __ip_vs_cleanup to speedup ipvs
devices dismantle.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/netfilter/ipvs/ip_vs_core.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index b4d79b7..58af24a 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2434,14 +2434,20 @@ static int __net_init __ip_vs_dev_init(struct net *net)
 	return ret;
 }
 
-static void __net_exit __ip_vs_dev_cleanup(struct net *net)
+static void __net_exit __ip_vs_dev_cleanup_batch(struct list_head *net_list)
 {
-	struct netns_ipvs *ipvs = net_ipvs(net);
+	struct netns_ipvs *ipvs;
+	struct net *net;
+	LIST_HEAD(list);
+
 	EnterFunction(2);
-	nf_unregister_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
-	ipvs->enable = 0;	/* Disable packet reception */
-	smp_wmb();
-	ip_vs_sync_net_cleanup(ipvs);
+	list_for_each_entry(net, net_list, exit_list) {
+		ipvs = net_ipvs(net);
+		nf_unregister_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
+		ipvs->enable = 0;	/* Disable packet reception */
+		smp_wmb();
+		ip_vs_sync_net_cleanup(ipvs);
+	}
 	LeaveFunction(2);
 }
 
@@ -2454,7 +2460,7 @@ static void __net_exit __ip_vs_dev_cleanup(struct net *net)
 
 static struct pernet_operations ipvs_core_dev_ops = {
 	.init = __ip_vs_dev_init,
-	.exit = __ip_vs_dev_cleanup,
+	.exit_batch = __ip_vs_dev_cleanup_batch,
 };
 
 /*
-- 
1.8.3.1




^ permalink raw reply related

* [net-next 1/2] ipvs: batch __ip_vs_cleanup
From: Haishuang Yan @ 2019-07-13 15:19 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Simon Horman
  Cc: netdev, lvs-devel, linux-kernel, netfilter-devel, Haishuang Yan
In-Reply-To: <1563031186-2101-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

It's better to batch __ip_vs_cleanup to speedup ipvs
connections dismantle.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 include/net/ip_vs.h             |  2 +-
 net/netfilter/ipvs/ip_vs_core.c | 29 +++++++++++++++++------------
 net/netfilter/ipvs/ip_vs_ctl.c  | 13 ++++++++++---
 3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 3759167..93e7a25 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1324,7 +1324,7 @@ static inline void ip_vs_control_del(struct ip_vs_conn *cp)
 void ip_vs_control_net_cleanup(struct netns_ipvs *ipvs);
 void ip_vs_estimator_net_cleanup(struct netns_ipvs *ipvs);
 void ip_vs_sync_net_cleanup(struct netns_ipvs *ipvs);
-void ip_vs_service_net_cleanup(struct netns_ipvs *ipvs);
+void ip_vs_service_nets_cleanup(struct list_head *net_list);
 
 /* IPVS application functions
  * (from ip_vs_app.c)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 46f06f9..b4d79b7 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -2402,18 +2402,23 @@ static int __net_init __ip_vs_init(struct net *net)
 	return -ENOMEM;
 }
 
-static void __net_exit __ip_vs_cleanup(struct net *net)
+static void __net_exit __ip_vs_cleanup_batch(struct list_head *net_list)
 {
-	struct netns_ipvs *ipvs = net_ipvs(net);
-
-	ip_vs_service_net_cleanup(ipvs);	/* ip_vs_flush() with locks */
-	ip_vs_conn_net_cleanup(ipvs);
-	ip_vs_app_net_cleanup(ipvs);
-	ip_vs_protocol_net_cleanup(ipvs);
-	ip_vs_control_net_cleanup(ipvs);
-	ip_vs_estimator_net_cleanup(ipvs);
-	IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen);
-	net->ipvs = NULL;
+	struct netns_ipvs *ipvs;
+	struct net *net;
+	LIST_HEAD(list);
+
+	ip_vs_service_nets_cleanup(net_list);	/* ip_vs_flush() with locks */
+	list_for_each_entry(net, net_list, exit_list) {
+		ipvs = net_ipvs(net);
+		ip_vs_conn_net_cleanup(ipvs);
+		ip_vs_app_net_cleanup(ipvs);
+		ip_vs_protocol_net_cleanup(ipvs);
+		ip_vs_control_net_cleanup(ipvs);
+		ip_vs_estimator_net_cleanup(ipvs);
+		IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen);
+		net->ipvs = NULL;
+	}
 }
 
 static int __net_init __ip_vs_dev_init(struct net *net)
@@ -2442,7 +2447,7 @@ static void __net_exit __ip_vs_dev_cleanup(struct net *net)
 
 static struct pernet_operations ipvs_core_ops = {
 	.init = __ip_vs_init,
-	.exit = __ip_vs_cleanup,
+	.exit_batch = __ip_vs_cleanup_batch,
 	.id   = &ip_vs_net_id,
 	.size = sizeof(struct netns_ipvs),
 };
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 07e0967..c8e652b 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1607,14 +1607,21 @@ static int ip_vs_flush(struct netns_ipvs *ipvs, bool cleanup)
 
 /*
  *	Delete service by {netns} in the service table.
- *	Called by __ip_vs_cleanup()
+ *	Called by __ip_vs_batch_cleanup()
  */
-void ip_vs_service_net_cleanup(struct netns_ipvs *ipvs)
+void ip_vs_service_nets_cleanup(struct list_head *net_list)
 {
+	struct netns_ipvs *ipvs;
+	struct net *net;
+	LIST_HEAD(list);
+
 	EnterFunction(2);
 	/* Check for "full" addressed entries */
 	mutex_lock(&__ip_vs_mutex);
-	ip_vs_flush(ipvs, true);
+	list_for_each_entry(net, net_list, exit_list) {
+		ipvs = net_ipvs(net);
+		ip_vs_flush(ipvs, true);
+	}
 	mutex_unlock(&__ip_vs_mutex);
 	LeaveFunction(2);
 }
-- 
1.8.3.1




^ permalink raw reply related

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-13 15:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713144108.GD26519@linux.ibm.com>

On Sat, Jul 13, 2019 at 07:41:08AM -0700, Paul E. McKenney wrote:
> On Sat, Jul 13, 2019 at 09:30:49AM -0400, Joel Fernandes wrote:
> > On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > > > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > > and it results in a nice cleanup.
> > > > > > > > 
> > > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > 
> > > > > > > Hi Oleg,
> > > > > > > Slightly unrelated to the patch,
> > > > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > > > 
> > > > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > > > reflect latest changes?
> > > > > > > Also could you help understand how is a writer not able to change
> > > > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > > > comment tries to say?
> > > > > > > 
> > > > > > > 	/*
> > > > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > > > 	 */
> > > > > > > 
> > > > > > > Also,
> > > > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > > > series.
> > > > > > 
> > > > > > Hello, Joel,
> > > > > > 
> > > > > > Oleg has a set of patches updating this code that just hit mainline
> > > > > > this week.  These patches get rid of the code that previously handled
> > > > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > > > me just missing your point?
> > > > > > 
> > > > > 
> > > > > Hi Paul,
> > > > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > > > say, I need to rebase my tree.
> > > > > 
> > > > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > > > rebase just fine.
> > > > > 
> > > > > The 2 options I see are:
> > > > > 1. Let us drop this patch for now and I resend it later.
> > > > > 2. I resend all patches based on Linus's master branch.
> > > > 
> > > > Below is the updated patch based on Linus master branch:
> > > > 
> > > > ---8<-----------------------
> > > > 
> > > > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > > > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > > > 
> > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > section. With RCU consolidating flavors and the generic helper added in
> > > > this series, this is no longer need. We can just use the generic helper
> > > > and it results in a nice cleanup.
> > > > 
> > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  include/linux/rcu_sync.h | 4 +---
> > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > > index 9b83865d24f9..0027d4c8087c 100644
> > > > --- a/include/linux/rcu_sync.h
> > > > +++ b/include/linux/rcu_sync.h
> > > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > > >   */
> > > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > > >  {
> > > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > > -			 !rcu_read_lock_bh_held() &&
> > > > -			 !rcu_read_lock_sched_held(),
> > > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > > 
> > > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > > If you have not already done so, could you please give it a try?
> > 
> > Hi Paul,
> > I don't think it will cause splats for !CONFIG_PREEMPT.
> > 
> > Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> > !preemptible(). This means that:
> > 
> > The following expression above:
> > RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> > 
> > Becomes:
> > RCU_LOCKDEP_WARN(preemptible(), ...)
> > 
> > For, CONFIG_PREEMPT=n kernels, this means:
> > RCU_LOCKDEP_WARN(0, ...)
> > 
> > Which would mean no splats. Or, did I miss the point?
> 
> I suggest trying it out on a CONFIG_PREEMPT=n kernel.

Sure, will do, sorry did not try it out yet because was busy with weekend
chores but will do soon, thanks!


^ permalink raw reply

* Re: [PATCH] [net-next] cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()
From: Arnd Bergmann @ 2019-07-13 15:35 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Miller, vishal, rahul.lakkireddy, ganeshgr, Alexios Zavras,
	arjun, surendra, Networking, Linux Kernel Mailing List,
	clang-built-linux
In-Reply-To: <82ccdd83d2a18912bb8cf75585e751c0bd39a215.camel@perches.com>

On Sat, Jul 13, 2019 at 2:14 AM Joe Perches <joe@perches.com> wrote:
>
> On Fri, 2019-07-12 at 15:36 -0700, David Miller wrote:
> > From: Arnd Bergmann <arnd@arndb.de>
> > Date: Fri, 12 Jul 2019 11:06:33 +0200
> >
> > > The cudbg_collect_mem_region() and cudbg_read_fw_mem() both use several
> > > hundred kilobytes of kernel stack space.
>
> Several hundred 'kilo' bytes?
> I hope not.

Right, my mistake.

       Arnd

^ permalink raw reply

* Re: [PATCH net] net: neigh: fix multiple neigh timer scheduling
From: Lorenzo Bianconi @ 2019-07-13 15:13 UTC (permalink / raw)
  To: David Miller; +Cc: Network Development, David Ahern, Marek Majkowski
In-Reply-To: <20190712.154225.26530675805696474.davem@davemloft.net>

>
> From: David Miller <davem@davemloft.net>
> Date: Fri, 12 Jul 2019 15:40:47 -0700 (PDT)
>
> > From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> > Date: Fri, 12 Jul 2019 19:22:51 +0200
> >
> >> Neigh timer can be scheduled multiple times from userspace adding
> >> multiple neigh entries and forcing the neigh timer scheduling passing
> >> NTF_USE in the netlink requests.
> >> This will result in a refcount leak and in the following dump stack:
> >  ...
> >> Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
> >> receiving a netlink request with NTF_USE flag set
> >>
> >> Reported-by: Marek Majkowski <marek@cloudflare.com>
> >> Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
> >> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> >
> > Applied and queued up for -stable, thanks.
>
> Actually, reverted, you didn't test the build thoroughly as Infiniband
> fails:
>
> drivers/infiniband/core/addr.c: In function ‘dst_fetch_ha’:
> drivers/infiniband/core/addr.c:337:3: error: too few arguments to function ‘neigh_event_send’
>    neigh_event_send(n, NULL);
>    ^~~~~~~~~~~~~~~~

Hi Dave,

sorry for the issue. I will post a v2 fixing it

Regards,
Lorenzo

^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-13 14:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713133049.GA133650@google.com>

On Sat, Jul 13, 2019 at 09:30:49AM -0400, Joel Fernandes wrote:
> On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > > and it results in a nice cleanup.
> > > > > > > 
> > > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > 
> > > > > > Hi Oleg,
> > > > > > Slightly unrelated to the patch,
> > > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > > 
> > > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > > reflect latest changes?
> > > > > > Also could you help understand how is a writer not able to change
> > > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > > comment tries to say?
> > > > > > 
> > > > > > 	/*
> > > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > > 	 */
> > > > > > 
> > > > > > Also,
> > > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > > series.
> > > > > 
> > > > > Hello, Joel,
> > > > > 
> > > > > Oleg has a set of patches updating this code that just hit mainline
> > > > > this week.  These patches get rid of the code that previously handled
> > > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > > me just missing your point?
> > > > > 
> > > > 
> > > > Hi Paul,
> > > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > > say, I need to rebase my tree.
> > > > 
> > > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > > rebase just fine.
> > > > 
> > > > The 2 options I see are:
> > > > 1. Let us drop this patch for now and I resend it later.
> > > > 2. I resend all patches based on Linus's master branch.
> > > 
> > > Below is the updated patch based on Linus master branch:
> > > 
> > > ---8<-----------------------
> > > 
> > > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > > 
> > > The rcu/sync code was doing its own check whether we are in a reader
> > > section. With RCU consolidating flavors and the generic helper added in
> > > this series, this is no longer need. We can just use the generic helper
> > > and it results in a nice cleanup.
> > > 
> > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  include/linux/rcu_sync.h | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > > index 9b83865d24f9..0027d4c8087c 100644
> > > --- a/include/linux/rcu_sync.h
> > > +++ b/include/linux/rcu_sync.h
> > > @@ -31,9 +31,7 @@ struct rcu_sync {
> > >   */
> > >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> > >  {
> > > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > > -			 !rcu_read_lock_bh_held() &&
> > > -			 !rcu_read_lock_sched_held(),
> > > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > 
> > I believe that replacing rcu_read_lock_sched_held() with preemptible()
> > in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> > If you have not already done so, could you please give it a try?
> 
> Hi Paul,
> I don't think it will cause splats for !CONFIG_PREEMPT.
> 
> Currently, rcu_read_lock_any_held() introduced in this patch returns true if
> !preemptible(). This means that:
> 
> The following expression above:
> RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)
> 
> Becomes:
> RCU_LOCKDEP_WARN(preemptible(), ...)
> 
> For, CONFIG_PREEMPT=n kernels, this means:
> RCU_LOCKDEP_WARN(0, ...)
> 
> Which would mean no splats. Or, did I miss the point?

I suggest trying it out on a CONFIG_PREEMPT=n kernel.

							Thanx, Paul

^ permalink raw reply

* [net-next] bonding: add documentation for peer_notif_delay
From: Vincent Bernat @ 2019-07-13 14:35 UTC (permalink / raw)
  To: David Miller, j.vosburgh, vfalico, andy, netdev; +Cc: Vincent Bernat

Ability to tweak the interval between peer notifications has been
added in 07a4ddec3ce9 ("bonding: add an option to specify a delay
between peer notifications") but the documentation was not updated.

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
---
 Documentation/networking/bonding.txt | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index d3e5dd26db12..e3abfbd32f71 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -706,9 +706,9 @@ num_unsol_na
 	unsolicited IPv6 Neighbor Advertisements) to be issued after a
 	failover event.  As soon as the link is up on the new slave
 	(possibly immediately) a peer notification is sent on the
-	bonding device and each VLAN sub-device.  This is repeated at
-	each link monitor interval (arp_interval or miimon, whichever
-	is active) if the number is greater than 1.
+	bonding device and each VLAN sub-device. This is repeated at
+	the rate specified by peer_notif_delay if the number is
+	greater than 1.
 
 	The valid range is 0 - 255; the default value is 1.  These options
 	affect only the active-backup mode.  These options were added for
@@ -727,6 +727,16 @@ packets_per_slave
 	The valid range is 0 - 65535; the default value is 1. This option
 	has effect only in balance-rr mode.
 
+peer_notif_delay
+
+        Specify the delay, in milliseconds, between each peer
+        notification (gratuitous ARP and unsolicited IPv6 Neighbor
+        Advertisement) when they are issued after a failover event.
+        This delay should be a multiple of the link monitor interval
+        (arp_interval or miimon, whichever is active). The default
+        value is 0 which means to match the value of the link monitor
+        interval.
+
 primary
 
 	A string (eth0, eth2, etc) specifying which slave is the
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-13 13:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713082114.GA26519@linux.ibm.com>

On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > and it results in a nice cleanup.
> > > > > > 
> > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > 
> > > > > Hi Oleg,
> > > > > Slightly unrelated to the patch,
> > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > 
> > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > reflect latest changes?
> > > > > Also could you help understand how is a writer not able to change
> > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > comment tries to say?
> > > > > 
> > > > > 	/*
> > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > 	 */
> > > > > 
> > > > > Also,
> > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > series.
> > > > 
> > > > Hello, Joel,
> > > > 
> > > > Oleg has a set of patches updating this code that just hit mainline
> > > > this week.  These patches get rid of the code that previously handled
> > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > me just missing your point?
> > > > 
> > > 
> > > Hi Paul,
> > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > say, I need to rebase my tree.
> > > 
> > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > rebase just fine.
> > > 
> > > The 2 options I see are:
> > > 1. Let us drop this patch for now and I resend it later.
> > > 2. I resend all patches based on Linus's master branch.
> > 
> > Below is the updated patch based on Linus master branch:
> > 
> > ---8<-----------------------
> > 
> > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > 
> > The rcu/sync code was doing its own check whether we are in a reader
> > section. With RCU consolidating flavors and the generic helper added in
> > this series, this is no longer need. We can just use the generic helper
> > and it results in a nice cleanup.
> > 
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  include/linux/rcu_sync.h | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > index 9b83865d24f9..0027d4c8087c 100644
> > --- a/include/linux/rcu_sync.h
> > +++ b/include/linux/rcu_sync.h
> > @@ -31,9 +31,7 @@ struct rcu_sync {
> >   */
> >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> >  {
> > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > -			 !rcu_read_lock_bh_held() &&
> > -			 !rcu_read_lock_sched_held(),
> > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> 
> I believe that replacing rcu_read_lock_sched_held() with preemptible()
> in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> If you have not already done so, could you please give it a try?

Hi Paul,
I don't think it will cause splats for !CONFIG_PREEMPT.

Currently, rcu_read_lock_any_held() introduced in this patch returns true if
!preemptible(). This means that:

The following expression above:
RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)

Becomes:
RCU_LOCKDEP_WARN(preemptible(), ...)

For, CONFIG_PREEMPT=n kernels, this means:
RCU_LOCKDEP_WARN(0, ...)

Which would mean no splats. Or, did I miss the point?

thanks,

 - Joel


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox