Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-13 13:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713082114.GA26519@linux.ibm.com>

On Sat, Jul 13, 2019 at 01:21:14AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> > On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > > this series, this is no longer need. We can just use the generic helper
> > > > > > and it results in a nice cleanup.
> > > > > > 
> > > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > 
> > > > > Hi Oleg,
> > > > > Slightly unrelated to the patch,
> > > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > > 
> > > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > > reflect latest changes?
> > > > > Also could you help understand how is a writer not able to change
> > > > > sem->state and count the per-cpu read counters at the same time as the
> > > > > comment tries to say?
> > > > > 
> > > > > 	/*
> > > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > > 	 */
> > > > > 
> > > > > Also,
> > > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > > series.
> > > > 
> > > > Hello, Joel,
> > > > 
> > > > Oleg has a set of patches updating this code that just hit mainline
> > > > this week.  These patches get rid of the code that previously handled
> > > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > > me just missing your point?
> > > > 
> > > 
> > > Hi Paul,
> > > You are right on point. I have a bad habit of not rebasing my trees. In this
> > > case the feature branch of mine in concern was based on v5.1. Needless to
> > > say, I need to rebase my tree.
> > > 
> > > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > > rebase just fine.
> > > 
> > > The 2 options I see are:
> > > 1. Let us drop this patch for now and I resend it later.
> > > 2. I resend all patches based on Linus's master branch.
> > 
> > Below is the updated patch based on Linus master branch:
> > 
> > ---8<-----------------------
> > 
> > >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > Date: Fri, 12 Jul 2019 12:13:27 -0400
> > Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> > 
> > The rcu/sync code was doing its own check whether we are in a reader
> > section. With RCU consolidating flavors and the generic helper added in
> > this series, this is no longer need. We can just use the generic helper
> > and it results in a nice cleanup.
> > 
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  include/linux/rcu_sync.h | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > index 9b83865d24f9..0027d4c8087c 100644
> > --- a/include/linux/rcu_sync.h
> > +++ b/include/linux/rcu_sync.h
> > @@ -31,9 +31,7 @@ struct rcu_sync {
> >   */
> >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> >  {
> > -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > -			 !rcu_read_lock_bh_held() &&
> > -			 !rcu_read_lock_sched_held(),
> > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> 
> I believe that replacing rcu_read_lock_sched_held() with preemptible()
> in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
> If you have not already done so, could you please give it a try?

Hi Paul,
I don't think it will cause splats for !CONFIG_PREEMPT.

Currently, rcu_read_lock_any_held() introduced in this patch returns true if
!preemptible(). This means that:

The following expression above:
RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),...)

Becomes:
RCU_LOCKDEP_WARN(preemptible(), ...)

For, CONFIG_PREEMPT=n kernels, this means:
RCU_LOCKDEP_WARN(0, ...)

Which would mean no splats. Or, did I miss the point?

thanks,

 - Joel


^ permalink raw reply

* Re: [Patch net] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: Eric Dumazet @ 2019-07-13 12:54 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Eric Dumazet
In-Reply-To: <20190712201749.28421-1-xiyou.wangcong@gmail.com>



On 7/12/19 10:17 PM, Cong Wang wrote:
> For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
> notably fq_codel, it makes no sense to let packets bypass the TC
> filters we setup in any scenario, otherwise our packets steering
> policy could not be enforced.
> 
> This can be easily reproduced with the following script:
> 
>  ip li add dev dummy0 type dummy
>  ifconfig dummy0 up
>  tc qd add dev dummy0 root fq_codel
>  tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
>  tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
>  ping -I dummy0 192.168.112.1
> 
> Without this patch, packets are sent directly to dummy0 without
> hitting any of the filters. With this patch, packets are redirected
> to loopback as expected.
> 
> This fix is not perfect, it only unsets the flag but does not set it back
> because we have to save the information somewhere in the qdisc if we
> really want that.
> 
> Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/sched/cls_api.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 638c1bc1ea1b..5c800b0c810b 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -2152,6 +2152,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>  		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
>  			       RTM_NEWTFILTER, false, rtnl_held);
>  		tfilter_put(tp, fh);
> +		q->flags &= ~TCQ_F_CAN_BYPASS;
>  	}
>  
>  errout:
> 

Strange, because sfq and fq_codel are roughly the same for TCQ_F_CAN_BYPASS handling.

Why is fq_codel_bind() not effective ?

If not effective, sfq had the same issue, so the Fixes: tag needs to be refined,
maybe to commit 23624935e0c4 net_sched: TCQ_F_CAN_BYPASS generalization


^ permalink raw reply

* Re: [RFC PATCH net-next 3/6] net: dsa: Pass tc-taprio offload to drivers
From: Vladimir Oltean @ 2019-07-13 12:48 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Florian Fainelli, Vivien Didelot, Andrew Lunn, David S. Miller,
	Vinicius Costa Gomes, vedang.patel, Richard Cochran, weifeng.voon,
	jiri, m-karicheri2, Jose.Abreu, netdev
In-Reply-To: <20190708112307.GA7480@apalos>

Hi Ilias,

On Mon, 8 Jul 2019 at 14:23, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> Hi Vladimir,
>
> > tc-taprio is a qdisc based on the enhancements for scheduled traffic
> > specified in IEEE 802.1Qbv (later merged in 802.1Q).  This qdisc has
> > a software implementation and an optional offload through which
> > compatible Ethernet ports may configure their egress 802.1Qbv
> > schedulers.
> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
> >  include/net/dsa.h |  3 +++
> >  net/dsa/slave.c   | 14 ++++++++++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/include/net/dsa.h b/include/net/dsa.h
> > index 1e8650fa8acc..e7ee6ac8ce6b 100644
> > --- a/include/net/dsa.h
> > +++ b/include/net/dsa.h
> > @@ -152,6 +152,7 @@ struct dsa_mall_tc_entry {
> >       };
> >  };
> >
> > +struct tc_taprio_qopt_offload;
> >
> >  struct dsa_port {
> >       /* A CPU port is physically connected to a master device.
> > @@ -516,6 +517,8 @@ struct dsa_switch_ops {
> >                                  bool ingress);
> >       void    (*port_mirror_del)(struct dsa_switch *ds, int port,
> >                                  struct dsa_mall_mirror_tc_entry *mirror);
> > +     int     (*port_setup_taprio)(struct dsa_switch *ds, int port,
> > +                                  struct tc_taprio_qopt_offload *qopt);
>
> Is there any way to make this more generic? 802.1Qbv are not the only hardware
> schedulers. CBS and ETF are examples that first come to mind. Maybe having
> something more generic than tc_taprio_qopt_offload as an option could host
> future schedulers?
>

Good point. I'll see what I can do to make DSA more qdisc-agnostic
when I gather enough feedback to mandate a v2.

> >
> >       /*
> >        * Cross-chip operations
> > diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> > index 99673f6b07f6..2bae33788708 100644
> > --- a/net/dsa/slave.c
> > +++ b/net/dsa/slave.c
> > @@ -965,12 +965,26 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
> >       }
> >  }
> >
> > +static int dsa_slave_setup_tc_taprio(struct net_device *dev,
> > +                                  struct tc_taprio_qopt_offload *f)
> > +{
> > +     struct dsa_port *dp = dsa_slave_to_port(dev);
> > +     struct dsa_switch *ds = dp->ds;
> > +
> > +     if (!ds->ops->port_setup_taprio)
> > +             return -EOPNOTSUPP;
> > +
> > +     return ds->ops->port_setup_taprio(ds, dp->index, f);
> > +}
> > +
> >  static int dsa_slave_setup_tc(struct net_device *dev, enum tc_setup_type type,
> >                             void *type_data)
> >  {
> >       switch (type) {
> >       case TC_SETUP_BLOCK:
> >               return dsa_slave_setup_tc_block(dev, type_data);
> > +     case TC_SETUP_QDISC_TAPRIO:
> > +             return dsa_slave_setup_tc_taprio(dev, type_data);
> >       default:
> >               return -EOPNOTSUPP;
> >       }
> > --
> > 2.17.1
> >
> Thanks
> /Ilias

Thanks,
-Vladimir

^ permalink raw reply

* [PATCH net] r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
From: Heiner Kallweit @ 2019-07-13 11:55 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller
  Cc: netdev@vger.kernel.org, Ionut Radu

On RTL8411b the RX unit gets confused if the PHY is powered-down.
This was reported in [0] and confirmed by Realtek. Realtek provided
a sequence to fix the RX unit after PHY wakeup.

The issue itself seems to have been there longer, the Fixes tag
refers to where the fix applies properly.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075

Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support")
Tested-by: Ionut Radu <ionut.radu@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
This patch is for versions up to 5.2. On versions before 5.2 there
may be little fuzz when applying because rtl_ephy_init used to have
one parameter more.
---
 drivers/net/ethernet/realtek/r8169.c | 137 +++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index d06a61f00..96637fcbe 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5157,6 +5157,143 @@ static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
 	/* disable aspm and clock request before access ephy */
 	rtl_hw_aspm_clkreq_enable(tp, false);
 	rtl_ephy_init(tp, e_info_8411_2);
+
+	/* The following Realtek-provided magic fixes an issue with the RX unit
+	 * getting confused after the PHY having been powered-down.
+	 */
+	r8168_mac_ocp_write(tp, 0xFC28, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC2A, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC2C, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC2E, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC30, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC32, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC34, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC36, 0x0000);
+	mdelay(3);
+	r8168_mac_ocp_write(tp, 0xFC26, 0x0000);
+
+	r8168_mac_ocp_write(tp, 0xF800, 0xE008);
+	r8168_mac_ocp_write(tp, 0xF802, 0xE00A);
+	r8168_mac_ocp_write(tp, 0xF804, 0xE00C);
+	r8168_mac_ocp_write(tp, 0xF806, 0xE00E);
+	r8168_mac_ocp_write(tp, 0xF808, 0xE027);
+	r8168_mac_ocp_write(tp, 0xF80A, 0xE04F);
+	r8168_mac_ocp_write(tp, 0xF80C, 0xE05E);
+	r8168_mac_ocp_write(tp, 0xF80E, 0xE065);
+	r8168_mac_ocp_write(tp, 0xF810, 0xC602);
+	r8168_mac_ocp_write(tp, 0xF812, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF814, 0x0000);
+	r8168_mac_ocp_write(tp, 0xF816, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF818, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF81A, 0x074C);
+	r8168_mac_ocp_write(tp, 0xF81C, 0xC302);
+	r8168_mac_ocp_write(tp, 0xF81E, 0xBB00);
+	r8168_mac_ocp_write(tp, 0xF820, 0x080A);
+	r8168_mac_ocp_write(tp, 0xF822, 0x6420);
+	r8168_mac_ocp_write(tp, 0xF824, 0x48C2);
+	r8168_mac_ocp_write(tp, 0xF826, 0x8C20);
+	r8168_mac_ocp_write(tp, 0xF828, 0xC516);
+	r8168_mac_ocp_write(tp, 0xF82A, 0x64A4);
+	r8168_mac_ocp_write(tp, 0xF82C, 0x49C0);
+	r8168_mac_ocp_write(tp, 0xF82E, 0xF009);
+	r8168_mac_ocp_write(tp, 0xF830, 0x74A2);
+	r8168_mac_ocp_write(tp, 0xF832, 0x8CA5);
+	r8168_mac_ocp_write(tp, 0xF834, 0x74A0);
+	r8168_mac_ocp_write(tp, 0xF836, 0xC50E);
+	r8168_mac_ocp_write(tp, 0xF838, 0x9CA2);
+	r8168_mac_ocp_write(tp, 0xF83A, 0x1C11);
+	r8168_mac_ocp_write(tp, 0xF83C, 0x9CA0);
+	r8168_mac_ocp_write(tp, 0xF83E, 0xE006);
+	r8168_mac_ocp_write(tp, 0xF840, 0x74F8);
+	r8168_mac_ocp_write(tp, 0xF842, 0x48C4);
+	r8168_mac_ocp_write(tp, 0xF844, 0x8CF8);
+	r8168_mac_ocp_write(tp, 0xF846, 0xC404);
+	r8168_mac_ocp_write(tp, 0xF848, 0xBC00);
+	r8168_mac_ocp_write(tp, 0xF84A, 0xC403);
+	r8168_mac_ocp_write(tp, 0xF84C, 0xBC00);
+	r8168_mac_ocp_write(tp, 0xF84E, 0x0BF2);
+	r8168_mac_ocp_write(tp, 0xF850, 0x0C0A);
+	r8168_mac_ocp_write(tp, 0xF852, 0xE434);
+	r8168_mac_ocp_write(tp, 0xF854, 0xD3C0);
+	r8168_mac_ocp_write(tp, 0xF856, 0x49D9);
+	r8168_mac_ocp_write(tp, 0xF858, 0xF01F);
+	r8168_mac_ocp_write(tp, 0xF85A, 0xC526);
+	r8168_mac_ocp_write(tp, 0xF85C, 0x64A5);
+	r8168_mac_ocp_write(tp, 0xF85E, 0x1400);
+	r8168_mac_ocp_write(tp, 0xF860, 0xF007);
+	r8168_mac_ocp_write(tp, 0xF862, 0x0C01);
+	r8168_mac_ocp_write(tp, 0xF864, 0x8CA5);
+	r8168_mac_ocp_write(tp, 0xF866, 0x1C15);
+	r8168_mac_ocp_write(tp, 0xF868, 0xC51B);
+	r8168_mac_ocp_write(tp, 0xF86A, 0x9CA0);
+	r8168_mac_ocp_write(tp, 0xF86C, 0xE013);
+	r8168_mac_ocp_write(tp, 0xF86E, 0xC519);
+	r8168_mac_ocp_write(tp, 0xF870, 0x74A0);
+	r8168_mac_ocp_write(tp, 0xF872, 0x48C4);
+	r8168_mac_ocp_write(tp, 0xF874, 0x8CA0);
+	r8168_mac_ocp_write(tp, 0xF876, 0xC516);
+	r8168_mac_ocp_write(tp, 0xF878, 0x74A4);
+	r8168_mac_ocp_write(tp, 0xF87A, 0x48C8);
+	r8168_mac_ocp_write(tp, 0xF87C, 0x48CA);
+	r8168_mac_ocp_write(tp, 0xF87E, 0x9CA4);
+	r8168_mac_ocp_write(tp, 0xF880, 0xC512);
+	r8168_mac_ocp_write(tp, 0xF882, 0x1B00);
+	r8168_mac_ocp_write(tp, 0xF884, 0x9BA0);
+	r8168_mac_ocp_write(tp, 0xF886, 0x1B1C);
+	r8168_mac_ocp_write(tp, 0xF888, 0x483F);
+	r8168_mac_ocp_write(tp, 0xF88A, 0x9BA2);
+	r8168_mac_ocp_write(tp, 0xF88C, 0x1B04);
+	r8168_mac_ocp_write(tp, 0xF88E, 0xC508);
+	r8168_mac_ocp_write(tp, 0xF890, 0x9BA0);
+	r8168_mac_ocp_write(tp, 0xF892, 0xC505);
+	r8168_mac_ocp_write(tp, 0xF894, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF896, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF898, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF89A, 0x0300);
+	r8168_mac_ocp_write(tp, 0xF89C, 0x051E);
+	r8168_mac_ocp_write(tp, 0xF89E, 0xE434);
+	r8168_mac_ocp_write(tp, 0xF8A0, 0xE018);
+	r8168_mac_ocp_write(tp, 0xF8A2, 0xE092);
+	r8168_mac_ocp_write(tp, 0xF8A4, 0xDE20);
+	r8168_mac_ocp_write(tp, 0xF8A6, 0xD3C0);
+	r8168_mac_ocp_write(tp, 0xF8A8, 0xC50F);
+	r8168_mac_ocp_write(tp, 0xF8AA, 0x76A4);
+	r8168_mac_ocp_write(tp, 0xF8AC, 0x49E3);
+	r8168_mac_ocp_write(tp, 0xF8AE, 0xF007);
+	r8168_mac_ocp_write(tp, 0xF8B0, 0x49C0);
+	r8168_mac_ocp_write(tp, 0xF8B2, 0xF103);
+	r8168_mac_ocp_write(tp, 0xF8B4, 0xC607);
+	r8168_mac_ocp_write(tp, 0xF8B6, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF8B8, 0xC606);
+	r8168_mac_ocp_write(tp, 0xF8BA, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF8BC, 0xC602);
+	r8168_mac_ocp_write(tp, 0xF8BE, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF8C0, 0x0C4C);
+	r8168_mac_ocp_write(tp, 0xF8C2, 0x0C28);
+	r8168_mac_ocp_write(tp, 0xF8C4, 0x0C2C);
+	r8168_mac_ocp_write(tp, 0xF8C6, 0xDC00);
+	r8168_mac_ocp_write(tp, 0xF8C8, 0xC707);
+	r8168_mac_ocp_write(tp, 0xF8CA, 0x1D00);
+	r8168_mac_ocp_write(tp, 0xF8CC, 0x8DE2);
+	r8168_mac_ocp_write(tp, 0xF8CE, 0x48C1);
+	r8168_mac_ocp_write(tp, 0xF8D0, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF8D2, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF8D4, 0x00AA);
+	r8168_mac_ocp_write(tp, 0xF8D6, 0xE0C0);
+	r8168_mac_ocp_write(tp, 0xF8D8, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF8DA, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF8DC, 0x0132);
+
+	r8168_mac_ocp_write(tp, 0xFC26, 0x8000);
+
+	r8168_mac_ocp_write(tp, 0xFC2A, 0x0743);
+	r8168_mac_ocp_write(tp, 0xFC2C, 0x0801);
+	r8168_mac_ocp_write(tp, 0xFC2E, 0x0BE9);
+	r8168_mac_ocp_write(tp, 0xFC30, 0x02FD);
+	r8168_mac_ocp_write(tp, 0xFC32, 0x0C25);
+	r8168_mac_ocp_write(tp, 0xFC34, 0x00A9);
+	r8168_mac_ocp_write(tp, 0xFC36, 0x012D);
+
 	rtl_hw_aspm_clkreq_enable(tp, true);
 }
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH net] r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
From: Heiner Kallweit @ 2019-07-13 11:45 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller
  Cc: netdev@vger.kernel.org, Ionut Radu

On RTL8411b the RX unit gets confused if the PHY is powered-down.
This was reported in [0] and confirmed by Realtek. Realtek provided
a sequence to fix the RX unit after PHY wakeup.

The issue itself seems to have been there longer, the Fixes tag
refers to where the fix applies properly.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075

Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support")
Tested-by: Ionut Radu <ionut.radu@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
This patch doesn't apply on versions up to 5.2 due to the renaming
of r8169.c to r8169_main.c. I will provide a separate patch for these
versions.
---
 drivers/net/ethernet/realtek/r8169_main.c | 137 ++++++++++++++++++++++
 1 file changed, 137 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index efef5453b..0637c6752 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4667,6 +4667,143 @@ static void rtl_hw_start_8411_2(struct rtl8169_private *tp)
 	/* disable aspm and clock request before access ephy */
 	rtl_hw_aspm_clkreq_enable(tp, false);
 	rtl_ephy_init(tp, e_info_8411_2);
+
+	/* The following Realtek-provided magic fixes an issue with the RX unit
+	 * getting confused after the PHY having been powered-down.
+	 */
+	r8168_mac_ocp_write(tp, 0xFC28, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC2A, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC2C, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC2E, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC30, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC32, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC34, 0x0000);
+	r8168_mac_ocp_write(tp, 0xFC36, 0x0000);
+	mdelay(3);
+	r8168_mac_ocp_write(tp, 0xFC26, 0x0000);
+
+	r8168_mac_ocp_write(tp, 0xF800, 0xE008);
+	r8168_mac_ocp_write(tp, 0xF802, 0xE00A);
+	r8168_mac_ocp_write(tp, 0xF804, 0xE00C);
+	r8168_mac_ocp_write(tp, 0xF806, 0xE00E);
+	r8168_mac_ocp_write(tp, 0xF808, 0xE027);
+	r8168_mac_ocp_write(tp, 0xF80A, 0xE04F);
+	r8168_mac_ocp_write(tp, 0xF80C, 0xE05E);
+	r8168_mac_ocp_write(tp, 0xF80E, 0xE065);
+	r8168_mac_ocp_write(tp, 0xF810, 0xC602);
+	r8168_mac_ocp_write(tp, 0xF812, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF814, 0x0000);
+	r8168_mac_ocp_write(tp, 0xF816, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF818, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF81A, 0x074C);
+	r8168_mac_ocp_write(tp, 0xF81C, 0xC302);
+	r8168_mac_ocp_write(tp, 0xF81E, 0xBB00);
+	r8168_mac_ocp_write(tp, 0xF820, 0x080A);
+	r8168_mac_ocp_write(tp, 0xF822, 0x6420);
+	r8168_mac_ocp_write(tp, 0xF824, 0x48C2);
+	r8168_mac_ocp_write(tp, 0xF826, 0x8C20);
+	r8168_mac_ocp_write(tp, 0xF828, 0xC516);
+	r8168_mac_ocp_write(tp, 0xF82A, 0x64A4);
+	r8168_mac_ocp_write(tp, 0xF82C, 0x49C0);
+	r8168_mac_ocp_write(tp, 0xF82E, 0xF009);
+	r8168_mac_ocp_write(tp, 0xF830, 0x74A2);
+	r8168_mac_ocp_write(tp, 0xF832, 0x8CA5);
+	r8168_mac_ocp_write(tp, 0xF834, 0x74A0);
+	r8168_mac_ocp_write(tp, 0xF836, 0xC50E);
+	r8168_mac_ocp_write(tp, 0xF838, 0x9CA2);
+	r8168_mac_ocp_write(tp, 0xF83A, 0x1C11);
+	r8168_mac_ocp_write(tp, 0xF83C, 0x9CA0);
+	r8168_mac_ocp_write(tp, 0xF83E, 0xE006);
+	r8168_mac_ocp_write(tp, 0xF840, 0x74F8);
+	r8168_mac_ocp_write(tp, 0xF842, 0x48C4);
+	r8168_mac_ocp_write(tp, 0xF844, 0x8CF8);
+	r8168_mac_ocp_write(tp, 0xF846, 0xC404);
+	r8168_mac_ocp_write(tp, 0xF848, 0xBC00);
+	r8168_mac_ocp_write(tp, 0xF84A, 0xC403);
+	r8168_mac_ocp_write(tp, 0xF84C, 0xBC00);
+	r8168_mac_ocp_write(tp, 0xF84E, 0x0BF2);
+	r8168_mac_ocp_write(tp, 0xF850, 0x0C0A);
+	r8168_mac_ocp_write(tp, 0xF852, 0xE434);
+	r8168_mac_ocp_write(tp, 0xF854, 0xD3C0);
+	r8168_mac_ocp_write(tp, 0xF856, 0x49D9);
+	r8168_mac_ocp_write(tp, 0xF858, 0xF01F);
+	r8168_mac_ocp_write(tp, 0xF85A, 0xC526);
+	r8168_mac_ocp_write(tp, 0xF85C, 0x64A5);
+	r8168_mac_ocp_write(tp, 0xF85E, 0x1400);
+	r8168_mac_ocp_write(tp, 0xF860, 0xF007);
+	r8168_mac_ocp_write(tp, 0xF862, 0x0C01);
+	r8168_mac_ocp_write(tp, 0xF864, 0x8CA5);
+	r8168_mac_ocp_write(tp, 0xF866, 0x1C15);
+	r8168_mac_ocp_write(tp, 0xF868, 0xC51B);
+	r8168_mac_ocp_write(tp, 0xF86A, 0x9CA0);
+	r8168_mac_ocp_write(tp, 0xF86C, 0xE013);
+	r8168_mac_ocp_write(tp, 0xF86E, 0xC519);
+	r8168_mac_ocp_write(tp, 0xF870, 0x74A0);
+	r8168_mac_ocp_write(tp, 0xF872, 0x48C4);
+	r8168_mac_ocp_write(tp, 0xF874, 0x8CA0);
+	r8168_mac_ocp_write(tp, 0xF876, 0xC516);
+	r8168_mac_ocp_write(tp, 0xF878, 0x74A4);
+	r8168_mac_ocp_write(tp, 0xF87A, 0x48C8);
+	r8168_mac_ocp_write(tp, 0xF87C, 0x48CA);
+	r8168_mac_ocp_write(tp, 0xF87E, 0x9CA4);
+	r8168_mac_ocp_write(tp, 0xF880, 0xC512);
+	r8168_mac_ocp_write(tp, 0xF882, 0x1B00);
+	r8168_mac_ocp_write(tp, 0xF884, 0x9BA0);
+	r8168_mac_ocp_write(tp, 0xF886, 0x1B1C);
+	r8168_mac_ocp_write(tp, 0xF888, 0x483F);
+	r8168_mac_ocp_write(tp, 0xF88A, 0x9BA2);
+	r8168_mac_ocp_write(tp, 0xF88C, 0x1B04);
+	r8168_mac_ocp_write(tp, 0xF88E, 0xC508);
+	r8168_mac_ocp_write(tp, 0xF890, 0x9BA0);
+	r8168_mac_ocp_write(tp, 0xF892, 0xC505);
+	r8168_mac_ocp_write(tp, 0xF894, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF896, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF898, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF89A, 0x0300);
+	r8168_mac_ocp_write(tp, 0xF89C, 0x051E);
+	r8168_mac_ocp_write(tp, 0xF89E, 0xE434);
+	r8168_mac_ocp_write(tp, 0xF8A0, 0xE018);
+	r8168_mac_ocp_write(tp, 0xF8A2, 0xE092);
+	r8168_mac_ocp_write(tp, 0xF8A4, 0xDE20);
+	r8168_mac_ocp_write(tp, 0xF8A6, 0xD3C0);
+	r8168_mac_ocp_write(tp, 0xF8A8, 0xC50F);
+	r8168_mac_ocp_write(tp, 0xF8AA, 0x76A4);
+	r8168_mac_ocp_write(tp, 0xF8AC, 0x49E3);
+	r8168_mac_ocp_write(tp, 0xF8AE, 0xF007);
+	r8168_mac_ocp_write(tp, 0xF8B0, 0x49C0);
+	r8168_mac_ocp_write(tp, 0xF8B2, 0xF103);
+	r8168_mac_ocp_write(tp, 0xF8B4, 0xC607);
+	r8168_mac_ocp_write(tp, 0xF8B6, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF8B8, 0xC606);
+	r8168_mac_ocp_write(tp, 0xF8BA, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF8BC, 0xC602);
+	r8168_mac_ocp_write(tp, 0xF8BE, 0xBE00);
+	r8168_mac_ocp_write(tp, 0xF8C0, 0x0C4C);
+	r8168_mac_ocp_write(tp, 0xF8C2, 0x0C28);
+	r8168_mac_ocp_write(tp, 0xF8C4, 0x0C2C);
+	r8168_mac_ocp_write(tp, 0xF8C6, 0xDC00);
+	r8168_mac_ocp_write(tp, 0xF8C8, 0xC707);
+	r8168_mac_ocp_write(tp, 0xF8CA, 0x1D00);
+	r8168_mac_ocp_write(tp, 0xF8CC, 0x8DE2);
+	r8168_mac_ocp_write(tp, 0xF8CE, 0x48C1);
+	r8168_mac_ocp_write(tp, 0xF8D0, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF8D2, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF8D4, 0x00AA);
+	r8168_mac_ocp_write(tp, 0xF8D6, 0xE0C0);
+	r8168_mac_ocp_write(tp, 0xF8D8, 0xC502);
+	r8168_mac_ocp_write(tp, 0xF8DA, 0xBD00);
+	r8168_mac_ocp_write(tp, 0xF8DC, 0x0132);
+
+	r8168_mac_ocp_write(tp, 0xFC26, 0x8000);
+
+	r8168_mac_ocp_write(tp, 0xFC2A, 0x0743);
+	r8168_mac_ocp_write(tp, 0xFC2C, 0x0801);
+	r8168_mac_ocp_write(tp, 0xFC2E, 0x0BE9);
+	r8168_mac_ocp_write(tp, 0xFC30, 0x02FD);
+	r8168_mac_ocp_write(tp, 0xFC32, 0x0C25);
+	r8168_mac_ocp_write(tp, 0xFC34, 0x00A9);
+	r8168_mac_ocp_write(tp, 0xFC36, 0x012D);
+
 	rtl_hw_aspm_clkreq_enable(tp, true);
 }
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH iproute2] tc: util: constrain percentage in 0-100 interval
From: Andrea Claudi @ 2019-07-13  9:44 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern

parse_percent() currently allows to specify negative percentages
or value above 100%. However this does not seems to make sense,
as the function is used for probabilities or bandiwidth rates.

Moreover, using negative values leads to erroneous results
(using Bernoulli loss model as example):

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
$ tc qdisc show dev test
qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 1-k 0%

Using values above 100% we have instead:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel 140% limit 10
$ tc qdisc show dev test
qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 1-k 0%

This commit changes parse_percent() with a check to ensure
percentage values stay between 1.0 and 0.0.
parse_percent_rate() function, which already employs a similar
check, is adjusted accordingly.

With this check in place, we have:

$ ip link add test type dummy
$ ip link set test up
$ tc qdisc add dev test root netem loss gemodel -10% limit 10
Illegal "loss gemodel p"

Fixes: 927e3cfb52b58 ("tc: B.W limits can now be specified in %.")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
---
 tc/tc_util.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tc/tc_util.c b/tc/tc_util.c
index 53d15e08e9734..b90d256c33a4a 100644
--- a/tc/tc_util.c
+++ b/tc/tc_util.c
@@ -198,7 +198,7 @@ int parse_percent(double *val, const char *str)
 	char *p;
 
 	*val = strtod(str, &p) / 100.;
-	if (*val == HUGE_VALF || *val == HUGE_VALL)
+	if (*val > 1.0 || *val < 0.0)
 		return 1;
 	if (*p && strcmp(p, "%"))
 		return -1;
@@ -226,16 +226,16 @@ static int parse_percent_rate(char *rate, size_t len,
 	if (ret != 1)
 		goto malf;
 
-	if (parse_percent(&perc, str_perc))
+	ret = parse_percent(&perc, str_perc);
+	if (ret == 1) {
+		fprintf(stderr, "Invalid rate specified; should be between [0,100]%% but is %s\n", str);
+		goto err;
+	} else if (ret == -1) {
 		goto malf;
+	}
 
 	free(str_perc);
 
-	if (perc > 1.0 || perc < 0.0) {
-		fprintf(stderr, "Invalid rate specified; should be between [0,100]%% but is %s\n", str);
-		return -1;
-	}
-
 	rate_bit = perc * dev_mbit * 1000 * 1000;
 
 	ret = snprintf(rate, len, "%lf", rate_bit);
@@ -247,8 +247,9 @@ static int parse_percent_rate(char *rate, size_t len,
 	return 0;
 
 malf:
-	free(str_perc);
 	fprintf(stderr, "Specified rate value could not be read or is malformed\n");
+err:
+	free(str_perc);
 	return -1;
 }
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-13  8:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713031008.GA248225@google.com>

On Fri, Jul 12, 2019 at 11:10:08PM -0400, Joel Fernandes wrote:
> On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> > On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > > section. With RCU consolidating flavors and the generic helper added in
> > > > > this series, this is no longer need. We can just use the generic helper
> > > > > and it results in a nice cleanup.
> > > > > 
> > > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > 
> > > > Hi Oleg,
> > > > Slightly unrelated to the patch,
> > > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > > 
> > > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > > so I did not follow what readers_fast means. Could the comment be updated to
> > > > reflect latest changes?
> > > > Also could you help understand how is a writer not able to change
> > > > sem->state and count the per-cpu read counters at the same time as the
> > > > comment tries to say?
> > > > 
> > > > 	/*
> > > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > > 	 * cannot both change sem->state from readers_fast and start checking
> > > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > > 	 * anything we did within this RCU-sched read-size critical section.
> > > > 	 */
> > > > 
> > > > Also,
> > > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > > series.
> > > 
> > > Hello, Joel,
> > > 
> > > Oleg has a set of patches updating this code that just hit mainline
> > > this week.  These patches get rid of the code that previously handled
> > > RCU's multiple flavors.  Or are you looking at current mainline and
> > > me just missing your point?
> > > 
> > 
> > Hi Paul,
> > You are right on point. I have a bad habit of not rebasing my trees. In this
> > case the feature branch of mine in concern was based on v5.1. Needless to
> > say, I need to rebase my tree.
> > 
> > Yes, this sync clean up patch does conflict when I rebase, but other patches
> > rebase just fine.
> > 
> > The 2 options I see are:
> > 1. Let us drop this patch for now and I resend it later.
> > 2. I resend all patches based on Linus's master branch.
> 
> Below is the updated patch based on Linus master branch:
> 
> ---8<-----------------------
> 
> >From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Date: Fri, 12 Jul 2019 12:13:27 -0400
> Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section
> 
> The rcu/sync code was doing its own check whether we are in a reader
> section. With RCU consolidating flavors and the generic helper added in
> this series, this is no longer need. We can just use the generic helper
> and it results in a nice cleanup.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  include/linux/rcu_sync.h | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> index 9b83865d24f9..0027d4c8087c 100644
> --- a/include/linux/rcu_sync.h
> +++ b/include/linux/rcu_sync.h
> @@ -31,9 +31,7 @@ struct rcu_sync {
>   */
>  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
>  {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> -			 !rcu_read_lock_bh_held() &&
> -			 !rcu_read_lock_sched_held(),
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),

I believe that replacing rcu_read_lock_sched_held() with preemptible()
in a CONFIG_PREEMPT=n kernel will give you false-positive splats here.
If you have not already done so, could you please give it a try?

							Thanx, Paul

>  			 "suspicious rcu_sync_is_idle() usage");
>  	return !READ_ONCE(rsp->gp_state); /* GP_IDLE */
>  }
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

^ permalink raw reply

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: Toke Høiland-Jørgensen @ 2019-07-13  8:07 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ido Schimmel, David Miller, netdev, jiri, mlxsw, dsahern, roopa,
	nikolay, andy, pablo, jakub.kicinski, pieter.jansenvanvuuren,
	andrew, f.fainelli, vivien.didelot, idosch
In-Reply-To: <20190713004011.GA24036@localhost.localdomain>

>Neil Horman <nhorman@tuxdriver.com> writes:

> On Fri, Jul 12, 2019 at 02:33:29PM +0200, Toke Høiland-Jørgensen wrote:
>> Neil Horman <nhorman@tuxdriver.com> writes:
>> 
>> > On Fri, Jul 12, 2019 at 11:27:55AM +0200, Toke Høiland-Jørgensen wrote:
>> >> Neil Horman <nhorman@tuxdriver.com> writes:
>> >> 
>> >> > On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
>> >> >> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
>> >> >> > From: Ido Schimmel <idosch@idosch.org>
>> >> >> > Date: Sun,  7 Jul 2019 10:58:17 +0300
>> >> >> > 
>> >> >> > > Users have several ways to debug the kernel and understand why a packet
>> >> >> > > was dropped. For example, using "drop monitor" and "perf". Both
>> >> >> > > utilities trace kfree_skb(), which is the function called when a packet
>> >> >> > > is freed as part of a failure. The information provided by these tools
>> >> >> > > is invaluable when trying to understand the cause of a packet loss.
>> >> >> > > 
>> >> >> > > In recent years, large portions of the kernel data path were offloaded
>> >> >> > > to capable devices. Today, it is possible to perform L2 and L3
>> >> >> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
>> >> >> > > Different TC classifiers and actions are also offloaded to capable
>> >> >> > > devices, at both ingress and egress.
>> >> >> > > 
>> >> >> > > However, when the data path is offloaded it is not possible to achieve
>> >> >> > > the same level of introspection as tools such "perf" and "drop monitor"
>> >> >> > > become irrelevant.
>> >> >> > > 
>> >> >> > > This patchset aims to solve this by allowing users to monitor packets
>> >> >> > > that the underlying device decided to drop along with relevant metadata
>> >> >> > > such as the drop reason and ingress port.
>> >> >> > 
>> >> >> > We are now going to have 5 or so ways to capture packets passing through
>> >> >> > the system, this is nonsense.
>> >> >> > 
>> >> >> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
>> >> >> > devlink thing.
>> >> >> > 
>> >> >> > This is insanity, too many ways to do the same thing and therefore the
>> >> >> > worst possible user experience.
>> >> >> > 
>> >> >> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
>> >> >> > XDP perf events, and these taps there too.
>> >> >> > 
>> >> >> > I mean really, think about it from the average user's perspective.  To
>> >> >> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
>> >> >> > listen on devlink but configure a special tap thing beforehand and then
>> >> >> > if someone is using XDP I gotta setup another perf event buffer capture
>> >> >> > thing too.
>> >> >> 
>> >> >> Dave,
>> >> >> 
>> >> >> Before I start working on v2, I would like to get your feedback on the
>> >> >> high level plan. Also adding Neil who is the maintainer of drop_monitor
>> >> >> (and counterpart DropWatch tool [1]).
>> >> >> 
>> >> >> IIUC, the problem you point out is that users need to use different
>> >> >> tools to monitor packet drops based on where these drops occur
>> >> >> (SW/HW/XDP).
>> >> >> 
>> >> >> Therefore, my plan is to extend the existing drop_monitor netlink
>> >> >> channel to also cover HW drops. I will add a new message type and a new
>> >> >> multicast group for HW drops and encode in the message what is currently
>> >> >> encoded in the devlink events.
>> >> >> 
>> >> > A few things here:
>> >> > IIRC we don't announce individual hardware drops, drivers record them in
>> >> > internal structures, and they are retrieved on demand via ethtool calls, so you
>> >> > will either need to include some polling (probably not a very performant idea),
>> >> > or some sort of flagging mechanism to indicate that on the next message sent to
>> >> > user space you should go retrieve hw stats from a given interface.  I certainly
>> >> > wouldn't mind seeing this happen, but its more work than just adding a new
>> >> > netlink message.
>> >> >
>> >> > Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
>> >> > hardware (you'll need your hw drop gathering mechanism for that), but for xdp
>> >> > programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
>> >> > program returns a DROP result for a packet being processed, the OS will call
>> >> > kfree_skb on its behalf, and dropwatch wil call that.
>> >> 
>> >> There is no skb by the time an XDP program runs, so this is not true. As
>> >> I mentioned upthread, there's a tracepoint that will get called if an
>> >> error occurs (or the program returns XDP_ABORTED), but in most cases,
>> >> XDP_DROP just means that the packet silently disappears...
>> >> 
>> > As I noted, thats only true for xdp programs that are offloaded to hardware, I
>> > was only speaking for XDP programs that run on the cpu.  For the former case, we
>> > obviously need some other mechanism to detect drops, but for cpu executed xdp
>> > programs, the OS is responsible for freeing skbs associated with programs the
>> > return XDP_DROP.
>> 
>> Ah, I think maybe you're thinking of generic XDP (also referred to as
>> skb mode)? That is a separate mode; an XDP program loaded in "native
> Yes, was I not clear about that?

No, not really. "Generic XDP" is not the same as "XDP"; the generic mode
is more of a debug mode (as far as I'm concerned at least). So in the
common case, it is absolutely not the case that the kernel will end up
calling kfree_skb after an XDP_DROP; so I got somewhat thrown off by
your insistence that it would... :)

-Toke

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2019-07-13  6:17 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Fix excessive stack usage in cxgb4, from Arnd Bergmann.

2) Missing skb queue lock init in tipc, from Chris Packham.

3) Fix some regressions in ipv6 flow label handling, from Eric Dumazet.

4) Elide flow dissection of local packets in FIB rules, from Petar
   Penkov.

5) Fix TLS support build failure in mlx5, from Tariq Toukab.

Please pull, thanks a lot.

The following changes since commit a131c2bf165684315f606fdd88cf80be22ba32f3:

  Merge tag 'acpi-5.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2019-07-11 11:17:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 25a09ce79639a8775244808c17282c491cff89cf:

  ppp: mppe: Revert "ppp: mppe: Add softdep to arc4" (2019-07-12 22:58:49 -0700)

----------------------------------------------------------------
Arnd Bergmann (2):
      davinci_cpdma: don't cast dma_addr_t to pointer
      cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()

Aya Levin (3):
      net/mlx5e: Fix return value from timeout recover function
      net/mlx5e: Fix error flow in tx reporter diagnose
      net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn

Chris Packham (1):
      tipc: ensure head->lock is initialised

Christian Lamparter (1):
      net: dsa: qca8k: replace legacy gpio include

Cong Wang (1):
      hsr: switch ->dellink() to ->ndo_uninit()

David S. Miller (4):
      Merge branch 'mlx5-build-fixes'
      Merge tag 'mlx5-fixes-2019-07-11' of git://git.kernel.org/.../saeed/linux
      Merge branch 'net/rds-fixes' of git://git.kernel.org/.../ssantosh/linux
      Merge branch 'nfp-flower-bugs'

Denis Efremov (1):
      net: phy: make exported variables non-static

Eli Britstein (1):
      net/mlx5e: Fix port tunnel GRE entropy control

Eric Biggers (1):
      ppp: mppe: Revert "ppp: mppe: Add softdep to arc4"

Eric Dumazet (3):
      ipv6: tcp: fix flowlabels reflection for RST packets
      ipv6: fix potential crash in ip6_datagram_dst_update()
      ipv6: fix static key imbalance in fl_create()

Gerd Rausch (3):
      Revert "RDS: IB: split the mr registration and invalidation path"
      rds: Accept peer connection reject messages due to incompatible version
      rds: Return proper "tos" value to user-space

Jiangfeng Xiao (1):
      net: hisilicon: Use devm_platform_ioremap_resource

Joe Perches (2):
      net: ethernet: mediatek: Fix misuses of GENMASK macro
      net: stmmac: Fix misuses of GENMASK macro

John Hurley (2):
      nfp: flower: fix ethernet check on match fields
      nfp: flower: ensure ip protocol is specified for L4 matches

Maor Gottlieb (1):
      net/mlx5: E-Switch, Fix default encap mode

Nathan Chancellor (1):
      net/mlx5e: Convert single case statement switch statements into if statements

Petar Penkov (1):
      net: fib_rules: do not flow dissect local packets

Roman Mashak (1):
      tc-tests: updated skbedit tests

Saeed Mahameed (3):
      net/mlx5e: Rx, Fix checksum calculation for new hardware
      net/mlx5e: Fix unused variable warning when CONFIG_MLX5_ESWITCH is off
      net/mlx5: E-Switch, Reduce ingress acl modify metadata stack usage

Santosh Shilimkar (2):
      rds: fix reordering with composite message notification
      rds: avoid version downgrade to legitimate newer peer connections

Taehee Yoo (1):
      net: openvswitch: do not update max_headroom if new headroom is equal to old headroom

Tariq Toukan (1):
      net/mlx5e: Fix compilation error in TLS code

Vlad Buslov (2):
      net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd()
      net/mlx5e: Provide cb_list pointer when setting up tc block on rep

yangxingwu (1):
      ipv6: Use ipv6_authlen for len

 drivers/net/dsa/qca8k.c                                          |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c                   |  19 ++++++---
 drivers/net/ethernet/hisilicon/hip04_eth.c                       |   7 +---
 drivers/net/ethernet/hisilicon/hisi_femac.c                      |   7 +---
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c                    |   7 +---
 drivers/net/ethernet/hisilicon/hns_mdio.c                        |   4 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.h                      |   2 +-
 drivers/net/ethernet/mediatek/mtk_sgmii.c                        |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h              |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h                     |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c         |  10 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c       |  34 +++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c                |   8 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c                 |   5 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c                  |   7 +++-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c                |   5 ---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c       |   9 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c            |   9 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/lib/port_tun.c           |  23 ++---------
 drivers/net/ethernet/netronome/nfp/flower/offload.c              |  28 +++++--------
 drivers/net/ethernet/stmicro/stmmac/descs.h                      |   2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c                |   4 +-
 drivers/net/ethernet/ti/davinci_cpdma.c                          |  26 ++++++------
 drivers/net/phy/phy_device.c                                     |   6 +--
 drivers/net/ppp/ppp_mppe.c                                       |   1 -
 include/linux/mlx5/mlx5_ifc.h                                    |   3 +-
 include/linux/phy.h                                              |   3 ++
 include/net/fib_rules.h                                          |   4 +-
 include/net/pkt_cls.h                                            |  10 +++++
 net/hsr/hsr_device.c                                             |  18 ++++-----
 net/hsr/hsr_device.h                                             |   1 -
 net/hsr/hsr_netlink.c                                            |   7 ----
 net/ipv6/ah6.c                                                   |   4 +-
 net/ipv6/datagram.c                                              |   2 +-
 net/ipv6/exthdrs_core.c                                          |   2 +-
 net/ipv6/ip6_flowlabel.c                                         |   9 +++--
 net/ipv6/ip6_tunnel.c                                            |   2 +-
 net/ipv6/netfilter/ip6t_ah.c                                     |   2 +-
 net/ipv6/netfilter/ip6t_ipv6header.c                             |   2 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c                          |   2 +-
 net/ipv6/netfilter/nf_log_ipv6.c                                 |   2 +-
 net/ipv6/tcp_ipv6.c                                              |   7 +++-
 net/openvswitch/datapath.c                                       |  39 +++++++++++++-----
 net/rds/connection.c                                             |   1 +
 net/rds/ib.h                                                     |   4 +-
 net/rds/ib_cm.c                                                  |   9 +----
 net/rds/ib_frmr.c                                                |  11 +++--
 net/rds/ib_send.c                                                |  29 ++++++-------
 net/rds/rdma.c                                                   |  10 -----
 net/rds/rdma_transport.c                                         |  11 +++--
 net/rds/rds.h                                                    |   1 -
 net/rds/send.c                                                   |   4 +-
 net/sched/cls_api.c                                              |   2 +-
 net/tipc/name_distr.c                                            |   2 +-
 tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 55 files changed, 328 insertions(+), 222 deletions(-)

^ permalink raw reply

* Re: [PATCH net] ppp: mppe: Revert "ppp: mppe: Add softdep to arc4"
From: David Miller @ 2019-07-13  6:07 UTC (permalink / raw)
  To: ebiggers; +Cc: netdev, linux-ppp, paulus, linux-crypto, tiwai, ard.biesheuvel
In-Reply-To: <20190712233931.17350-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@kernel.org>
Date: Fri, 12 Jul 2019 16:39:31 -0700

> From: Eric Biggers <ebiggers@google.com>
> 
> Commit 0e5a610b5ca5 ("ppp: mppe: switch to RC4 library interface"),
> which was merged through the crypto tree for v5.3, changed ppp_mppe.c to
> use the new arc4_crypt() library function rather than access RC4 through
> the dynamic crypto_skcipher API.
> 
> Meanwhile commit aad1dcc4f011 ("ppp: mppe: Add softdep to arc4") was
> merged through the net tree and added a module soft-dependency on "arc4".
> 
> The latter commit no longer makes sense because the code now uses the
> "libarc4" module rather than "arc4", and also due to the direct use of
> arc4_crypt(), no module soft-dependency is required.
> 
> So revert the latter commit.
> 
> Cc: Takashi Iwai <tiwai@suse.de>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Shannon Nelson @ 2019-07-13  5:32 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20190709023050.GC5835@lunn.ch>

On 7/8/19 7:30 PM, Andrew Lunn wrote:
>> +static int ionic_nway_reset(struct net_device *netdev)
>> +{
>> +	struct lif *lif = netdev_priv(netdev);
>> +	int err = 0;
>> +
>> +	if (netif_running(netdev))
>> +		err = ionic_reset_queues(lif);
> What does ionic_reset_queues() do? It sounds nothing like restarting
> auto negotiation?
>
>       Andrew
Basically, it's a rip-it-all-down-and-start-over way of restarting the 
connection, and is also useful for fixing queues that are misbehaving.  
It's a little old-fashioned, taken from the ixgbe example, but is 
effective when there isn't an actual "restart auto-negotiation" command 
in the firmware.

I'll try to make it a little more evident.

sln



^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Shannon Nelson @ 2019-07-13  5:16 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20190709021426.GA5835@lunn.ch>

On 7/8/19 7:14 PM, Andrew Lunn wrote:
>> +static int ionic_set_pauseparam(struct net_device *netdev,
>> +				struct ethtool_pauseparam *pause)
>> +{
>> +	struct lif *lif = netdev_priv(netdev);
>> +	struct ionic *ionic = lif->ionic;
>> +	struct ionic_dev *idev = &lif->ionic->idev;
>> +
>> +	u32 requested_pause;
>> +	u32 cur_autoneg;
>> +	int err;
>> +
>> +	cur_autoneg = idev->port_info->config.an_enable ? AUTONEG_ENABLE :
>> +								AUTONEG_DISABLE;
>> +	if (pause->autoneg != cur_autoneg) {
>> +		netdev_info(netdev, "Please use 'ethtool -s ...' to change autoneg\n");
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	/* change both at the same time */
>> +	requested_pause = PORT_PAUSE_TYPE_LINK;
>> +	if (pause->rx_pause)
>> +		requested_pause |= IONIC_PAUSE_F_RX;
>> +	if (pause->tx_pause)
>> +		requested_pause |= IONIC_PAUSE_F_TX;
>> +
>> +	if (requested_pause == idev->port_info->config.pause_type)
>> +		return 0;
>> +
>> +	idev->port_info->config.pause_type = requested_pause;
>> +
>> +	mutex_lock(&ionic->dev_cmd_lock);
>> +	ionic_dev_cmd_port_pause(idev, requested_pause);
>> +	err = ionic_dev_cmd_wait(ionic, devcmd_timeout);
>> +	mutex_unlock(&ionic->dev_cmd_lock);
>> +	if (err)
>> +		return err;
> Hi Shannon
>
> I've no idea what the firmware black box is doing, but this looks
> wrong.
>
> pause->autoneg is about if the results of auto-neg should be used or
> not. If false, just configure the MAC with the pause settings and you
> are done. If the interface is being forced, so autoneg in general is
> disabled, just configure the MAC and you are done.
>
> If pause->autoneg is true and the interface is using auto-neg as a
> whole, you pass the pause values to the PHY for it to advertise and
> trigger an auto-neg. Once autoneg has completed, and the resolved
> settings are available, the MAC is configured with the resolved
> values.
>
> Looking at this code, i don't see any difference between configuring
> the MAC or configuring the PHY. I would expect pause->autoneg to be
> part of requested_pause somehow, so the firmware knows what is should
> do.
>
> 	Andrew

In this device there's actually very little the driver can do to 
directly configure the mac or phy besides passing through to the 
firmware what the user has requested - that happens here for the pause 
values, and in ionic_set_link_ksettings() for autoneg.  The firmware is 
managing the port based on these requests with the help of internally 
configured rules defined in a customer setting.

sln


^ permalink raw reply

* Re: [PATCH v3 bpf 1/3] bpf: fix BTF verifier size resolution logic
From: Martin Lau @ 2019-07-13  4:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Alexei Starovoitov,
	daniel@iogearbox.net, Yonghong Song, andrii.nakryiko@gmail.com,
	Kernel Team
In-Reply-To: <20190712172557.4039121-2-andriin@fb.com>

On Fri, Jul 12, 2019 at 10:25:55AM -0700, Andrii Nakryiko wrote:
> BTF verifier has a size resolution bug which in some circumstances leads to
> invalid size resolution for, e.g., TYPEDEF modifier.  This happens if we have
> [1] PTR -> [2] TYPEDEF -> [3] ARRAY, in which case due to being in pointer
> context ARRAY size won't be resolved (because for pointer it doesn't matter, so
> it's a sink in pointer context), but it will be permanently remembered as zero
> for TYPEDEF and TYPEDEF will be marked as RESOLVED. Eventually ARRAY size will
> be resolved correctly, but TYPEDEF resolved_size won't be updated anymore.
> This, subsequently, will lead to erroneous map creation failure, if that
> TYPEDEF is specified as either key or value, as key_size/value_size won't
> correspond to resolved size of TYPEDEF (kernel will believe it's zero).
Thanks for the fix.

Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-13  3:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713030150.GA246587@google.com>

On Fri, Jul 12, 2019 at 11:01:50PM -0400, Joel Fernandes wrote:
> On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > > The rcu/sync code was doing its own check whether we are in a reader
> > > > section. With RCU consolidating flavors and the generic helper added in
> > > > this series, this is no longer need. We can just use the generic helper
> > > > and it results in a nice cleanup.
> > > > 
> > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > Hi Oleg,
> > > Slightly unrelated to the patch,
> > > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > > 
> > > I do understand how rcu sync and percpu rwsem works, however the comment
> > > below didn't make much sense to me. For one, there's no readers_fast anymore
> > > so I did not follow what readers_fast means. Could the comment be updated to
> > > reflect latest changes?
> > > Also could you help understand how is a writer not able to change
> > > sem->state and count the per-cpu read counters at the same time as the
> > > comment tries to say?
> > > 
> > > 	/*
> > > 	 * We are in an RCU-sched read-side critical section, so the writer
> > > 	 * cannot both change sem->state from readers_fast and start checking
> > > 	 * counters while we are here. So if we see !sem->state, we know that
> > > 	 * the writer won't be checking until we're past the preempt_enable()
> > > 	 * and that once the synchronize_rcu() is done, the writer will see
> > > 	 * anything we did within this RCU-sched read-size critical section.
> > > 	 */
> > > 
> > > Also,
> > > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > > the callbacks are the same now. I will post that as a follow-up patch to this
> > > series.
> > 
> > Hello, Joel,
> > 
> > Oleg has a set of patches updating this code that just hit mainline
> > this week.  These patches get rid of the code that previously handled
> > RCU's multiple flavors.  Or are you looking at current mainline and
> > me just missing your point?
> > 
> 
> Hi Paul,
> You are right on point. I have a bad habit of not rebasing my trees. In this
> case the feature branch of mine in concern was based on v5.1. Needless to
> say, I need to rebase my tree.
> 
> Yes, this sync clean up patch does conflict when I rebase, but other patches
> rebase just fine.
> 
> The 2 options I see are:
> 1. Let us drop this patch for now and I resend it later.
> 2. I resend all patches based on Linus's master branch.

Below is the updated patch based on Linus master branch:

---8<-----------------------

From 5f40c9a07fcf3d6dafc2189599d0ba9443097d0f Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Fri, 12 Jul 2019 12:13:27 -0400
Subject: [PATCH v2.1 3/9] rcu/sync: Remove custom check for reader-section

The rcu/sync code was doing its own check whether we are in a reader
section. With RCU consolidating flavors and the generic helper added in
this series, this is no longer need. We can just use the generic helper
and it results in a nice cleanup.

Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcu_sync.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
index 9b83865d24f9..0027d4c8087c 100644
--- a/include/linux/rcu_sync.h
+++ b/include/linux/rcu_sync.h
@@ -31,9 +31,7 @@ struct rcu_sync {
  */
 static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
 {
-	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
-			 !rcu_read_lock_bh_held() &&
-			 !rcu_read_lock_sched_held(),
+	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
 			 "suspicious rcu_sync_is_idle() usage");
 	return !READ_ONCE(rsp->gp_state); /* GP_IDLE */
 }
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Joel Fernandes @ 2019-07-13  3:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712233206.GZ26519@linux.ibm.com>

On Fri, Jul 12, 2019 at 04:32:06PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> > On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > > The rcu/sync code was doing its own check whether we are in a reader
> > > section. With RCU consolidating flavors and the generic helper added in
> > > this series, this is no longer need. We can just use the generic helper
> > > and it results in a nice cleanup.
> > > 
> > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Hi Oleg,
> > Slightly unrelated to the patch,
> > I tried hard to understand this comment below in percpu_down_read() but no dice.
> > 
> > I do understand how rcu sync and percpu rwsem works, however the comment
> > below didn't make much sense to me. For one, there's no readers_fast anymore
> > so I did not follow what readers_fast means. Could the comment be updated to
> > reflect latest changes?
> > Also could you help understand how is a writer not able to change
> > sem->state and count the per-cpu read counters at the same time as the
> > comment tries to say?
> > 
> > 	/*
> > 	 * We are in an RCU-sched read-side critical section, so the writer
> > 	 * cannot both change sem->state from readers_fast and start checking
> > 	 * counters while we are here. So if we see !sem->state, we know that
> > 	 * the writer won't be checking until we're past the preempt_enable()
> > 	 * and that once the synchronize_rcu() is done, the writer will see
> > 	 * anything we did within this RCU-sched read-size critical section.
> > 	 */
> > 
> > Also,
> > I guess we could get rid of all of the gp_ops struct stuff now that since all
> > the callbacks are the same now. I will post that as a follow-up patch to this
> > series.
> 
> Hello, Joel,
> 
> Oleg has a set of patches updating this code that just hit mainline
> this week.  These patches get rid of the code that previously handled
> RCU's multiple flavors.  Or are you looking at current mainline and
> me just missing your point?
> 

Hi Paul,
You are right on point. I have a bad habit of not rebasing my trees. In this
case the feature branch of mine in concern was based on v5.1. Needless to
say, I need to rebase my tree.

Yes, this sync clean up patch does conflict when I rebase, but other patches
rebase just fine.

The 2 options I see are:
1. Let us drop this patch for now and I resend it later.
2. I resend all patches based on Linus's master branch.

thanks,

- Joel


^ permalink raw reply

* [PATCH 2/2] net-next: ag71xx: Rearrange ag711xx struct to remove holes
From: Rosen Penev @ 2019-07-13  2:09 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20190713020921.18202-1-rosenp@gmail.com>

Removed ____cacheline_aligned attribute to ring structs. This actually
causes holes in the ag71xx struc as well as lower performance.

Rearranged struct members to fall within respective cachelines. The RX
ring struct now does not share a cacheline with the TX ring. The NAPI
atruct now takes up its own cachelines and does not share.

According to pahole -C ag71xx -c 32

Before:

struct ag71xx {
	/* size: 384, cachelines: 12, members: 22 */
	/* sum members: 375, holes: 2, sum holes: 9 */

After:

struct ag71xx {
	/* size: 376, cachelines: 12, members: 22 */
	/* last cacheline: 24 bytes */

Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 drivers/net/ethernet/atheros/ag71xx.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
index 8f450a03a885..f19711984d34 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -295,16 +295,15 @@ struct ag71xx {
 	/* Critical data related to the per-packet data path are clustered
 	 * early in this structure to help improve the D-cache footprint.
 	 */
-	struct ag71xx_ring rx_ring ____cacheline_aligned;
-	struct ag71xx_ring tx_ring ____cacheline_aligned;
-
+	struct ag71xx_ring rx_ring;
 	u16 rx_buf_size;
-	u8 rx_buf_offset;
+	u16 rx_buf_offset;
+	u32 msg_enable;
+	struct ag71xx_ring tx_ring;
 
 	struct net_device *ndev;
 	struct platform_device *pdev;
 	struct napi_struct napi;
-	u32 msg_enable;
 	const struct ag71xx_dcfg *dcfg;
 
 	/* From this point onwards we're not looking at per-packet fields. */
@@ -313,20 +312,17 @@ struct ag71xx {
 	struct ag71xx_desc *stop_desc;
 	dma_addr_t stop_desc_dma;
 
-	int phy_if_mode;
-
-	struct delayed_work restart_work;
-	struct timer_list oom_timer;
-
-	struct reset_control *mac_reset;
-
 	u32 fifodata[3];
 	int mac_idx;
+	int phy_if_mode;
 
-	struct reset_control *mdio_reset;
 	struct mii_bus *mii_bus;
 	struct clk *clk_mdio;
 	struct clk *clk_eth;
+	struct reset_control *mdio_reset;
+	struct delayed_work restart_work;
+	struct timer_list oom_timer;
+	struct reset_control *mac_reset;
 };
 
 static int ag71xx_desc_empty(struct ag71xx_desc *desc)
-- 
2.17.1


^ permalink raw reply related

* [PATCH 1/2] net-next: ag71xx: Add missing header
From: Rosen Penev @ 2019-07-13  2:09 UTC (permalink / raw)
  To: netdev

ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function

Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 drivers/net/ethernet/atheros/ag71xx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
index 72a57c6cd254..8f450a03a885 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -35,6 +35,7 @@
 #include <linux/regmap.h>
 #include <linux/reset.h>
 #include <linux/clk.h>
+#include <linux/io.h>
 
 /* For our NAPI weight bigger does *NOT* mean better - it means more
  * D-cache misses and lots more wasted cycles than we'll ever
-- 
2.17.1


^ permalink raw reply related

* LPC 2019 Networking Track CFP (reminder)
From: David Miller @ 2019-07-13  0:52 UTC (permalink / raw)
  To: netdev, daniel; +Cc: linux-wireless, netfilter-devel, bpf, linux-kernel, lwn

This is a call for proposals for the 3 day networking track at the
Linux Plumbers Conference in Lisbon, which will be happening on
September 9th-11th, 2019.

We are seeking talks of 40 minutes in length (including Q & A),
optionally accompanied by papers of 2 to 10 pages in length.  The
papers, while not required, are very strongly encouraged by the
committee.  The submitters intention to provide a paper will be taken
into consideration as a criteria when deciding which proposals to
accept.

Any kind of advanced networking-related topic will be considered.

Please submit your proposals on the LPC website at:

	https://www.linuxplumbersconf.org/event/4/abstracts/#submit-abstract

And be sure to select "Networking Summit Track" in the Track pulldown
menu.

Proposals must be submitted by August 2nd, and submitters will be
notified of acceptance by August 9th.

Final slides and papers (as PDF) are due on September 2nd.

Looking forward to seeing you all in Lisbon in September!

^ permalink raw reply

* Re: [PATCH] be2net: fix adapter->big_page_size miscaculation
From: David Miller @ 2019-07-13  0:50 UTC (permalink / raw)
  To: cai
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
	arnd, dhowells, hpa, netdev, linux-arch, linux-kernel
In-Reply-To: <EFD25845-097A-46B1-9C1A-02458883E4DA@lca.pw>

From: Qian Cai <cai@lca.pw>
Date: Fri, 12 Jul 2019 20:27:09 -0400

> Actually, GCC would consider it a const with -O2 optimized level because it found that it was never modified and it does not understand it is a module parameter. Considering the following code.
> 
> # cat const.c 
> #include <stdio.h>
> 
> static int a = 1;
> 
> int main(void)
> {
> 	if (__builtin_constant_p(a))
> 		printf("a is a const.\n");
> 
> 	return 0;
> }
> 
> # gcc -O2 const.c -o const

That's not a complete test case, and with a proper test case that
shows the externalization of the address of &a done by the module
parameter macros, gcc should not make this optimization or we should
define the module parameter macros in a way that makes this properly
clear to the compiler.

It makes no sense to hack around this locally in drivers and other
modules.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: Neil Horman @ 2019-07-13  0:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Ido Schimmel, David Miller, netdev, jiri, mlxsw, dsahern, roopa,
	nikolay, andy, pablo, jakub.kicinski, pieter.jansenvanvuuren,
	andrew, f.fainelli, vivien.didelot, idosch
In-Reply-To: <871ryvv3dy.fsf@toke.dk>

On Fri, Jul 12, 2019 at 02:33:29PM +0200, Toke Høiland-Jørgensen wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > On Fri, Jul 12, 2019 at 11:27:55AM +0200, Toke Høiland-Jørgensen wrote:
> >> Neil Horman <nhorman@tuxdriver.com> writes:
> >> 
> >> > On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
> >> >> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> >> >> > From: Ido Schimmel <idosch@idosch.org>
> >> >> > Date: Sun,  7 Jul 2019 10:58:17 +0300
> >> >> > 
> >> >> > > Users have several ways to debug the kernel and understand why a packet
> >> >> > > was dropped. For example, using "drop monitor" and "perf". Both
> >> >> > > utilities trace kfree_skb(), which is the function called when a packet
> >> >> > > is freed as part of a failure. The information provided by these tools
> >> >> > > is invaluable when trying to understand the cause of a packet loss.
> >> >> > > 
> >> >> > > In recent years, large portions of the kernel data path were offloaded
> >> >> > > to capable devices. Today, it is possible to perform L2 and L3
> >> >> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> >> >> > > Different TC classifiers and actions are also offloaded to capable
> >> >> > > devices, at both ingress and egress.
> >> >> > > 
> >> >> > > However, when the data path is offloaded it is not possible to achieve
> >> >> > > the same level of introspection as tools such "perf" and "drop monitor"
> >> >> > > become irrelevant.
> >> >> > > 
> >> >> > > This patchset aims to solve this by allowing users to monitor packets
> >> >> > > that the underlying device decided to drop along with relevant metadata
> >> >> > > such as the drop reason and ingress port.
> >> >> > 
> >> >> > We are now going to have 5 or so ways to capture packets passing through
> >> >> > the system, this is nonsense.
> >> >> > 
> >> >> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> >> >> > devlink thing.
> >> >> > 
> >> >> > This is insanity, too many ways to do the same thing and therefore the
> >> >> > worst possible user experience.
> >> >> > 
> >> >> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> >> >> > XDP perf events, and these taps there too.
> >> >> > 
> >> >> > I mean really, think about it from the average user's perspective.  To
> >> >> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> >> >> > listen on devlink but configure a special tap thing beforehand and then
> >> >> > if someone is using XDP I gotta setup another perf event buffer capture
> >> >> > thing too.
> >> >> 
> >> >> Dave,
> >> >> 
> >> >> Before I start working on v2, I would like to get your feedback on the
> >> >> high level plan. Also adding Neil who is the maintainer of drop_monitor
> >> >> (and counterpart DropWatch tool [1]).
> >> >> 
> >> >> IIUC, the problem you point out is that users need to use different
> >> >> tools to monitor packet drops based on where these drops occur
> >> >> (SW/HW/XDP).
> >> >> 
> >> >> Therefore, my plan is to extend the existing drop_monitor netlink
> >> >> channel to also cover HW drops. I will add a new message type and a new
> >> >> multicast group for HW drops and encode in the message what is currently
> >> >> encoded in the devlink events.
> >> >> 
> >> > A few things here:
> >> > IIRC we don't announce individual hardware drops, drivers record them in
> >> > internal structures, and they are retrieved on demand via ethtool calls, so you
> >> > will either need to include some polling (probably not a very performant idea),
> >> > or some sort of flagging mechanism to indicate that on the next message sent to
> >> > user space you should go retrieve hw stats from a given interface.  I certainly
> >> > wouldn't mind seeing this happen, but its more work than just adding a new
> >> > netlink message.
> >> >
> >> > Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
> >> > hardware (you'll need your hw drop gathering mechanism for that), but for xdp
> >> > programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
> >> > program returns a DROP result for a packet being processed, the OS will call
> >> > kfree_skb on its behalf, and dropwatch wil call that.
> >> 
> >> There is no skb by the time an XDP program runs, so this is not true. As
> >> I mentioned upthread, there's a tracepoint that will get called if an
> >> error occurs (or the program returns XDP_ABORTED), but in most cases,
> >> XDP_DROP just means that the packet silently disappears...
> >> 
> > As I noted, thats only true for xdp programs that are offloaded to hardware, I
> > was only speaking for XDP programs that run on the cpu.  For the former case, we
> > obviously need some other mechanism to detect drops, but for cpu executed xdp
> > programs, the OS is responsible for freeing skbs associated with programs the
> > return XDP_DROP.
> 
> Ah, I think maybe you're thinking of generic XDP (also referred to as
> skb mode)? That is a separate mode; an XDP program loaded in "native
Yes, was I not clear about that?
Neil

> mode" (or "driver mode") runs on the CPU, but before the skb is created;
> this is the common case for XDP, and there is no skb and thus no drop
> notification in this mode.
> 
> There is *also* an offload mode for XDP programs, but that is only
> supported by netronome cards thus far, so not as commonly used...
> 
> -Toke
> 

^ permalink raw reply

* Re: [GIT PULL] 9p updates for 5.3
From: pr-tracker-bot @ 2019-07-13  0:40 UTC (permalink / raw)
  To: Dominique Martinet; +Cc: Linus Torvalds, v9fs-developer, linux-kernel, netdev
In-Reply-To: <20190712080446.GA19400@nautica>

The pull request you sent on Fri, 12 Jul 2019 10:04:46 +0200:

> git://github.com/martinetd/linux tags/9p-for-5.3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/23bbbf5c1fb3ddf104c2ddbda4cc24ebe53a3453

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [PATCH] be2net: fix adapter->big_page_size miscaculation
From: Qian Cai @ 2019-07-13  0:27 UTC (permalink / raw)
  To: David Miller
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
	arnd, dhowells, hpa, netdev, linux-arch, linux-kernel
In-Reply-To: <20190712.154606.493382088615011132.davem@davemloft.net>



> On Jul 12, 2019, at 6:46 PM, David Miller <davem@davemloft.net> wrote:
> 
> From: Qian Cai <cai@lca.pw>
> Date: Fri, 12 Jul 2019 15:23:21 -0400
> 
>> The commit d66acc39c7ce ("bitops: Optimise get_order()") introduced a
>> problem for the be2net driver as "rx_frag_size" could be a module
>> parameter that can be changed while loading the module.
> 
> Why is this a problem?

Well, for example, if rx_frag_size was set to 8096 when loading the module, the kernel has already used the default value 2048 during compilation time.

> 
>> That commit checks __builtin_constant_p() first in get_order() which
>> cause "adapter->big_page_size" to be assigned a value based on the
>> the default "rx_frag_size" value at the compilation time. It also
>> generate a compilation warning,
> 
> rx_frag_size is not a constant, therefore the __builtin_constant_p()
> test should not pass.
> 
> This explanation doesn't seem valid.

Actually, GCC would consider it a const with -O2 optimized level because it found that it was never modified and it does not understand it is a module parameter. Considering the following code.

# cat const.c 
#include <stdio.h>

static int a = 1;

int main(void)
{
	if (__builtin_constant_p(a))
		printf("a is a const.\n");

	return 0;
}

# gcc -O2 const.c -o const

# ./const 
a is a const.

^ permalink raw reply

* Re: [PATCH] [net-next] cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()
From: Joe Perches @ 2019-07-13  0:14 UTC (permalink / raw)
  To: David Miller, arnd
  Cc: vishal, rahul.lakkireddy, ganeshgr, alexios.zavras, arjun,
	surendra, netdev, linux-kernel, clang-built-linux
In-Reply-To: <20190712.153632.1007215196498198399.davem@davemloft.net>

On Fri, 2019-07-12 at 15:36 -0700, David Miller wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> Date: Fri, 12 Jul 2019 11:06:33 +0200
> 
> > The cudbg_collect_mem_region() and cudbg_read_fw_mem() both use several
> > hundred kilobytes of kernel stack space.

Several hundred 'kilo' bytes?
I hope not.


^ permalink raw reply

* [PATCH net] ppp: mppe: Revert "ppp: mppe: Add softdep to arc4"
From: Eric Biggers @ 2019-07-12 23:39 UTC (permalink / raw)
  To: netdev, linux-ppp, David S . Miller, Paul Mackerras
  Cc: linux-crypto, Takashi Iwai, Ard Biesheuvel

From: Eric Biggers <ebiggers@google.com>

Commit 0e5a610b5ca5 ("ppp: mppe: switch to RC4 library interface"),
which was merged through the crypto tree for v5.3, changed ppp_mppe.c to
use the new arc4_crypt() library function rather than access RC4 through
the dynamic crypto_skcipher API.

Meanwhile commit aad1dcc4f011 ("ppp: mppe: Add softdep to arc4") was
merged through the net tree and added a module soft-dependency on "arc4".

The latter commit no longer makes sense because the code now uses the
"libarc4" module rather than "arc4", and also due to the direct use of
arc4_crypt(), no module soft-dependency is required.

So revert the latter commit.

Cc: Takashi Iwai <tiwai@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 drivers/net/ppp/ppp_mppe.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
index bd3c80b0bc77d..de3b57d09d0cb 100644
--- a/drivers/net/ppp/ppp_mppe.c
+++ b/drivers/net/ppp/ppp_mppe.c
@@ -64,7 +64,6 @@ MODULE_AUTHOR("Frank Cusack <fcusack@fcusack.com>");
 MODULE_DESCRIPTION("Point-to-Point Protocol Microsoft Point-to-Point Encryption support");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("ppp-compress-" __stringify(CI_MPPE));
-MODULE_SOFTDEP("pre: arc4");
 MODULE_VERSION("1.0.2");

 #define SHA1_PAD_SIZE 40
-- 
2.22.0

^ permalink raw reply related

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-12 23:32 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712213559.GA175138@google.com>

On Fri, Jul 12, 2019 at 05:35:59PM -0400, Joel Fernandes wrote:
> On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > The rcu/sync code was doing its own check whether we are in a reader
> > section. With RCU consolidating flavors and the generic helper added in
> > this series, this is no longer need. We can just use the generic helper
> > and it results in a nice cleanup.
> > 
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Hi Oleg,
> Slightly unrelated to the patch,
> I tried hard to understand this comment below in percpu_down_read() but no dice.
> 
> I do understand how rcu sync and percpu rwsem works, however the comment
> below didn't make much sense to me. For one, there's no readers_fast anymore
> so I did not follow what readers_fast means. Could the comment be updated to
> reflect latest changes?
> Also could you help understand how is a writer not able to change
> sem->state and count the per-cpu read counters at the same time as the
> comment tries to say?
> 
> 	/*
> 	 * We are in an RCU-sched read-side critical section, so the writer
> 	 * cannot both change sem->state from readers_fast and start checking
> 	 * counters while we are here. So if we see !sem->state, we know that
> 	 * the writer won't be checking until we're past the preempt_enable()
> 	 * and that once the synchronize_rcu() is done, the writer will see
> 	 * anything we did within this RCU-sched read-size critical section.
> 	 */
> 
> Also,
> I guess we could get rid of all of the gp_ops struct stuff now that since all
> the callbacks are the same now. I will post that as a follow-up patch to this
> series.

Hello, Joel,

Oleg has a set of patches updating this code that just hit mainline
this week.  These patches get rid of the code that previously handled
RCU's multiple flavors.  Or are you looking at current mainline and
me just missing your point?

							Thanx, Paul

> thanks!
> 
>  - Joel
> 
> 
> > ---
> > Please note: Only build and boot tested this particular patch so far.
> > 
> >  include/linux/rcu_sync.h |  5 ++---
> >  kernel/rcu/sync.c        | 22 ----------------------
> >  2 files changed, 2 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > index 6fc53a1345b3..c954f1efc919 100644
> > --- a/include/linux/rcu_sync.h
> > +++ b/include/linux/rcu_sync.h
> > @@ -39,9 +39,8 @@ extern void rcu_sync_lockdep_assert(struct rcu_sync *);
> >   */
> >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> >  {
> > -#ifdef CONFIG_PROVE_RCU
> > -	rcu_sync_lockdep_assert(rsp);
> > -#endif
> > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > +			 "suspicious rcu_sync_is_idle() usage");
> >  	return !rsp->gp_state; /* GP_IDLE */
> >  }
> >  
> > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > index a8304d90573f..535e02601f56 100644
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -10,37 +10,25 @@
> >  #include <linux/rcu_sync.h>
> >  #include <linux/sched.h>
> >  
> > -#ifdef CONFIG_PROVE_RCU
> > -#define __INIT_HELD(func)	.held = func,
> > -#else
> > -#define __INIT_HELD(func)
> > -#endif
> > -
> >  static const struct {
> >  	void (*sync)(void);
> >  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> >  	void (*wait)(void);
> > -#ifdef CONFIG_PROVE_RCU
> > -	int  (*held)(void);
> > -#endif
> >  } gp_ops[] = {
> >  	[RCU_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_held)
> >  	},
> >  	[RCU_SCHED_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_sched_held)
> >  	},
> >  	[RCU_BH_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_bh_held)
> >  	},
> >  };
> >  
> > @@ -49,16 +37,6 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
> >  
> >  #define	rss_lock	gp_wait.lock
> >  
> > -#ifdef CONFIG_PROVE_RCU
> > -void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
> > -{
> > -	RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
> > -			 "suspicious rcu_sync_is_idle() usage");
> > -}
> > -
> > -EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
> > -#endif
> > -
> >  /**
> >   * rcu_sync_init() - Initialize an rcu_sync structure
> >   * @rsp: Pointer to rcu_sync structure to be initialized
> > -- 
> > 2.22.0.510.g264f2c817a-goog
> > 
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox