Netdev List

Netdev List
 help / color / mirror / Atom feed

* pull request (net-next): ipsec-next 2017-04-28
From: Steffen Klassert @ 2017-04-28  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

Just one patch to fix a misplaced spin_unlock_bh in an error path.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit e2989ee9746b3f2e78d1a39bbc402d884e8b8bf1:

  bpf, doc: update list of architectures that do eBPF JIT (2017-04-23 15:56:48 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to e892d2d40445a14a19530a2be8c489b87bcd7c19:

  esp: Fix misplaced spin_unlock_bh. (2017-04-24 07:56:31 +0200)

----------------------------------------------------------------
Steffen Klassert (1):
      esp: Fix misplaced spin_unlock_bh.

 net/ipv4/esp4.c | 6 +-----
 net/ipv6/esp6.c | 6 +-----
 2 files changed, 2 insertions(+), 10 deletions(-)

^ permalink raw reply

* [PATCH] esp: Fix misplaced spin_unlock_bh.
From: Steffen Klassert @ 2017-04-28  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1493368958-29609-1-git-send-email-steffen.klassert@secunet.com>

A recent commit moved esp_alloc_tmp() out of a lock
protected region, but forgot to remove the unlock from
the error path. This patch removes the forgotten unlock.
While at it, remove some unneeded error assignments too.

Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output")
Fixes: 383d0350f2cc ("esp6: Reorganize esp_output")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/esp4.c | 6 +-----
 net/ipv6/esp6.c | 6 +-----
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 7e501ad..7f2caf7 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -356,11 +356,8 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info *
 	ivlen = crypto_aead_ivsize(aead);
 
 	tmp = esp_alloc_tmp(aead, esp->nfrags + 2, extralen);
-	if (!tmp) {
-		spin_unlock_bh(&x->lock);
-		err = -ENOMEM;
+	if (!tmp)
 		goto error;
-	}
 
 	extra = esp_tmp_extra(tmp);
 	iv = esp_tmp_iv(aead, tmp, extralen);
@@ -389,7 +386,6 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info *
 		spin_lock_bh(&x->lock);
 		if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
 			spin_unlock_bh(&x->lock);
-			err = -ENOMEM;
 			goto error;
 		}
 
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 8b55abf..1fe99ba 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -330,11 +330,8 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info
 	ivlen = crypto_aead_ivsize(aead);
 
 	tmp = esp_alloc_tmp(aead, esp->nfrags + 2, seqhilen);
-	if (!tmp) {
-		spin_unlock_bh(&x->lock);
-		err = -ENOMEM;
+	if (!tmp)
 		goto error;
-	}
 
 	seqhi = esp_tmp_seqhi(tmp);
 	iv = esp_tmp_iv(aead, tmp, seqhilen);
@@ -362,7 +359,6 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info
 		spin_lock_bh(&x->lock);
 		if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
 			spin_unlock_bh(&x->lock);
-			err = -ENOMEM;
 			goto error;
 		}
 
-- 
2.7.4

^ permalink raw reply related

* Re: [GIT 0/1] IPVS Fixes for v4.11
From: Simon Horman @ 2017-04-28 10:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov
In-Reply-To: <20170428095816.6588-1-horms@verge.net.au>

Sorry, I messed this up.
I will repost.

On Fri, Apr 28, 2017 at 11:58:15AM +0200, Simon Horman wrote:
> Hi Pablo,
> 
> please consider this fix to IPVS for v4.11.
> Or if it is too late for v4.11 please consider it for v4.12.
> I would also like it considered for stable.
> 
> * Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
>   to avoid oops caused by IPVS accesing IPv6 routing code in such
>   circumstances.
> 
> The following changes since commit 1debdc8f9ebd07daf140e417b3841596911e0066:
> 
>   sh_eth: unmap DMA buffers when freeing rings (2017-04-18 22:04:32 -0400)
> 
> are available in the git repository at:
> 
>   http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git ipvs-fixes-for-v4.11
> 
> for you to fetch changes up to 8f8688b0d483ff06236808ab5fc8bc83c5eaa8d9:
> 
>   ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled (2017-04-24 11:53:55 +0200)
> 
> ----------------------------------------------------------------
> Paolo Abeni (1):
>       ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
> 
>  net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> -- 
> 2.12.2.816.g2cccc81164
> 

^ permalink raw reply

* [PATCH 1/1] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Simon Horman @ 2017-04-28  9:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Paolo Abeni, Simon Horman
In-Reply-To: <20170428095816.6588-1-horms@verge.net.au>

From: Paolo Abeni <pabeni@redhat.com>

When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.

This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.

This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.

According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")

Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5aeb0dde6ccc..4d753beaac32 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3078,6 +3078,17 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb,
 	return skb->len;
 }
 
+static bool ip_vs_is_af_valid(int af)
+{
+	if (af == AF_INET)
+		return true;
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6 && ipv6_mod_enabled())
+		return true;
+#endif
+	return false;
+}
+
 static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 				    struct ip_vs_service_user_kern *usvc,
 				    struct nlattr *nla, int full_entry,
@@ -3104,11 +3115,7 @@ static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 	memset(usvc, 0, sizeof(*usvc));
 
 	usvc->af = nla_get_u16(nla_af);
-#ifdef CONFIG_IP_VS_IPV6
-	if (usvc->af != AF_INET && usvc->af != AF_INET6)
-#else
-	if (usvc->af != AF_INET)
-#endif
+	if (!ip_vs_is_af_valid(usvc->af))
 		return -EAFNOSUPPORT;
 
 	if (nla_fwmark) {
@@ -3610,6 +3617,11 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
 		if (udest.af == 0)
 			udest.af = svc->af;
 
+		if (!ip_vs_is_af_valid(udest.af)) {
+			ret = -EAFNOSUPPORT;
+			goto out;
+		}
+
 		if (udest.af != svc->af && cmd != IPVS_CMD_DEL_DEST) {
 			/* The synchronization protocol is incompatible
 			 * with mixed family services
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [GIT 0/1] IPVS Fixes for v4.11
From: Simon Horman @ 2017-04-28  9:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman

Hi Pablo,

please consider this fix to IPVS for v4.11.
Or if it is too late for v4.11 please consider it for v4.12.
I would also like it considered for stable.

* Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
  to avoid oops caused by IPVS accesing IPv6 routing code in such
  circumstances.

The following changes since commit 1debdc8f9ebd07daf140e417b3841596911e0066:

  sh_eth: unmap DMA buffers when freeing rings (2017-04-18 22:04:32 -0400)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git ipvs-fixes-for-v4.11

for you to fetch changes up to 8f8688b0d483ff06236808ab5fc8bc83c5eaa8d9:

  ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled (2017-04-24 11:53:55 +0200)

----------------------------------------------------------------
Paolo Abeni (1):
      ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled

 net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

-- 
2.12.2.816.g2cccc81164


^ permalink raw reply

* [PATCH 2/2] xfrm: fix GRO for !CONFIG_NETFILTER
From: Steffen Klassert @ 2017-04-28  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1493370873-30836-1-git-send-email-steffen.klassert@secunet.com>

From: Sabrina Dubroca <sd@queasysnail.net>

In xfrm_input() when called from GRO, async == 0, and we end up
skipping the processing in xfrm4_transport_finish(). GRO path will
always skip the NF_HOOK, so we don't need the special-case for
!NETFILTER during GRO processing.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 46bdb4f..e23570b 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -395,7 +395,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		if (xo)
 			xfrm_gro = xo->flags & XFRM_GRO;
 
-		err = x->inner_mode->afinfo->transport_finish(skb, async);
+		err = x->inner_mode->afinfo->transport_finish(skb, xfrm_gro || async);
 		if (xfrm_gro) {
 			skb_dst_drop(skb);
 			gro_cells_receive(&gro_cells, skb);
-- 
2.7.4

^ permalink raw reply related

* pull request (net): ipsec 2017-04-28
From: Steffen Klassert @ 2017-04-28  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Do garbage collecting after a policy flush to remove old
   bundles immediately. From Xin Long.

2) Fix GRO if netfilter is not defined.
   From Sabrina Dubroca.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit fd2c83b35752f0a8236b976978ad4658df14a59f:

  net/packet: check length in getsockopt() called with PACKET_HDRLEN (2017-04-25 14:05:52 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to cfcf99f987ba321a3d122580716beb9b08d52eb8:

  xfrm: fix GRO for !CONFIG_NETFILTER (2017-04-27 12:20:19 +0200)

----------------------------------------------------------------
Sabrina Dubroca (1):
      xfrm: fix GRO for !CONFIG_NETFILTER

Xin Long (1):
      xfrm: do the garbage collection after flushing policy

 net/xfrm/xfrm_input.c  | 2 +-
 net/xfrm/xfrm_policy.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH 1/2] xfrm: do the garbage collection after flushing policy
From: Steffen Klassert @ 2017-04-28  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1493370873-30836-1-git-send-email-steffen.klassert@secunet.com>

From: Xin Long <lucien.xin@gmail.com>

Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
These is no reason not to do it after flushing policies, especially
considering that 'garbage collection deferred' is only triggered
when it reaches gc_thresh.

It's no good that the policy is gone but the xdst still hold there.
The worse thing is that xdst->route/orig_dst is also hold and can
not be released even if the orig_dst is already expired.

This patch is to do the garbage collection if there is any policy
removed in xfrm_policy_flush.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 236cbbc..dfc77b9 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1006,6 +1006,10 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
 		err = -ESRCH;
 out:
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+
+	if (cnt)
+		xfrm_garbage_collect(net);
+
 	return err;
 }
 EXPORT_SYMBOL(xfrm_policy_flush);
-- 
2.7.4

^ permalink raw reply related

* (unknown), 
From: администратор @ 2017-04-28  9:09 UTC (permalink / raw)


внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...635829wjxnxl....74990.RU.2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply

* RE: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Gabriele Paoloni @ 2017-04-28  9:12 UTC (permalink / raw)
  To: Casey Leedom, Bjorn Helgaas, Alexander Duyck
  Cc: Dingtianhong, Mark Rutland, Amir Ancel, linux-pci@vger.kernel.org,
	Catalin Marinas, Will Deacon, Linuxarm, David Laight,
	jeffrey.t.kirsher@intel.com, netdev@vger.kernel.org, Robin Murphy,
	davem@davemloft.net, linux-arm-kernel@lists.infradead.org
In-Reply-To: <MWHPR12MB1600CB0756EA24211C93E053C8100@MWHPR12MB1600.namprd12.prod.outlook.com>

Hi Casey

Many thanks for the detailed explanation

> -----Original Message-----
> From: Casey Leedom [mailto:leedom@chelsio.com]
> Sent: 27 April 2017 21:35
> To: Bjorn Helgaas; Alexander Duyck
> Cc: Dingtianhong; Mark Rutland; Amir Ancel; Gabriele Paoloni; linux-
> pci@vger.kernel.org; Catalin Marinas; Will Deacon; Linuxarm; David
> Laight; jeffrey.t.kirsher@intel.com; netdev@vger.kernel.org; Robin
> Murphy; davem@davemloft.net; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the
> ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
> 
> | From: Bjorn Helgaas <helgaas@kernel.org>
> | Sent: Thursday, April 27, 2017 10:19 AM
> |
> | Are you hinting that the PCI core or arch code could actually
> *enable*
> | Relaxed Ordering without the driver doing anything?  Is it safe to do
> that?
> | Is there such a thing as a device that is capable of using RO, but
> where the
> | driver must be aware of it being enabled, so it programs the device
> | appropriately?
> 
>   I forgot to reply to this portion of Bjorn's email.
> 
>   The PCI Configuration Space PCI Capability Device Control[Enable
> Relaxed
> Ordering] bit governs enabling the _ability_ for the PCIe Device to
> send
> TLPs with the Relaxed Ordering Attribute set.  It does not _cause_ RO
> to be
> set on TLPs.  Doing that would almost certainly cause Data Corruption
> Bugs
> since you only want a subset of TLPs to have RO set.
> 
>   For instance, we typically use RO for Ingress Packet Data delivery
> but
> non-RO for messages notifying the Host that an Ingress Packet has been
> delivered.  This ensures that the "Ingress Packet Delivered" non-RO TLP
> is
> processed _after_ any preceding RO TLPs delivering the actual Ingress
> Packet
> Data.
> 
>   In the above scenario, if one were to turn off Enable Relaxed
> Ordering via
> the PCIe Capability, then the on-chip PCIe engine would simply never
> send a
> TLP with the Relaxed Ordering Attribute set, regardless of any other
> chip
> programming.
> 
>   And finally, just to be absolutely clear, using Relaxed Ordering
> isn't and
> "Architecture Thing".  It's a PCIe Fabric End Point Thing.  Many End
> Points
> simply ignore the Relaxed Ordering Attribute (except to reflect it back
> in
> Response TLPs).  In this sense, Relaxed Ordering simply provides
> potentially useful optimization information to the PCIe End Point.

I think your view matches what I found out about the current usage of the
"Enable Relaxed Ordering" bit in Linux mainline: i.e. looking at where and
why the other drivers set/clear the "Enable Relaxed Ordering" they do not
look for any global symbol, nor they look at the host architecture.

So with respect to this specific ixgbe driver I guess the main question is
why RO was disabled by default by Intel for this EP (commit 3d5c520727ce
mentions issues with "some chipsets"), then why it is safe to enable it back
on SPARC....?

Thanks
Gab

> 
> Casey

^ permalink raw reply

* Re: [PATCH net] esp: skip GRO for fragmented packets
From: Sabrina Dubroca @ 2017-04-28  9:04 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Herbert Xu
In-Reply-To: <20170427104334.GE2649@secunet.com>

2017-04-27, 12:43:35 +0200, Steffen Klassert wrote:
> On Thu, Apr 27, 2017 at 12:31:14PM +0200, Sabrina Dubroca wrote:
> > Currently, ESP4 GRO doesn't work for fragmented packets, so let's send
> > these through the normal path.
> > 
> > Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> > Steffen, if you prefer to drop this patch and fix this properly,
> > that's okay for me. I can't look much deeper into this right now and
> > it's broken on current net/master.
> 
> I did a fix for this last week, but forgot to submit it.
> We can fix this in inet_gro_receive(), as no GRO handler
> can really handle fragmented packets.
> 
> I'll plan to fix it with this patch:

Yeah, that looks okay to me, thanks.
Let's make sure it ends up in 4.11 (or an early 4.11.x).

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Miroslav Lichvar @ 2017-04-28  8:54 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <CAF=yD-+GSK491AWQx8=6yd3=-HHwxdWq677ubwdjbV5AXzRbog@mail.gmail.com>

On Wed, Apr 26, 2017 at 08:00:02PM -0400, Willem de Bruijn wrote:
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 81ef53f..42bff22 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -3300,8 +3300,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
> >
> >  static inline void sw_tx_timestamp(struct sk_buff *skb)
> >  {
> > -       if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
> > -           !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
> > +       if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
> >                 skb_tstamp_tx(skb, NULL);
> >  }

> > +++ b/net/core/skbuff.c
> > @@ -3874,6 +3874,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >         if (!sk)
> >                 return;
> >
> > +       if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
> > +           skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
> > +               return;
> > +
> 
> This check should only happen for software transmit timestamps, so simpler to
> revise the check in sw_tx_timestamp above to
> 
>   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
> -        !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
> +      (!(skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)) ||
> +      (skb->sk && skb->sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)

I'm not sure if this can work. sk_buff.h would need to include sock.h
in order to get the definition of struct sock. Any suggestions?

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Lucas Stach @ 2017-04-28  8:51 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alexander Duyck, Ding Tianhong, Mark Rutland, Amir Ancel,
	Gabriele Paoloni, linux-pci@vger.kernel.org, Catalin Marinas,
	Will Deacon, LinuxArm, David Laight, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org, Robin Murphy, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, Casey Leedom
In-Reply-To: <20170427171938.GA10705@bhelgaas-glaptop.roam.corp.google.com>

Am Donnerstag, den 27.04.2017, 12:19 -0500 schrieb Bjorn Helgaas:
> [+cc Casey]
> 
> On Wed, Apr 26, 2017 at 09:18:33AM -0700, Alexander Duyck wrote:
> > On Wed, Apr 26, 2017 at 2:26 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
> > > Hi Amir:
> > >
> > > It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx,
> > > we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO,
> > > we could try to restart this solution, what do you think about it. :)
> > >
> > > Thanks
> > > Ding
> > 
> > Hi Ding,
> > 
> > Enabing relaxed ordering really doesn't have any place in ethtool. It
> > is a PCIe attribute that you are essentially wanting to enable.
> > 
> > It might be worth while to take a look at updating the PCIe code path
> > to handle this. Really what we should probably do is guarantee that
> > the architectures that need relaxed ordering are setting it in the
> > PCIe Device Control register and that the ones that don't are clearing
> > the bit. It's possible that this is already occurring, but I don't
> > know the state of handling those bits is in the kernel. Once we can
> > guarantee that we could use that to have the drivers determine their
> > behavior in regards to relaxed ordering. For example in the case of
> > igb/ixgbe we could probably change the behavior so that it will bey
> > default try to use relaxed ordering but if it is not enabled in PCIe
> > Device Control register the hardware should not request to use it. It
> > would simplify things in the drivers and allow for each architecture
> > to control things as needed in their PCIe code.
> 
> I thought Relaxed Ordering was an optimization.  Are there cases where
> it is actually required for correct behavior?

Yes, at least the Tegra 2 TRM claims that RO needs to be enabled on the
device side for correct operation with the following language:

"Tegra 2 requires relaxed ordering for responses to downstream requests
(responses can pass writes). It is possible in some circumstances for
PCIe transfers from an external bus masters (i.e. upstream transfers) to
become blocked by a downstream read or non-posted write. The responses
to these downstream requests are blocked by upstream posted writes only
when PCIe strict ordering is imposed. It is therefore necessary to never
impose strict ordering that would block a response to a downstream
NPW/read request and always set the relaxed ordering bit to 1. Only
devices that are capable of relaxed ordering may be used with Tegra 2
devices."

Regards,
Lucas

^ permalink raw reply

* Re: Network cooling device and how to control NIC speed on thermal condition
From: Waldemar Rymarkiewicz @ 2017-04-28  8:04 UTC (permalink / raw)
  To: Alan Cox, Andrew Lunn, Florian Fainelli; +Cc: netdev, linux-kernel
In-Reply-To: <20170425144501.0cfe27a5@lxorguk.ukuu.org.uk>

On 25 April 2017 at 15:45, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote:
>> I am looking on Linux thermal framework and on how to cool down the
>> system effectively when it hits thermal condition. Already existing
>> cooling methods cpu_cooling and clock_cooling are good. However, I
>> wanted to go further and dynamically control also a switch ports'
>> speed based on thermal condition. Lowering speed means less power,
>> less power means lower temp.
>>
>> Is there any in-kernel interface to configure switch port/NIC from other driver?
>
> No but you can always hook that kind of functionality to the thermal
> daemon. However I'd be careful with your assumptions. Lower speed also
> means more time active.
>
> https://github.com/01org/thermal_daemon

This is one of the option indeed. Will consider this option as well. I
would see, however,  a generic solution in the kernel  (configurable
of course) as every network device can generate higher heat with
higher link speed.

> For example if you run a big encoding job on an atom instead of an Intel
> i7, the atom will often not only take way longer but actually use more
> total power than the i7 did.
>
> Thus it would often be far more efficient to time synchronize your
> systems, batch up data on the collecting end, have the processing node
> wake up on an alarm, collect data from the other node and then actually
> go back into suspend.

Yes, that's true in a normal thermal conditions. However, if the
platform reaches max temp trip we don't really care about performance
and time efficiency  we just try to avoid critical trip and system
shutdown by cooling the system eg. lowering cpu freq, limiting usb phy
speed, or net  link speed etc.

I did a quick test to show you what I am about.

I collect SoC temp every a few secs. Meantime, I use ethtool -s ethX
speed <speed> to manipulate link speed and to see how it impacts SoC
temp. My 4 PHYs and switch are integrated into SoC and I always
change link speed for all PHYs , no traffic on the link for this test.
Starting with 1Gb/s and then scaling down to 100 Mb/s and then to
10Mb/s, I see significant  ~10 *C drop in temp while link is set to
10Mb/s.

So, throttling link speed can really help to dissipate heat
significantly when the platform is under threat.

Renegotiating link speed costs something I agree, it also impacts user
experience, but such a thermal condition will not occur often I
believe.

/Waldek

^ permalink raw reply

* Re: ipsec doesn't route TCP with 4.11 kernel
From: Steffen Klassert @ 2017-04-28  7:13 UTC (permalink / raw)
  To: Don Bowman
  Cc: Cong Wang, linux-kernel@vger.kernel.org, Herbert Xu,
	Linux Kernel Network Developers
In-Reply-To: <CADJev7_=YEHmijGweqZvdATMQVuzwywEbBKweYvPurJfTEQRjQ@mail.gmail.com>

On Thu, Apr 27, 2017 at 06:13:38PM -0400, Don Bowman wrote:
> On 27 April 2017 at 04:42, Steffen Klassert <steffen.klassert@secunet.com>
> wrote:
> > On Wed, Apr 26, 2017 at 10:01:34PM -0700, Cong Wang wrote:
> >> (Cc'ing netdev and IPSec maintainers)
> >>
> >> On Tue, Apr 25, 2017 at 6:08 PM, Don Bowman <db@donbowman.ca> wrote:
> 
> for 'esp' question, i have ' esp = aes256-sha256-modp1536!' is that what
> you mean?
> its nat-aware tunnel [from my desktop pc to my office]
> 
> root@office:~# ip -s x s
> src 172.16.0.8 dst 64.7.137.180
>         proto esp spi 0x0d588366(223904614) reqid 1(0x00000001) mode tunnel
>         replay-window 0 seq 0x00000000 flag af-unspec (0x00100000)
>         auth-trunc hmac(sha256)
> 0x046cafdf19c5d78d1c29165d96a0b9fce1c500029d77be0fe956dce1bf80a86a (256
> bits) 128
>         enc cbc(aes)
> 0x79ff2fbc2178eb468de6ff16612f0603b514a1d1d5f375c67222294463ec7c62 (256
> bits)
>         encap type espinudp sport 4500 dport 4500 addr 0.0.0.0

Ok, this is espinudp. This information was important.

> 
> I'm not sure what you mean the receiving interface, you mean the outer, the
> native interface?
> listening on eno1, link-type EN10MB (Ethernet), capture size 262144 bytes
> 18:11:32.061501 IP 172.16.0.8.3416 > 64.7.137.180.33638:
> truncated-udplength 0
> 18:11:32.788091 IP 64.7.137.180.4500 > 172.16.0.8.4500: NONESP-encap:
> isakmp: child_sa  inf2
> 18:11:32.788354 IP 172.16.0.8.4500 > 64.7.137.180.4500: NONESP-encap:
> isakmp: child_sa  inf2[IR]
> 18:11:33.066830 IP 172.16.0.8.3416 > 64.7.137.180.33638:
> truncated-udplength 0
> 18:11:35.082839 IP 172.16.0.8.3416 > 64.7.137.180.33638:
> truncated-udplength 0

This is not a GRO issue as I thought, the TX side is already broken.

Could you please try the patch below?

Subject: [PATCH] esp4: Fix udpencap for local TCP packets.

Locally generated TCP packets are usually cloned, so we
do skb_cow_data() on this packets. After that we need to
reload the pointer to the esp header. On udpencap this
header has an offset to skb_transport_header, so take this
offset into account.

Fixes: commit cac2661c53f ("esp4: Avoid skb_cow_data whenever possible")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/esp4.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index b1e2444..ab71fbb 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -223,6 +223,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	int extralen;
 	int tailen;
 	__be64 seqno;
+	int esp_offset = 0;
 	__u8 proto = *skb_mac_header(skb);
 
 	/* skb is pure payload to encrypt */
@@ -288,6 +289,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 			break;
 		}
 
+		esp_offset = (unsigned char *)esph - (unsigned char *)uh;
+
 		*skb_mac_header(skb) = IPPROTO_UDP;
 	}
 
@@ -397,7 +400,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 		goto error;
 	nfrags = err;
 	tail = skb_tail_pointer(trailer);
-	esph = ip_esp_hdr(skb);
+	esph = (struct ip_esp_hdr *)(skb_transport_header(skb) + esp_offset);
 
 skip_cow:
 	esp_output_fill_trailer(tail, tfclen, plen, proto);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-28  7:02 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: davem, xiyou.wangcong, eric.dumazet, netdev, Simon Horman,
	Benjamin LaHaise
In-Reply-To: <c1680135-8c9f-14d0-c65e-7a5bb7d8d661@mojatatu.com>

Fri, Apr 28, 2017 at 03:22:53AM CEST, jhs@mojatatu.com wrote:
>On 17-04-27 02:30 AM, Jiri Pirko wrote:
>> Wed, Apr 26, 2017 at 10:07:08PM CEST, jhs@mojatatu.com wrote:
>> > On 17-04-26 09:56 AM, Jiri Pirko wrote:
>> > > Wed, Apr 26, 2017 at 03:14:38PM CEST, jhs@mojatatu.com wrote:
>
>> > I think to have flags at that level is useful but it
>> > is a different hierarchy level. I am not sure the
>> > "actions dump large messages" is a fit for that level.
>> 
>> Jamal, the idea is to have exactly what you want to have. Only does not
>> use NLA_U32 attr for that but a special attr NLA_FLAGS which would have
>> well defined semantics and set of helpers to work with and enforce it.
>> 
>> Then, this could be easily reused in other subsystem that uses netlink
>> 
>
>Maybe I am misunderstanding:
>Recall, this is what it looks like with this patchset:
><nlh><subsytem-header>[TCA_ROOT_XXXX]
>
>TCA_ROOT_XXX is very subsystem specific. classifiers, qdiscs and many
>subsystems defined their own semantics for that TLV level. This specific
>"dump max" is very very specific to actions. They were crippled by the
>fact you could only send 32 at a time - this allows more to be sent.
>
>I thought initially you meant:
><nlh>[NLA_XXX]<subsytem-header>[TCA_ROOT_XXXX]
>
>I think at the NLA_XXX you could fit netlink wide TLVs - but if i said
>"do a large dump" it is of no use to any other subsystem.

Okay, I'm sorry, I had couple of beers yesterday so that might be
the cause why your msg makes me totally confused :O

All I suggest is to replace NLA_U32 flags you want that does not
have any semantics with NLA_FLAGS flags, which eventually will carry
the exact same u32, but with predefined semantics, helpers, everything.

^ permalink raw reply

* Re: [patch net-next 00/10] net: sched: introduce multichain support for filters
From: Jiri Pirko @ 2017-04-28  6:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	David Ahern, Eric Dumazet, Stephen Hemminger, Daniel Borkmann,
	Alexander Duyck, mlxsw, Simon Horman
In-Reply-To: <CAM_iQpXUs1301-OR1_bsO-0bU8sSBVo2BWRHy4qNhzdHxvJWaQ@mail.gmail.com>

Thu, Apr 27, 2017 at 07:46:03PM CEST, xiyou.wangcong@gmail.com wrote:
>On Thu, Apr 27, 2017 at 4:12 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Simple example:
>> $ tc qdisc add dev eth0 ingress
>> $ tc filter add dev eth0 parent ffff: protocol ip pref 33 flower dst_mac 52:54:00:3d:c7:6d action goto chain 11
>> $ tc filter add dev eth0 parent ffff: protocol ip pref 22 chain 11 flower dst_ip 192.168.40.1 action drop
>> $ tc filter show dev eth0 root
>
>Interesting.
>
>I don't look into the code yet. If I understand the concepts correctly,
>so with your patchset we can mark either filter with a chain No. to
>choose which chain it belongs to _logically_ even though
>_physically_ it is still in the old-fashion chain (prio, proto)?

You have to see the code :)

There are physically multiple chains


>
>If so, you have to ensure proto is same since the protocol of
>the packet does not change dynamically? And the original
>priority becomes pointless with chains since we can just to
>any other chain in any order?
>
>By default, without any chain No., you use 0 for all the chains,
>so the old-fashion chain still works.

Yes.

>
>Thanks.

^ permalink raw reply

* Re: [patch net-next 10/10] net: sched: extend gact to allow jumping to another filter chain
From: Jiri Pirko @ 2017-04-28  6:52 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, xiyou.wangcong, dsa, edumazet, stephen, daniel,
	alexander.h.duyck, mlxsw, simon.horman
In-Reply-To: <bedf1246-4900-ebba-cf78-936989029db9@mojatatu.com>

Fri, Apr 28, 2017 at 03:41:08AM CEST, jhs@mojatatu.com wrote:
>
>Jiri,
>
>Good stuff!
>Thanks for the effort.
>
>I didnt review the details - will do. I wanted to raise one issue.
>This should work for all actions, not just gact (refer to the
>recent commit i made on the action jumping).
>
>Example policy for policer:
>
>#if packets destined for mac address 52:54:00:3d:c7:6d
>#exceed 90kbps with burst of 90K then jump to chain 11
>#for further classification, otherwise set their skb mark to 11
># and proceed.
>
>tc filter add dev eth0 parent ffff: protocol ip pref 33 \
>flower dst_mac 52:54:00:3d:c7:6d \
>action police rate 1kbit burst 90k conform-exceed pipe/goto chain 11 \
>action skbedit mark 11
>
>But i should also be able to do this for any other action, etc.
>
>For this to work, you have to be able to encode the action in the
>opcode. Something like (for 2^16 chains):
>
>#define TC_ACT_GOTO_CHAIN	0x20000000
>#define TCA_ACT_MAX_CHAIN_MASK 0xFFFF
>
>So 0x20000001 is encoding of chain 1 etc.
>
>I will post the iproute2 code i used for jumping of actions.

You can have multiple actions in list and gact goto as the last one. Why
to do this ugliness?


>
>cheers,
>jamal
>
>On 17-04-27 07:12 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Introduce new type of gact action called "goto_chain". This allows
>> user to specify a chain to be processed. This action type is
>> then processed as a return value in tcf_classify loop in similar
>> way as "reclassify" is, only it does not reset to the first filter
>> in chain but rather reset to the first filter of the desired chain.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/sch_generic.h           |  9 +++++--
>>  include/net/tc_act/tc_gact.h        |  2 ++
>>  include/uapi/linux/pkt_cls.h        |  1 +
>>  include/uapi/linux/tc_act/tc_gact.h |  1 +
>>  net/sched/act_gact.c                | 48 ++++++++++++++++++++++++++++++++++++-
>>  net/sched/cls_api.c                 |  8 +++++--
>>  6 files changed, 64 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 569b565..3688501 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -193,8 +193,13 @@ struct Qdisc_ops {
>> 
>> 
>>  struct tcf_result {
>> -	unsigned long	class;
>> -	u32		classid;
>> +	union {
>> +		struct {
>> +			unsigned long	class;
>> +			u32		classid;
>> +		};
>> +		const struct tcf_proto *goto_tp;
>> +	};
>>  };
>> 
>>  struct tcf_proto_ops {
>> diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
>> index b6f1739..58bee54 100644
>> --- a/include/net/tc_act/tc_gact.h
>> +++ b/include/net/tc_act/tc_gact.h
>> @@ -12,6 +12,8 @@ struct tcf_gact {
>>  	int			tcfg_paction;
>>  	atomic_t		packets;
>>  #endif
>> +	struct tcf_chain	*goto_chain;
>> +	struct rcu_head		rcu;
>>  };
>>  #define to_gact(a) ((struct tcf_gact *)a)
>> 
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index f1129e3..e03ba27 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -37,6 +37,7 @@ enum {
>>  #define TC_ACT_QUEUED		5
>>  #define TC_ACT_REPEAT		6
>>  #define TC_ACT_REDIRECT		7
>> +#define TC_ACT_GOTO_CHAIN	8
>>  #define TC_ACT_JUMP		0x10000000
>> 
>>  /* Action type identifiers*/
>> diff --git a/include/uapi/linux/tc_act/tc_gact.h b/include/uapi/linux/tc_act/tc_gact.h
>> index 70b536a..388733d 100644
>> --- a/include/uapi/linux/tc_act/tc_gact.h
>> +++ b/include/uapi/linux/tc_act/tc_gact.h
>> @@ -26,6 +26,7 @@ enum {
>>  	TCA_GACT_PARMS,
>>  	TCA_GACT_PROB,
>>  	TCA_GACT_PAD,
>> +	TCA_GACT_CHAIN,
>>  	__TCA_GACT_MAX
>>  };
>>  #define TCA_GACT_MAX (__TCA_GACT_MAX - 1)
>> diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
>> index c527c11..d63aebd 100644
>> --- a/net/sched/act_gact.c
>> +++ b/net/sched/act_gact.c
>> @@ -20,6 +20,7 @@
>>  #include <linux/init.h>
>>  #include <net/netlink.h>
>>  #include <net/pkt_sched.h>
>> +#include <net/pkt_cls.h>
>>  #include <linux/tc_act/tc_gact.h>
>>  #include <net/tc_act/tc_gact.h>
>> 
>> @@ -54,6 +55,7 @@ static g_rand gact_rand[MAX_RAND] = { NULL, gact_net_rand, gact_determ };
>>  static const struct nla_policy gact_policy[TCA_GACT_MAX + 1] = {
>>  	[TCA_GACT_PARMS]	= { .len = sizeof(struct tc_gact) },
>>  	[TCA_GACT_PROB]		= { .len = sizeof(struct tc_gact_p) },
>> +	[TCA_GACT_CHAIN]	= { .type = NLA_U32 },
>>  };
>> 
>>  static int tcf_gact_init(struct net *net, struct tcf_proto *tp,
>> @@ -92,6 +94,9 @@ static int tcf_gact_init(struct net *net, struct tcf_proto *tp,
>>  	}
>>  #endif
>> 
>> +	if (parm->action == TC_ACT_GOTO_CHAIN && !tb[TCA_GACT_CHAIN])
>> +		return -EINVAL;
>> +
>>  	if (!tcf_hash_check(tn, parm->index, a, bind)) {
>>  		ret = tcf_hash_create(tn, parm->index, est, a,
>>  				      &act_gact_ops, bind, true);
>> @@ -121,11 +126,43 @@ static int tcf_gact_init(struct net *net, struct tcf_proto *tp,
>>  		gact->tcfg_ptype   = p_parm->ptype;
>>  	}
>>  #endif
>> +
>> +	if (gact->tcf_action == TC_ACT_GOTO_CHAIN) {
>> +		u32 chain_index = nla_get_u32(tb[TCA_GACT_CHAIN]);
>> +
>> +		if (!tp) {
>> +			if (ret == ACT_P_CREATED)
>> +				tcf_hash_release(*a, bind);
>> +			return -EINVAL;
>> +		}
>> +		gact->goto_chain = tcf_chain_get(tp->chain->block, chain_index);
>> +		if (!gact->goto_chain) {
>> +			if (ret == ACT_P_CREATED)
>> +				tcf_hash_release(*a, bind);
>> +			return -ENOMEM;
>> +		}
>> +	}
>> +
>>  	if (ret == ACT_P_CREATED)
>>  		tcf_hash_insert(tn, *a);
>>  	return ret;
>>  }
>> 
>> +static void tcf_gact_cleanup_rcu(struct rcu_head *rcu)
>> +{
>> +	struct tcf_gact *gact = container_of(rcu, struct tcf_gact, rcu);
>> +
>> +	if (gact->tcf_action == TC_ACT_GOTO_CHAIN)
>> +		tcf_chain_put(gact->goto_chain);
>> +}
>> +
>> +static void tcf_gact_cleanup(struct tc_action *a, int bind)
>> +{
>> +	struct tcf_gact *gact = to_gact(a);
>> +
>> +	call_rcu(&gact->rcu, tcf_gact_cleanup_rcu);
>> +}
>> +
>>  static int tcf_gact(struct sk_buff *skb, const struct tc_action *a,
>>  		    struct tcf_result *res)
>>  {
>> @@ -141,8 +178,13 @@ static int tcf_gact(struct sk_buff *skb, const struct tc_action *a,
>>  	}
>>  #endif
>>  	bstats_cpu_update(this_cpu_ptr(gact->common.cpu_bstats), skb);
>> -	if (action == TC_ACT_SHOT)
>> +	if (action == TC_ACT_SHOT) {
>>  		qstats_drop_inc(this_cpu_ptr(gact->common.cpu_qstats));
>> +	} else if (action == TC_ACT_GOTO_CHAIN) {
>> +		struct tcf_chain *chain = gact->goto_chain;
>> +
>> +		res->goto_tp = rcu_dereference_bh(chain->filter_chain);
>> +	}
>> 
>>  	tcf_lastuse_update(&gact->tcf_tm);
>> 
>> @@ -194,6 +236,9 @@ static int tcf_gact_dump(struct sk_buff *skb, struct tc_action *a,
>>  	tcf_tm_dump(&t, &gact->tcf_tm);
>>  	if (nla_put_64bit(skb, TCA_GACT_TM, sizeof(t), &t, TCA_GACT_PAD))
>>  		goto nla_put_failure;
>> +	if (gact->tcf_action == TC_ACT_GOTO_CHAIN &&
>> +	    nla_put_u32(skb, TCA_GACT_CHAIN, gact->goto_chain->index))
>> +		goto nla_put_failure;
>>  	return skb->len;
>> 
>>  nla_put_failure:
>> @@ -225,6 +270,7 @@ static struct tc_action_ops act_gact_ops = {
>>  	.stats_update	=	tcf_gact_stats_update,
>>  	.dump		=	tcf_gact_dump,
>>  	.init		=	tcf_gact_init,
>> +	.cleanup	=	tcf_gact_cleanup,
>>  	.walk		=	tcf_gact_walker,
>>  	.lookup		=	tcf_gact_search,
>>  	.size		=	sizeof(struct tcf_gact),
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index dbc1348..a2d6bc7 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -304,10 +304,14 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>>  			continue;
>> 
>>  		err = tp->classify(skb, tp, res);
>> -		if (unlikely(err == TC_ACT_RECLASSIFY && !compat_mode))
>> +		if (err == TC_ACT_RECLASSIFY && !compat_mode) {
>>  			goto reset;
>> -		if (err >= 0)
>> +		} else if (err == TC_ACT_GOTO_CHAIN) {
>> +			old_tp = res->goto_tp;
>> +			goto reset;
>> +		} else if (err >= 0) {
>>  			return err;
>> +		}
>>  	}
>> 
>>  	return TC_ACT_UNSPEC; /* signal: continue lookup */
>> 
>

^ permalink raw reply

* Re: [PATCH v2 07/21] crypto: shash, caam: Make use of the new sg_map helper function
From: Herbert Xu @ 2017-04-28  6:30 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	David S. Miller
In-Reply-To: <94123cbf-3287-f05e-7267-0bcf08ab0a8b-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

On Thu, Apr 27, 2017 at 09:45:57AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 26/04/17 09:56 PM, Herbert Xu wrote:
> > On Tue, Apr 25, 2017 at 12:20:54PM -0600, Logan Gunthorpe wrote:
> >> Very straightforward conversion to the new function in the caam driver
> >> and shash library.
> >>
> >> Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
> >> Cc: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
> >> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> >> ---
> >>  crypto/shash.c                | 9 ++++++---
> >>  drivers/crypto/caam/caamalg.c | 8 +++-----
> >>  2 files changed, 9 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/crypto/shash.c b/crypto/shash.c
> >> index 5e31c8d..5914881 100644
> >> --- a/crypto/shash.c
> >> +++ b/crypto/shash.c
> >> @@ -283,10 +283,13 @@ int shash_ahash_digest(struct ahash_request *req, struct shash_desc *desc)
> >>  	if (nbytes < min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset)) {
> >>  		void *data;
> >>  
> >> -		data = kmap_atomic(sg_page(sg));
> >> -		err = crypto_shash_digest(desc, data + offset, nbytes,
> >> +		data = sg_map(sg, 0, SG_KMAP_ATOMIC);
> >> +		if (IS_ERR(data))
> >> +			return PTR_ERR(data);
> >> +
> >> +		err = crypto_shash_digest(desc, data, nbytes,
> >>  					  req->result);
> >> -		kunmap_atomic(data);
> >> +		sg_unmap(sg, data, 0, SG_KMAP_ATOMIC);
> >>  		crypto_yield(desc->flags);
> >>  	} else
> >>  		err = crypto_shash_init(desc) ?:
> > 
> > Nack.  This is an optimisation for the special case of a single
> > SG list entry.  In fact in the common case the kmap_atomic should
> > disappear altogether in the no-highmem case.  So replacing it
> > with sg_map is not acceptable.
> 
> What you seem to have missed is that sg_map is just a thin wrapper
> around kmap_atomic. Perhaps with a future check for a mappable page.
> This change should have zero impact on performance.

You are right.  Indeed the existing code looks buggy as they
don't take sg->offset into account when doing the kmap.  Could
you send me some patches that fix these problems first so that
they can be easily backported?

Thanks,
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH net-next v5 1/2] net: hns: support deferred probe when can not obtain irq
From: Yankejian @ 2017-04-28  6:49 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493362187-51671-1-git-send-email-yankejian@huawei.com>

From: lipeng <lipeng321@huawei.com>

In the hip06 and hip07 SoCs, the interrupt lines from the
DSAF controllers are connected to mbigen hw module.
The mbigen module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
---
change log:
V4 -> V5:
1. Float on net-next;

V3 -> V4:
1. Delete redundant commit message;
2. add Reviewed-by: Matthias Brugger <mbrugger@suse.com>;

V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++++++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index eba406b..93e71e2 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
 
 		hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
 
-		hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+		ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+		if (ret)
+			goto get_cfg_fail;
 	}
 
 	for (i = 0; i < HNS_PPE_COM_NUM; i++)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index c20a0f4..e2e2853 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -492,7 +492,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb *rcb_common)
  *hns_rcb_get_cfg - get rcb config
  *@rcb_common: rcb common device
  */
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 {
 	struct ring_pair_cb *ring_pair_cb;
 	u32 i;
@@ -517,10 +517,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 		ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
 		is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) :
 			  platform_get_irq(pdev, base_irq_idx + i * 3);
+		if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) ||
+		    (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER))
+			return -EPROBE_DEFER;
+
 		ring_pair_cb->q.phy_base =
 			RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
 		hns_rcb_ring_pair_get_cfg(ring_pair_cb);
 	}
+
+	return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
index a664ee8..6028164 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
@@ -121,7 +121,7 @@ struct rcb_common_cb {
 void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index);
 int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common);
 void hns_rcb_start(struct hnae_queue *q, u32 val);
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
 void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode,
 			    u16 *max_vfn, u16 *max_q_per_vf);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next v5 2/2] net: hns: support deferred probe when no mdio
From: Yankejian @ 2017-04-28  6:49 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493362187-51671-1-git-send-email-yankejian@huawei.com>

From: lipeng <lipeng321@huawei.com>

In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio
module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

We check for probe deferral in the mac init, so we not init DSAF
when there is no mdio, and free all resource, to later learn that
we need to defer the probe.

Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
---
change log:
V4 -> V5:
1. Float on net-next;

V1 -> V2:
1. Return appropriate errno in hns_mac_register_phy;
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++++++++++++++++++----
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index 0c1f56e..8b5cdf4 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -696,6 +696,8 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
 	rc = phy_device_register(phy);
 	if (rc) {
 		phy_device_free(phy);
+		dev_err(&mdio->dev, "registered phy fail at address %i\n",
+			addr);
 		return -ENODEV;
 	}
 
@@ -706,7 +708,7 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
 	return 0;
 }
 
-static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
+static int hns_mac_register_phy(struct hns_mac_cb *mac_cb)
 {
 	struct acpi_reference_args args;
 	struct platform_device *pdev;
@@ -716,24 +718,39 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
 
 	/* Loop over the child nodes and register a phy_device for each one */
 	if (!to_acpi_device_node(mac_cb->fw_port))
-		return;
+		return -ENODEV;
 
 	rc = acpi_node_get_property_reference(
 			mac_cb->fw_port, "mdio-node", 0, &args);
 	if (rc)
-		return;
+		return rc;
 
 	addr = hns_mac_phy_parse_addr(mac_cb->dev, mac_cb->fw_port);
 	if (addr < 0)
-		return;
+		return addr;
 
 	/* dev address in adev */
 	pdev = hns_dsaf_find_platform_device(acpi_fwnode_handle(args.adev));
+	if (!pdev) {
+		dev_err(mac_cb->dev, "mac%d mdio pdev is NULL\n",
+			mac_cb->mac_id);
+		return  -EINVAL;
+	}
+
 	mii_bus = platform_get_drvdata(pdev);
+	if (!mii_bus) {
+		dev_err(mac_cb->dev,
+			"mac%d mdio is NULL, dsaf will probe again later\n",
+			mac_cb->mac_id);
+		return -EPROBE_DEFER;
+	}
+
 	rc = hns_mac_register_phydev(mii_bus, mac_cb, addr);
 	if (!rc)
 		dev_dbg(mac_cb->dev, "mac%d register phy addr:%d\n",
 			mac_cb->mac_id, addr);
+
+	return rc;
 }
 
 #define MAC_MEDIA_TYPE_MAX_LEN		16
@@ -754,7 +771,7 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
  *@np:device node
  * return: 0 --success, negative --fail
  */
-static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
+static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
 {
 	struct device_node *np;
 	struct regmap *syscon;
@@ -864,7 +881,15 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
 			}
 		}
 	} else if (is_acpi_node(mac_cb->fw_port)) {
-		hns_mac_register_phy(mac_cb);
+		ret = hns_mac_register_phy(mac_cb);
+		/*
+		 * Mac can work well if there is phy or not.If the port don't
+		 * connect with phy, the return value will be ignored. Only
+		 * when there is phy but can't find mdio bus, the return value
+		 * will be handled.
+		 */
+		if (ret == -EPROBE_DEFER)
+			return ret;
 	} else {
 		dev_err(mac_cb->dev, "mac%d cannot find phy node\n",
 			mac_cb->mac_id);
@@ -1026,6 +1051,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
 			dsaf_dev->mac_cb[port_id] = mac_cb;
 		}
 	}
+
 	/* init mac_cb for all port */
 	for (port_id = 0; port_id < max_port_num; port_id++) {
 		mac_cb = dsaf_dev->mac_cb[port_id];
@@ -1035,6 +1061,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
 		ret = hns_mac_get_cfg(dsaf_dev, mac_cb);
 		if (ret)
 			return ret;
+
 		ret = hns_mac_init_ex(mac_cb);
 		if (ret)
 			return ret;
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next v5 0/2] net: hns: bug fix for HNS driver
From: Yankejian @ 2017-04-28  6:49 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm

From: lipeng <lipeng321@huawei.com>

This patchset add support defered dsaf probe when mdio and
mbigen module is not insmod.

For more details, please refer to individual patch.

change log:
V4 - > V5:
1. Float on net-next;
2. Delete patch "net: hns: fixed bug that skb used after kfree"
   from this patchset;

V3 -> V4:
1. Delete redundant commit message;
2. Add Reviewed-by: Matthias Brugger <mbrugger@suse.com>;

V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;

V1 -> V2:
1. Return appropriate errno in hns_mac_register_phy;

lipeng (2):
  net: hns: support deferred probe when can not obtain irq
  net: hns: support deferred probe when no mdio

 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++++++++++++++++++----
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |  4 ++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |  8 ++++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h |  2 +-
 4 files changed, 44 insertions(+), 9 deletions(-)

-- 
1.9.1

^ permalink raw reply

* Re: Strange samples/bpf loading error for maps on net-next?
From: Jesper Dangaard Brouer @ 2017-04-28  6:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, netdev@vger.kernel.org, eric,
	brouer
In-Reply-To: <20170428054950.teh53ssg6ei4xkvx@ast-mbp>

On Thu, 27 Apr 2017 22:49:51 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Apr 27, 2017 at 01:15:42PM +0200, Jesper Dangaard Brouer wrote:
> > 
> > To provoke this bug, remember that you MUST call:
> > 
> >  make headers_install
> > 
> > In the kernels root directory, else you will be compiling samples/bpf/
> > against the older headers previously installed.
> > 
> > The error looks like:
> > 
> >  $ sudo ./sockex1
> >  bpf_load_program() err=22
> >  fd 0 is not pointing to valid bpf_map
> >  sockex1: [...]/samples/bpf/sockex1_user.c:26: main: Assertion `setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, sizeof(prog_fd[0])) == 0' failed.
> >  Aborted
> > 
> > I've found that the bug were introduced in
> >  commit: fb30d4b71214 ("bpf: Add tests for map-in-map")  
> 
> Great debugging!
> Indeed that change made samples/bpf/bpf_load.c to be incompatible with .o
> generated earlier. We should really get rid of that loader and
> switch to tools/lib/bpf/. I believe Eric Leblond already made it
> resilient to 'struct bpf_map_def' changes.

Yes, exactly it is problem in samples/bpf/bpf_load.c.  As it assumes
the contents of the ELF file maps section will always chunks in
sizeof(struct bpf_map_def) and just uses that directly as a pointer to
an array of type struct bpf_map_def, which of-cause silently blows up
when changing struct bpf_map_def.  That cost me many hours to discover
that yesterday.

I started implementing more correct parsing of the ELF maps section, it
is doable, but as you say, maybe we should just get rid of this loader?
I will at least fixup bpf_load.c and perhaps just abort the program the
program if I detect a difference between the ELF size and struct size.
And send this as a patch later today...

I've also looked at the loaded Daniel implemented[1] in iproute2, and
it is much cleaner.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/tree/lib/bpf.c
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [REGRESSION next-20170426] Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices") causes oops in mvneta
From: Ralph Sennhauser @ 2017-04-28  6:19 UTC (permalink / raw)
  To: Sricharan R
  Cc: Rafael J. Wysocki, Joerg Roedel, Bjorn Helgaas, linux-acpi,
	linux-kernel, linux-pci, Thomas Petazzoni, netdev
In-Reply-To: <524b7fc8-1eca-a7d8-7bc7-6743be17c208@codeaurora.org>

On Fri, 28 Apr 2017 11:13:33 +0530
Sricharan R <sricharan@codeaurora.org> wrote:

> Hi Ralph,
> 
> On 4/27/2017 8:10 PM, Ralph Sennhauser wrote:
> > On Thu, 27 Apr 2017 19:05:09 +0530
> > Sricharan R <sricharan@codeaurora.org> wrote:
> >   
> >> Hi,
> >>
> >> On 4/26/2017 9:45 PM, Ralph Sennhauser wrote:  
> >>> Hi Sricharan R,
> >>>
> >>> Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe
> >>> time for platform/amba/pci bus devices") causes a kernel panic as
> >>> in the log below on an armada-385. Reverting the commit fixes the
> >>> issue.
> >>>
> >>> Regards
> >>> Ralph    
> >>
> >> Somehow not getting a obvious clue on whats going wrong with the
> >> logs below. From the log and looking in to dts, the drivers seems
> >> to the one for "marvell,armada-370-neta".  
> > 
> > Correct.
> >   
> >> Issue looks the data from the dma
> >> has gone bad and subsequently referring the wrong data has resulted
> >> in the crash. Looks like the dma_masks is the one going wrong.
> >> Can i get some logs from mvneta_probe, about dev->dma_mask,
> >> dev->coherent_dma_mask and dev->dma_ops with and without the patch
> >> to see whats the difference ?  
> > 
> > Not sure I understood what exactly you are after. Might be faster to
> > just send me a patch with all debug print statements you like to
> > see. 
> 
> Attached the patch with debug prints.
> 
> Regards,
>  Sricharan
> 

Hi Sricharan

With commit 09515ef5ddad

[    1.288962] mvneta f1070000.ethernet: dev->dma_mask 0xffffffff
[    1.294827] mvneta f1070000.ethernet: dev->coherent_dma_mask 0xffffffff
[    1.301472] mvneta f1070000.ethernet: dev->dma_ops 0x40b00c0601460

[    1.322047] mvneta f1034000.ethernet: dev->dma_mask 0xffffffff
[    1.327904] mvneta f1034000.ethernet: dev->coherent_dma_mask 0xffffffff
[    1.334549] mvneta f1034000.ethernet: dev->dma_ops 0x40b00c0601460


With the patch reverted, the build that works

[    1.289001] mvneta f1070000.ethernet: dev->dma_mask 0xffffffff
[    1.294866] mvneta f1070000.ethernet: dev->coherent_dma_mask 0xffffffff
[    1.301511] mvneta f1070000.ethernet: dev->dma_ops 0x40b00c06014a8

[    1.317005] mvneta f1034000.ethernet: dev->dma_mask 0xffffffff
[    1.322867] mvneta f1034000.ethernet: dev->coherent_dma_mask 0xffffffff
[    1.329508] mvneta f1034000.ethernet: dev->dma_ops 0x40b00c06014a8


Regards
Ralph

^ permalink raw reply

* [PATCH net-next] rhashtable: Do not lower max_elems when max_size is zero
From: Herbert Xu @ 2017-04-28  6:10 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem, fw, tgraf
In-Reply-To: <20170427223024.32657-1-f.fainelli@gmail.com>

On Thu, Apr 27, 2017 at 03:30:24PM -0700, Florian Fainelli wrote:
> After commit 6d684e54690c ("rhashtable: Cap total number of
> entries to 2^31"), we would be hitting a panic() in net/core/rtnetlink.c
> during initialization. The call stack would look like this:

Thanks for the patch.  I think we could just fold it into the
previous if clause, like this:

---8<---
The commit 6d684e54690c ("rhashtable: Cap total number of entries
to 2^31") breaks rhashtable users that do not set max_size.  This
is because when max_size is zero max_elems is also incorrectly set
to zero instead of 2^31.

This patch fixes it by only lowering max_elems when max_size is not
zero.

Fixes: 6d684e54690c ("rhashtable: Cap total number of entries to 2^31")
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 751630b..3895486 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -958,13 +958,14 @@ int rhashtable_init(struct rhashtable *ht,
 	if (params->min_size)
 		ht->p.min_size = roundup_pow_of_two(params->min_size);
 
-	if (params->max_size)
-		ht->p.max_size = rounddown_pow_of_two(params->max_size);
-
 	/* Cap total entries at 2^31 to avoid nelems overflow. */
 	ht->max_elems = 1u << 31;
-	if (ht->p.max_size < ht->max_elems / 2)
-		ht->max_elems = ht->p.max_size * 2;
+
+	if (params->max_size) {
+		ht->p.max_size = rounddown_pow_of_two(params->max_size);
+		if (ht->p.max_size < ht->max_elems / 2)
+			ht->max_elems = ht->p.max_size * 2;
+	}
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox