Netdev List
 help / color / mirror / Atom feed
* [Patch net v2] net_sched: let qdisc_put() accept NULL pointer
From: Cong Wang @ 2019-09-12 17:22 UTC (permalink / raw)
  To: netdev
  Cc: Cong Wang, syzbot+d5870a903591faaca4ae, Linus Torvalds,
	Jamal Hadi Salim, Jiri Pirko

When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL
pointer which leads to a crash in sfb_destroy(). Similar for
sch_dsmark.

Instead of fixing each separately, Linus suggested to just accept
NULL pointer in qdisc_put(), which would make callers easier.

(For sch_dsmark, the bug probably exists long before commit
6529eaba33f0.)

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Reported-by: syzbot+d5870a903591faaca4ae@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/sch_generic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ac28f6a5d70e..17bd8f539bc7 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -985,6 +985,9 @@ static void qdisc_destroy(struct Qdisc *qdisc)
 
 void qdisc_put(struct Qdisc *qdisc)
 {
+	if (!qdisc)
+		return;
+
 	if (qdisc->flags & TCQ_F_BUILTIN ||
 	    !refcount_dec_and_test(&qdisc->refcnt))
 		return;
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v4 2/2] PTP: add support for one-shot output
From: Richard Cochran @ 2019-09-12 16:56 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: Christopher S Hall, netdev, linux-kernel, David Miller
In-Reply-To: <20190911061622.774006-2-felipe.balbi@linux.intel.com>

On Wed, Sep 11, 2019 at 09:16:22AM +0300, Felipe Balbi wrote:
> Some controllers allow for a one-shot output pulse, in contrast to
> periodic output. Now that we have extensible versions of our IOCTLs, we
> can finally make use of the 'flags' field to pass a bit telling driver
> that if we want one-shot pulse output.
> 
> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
> ---
> 
> Changes since v3:
> 	- Remove bogus bitwise negation
> 
> Changes since v2:
> 	- Add _PEROUT_ to bit macro
> 
> Changes since v1:
> 	- remove comment from .flags field

Reviewed-by: Richard Cochran <richardcochran@gmail.com>

@davem, these two are good to go!

Thanks,
Richard

^ permalink raw reply

* [PATCH] net/mlx5: Remove unneeded variable in mlx5_unload_one
From: zhong jiang @ 2019-09-12 16:59 UTC (permalink / raw)
  To: davem, saeedm; +Cc: netdev, linux-kernel

mlx5_unload_one do not need local variable to store different value,
Hence just remove it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 9648c22..c39bb37 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1228,8 +1228,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, bool boot)
 
 static int mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup)
 {
-	int err = 0;
-
 	if (cleanup) {
 		mlx5_unregister_device(dev);
 		mlx5_drain_health_wq(dev);
@@ -1257,7 +1255,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup)
 	mlx5_function_teardown(dev, cleanup);
 out:
 	mutex_unlock(&dev->intf_state_mutex);
-	return err;
+	return 0;
 }
 
 static int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
-- 
1.7.12.4


^ permalink raw reply related

* Re: [PATCH v4 2/2] PTP: add support for one-shot output
From: David Miller @ 2019-09-12 17:01 UTC (permalink / raw)
  To: richardcochran; +Cc: felipe.balbi, christopher.s.hall, netdev, linux-kernel
In-Reply-To: <20190912165609.GA1439@localhost>

From: Richard Cochran <richardcochran@gmail.com>
Date: Thu, 12 Sep 2019 09:56:09 -0700

> On Wed, Sep 11, 2019 at 09:16:22AM +0300, Felipe Balbi wrote:
>> Some controllers allow for a one-shot output pulse, in contrast to
>> periodic output. Now that we have extensible versions of our IOCTLs, we
>> can finally make use of the 'flags' field to pass a bit telling driver
>> that if we want one-shot pulse output.
>> 
>> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
>> ---
>> 
>> Changes since v3:
>> 	- Remove bogus bitwise negation
>> 
>> Changes since v2:
>> 	- Add _PEROUT_ to bit macro
>> 
>> Changes since v1:
>> 	- remove comment from .flags field
> 
> Reviewed-by: Richard Cochran <richardcochran@gmail.com>
> 
> @davem, these two are good to go!

Ok, thanks for reviewing.

^ permalink raw reply

* Re: [Patch net] sch_sfb: fix a crash in sfb_destroy()
From: Cong Wang @ 2019-09-12 16:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, Linux Kernel Network Developers, syzbot,
	Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <CAHk-=whO37+O-mohvMODnD57ppCsK3Bcv8oHzSBvmwJbsT54cA@mail.gmail.com>

On Thu, Sep 12, 2019 at 3:31 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Sep 12, 2019 at 2:10 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Wed, Sep 11, 2019 at 2:36 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > It seems a similar fix would be needed in net/sched/sch_dsmark.c ?
> > >
> >
> > Yeah, or just add a NULL check in dsmark_destroy().
>
> Well, this was why one of my suggestions was to just make
> "qdisc_put()" be happy with a NULL pointer (or even an ERR_PTR()).
>
> That would have fixed not just sfb, but also dsmark with a single patch.

Sure, I don't have any preference here, just want to find a minimum
fix for -stable.

I will send v2.

Thanks.

^ permalink raw reply

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Robert Beckett @ 2019-09-12 16:46 UTC (permalink / raw)
  To: Florian Fainelli, Ido Schimmel
  Cc: netdev@vger.kernel.org, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jiri Pirko, bob.beckett
In-Reply-To: <68676250-17df-b0bb-521a-64877f198647@gmail.com>

On Thu, 2019-09-12 at 09:25 -0700, Florian Fainelli wrote:
> On 9/12/19 2:03 AM, Ido Schimmel wrote:
> > On Wed, Sep 11, 2019 at 12:49:03PM +0100, Robert Beckett wrote:
> > > On Wed, 2019-09-11 at 11:21 +0000, Ido Schimmel wrote:
> > > > On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli
> > > > wrote:
> > > > > +Ido, Jiri,
> > > > > 
> > > > > On 9/10/19 8:41 AM, Robert Beckett wrote:
> > > > > > This patch-set adds support for some features of the
> > > > > > Marvell
> > > > > > switch
> > > > > > chips that can be used to handle packet storms.
> > > > > > 
> > > > > > The rationale for this was a setup that requires the
> > > > > > ability to
> > > > > > receive
> > > > > > traffic from one port, while a packet storm is occuring on
> > > > > > another port
> > > > > > (via an external switch with a deliberate loop). This is
> > > > > > needed
> > > > > > to
> > > > > > ensure vital data delivery from a specific port, while
> > > > > > mitigating
> > > > > > any
> > > > > > loops or DoS that a user may introduce on another port
> > > > > > (can't
> > > > > > guarantee
> > > > > > sensible users).
> > > > > 
> > > > > The use case is reasonable, but the implementation is not
> > > > > really.
> > > > > You
> > > > > are using Device Tree which is meant to describe hardware as
> > > > > a
> > > > > policy
> > > > > holder for setting up queue priorities and likewise for queue
> > > > > scheduling.
> > > > > 
> > > > > The tool that should be used for that purpose is tc and
> > > > > possibly an
> > > > > appropriately offloaded queue scheduler in order to map the
> > > > > desired
> > > > > scheduling class to what the hardware supports.
> > > > > 
> > > > > Jiri, Ido, how do you guys support this with mlxsw?
> > > > 
> > > > Hi Florian,
> > > > 
> > > > Are you referring to policing traffic towards the CPU using a
> > > > policer
> > > > on
> > > > the egress of the CPU port? At least that's what I understand
> > > > from
> > > > the
> > > > description of patch 6 below.
> > > > 
> > > > If so, mlxsw sets policers for different traffic types during
> > > > its
> > > > initialization sequence. These policers are not exposed to the
> > > > user
> > > > nor
> > > > configurable. While the default settings are good for most
> > > > users, we
> > > > do
> > > > want to allow users to change these and expose current
> > > > settings.
> > > > 
> > > > I agree that tc seems like the right choice, but the question
> > > > is
> > > > where
> > > > are we going to install the filters?
> > > > 
> > > 
> > > Before I go too far down the rabbit hole of tc traffic shaping,
> > > maybe
> > > it would be good to explain in more detail the problem I am
> > > trying to
> > > solve.
> > > 
> > > We have a setup as follows:
> > > 
> > > Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port
> > > 1
> > > (P1) is critical priority, no dropped packets allowed, all others
> > > can
> > > be best effort.
> > > 
> > > CPU port of swtich chip is connected via phy to phy of intel i210
> > > (igb
> > > driver).
> > > 
> > > i210 is connected via pcie switch to imx6.
> > > 
> > > When too many small packets attempt to be delivered to CPU port
> > > (e.g.
> > > during broadcast flood) we saw dropped packets.
> > > 
> > > The packets were being received by i210 in to rx descriptor
> > > buffer
> > > fine, but the CPU could not keep up with the load. We saw
> > > rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
> > > 
> > > 
> > > With this in mind, I am wondering whether any amount of tc
> > > traffic
> > > shaping would help? Would tc shaping require that the packet
> > > reception
> > > manages to keep up before it can enact its policies? Does the
> > > infrastructure have accelerator offload hooks to be able to apply
> > > it
> > > via HW? I dont see how it would be able to inspect the packets to
> > > apply
> > > filtering if they were dropped due to rx descriptor exhaustion.
> > > (please
> > > bear with me with the basic questions, I am not familiar with
> > > this part
> > > of the stack).
> > > 
> > > Assuming that tc is still the way to go, after a brief look in to
> > > the
> > > man pages and the documentation at largc.org, it seems like it
> > > would
> > > need to use the ingress qdisc, with some sort of system to
> > > segregate
> > > and priortise based on ingress port. Is this possible?
> > 
> > Hi Robert,
> > 
> > As I see it, you have two problems here:
> > 
> > 1. Classification: Based on ingress port in your case
> > 
> > 2. Scheduling: How to schedule between the different transmission
> > queues
> > 
> > Where the port from which the packets should egress is the CPU
> > port,
> > before they cross the PCI towards the imx6.
> > 
> > Both of these issues can be solved by tc. The main problem is that
> > today
> > we do not have a netdev to represent the CPU port and therefore
> > can't
> > use existing infra like tc. I believe we need to create one.
> > Besides
> > scheduling, we can also use it to permit/deny certain traffic from
> > reaching the CPU and perform policing.
> 
> We do not necessarily have to create a CPU netdev, we can overlay
> netdev
> operations onto the DSA master interface (fec in that case), and
> whenever you configure the DSA master interface, we also call back
> into
> the switch side for the CPU port. This is not necessarily the
> cleanest
> way to do things, but that is how we support ethtool operations (and
> some netdev operations incidentally), and it works

After reading up on tc, I am not sure how this would work given the
semantics of the tool currently.

My initial thought was to model the switch's 4 output queues using an
mqprio qdisc for the CPU port, and then use either iptables's classify
module on the input ports to set which queue it egresses from on the
CPU port, or use vlan tagging with id 0 and priority set. (with the
many detail of how to implement them still left to discover).

However, it looks like the mqprio qdisc could only be used for egress,
so without a netdev representing the CPU port, I dont know how it could
be used.

Another thing I thought of using was just to use iptable's TOS module
to set the minimal delay bit and rely on default behaviours, but Ive
yet to find anything in the Marvell manual that indicates it could set
that bit on all frames entering a port.

Another option might be to use vlans with their priority bits being
used to steer to output queues, but I really dont want to introduce
more virtual interfaces in to the setup, and I cant see how to
configure an enforce default vlan tag with id 0 and priority bits set
via linux userland tools.


It does look like tc would be quite nice for configuring the egress
rate limiting assuming we a netdev to target with the rate controls of
the qdisc.


So far, this seems like I am trying to shoe horn this stuff in to tc.
It seems like tc is meant to configure how the ip stack  configures
flow within the stack, whereas in a switch chip, the packets go nowhere
near the CPUs kernel ip stack. I cant help thinking that it would be
good have a specific utility for configuring switches that operates on
the port level for manage flow within the chip, or maybe simple sysfs
attributes to set the ports priority.


^ permalink raw reply

* Re: [PATCH] ixgbe: Fix secpath usage for IPsec TX offload.
From: Jonathan Tooker @ 2019-09-12 16:33 UTC (permalink / raw)
  To: Steffen Klassert, Jeff Kirsher, intel-wired-lan
  Cc: Michael Marley, Shannon Nelson, netdev
In-Reply-To: <20190912110144.GS2879@gauss3.secunet.de>

On 9/12/2019 6:01 AM, Steffen Klassert wrote:
> The ixgbe driver currently does IPsec TX offloading
> based on an existing secpath. However, the secpath
> can also come from the RX side, in this case it is
> misinterpreted for TX offload and the packets are
> dropped with a "bad sa_idx" error. Fix this by using
> the xfrm_offload() function to test for TX offload.
>
Does this patch also need to be ported to the ixgbevf driver? I can 
replicate the bad sa_idx error using a VM that's using a VF & the 
ixgebvf  driver.


^ permalink raw reply

* [PATCH 0/3] wireless: Remove unneeded variable
From: zhong jiang @ 2019-09-12 16:41 UTC (permalink / raw)
  To: kvalo; +Cc: davem, zhongjiang, linux-wireless, netdev, linux-kernel

With the help of Coccinelle, I find some place to use redundant
variable to store the return value, It is unnecessary. Just remove
it and make the funtion to be void.

zhong jiang (3):
  brcmsmac: Remove unneeded variable and make function to be void
  wlegacy: Remove unneeded variable and make function to be void
  libertas: Remove unneeded variable and make function to be void

 drivers/net/wireless/broadcom/brcm80211/brcmsmac/main.c | 5 +----
 drivers/net/wireless/intel/iwlegacy/4965-mac.c          | 8 ++------
 drivers/net/wireless/marvell/libertas/cmd.h             | 2 +-
 drivers/net/wireless/marvell/libertas/cmdresp.c         | 5 +----
 4 files changed, 5 insertions(+), 15 deletions(-)

-- 
1.7.12.4


^ permalink raw reply

* [PATCH 2/3] wlegacy: Remove unneeded variable and make function to be void
From: zhong jiang @ 2019-09-12 16:41 UTC (permalink / raw)
  To: kvalo; +Cc: davem, zhongjiang, linux-wireless, netdev, linux-kernel
In-Reply-To: <1568306492-42998-1-git-send-email-zhongjiang@huawei.com>

il4965_set_tkip_dynamic_key_info  do not need return value to
cope with different ases. And change functon return type to void.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/wireless/intel/iwlegacy/4965-mac.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlegacy/4965-mac.c b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
index ffb705b..a7bbfe2 100644
--- a/drivers/net/wireless/intel/iwlegacy/4965-mac.c
+++ b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
@@ -3326,12 +3326,11 @@ struct il_mod_params il4965_mod_params = {
 	return il_send_add_sta(il, &sta_cmd, CMD_SYNC);
 }
 
-static int
+static void
 il4965_set_tkip_dynamic_key_info(struct il_priv *il,
 				 struct ieee80211_key_conf *keyconf, u8 sta_id)
 {
 	unsigned long flags;
-	int ret = 0;
 	__le16 key_flags = 0;
 
 	key_flags |= (STA_KEY_FLG_TKIP | STA_KEY_FLG_MAP_KEY_MSK);
@@ -3367,8 +3366,6 @@ struct il_mod_params il4965_mod_params = {
 	memcpy(il->stations[sta_id].sta.key.key, keyconf->key, 16);
 
 	spin_unlock_irqrestore(&il->sta_lock, flags);
-
-	return ret;
 }
 
 void
@@ -3483,8 +3480,7 @@ struct il_mod_params il4965_mod_params = {
 		    il4965_set_ccmp_dynamic_key_info(il, keyconf, sta_id);
 		break;
 	case WLAN_CIPHER_SUITE_TKIP:
-		ret =
-		    il4965_set_tkip_dynamic_key_info(il, keyconf, sta_id);
+		il4965_set_tkip_dynamic_key_info(il, keyconf, sta_id);
 		break;
 	case WLAN_CIPHER_SUITE_WEP40:
 	case WLAN_CIPHER_SUITE_WEP104:
-- 
1.7.12.4


^ permalink raw reply related

* [PATCH 3/3] libertas: Remove unneeded variable and make function to be void
From: zhong jiang @ 2019-09-12 16:41 UTC (permalink / raw)
  To: kvalo; +Cc: davem, zhongjiang, linux-wireless, netdev, linux-kernel
In-Reply-To: <1568306492-42998-1-git-send-email-zhongjiang@huawei.com>

lbs_process_event  do not need return value to cope with different
cases. And change functon return type to void.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/wireless/marvell/libertas/cmd.h     | 2 +-
 drivers/net/wireless/marvell/libertas/cmdresp.c | 5 +----
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/marvell/libertas/cmd.h b/drivers/net/wireless/marvell/libertas/cmd.h
index 8087856..3c19307 100644
--- a/drivers/net/wireless/marvell/libertas/cmd.h
+++ b/drivers/net/wireless/marvell/libertas/cmd.h
@@ -76,7 +76,7 @@ void lbs_mac_event_disconnected(struct lbs_private *priv,
 
 /* Events */
 
-int lbs_process_event(struct lbs_private *priv, u32 event);
+void lbs_process_event(struct lbs_private *priv, u32 event);
 
 
 /* Actual commands */
diff --git a/drivers/net/wireless/marvell/libertas/cmdresp.c b/drivers/net/wireless/marvell/libertas/cmdresp.c
index b73d083..cb515c5 100644
--- a/drivers/net/wireless/marvell/libertas/cmdresp.c
+++ b/drivers/net/wireless/marvell/libertas/cmdresp.c
@@ -220,9 +220,8 @@ int lbs_process_command_response(struct lbs_private *priv, u8 *data, u32 len)
 	return ret;
 }
 
-int lbs_process_event(struct lbs_private *priv, u32 event)
+void lbs_process_event(struct lbs_private *priv, u32 event)
 {
-	int ret = 0;
 	struct cmd_header cmd;
 
 	switch (event) {
@@ -351,6 +350,4 @@ int lbs_process_event(struct lbs_private *priv, u32 event)
 		netdev_alert(priv->dev, "EVENT: unknown event id %d\n", event);
 		break;
 	}
-
-	return ret;
 }
-- 
1.7.12.4


^ permalink raw reply related

* [PATCH 1/3] brcmsmac: Remove unneeded variable and make function to be void
From: zhong jiang @ 2019-09-12 16:41 UTC (permalink / raw)
  To: kvalo; +Cc: davem, zhongjiang, linux-wireless, netdev, linux-kernel
In-Reply-To: <1568306492-42998-1-git-send-email-zhongjiang@huawei.com>

brcms_c_set_mac  do not need return value to cope with different
cases. And change functon return type to void.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/main.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/main.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/main.c
index 080e829..ddc4d47 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/main.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/main.c
@@ -3776,17 +3776,14 @@ static void brcms_c_set_ps_ctrl(struct brcms_c_info *wlc)
  * Write this BSS config's MAC address to core.
  * Updates RXE match engine.
  */
-static int brcms_c_set_mac(struct brcms_bss_cfg *bsscfg)
+static void brcms_c_set_mac(struct brcms_bss_cfg *bsscfg)
 {
-	int err = 0;
 	struct brcms_c_info *wlc = bsscfg->wlc;
 
 	/* enter the MAC addr into the RXE match registers */
 	brcms_c_set_addrmatch(wlc, RCM_MAC_OFFSET, wlc->pub->cur_etheraddr);
 
 	brcms_c_ampdu_macaddr_upd(wlc);
-
-	return err;
 }
 
 /* Write the BSS config's BSSID address to core (set_bssid in d11procs.tcl).
-- 
1.7.12.4


^ permalink raw reply related

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Jeff Kirsher @ 2019-09-12 16:29 UTC (permalink / raw)
  To: Stephen Rothwell, David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Ilya Maximets
In-Reply-To: <20190913022535.65ac3420@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

On Fri, 2019-09-13 at 02:25 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> 
> between commit:
> 
>   5c129241e2de ("ixgbe: add support for AF_XDP need_wakeup feature")
> 
> from the net tree and commit:
> 
>   bf280c0387eb ("ixgbe: fix double clean of Tx descriptors with xdp")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your
> tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any
> particularly
> complex conflicts.
> 

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

On how you fixed up the conflict.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH bpf-next 2/3] ixgbe: fix xdp handle calculations
From: Bowers, AndrewX @ 2019-09-12 16:28 UTC (permalink / raw)
  To: netdev@vger.kernel.org, bpf@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190911172435.21042-2-ciara.loftus@intel.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Ciara Loftus
> Sent: Wednesday, September 11, 2019 10:25 AM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; bpf@vger.kernel.org;
> intel-wired-lan@lists.osuosl.org; Loftus, Ciara <ciara.loftus@intel.com>;
> Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH bpf-next 2/3] ixgbe: fix xdp handle
> calculations
> 
> Commit 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") reintroduced the
> addition of the umem headroom to the xdp handle in the ixgbe_zca_free,
> ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However,
> the headroom is already added to the handle in the function
> ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the
> case where the headroom is non-zero.
> 
> Fixes: 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations")
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH bpf-next 1/3] i40e: fix xdp handle calculations
From: Bowers, AndrewX @ 2019-09-12 16:27 UTC (permalink / raw)
  To: netdev@vger.kernel.org, bpf@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190911172435.21042-1-ciara.loftus@intel.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Ciara Loftus
> Sent: Wednesday, September 11, 2019 10:25 AM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; bpf@vger.kernel.org;
> intel-wired-lan@lists.osuosl.org; Loftus, Ciara <ciara.loftus@intel.com>;
> Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH bpf-next 1/3] i40e: fix xdp handle
> calculations
> 
> Commit 4c5d9a7fa149 ("i40e: fix xdp handle calculations") reintroduced the
> addition of the umem headroom to the xdp handle in the i40e_zca_free,
> i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However, the
> headroom is already added to the handle in the function i40_run_xdp_zc.
> This commit removes the latter addition and fixes the case where the
> headroom is non-zero.
> 
> Fixes: 4c5d9a7fa149 ("i40e: fix xdp handle calculations")
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2019-09-12 16:25 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Ilya Maximets,
	Jeff Kirsher

[-- Attachment #1: Type: text/plain, Size: 1737 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

between commit:

  5c129241e2de ("ixgbe: add support for AF_XDP need_wakeup feature")

from the net tree and commit:

  bf280c0387eb ("ixgbe: fix double clean of Tx descriptors with xdp")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index a3b6d8c89127,ad802a8909e0..000000000000
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@@ -682,10 -697,19 +691,17 @@@ bool ixgbe_clean_xdp_tx_irq(struct ixgb
  	if (xsk_frames)
  		xsk_umem_complete_tx(umem, xsk_frames);
  
+ 	if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem)) {
+ 		if (tx_ring->next_to_clean == tx_ring->next_to_use)
+ 			xsk_set_tx_need_wakeup(tx_ring->xsk_umem);
+ 		else
+ 			xsk_clear_tx_need_wakeup(tx_ring->xsk_umem);
+ 	}
+ 
 -	xmit_done = ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
 -
 -	return budget > 0 && xmit_done;
 +	return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
  }
  
- int ixgbe_xsk_async_xmit(struct net_device *dev, u32 qid)
+ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
  {
  	struct ixgbe_adapter *adapter = netdev_priv(dev);
  	struct ixgbe_ring *ring;

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Florian Fainelli @ 2019-09-12 16:25 UTC (permalink / raw)
  To: Ido Schimmel, Robert Beckett
  Cc: netdev@vger.kernel.org, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jiri Pirko
In-Reply-To: <20190912090339.GA16311@splinter>

On 9/12/19 2:03 AM, Ido Schimmel wrote:
> On Wed, Sep 11, 2019 at 12:49:03PM +0100, Robert Beckett wrote:
>> On Wed, 2019-09-11 at 11:21 +0000, Ido Schimmel wrote:
>>> On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli wrote:
>>>> +Ido, Jiri,
>>>>
>>>> On 9/10/19 8:41 AM, Robert Beckett wrote:
>>>>> This patch-set adds support for some features of the Marvell
>>>>> switch
>>>>> chips that can be used to handle packet storms.
>>>>>
>>>>> The rationale for this was a setup that requires the ability to
>>>>> receive
>>>>> traffic from one port, while a packet storm is occuring on
>>>>> another port
>>>>> (via an external switch with a deliberate loop). This is needed
>>>>> to
>>>>> ensure vital data delivery from a specific port, while mitigating
>>>>> any
>>>>> loops or DoS that a user may introduce on another port (can't
>>>>> guarantee
>>>>> sensible users).
>>>>
>>>> The use case is reasonable, but the implementation is not really.
>>>> You
>>>> are using Device Tree which is meant to describe hardware as a
>>>> policy
>>>> holder for setting up queue priorities and likewise for queue
>>>> scheduling.
>>>>
>>>> The tool that should be used for that purpose is tc and possibly an
>>>> appropriately offloaded queue scheduler in order to map the desired
>>>> scheduling class to what the hardware supports.
>>>>
>>>> Jiri, Ido, how do you guys support this with mlxsw?
>>>
>>> Hi Florian,
>>>
>>> Are you referring to policing traffic towards the CPU using a policer
>>> on
>>> the egress of the CPU port? At least that's what I understand from
>>> the
>>> description of patch 6 below.
>>>
>>> If so, mlxsw sets policers for different traffic types during its
>>> initialization sequence. These policers are not exposed to the user
>>> nor
>>> configurable. While the default settings are good for most users, we
>>> do
>>> want to allow users to change these and expose current settings.
>>>
>>> I agree that tc seems like the right choice, but the question is
>>> where
>>> are we going to install the filters?
>>>
>>
>> Before I go too far down the rabbit hole of tc traffic shaping, maybe
>> it would be good to explain in more detail the problem I am trying to
>> solve.
>>
>> We have a setup as follows:
>>
>> Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port 1
>> (P1) is critical priority, no dropped packets allowed, all others can
>> be best effort.
>>
>> CPU port of swtich chip is connected via phy to phy of intel i210 (igb
>> driver).
>>
>> i210 is connected via pcie switch to imx6.
>>
>> When too many small packets attempt to be delivered to CPU port (e.g.
>> during broadcast flood) we saw dropped packets.
>>
>> The packets were being received by i210 in to rx descriptor buffer
>> fine, but the CPU could not keep up with the load. We saw
>> rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
>>
>>
>> With this in mind, I am wondering whether any amount of tc traffic
>> shaping would help? Would tc shaping require that the packet reception
>> manages to keep up before it can enact its policies? Does the
>> infrastructure have accelerator offload hooks to be able to apply it
>> via HW? I dont see how it would be able to inspect the packets to apply
>> filtering if they were dropped due to rx descriptor exhaustion. (please
>> bear with me with the basic questions, I am not familiar with this part
>> of the stack).
>>
>> Assuming that tc is still the way to go, after a brief look in to the
>> man pages and the documentation at largc.org, it seems like it would
>> need to use the ingress qdisc, with some sort of system to segregate
>> and priortise based on ingress port. Is this possible?
> 
> Hi Robert,
> 
> As I see it, you have two problems here:
> 
> 1. Classification: Based on ingress port in your case
> 
> 2. Scheduling: How to schedule between the different transmission queues
> 
> Where the port from which the packets should egress is the CPU port,
> before they cross the PCI towards the imx6.
> 
> Both of these issues can be solved by tc. The main problem is that today
> we do not have a netdev to represent the CPU port and therefore can't
> use existing infra like tc. I believe we need to create one. Besides
> scheduling, we can also use it to permit/deny certain traffic from
> reaching the CPU and perform policing.

We do not necessarily have to create a CPU netdev, we can overlay netdev
operations onto the DSA master interface (fec in that case), and
whenever you configure the DSA master interface, we also call back into
the switch side for the CPU port. This is not necessarily the cleanest
way to do things, but that is how we support ethtool operations (and
some netdev operations incidentally), and it works
-- 
Florian

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2019-09-12 16:19 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Mario Limonciello, Marcel Holtmann, Alex Lu

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/bluetooth/btusb.c

between commit:

  1ffdb51f28e8 ("Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"")

from the net tree and commit:

  9e45524a0111 ("Bluetooth: btusb: Fix suspend issue for Realtek devices")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/bluetooth/btusb.c
index ba4149054304,ed455de598ea..000000000000
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@@ -1173,7 -1201,18 +1204,14 @@@ static int btusb_open(struct hci_dev *h
  	}
  
  	data->intf->needs_remote_wakeup = 1;
 -	/* device specific wakeup source enabled and required for USB
 -	 * remote wakeup while host is suspended
 -	 */
 -	device_wakeup_enable(&data->udev->dev);
  
+ 	/* Disable device remote wakeup when host is suspended
+ 	 * For Realtek chips, global suspend without
+ 	 * SET_FEATURE (DEVICE_REMOTE_WAKEUP) can save more power in device.
+ 	 */
+ 	if (test_bit(BTUSB_WAKEUP_DISABLE, &data->flags))
+ 		device_wakeup_disable(&data->udev->dev);
+ 
  	if (test_and_set_bit(BTUSB_INTR_RUNNING, &data->flags))
  		goto done;
  
@@@ -1237,6 -1276,12 +1275,11 @@@ static int btusb_close(struct hci_dev *
  		goto failed;
  
  	data->intf->needs_remote_wakeup = 0;
+ 
+ 	/* Enable remote wake up for auto-suspend */
+ 	if (test_bit(BTUSB_WAKEUP_DISABLE, &data->flags))
+ 		data->intf->needs_remote_wakeup = 1;
+ 
 -	device_wakeup_disable(&data->udev->dev);
  	usb_autopm_put_interface(data->intf);
  
  failed:

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [v2 2/3] samples: pktgen: add helper functions for IP(v4/v6) CIDR parsing
From: Jesper Dangaard Brouer @ 2019-09-12 15:59 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: David S . Miller, netdev, brouer
In-Reply-To: <20190911184807.21770-2-danieltimlee@gmail.com>

On Thu, 12 Sep 2019 03:48:06 +0900
"Daniel T. Lee" <danieltimlee@gmail.com> wrote:

> This commit adds CIDR parsing and IP validate helper function to parse
> single IP or range of IP with CIDR. (e.g. 198.18.0.0/15)

One question: You do know that this expansion of the CIDR will also
include the CIDR network broadcast IP and "network-address", is that
intentional?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] ixgbe: Fix secpath usage for IPsec TX offload.
From: Jeff Kirsher @ 2019-09-12 15:57 UTC (permalink / raw)
  To: David Miller, steffen.klassert; +Cc: intel-wired-lan, michael, snelson, netdev
In-Reply-To: <20190912.134359.345289288863944180.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

On Thu, 2019-09-12 at 13:43 +0200, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 12 Sep 2019 13:01:44 +0200
> 
> > The ixgbe driver currently does IPsec TX offloading
> > based on an existing secpath. However, the secpath
> > can also come from the RX side, in this case it is
> > misinterpreted for TX offload and the packets are
> > dropped with a "bad sa_idx" error. Fix this by using
> > the xfrm_offload() function to test for TX offload.
> > 
> > Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
> > Reported-by: Michael Marley <michael@michaelmarley.com>
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> I'll apply this directly and queue it up for -stable, thanks.

Thanks Dave!

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH bpf-next] libbpf: add xsk_umem__adjust_offset
From: Kevin Laatz @ 2019-09-12  7:28 UTC (permalink / raw)
  To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jonathan.lemon
  Cc: bruce.richardson, ciara.loftus, bpf, Kevin Laatz

Currently, xsk_umem_adjust_offset exists as a kernel internal function.
This patch adds xsk_umem__adjust_offset to libbpf so that it can be used
from userspace. This will take the responsibility of properly storing the
offset away from the application, making it less error prone.

Since xsk_umem__adjust_offset is called on a per-packet basis, we need to
inline the function to avoid any performance regressions.  In order to
inline xsk_umem__adjust_offset, we need to add it to xsk.h. Unfortunately
this means that we can't dereference the xsk_umem_config struct directly
since it is defined only in xsk.c. We therefore add an extra API to return
the flags field to the user from the structure, and have the inline
function use this flags field directly.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 tools/lib/bpf/libbpf.map |  1 +
 tools/lib/bpf/xsk.c      |  5 +++++
 tools/lib/bpf/xsk.h      | 14 ++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index d04c7cb623ed..760350c9b81c 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -189,4 +189,5 @@ LIBBPF_0.0.4 {
 LIBBPF_0.0.5 {
 	global:
 		bpf_btf_get_next_id;
+		xsk_umem__get_flags;
 } LIBBPF_0.0.4;
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 842c4fd55859..a4250a721ea6 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -84,6 +84,11 @@ int xsk_socket__fd(const struct xsk_socket *xsk)
 	return xsk ? xsk->fd : -EINVAL;
 }
 
+__u32 xsk_umem__get_flags(struct xsk_umem *umem)
+{
+	return umem->config.flags;
+}
+
 static bool xsk_page_aligned(void *buffer)
 {
 	unsigned long addr = (unsigned long)buffer;
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 584f6820a639..bf782facb274 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -183,8 +183,22 @@ static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
 	return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
 }
 
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline __u64 xsk_umem__adjust_offset(__u32 umem_flags, __u64 addr,
+					    __u64 offset)
+{
+	if (umem_flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		return addr + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+	else
+		return addr + offset;
+}
+
 LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
 LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
+LIBBPF_API __u32 xsk_umem__get_flags(struct xsk_umem *umem);
 
 #define XSK_RING_CONS__DEFAULT_NUM_DESCS      2048
 #define XSK_RING_PROD__DEFAULT_NUM_DESCS      2048
-- 
2.17.1


^ permalink raw reply related

* Re: memory leak in ppp_write
From: syzbot @ 2019-09-12 15:17 UTC (permalink / raw)
  To: ast, bpf, daniel, davem, jeliantsurux, kafai, linux-kernel,
	linux-ppp, netdev, paulus, songliubraving, syzkaller-bugs, yhs
In-Reply-To: <CAKK_rchVQCYmjPSxk9MszV9BtF8y04-j2dpjV0Jg3c+nrRNEWQ@mail.gmail.com>

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+d9c8bf24e56416d7ce2c@syzkaller.appspotmail.com

Tested on:

commit:         6525771f Merge tag 'arc-5.3-rc7' of git://git.kernel.org/p..
git tree:       https://github.com/google/kasan.git
kernel config:  https://syzkaller.appspot.com/x/.config?x=e6131eafb9408877
dashboard link: https://syzkaller.appspot.com/bug?extid=d9c8bf24e56416d7ce2c
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
patch:          https://syzkaller.appspot.com/x/patch.diff?x=12610ef1600000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply

* [bisected] UDP / xfrm: NAT-T packets with bad UDP checksum get dropped
From: Thomas Jarosch @ 2019-09-12 15:13 UTC (permalink / raw)
  To: netdev; +Cc: Tom Herbert, Steffen Klassert

Hello together,

after updating many machines already from 3.14 to 4.19.67,
one site showed a non-working IPSec VPN connection with 4.19.x.

This IPSec connection is using UDP NAT traversal on port 4500.
The tunnel gets established fine, but no data flows.
Output of "ip xfrm state" looked sane.

The TRACE iptables firewall target showed the UDP packets
on port 4500 get accepted as expected. Still they didn't appear
in "ip xfrm monitor" and vanish somewhere in the kernel,
no error counter in /proc/net/xfrm_state increased at all.

After a few hours of bisecting with a test VM,
this commit was identified to cause the packet drop:

*******************
commit 0a80966b1043c3e2dc684140f155a3fded308660
Author: Tom Herbert <therbert@google.com>
Date:   Wed May 7 16:52:39 2014 -0700

    net: Verify UDP checksum before handoff to encap

    Moving validation of UDP checksum to be done in UDP not encap layer.

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f2d05d7be743..54ea0a3a48f1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1495,6 +1495,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
                if (skb->len > sizeof(struct udphdr) && encap_rcv != NULL) {
                        int ret;
 
+                       /* Verify checksum before giving to encap */
+                       if (udp_lib_checksum_complete(skb))
+                               goto csum_error;
+
                        ret = encap_rcv(sk, skb);
                        if (ret <= 0) {
                                UDP_INC_STATS_BH(sock_net(sk),
..
*******************

This commit is part of kernel 3.16. Reverting the commit
brings back the VPN connection using kernel 4.19.67.

I didn't spot the checksum error initially since wireshark and tcpdump
need to be explicitly told to verify checksums and there's a warning
about checksum offloading messing with it.

Further tracing showed the UDP packets leave the source
site with a zero UDP checksum and arrive, after passing
an unknown home router and a few other routers,
with a bogus UDP checksum on the destination side.

I've disabled rx and tx checksum offloading on the target test VM
and also on a router in between, but the packet dumps just
contain a fixed value as UDP checksum (f.e. 0x91c4).

RFC 3948 specifies how ESP packets should be encapsulated
using UDP for NAT traveral:
https://tools.ietf.org/html/rfc3948

*******************
2.1. UDP-Encapsulated ESP Header Format

the IPv4 UDP Checksum SHOULD be transmitted as a zero value, and
receivers MUST NOT depend on the UDP checksum being a zero value.


3.5.  Tunnel Mode ESP Decapsulation

1.  The UDP header is removed from the packet.
*******************

I'm wondering how the RFC should be interpreted here
regarding the UDP checksumming?

As far as I can tell it doesn't mention the UDP checksum
should be verified before decapsulation, the ESP packets
will provide proper data authentication anyway.

Cheers,
Thomas

^ permalink raw reply related

* Re: [PATCH v2 net 3/3] sctp: destroy bucket if failed to bind addr
From: Marcelo Ricardo Leitner @ 2019-09-12 14:52 UTC (permalink / raw)
  To: Mao Wenan
  Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
	kernel-janitors, Hulk Robot
In-Reply-To: <20190912040219.67517-4-maowenan@huawei.com>

On Thu, Sep 12, 2019 at 12:02:19PM +0800, Mao Wenan wrote:
> There is one memory leak bug report:
> BUG: memory leak
> unreferenced object 0xffff8881dc4c5ec0 (size 40):
>   comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s)
>   hex dump (first 32 bytes):
>     02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00  ................
>     f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00  .c=.............
>   backtrace:
>     [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp]
>     [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp]
>     [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp]
>     [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6]
>     [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647
>     [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline]
>     [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline]
>     [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656
>     [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
>     [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> This is because in sctp_do_bind, if sctp_get_port_local is to
> create hash bucket successfully, and sctp_add_bind_addr failed
> to bind address, e.g return -ENOMEM, so memory leak found, it
> needs to destroy allocated bucket.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply

* Re: [PATCH v2 net 2/3] sctp: remove redundant assignment when call sctp_get_port_local
From: Marcelo Ricardo Leitner @ 2019-09-12 14:52 UTC (permalink / raw)
  To: Mao Wenan
  Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <20190912040219.67517-3-maowenan@huawei.com>

On Thu, Sep 12, 2019 at 12:02:18PM +0800, Mao Wenan wrote:
> There are more parentheses in if clause when call sctp_get_port_local
> in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
> do cleanup.
> 
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply

* Re: [PATCH v2 net 1/3] sctp: change return type of sctp_get_port_local
From: Marcelo Ricardo Leitner @ 2019-09-12 14:51 UTC (permalink / raw)
  To: Mao Wenan
  Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <20190912040219.67517-2-maowenan@huawei.com>

On Thu, Sep 12, 2019 at 12:02:17PM +0800, Mao Wenan wrote:
> Currently sctp_get_port_local() returns a long
> which is either 0,1 or a pointer casted to long.
> It's neither of the callers use the return value since
> commit 62208f12451f ("net: sctp: simplify sctp_get_port").
> Now two callers are sctp_get_port and sctp_do_bind,
> they actually assumend a casted to an int was the same as
> a pointer casted to a long, and they don't save the return
> value just check whether it is zero or non-zero, so
> it would better change return type from long to int for
> sctp_get_port_local.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

This Fixes tag is not needed. It's just a cleanup.
Maybe Dave can remove it for us.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Thanks.

> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  net/sctp/socket.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 9d1f83b10c0a..5e1934c48709 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -309,7 +309,7 @@ static int sctp_bind(struct sock *sk, struct sockaddr *addr, int addr_len)
>  	return retval;
>  }
>  
> -static long sctp_get_port_local(struct sock *, union sctp_addr *);
> +static int sctp_get_port_local(struct sock *, union sctp_addr *);
>  
>  /* Verify this is a valid sockaddr. */
>  static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
> @@ -7998,7 +7998,7 @@ static void sctp_unhash(struct sock *sk)
>  static struct sctp_bind_bucket *sctp_bucket_create(
>  	struct sctp_bind_hashbucket *head, struct net *, unsigned short snum);
>  
> -static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> +static int sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  {
>  	struct sctp_sock *sp = sctp_sk(sk);
>  	bool reuse = (sk->sk_reuse || sp->reuse);
> @@ -8108,7 +8108,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  
>  			if (sctp_bind_addr_conflict(&ep2->base.bind_addr,
>  						    addr, sp2, sp)) {
> -				ret = (long)sk2;
> +				ret = 1;
>  				goto fail_unlock;
>  			}
>  		}
> @@ -8180,7 +8180,7 @@ static int sctp_get_port(struct sock *sk, unsigned short snum)
>  	addr.v4.sin_port = htons(snum);
>  
>  	/* Note: sk->sk_num gets filled in if ephemeral port request. */
> -	return !!sctp_get_port_local(sk, &addr);
> +	return sctp_get_port_local(sk, &addr);
>  }
>  
>  /*
> -- 
> 2.20.1
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox