Netdev List
 help / color / mirror / Atom feed
* [PATCH v3 net-next 2/4] selftests: rtnetlink: use dummydev as a test device
From: Shannon Nelson @ 2018-06-26 17:07 UTC (permalink / raw)
  To: davem, netdev, jakub.kicinski; +Cc: anders.roxell, linux-kselftest
In-Reply-To: <1530032875-30482-1-git-send-email-shannon.nelson@oracle.com>

We really shouldn't mess with local system settings, so let's
use the already created dummy device instead for ipsec testing.
Oh, and let's put the temp file into a proper directory.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 tools/testing/selftests/net/rtnetlink.sh | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 261a981..15948cf 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -523,21 +523,19 @@ kci_test_macsec()
 kci_test_ipsec()
 {
 	ret=0
-
-	# find an ip address on this machine and make up a destination
-	srcip=`ip -o addr | awk '/inet / { print $4; }' | grep -v "^127" | head -1 | cut -f1 -d/`
-	net=`echo $srcip | cut -f1-3 -d.`
-	base=`echo $srcip | cut -f4 -d.`
-	dstip="$net."`expr $base + 1`
-
 	algo="aead rfc4106(gcm(aes)) 0x3132333435363738393031323334353664636261 128"
+	srcip=192.168.123.1
+	dstip=192.168.123.2
+	spi=7
+
+	ip addr add $srcip dev $devdummy
 
 	# flush to be sure there's nothing configured
 	ip x s flush ; ip x p flush
 	check_err $?
 
 	# start the monitor in the background
-	tmpfile=`mktemp ipsectestXXX`
+	tmpfile=`mktemp /var/run/ipsectestXXX`
 	mpid=`(ip x m > $tmpfile & echo $!) 2>/dev/null`
 	sleep 0.2
 
@@ -601,6 +599,7 @@ kci_test_ipsec()
 	check_err $?
 	ip x p flush
 	check_err $?
+	ip addr del $srcip/32 dev $devdummy
 
 	if [ $ret -ne 0 ]; then
 		echo "FAIL: ipsec"
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 net-next 0/4] Updates for ipsec selftests
From: Shannon Nelson @ 2018-06-26 17:07 UTC (permalink / raw)
  To: davem, netdev, jakub.kicinski; +Cc: anders.roxell, linux-kselftest

Fix up the existing ipsec selftest and add tests for
the ipsec offload driver API.

v2: addressed formatting nits in netdevsim from Jakub Kicinski
v3: a couple more nits from Jakub

Shannon Nelson (4):
  selftests: rtnetlink: clear the return code at start of ipsec test
  selftests: rtnetlink: use dummydev as a test device
  netdevsim: add ipsec offload testing
  selftests: rtnetlink: add ipsec offload API test

 drivers/net/netdevsim/Makefile           |   4 +
 drivers/net/netdevsim/ipsec.c            | 345 +++++++++++++++++++++++++++++++
 drivers/net/netdevsim/netdev.c           |   7 +
 drivers/net/netdevsim/netdevsim.h        |  37 ++++
 tools/testing/selftests/net/rtnetlink.sh | 132 +++++++++++-
 5 files changed, 518 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/netdevsim/ipsec.c

-- 
2.7.4

^ permalink raw reply

* Re: [patch net-next RFC 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Guenter Roeck @ 2018-06-26 17:00 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Vadim Pasternak, linux-pm, netdev, rui.zhang, edubezval, jiri
In-Reply-To: <20180626142238.GB5064@lunn.ch>

On Tue, Jun 26, 2018 at 04:22:38PM +0200, Andrew Lunn wrote:
> On Tue, Jun 26, 2018 at 12:10:28PM +0000, Vadim Pasternak wrote:
> 
> Adding the linux-pm@vger.kernel.org list.
> 
> > Add new core_env module to allow port temperature reading. This
> > information has most critical impact on system's thermal monitoring and
> > is to be used by core_hwmon and core_thermal modules.
> > 
> > New internal API reads the temperature from all the modules, which are
> > equipped with the thermal sensor and exposes temperature according to
> > the worst measure. All individual temperature values are normalized to
> > pre-defined range.
> 
> This patchset has been sent to the netdev list before. I raised a few
> questions about this, which is why it is now being posted to a bigger
> group for review.
> 
> The hardware has up to 64 temperature sensors. These sensors are
> hot-plugable, since they are inside SFP modules, which are
> hot-plugable. Different SFP modules can have different operating
> temperature ranges. They contain an EEPROM which lists upper and lower
> warning and fail temperatures, and report alarms when these thresholds
> a reached.
> 
> This code takes the 64 sensors readings and calculates a single value
> it passes to one thermal zone. That thermal zone then controls one fan
> to keep this single value in range.
> 
> I queried is this is the correct way to do this? Would it not be
> better to have up to 64 thermal zones? Leave the thermal core to
> iterate over all the zones in order to determine how the fan should be
> driven?
> 
I very much think so. This problem must exist elsewhere; essentially
it is the bundling of multiple temperature sensors into a single thermal
zone. I am not sure if this should be 64 thermal zones or one thermal
zone with up to 64 sensors and some algorithm to select the relevant
temperature; that would be up to the thermal subsystem maintainers
to decide. Either case, the sensors should be handled and reported
as individual sensors, with appropriate limits, not as single sensor.
Yes, I understand that means we'll have hundreds of hwmon devices,
but that should not be a problem (and if it is, we'll have to fix
the problem, not the code exposing it).

I understand that the thermal subsystem does not currently support
handling this problem. There may also be some missing pieces between
the hwmon and thermal subsystems, such as reporting limits or alarms
when a hwmon driver register with the thermal subsystem.

Maybe it is time to add this support as part of this patch series ?

> This is possibly the first board with so many sensors. However, i
> doubt it is totally unique. Other big Ethernet switches with lots of
> SFP modules may be added later. Also, 10G copper PHYs often have
> temperature sensors, so this is not limited to just boards with
> optical ports. So having a generic solution would be good.

Agreed.

Thanks,
Guenter

> 
> What do the Linux PM exports say about this?
> 
> Thanks
> 	Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
From: Okash Khawaja @ 2018-06-26 16:48 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Jakub Kicinski, Daniel Borkmann, Alexei Starovoitov,
	Yonghong Song, Quentin Monnet, David S. Miller, netdev,
	kernel-team, linux-kernel
In-Reply-To: <20180623002639.h4qxy7aakypi6t7b@kafai-mbp.dhcp.thefacebook.com>

On Fri, Jun 22, 2018 at 05:26:39PM -0700, Martin KaFai Lau wrote:
> On Fri, Jun 22, 2018 at 04:32:00PM -0700, Jakub Kicinski wrote:
> > On Fri, 22 Jun 2018 15:54:08 -0700, Martin KaFai Lau wrote:
> > > > > > > > > > > >         "value": ["0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
> > > > > > > > > > > >         ],
> > > > > > > > > > > > 	"value_struct":{
> > > > > > > > > > > > 		"src_ip":2,      
> > > > > > > If for the same map the user changes the "src_ip" to an array of int[4]
> > > > > > > later (e.g. to support ipv6), it will become "src_ip": [1, 2, 3, 4].
> > > > > > > Is it breaking backward compat?
> > > > > > > i.e.
> > > > > > > struct five_tuples {
> > > > > > > -	int src_ip;
> > > > > > > +	int src_ip[4];
> > > > > > > /* ... */
> > > > > > > };    
> > > > > > 
> > > > > > Well, it is breaking backward compat, but it's the program doing it,
> > > > > > not bpftool :)  BTF changes so does the output.    
> > > > > As we see, the key/value's btf-output is inherently not backward compat.
> > > > > Hence, "-j" and "-p" will stay as is.  The whole existing json will
> > > > > be backward compat instead of only partly backward compat.  
> > > > 
> > > > No.  There is a difference between user of a facility changing their
> > > > input and kernel/libraries providing different output in response to
> > > > that, and the libraries suddenly changing the output on their own.
> > > > 
> > > > Your example is like saying if user started using IPv6 addresses
> > > > instead of IPv4 the netlink attributes in dumps will be different so
> > > > kernel didn't keep backwards compat.  While what you're doing is more
> > > > equivalent to dropping support for old ioctl interfaces because there
> > > > is a better mechanism now.  
> > > Sorry, I don't follow this.  I don't see netlink suffer json issue like
> > > the one on "key" and "value".
> > > 
> > > All I can grasp is, the json should normally be backward compat but now
> > > we are saying anything added by btf-output is an exception because
> > > the script parsing it will treat it differently than "key" and "value"
> > 
> > Backward compatibility means that if I run *the same* program against
> > different kernels/libraries it continues to work.  If someone decides
> > to upgrade their program to work with IPv6 (which was your example)
> > obviously there is no way system as a whole will look 1:1 the same.
> > 
> > > > BTF in JSON is very useful, and will help people who writes simple
> > > > orchestration/scripts based on bpftool *a* *lot*.  I really appreciate  
> > > Can you share what the script will do?  I want to understand why
> > > it cannot directly use the BTF format and the map data.
> > 
> > Think about a python script which wants to read a counter in a map.
> > Right now it would have to get the BTF, find out which bytes are the
> > counter, then convert the bytes into a larger int.  With JSON BTF if
> > just does entry["formatted"]["value"]["counter"].
> > 
> > Real life example from my test code (conversion of 3 element counter
> > array):
> > 
> > def str2int(strtab):
> >     inttab = []
> >     for i in strtab:
> >         inttab.append(int(i, 16))
> >     ba = bytearray(inttab)
> >     if len(strtab) == 4:
> >         fmt = "I"
> >     elif len(strtab) == 8:
> >         fmt = "Q"
> >     else:
> >         raise Exception("String array of len %d can't be unpacked to an int" %
> >                         (len(strtab)))
> >     return struct.unpack(fmt, ba)[0]
> > 
> > def convert(elems, idx):
> >     val = []
> >     for i in range(3):
> >         part = elems[idx]["value"][i * length:(i + 1) * length]
> >         val.append(str2int(part))
> >     return val
> > 
> > With BTF it would be:
> > 
> > 	elems[idx]["formatted"]["value"]
> > 
> > Which is fairly awesome.
> Thanks for the example.  Agree that with BTF, things are easier in general.
> 
> btw, what more awesome is,
> #> bpftool map find id 100 key 1
> {
> 	"counter_x": 1,
> 	"counter_y": 10
> }
> 
> > 
> > > > this addition to bpftool and will start using it myself as soon as it
> > > > lands.  I'm not sure why the reluctance to slightly change the output
> > > > format?  
> > > The initial change argument is because the json has to be backward compat.
> > > 
> > > Then we show that btf-output is inherently not backward compat, so
> > > printing it in json does not make sense at all.
> > > 
> > > However, now it is saying part of it does not have to be backward compat.
> > 
> > Compatibility of "formatted" member is defined as -> fields broken out
> > according to BTF.  So it is backward compatible.  The definition of
> > "value" member is -> an array of unfortunately formatted array of
> > ugly hex strings :(
> > 
> > > I am fine putting it under "formatted" for "-j" or "-p" if that is the
> > > case, other than the double output is still confusing.  Lets wait for
> > > Okash's input.
> > >
> > > At the same time, the same output will be used as the default plaintext
> > > output when BTF is available.  Then the plaintext BTF output
> > > will not be limited by the json restrictions when we want
> > > to improve human readability later.  Apparently, the
> > > improvements on plaintext will not be always applicable
> > > to json output.
> > 

hi,

so i guess following is what we want:

1. a "formatted" object nested inside -p and -j switches for bpf map
  dump. this will be JSON and backward compatible
2. an output for humans - which is like the current output. this will
not be JSON. this won't have to be backward compatible. this output will
be shown when neither of -j and -p are supplied and btf info is
available.

i can update the patches to v2 which covers 2 above + all other comments
on the patchset. later we can follow up with a patch for 1.

thanks for valuable feedback :)

okash

^ permalink raw reply

* RE: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface with FAN fault attribute
From: Vadim Pasternak @ 2018-06-26 16:47 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Andrew Lunn, davem@davemloft.net, netdev@vger.kernel.org,
	rui.zhang@intel.com, edubezval@gmail.com, jiri@resnulli.us, mlxsw,
	Michael Shych
In-Reply-To: <20180626163232.GA32079@roeck-us.net>



> -----Original Message-----
> From: Guenter Roeck [mailto:linux@roeck-us.net]
> Sent: Tuesday, June 26, 2018 7:33 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: Andrew Lunn <andrew@lunn.ch>; davem@davemloft.net;
> netdev@vger.kernel.org; rui.zhang@intel.com; edubezval@gmail.com;
> jiri@resnulli.us; mlxsw <mlxsw@mellanox.com>; Michael Shych
> <michaelsh@mellanox.com>
> Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface
> with FAN fault attribute
> 
> On Tue, Jun 26, 2018 at 02:47:01PM +0000, Vadim Pasternak wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > > Sent: Tuesday, June 26, 2018 5:29 PM
> > > To: Vadim Pasternak <vadimp@mellanox.com>
> > > Cc: davem@davemloft.net; netdev@vger.kernel.org; linux@roeck-us.net;
> > > rui.zhang@intel.com; edubezval@gmail.com; jiri@resnulli.us; mlxsw
> > > <mlxsw@mellanox.com>; Michael Shych <michaelsh@mellanox.com>
> > > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon
> > > interface with FAN fault attribute
> > >
> > > > +static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev,
> > > > +					  struct device_attribute *attr,
> > > > +					  char *buf)
> > > > +{
> > > > +	struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
> > > > +			container_of(attr, struct mlxsw_hwmon_attr,
> > > dev_attr);
> > > > +	struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
> > > > +	char mfsm_pl[MLXSW_REG_MFSM_LEN];
> > > > +	u16 tach;
> > > > +	int err;
> > > > +
> > > > +	mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index);
> > > > +	err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm),
> > > mfsm_pl);
> > > > +	if (err) {
> > > > +		dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query
> > > fan\n");
> > > > +		return err;
> > > > +	}
> > > > +	tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl);
> > > > +
> > > > +	return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 :
> > > > +0); }
> > >
> > > Documentation/hwmon/sysfs-interface says:
> > >
> > > Alarms are direct indications read from the chips. The drivers do
> > > NOT make comparisons of readings to thresholds. This allows
> > > violations between readings to be caught and alarmed. The exact
> > > definition of an alarm (for example, whether a threshold must be met
> > > or must be exceeded to cause an alarm) is chip-dependent.
> > >
> > > Now, this is a fault, not an alarm. But does the same apply?
> >
> Yes, it does. There are no "soft" alarms / faults.
> 
> > Hi Andrew,
> >
> > Hardware provides minimum value for tachometer.
> > Tachometer is considered as faulty in case it's below this value.
> 
> This is for user space to decide, not for the kernel.

Hi Guenter,

Do you suggest to expose provide fan{x}_min, instead of fan{x}_fault
and give to user to compare fan{x}_input versus fan{x}_min for the
fault decision?

> 
> > In case any tachometer is faulty, PWM according to the system
> > requirements should be set to 100% until the fault
> 
> system requirements. Again, this is for user space to decide.


Yes, user should decide in this case and I wanted to provide to user
fan{x}_fault for this matter. But it could do it based on input and min
attributes, of course.

> 
> > is not recovered (f.e. by physical replacing of bad unit).
> > This is the motivation to expose fan{x}_fault in the way it's exposed.
> >
> > Thanks,
> > Vadim.
> >
> > >
> > >      Andrew

^ permalink raw reply

* [PATCH net-next] l2tp: define helper for parsing struct sockaddr_pppol2tp*
From: Guillaume Nault @ 2018-06-26 16:41 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman

'sockaddr_len' is checked against various values when entering
pppol2tp_connect(), to verify its validity. It is used again later, to
find out which sockaddr structure was passed from user space. This
patch combines these two operations into one new function in order to
simplify pppol2tp_connect().

A new structure, l2tp_connect_info, is used to pass sockaddr data back
to pppol2tp_connect(), to avoid passing too many parameters to
l2tp_sockaddr_get_info(). Also, the first parameter is void* in order
to avoid casting between all sockaddr_* structures manually.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_ppp.c | 173 ++++++++++++++++++++++++++------------------
 1 file changed, 103 insertions(+), 70 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index eea5d7844473..d3a9355ac8ac 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -588,40 +588,113 @@ static void pppol2tp_session_init(struct l2tp_session *session)
 	}
 }
 
+struct l2tp_connect_info {
+	u8 version;
+	int fd;
+	u32 tunnel_id;
+	u32 peer_tunnel_id;
+	u32 session_id;
+	u32 peer_session_id;
+};
+
+static int pppol2tp_sockaddr_get_info(const void *sa, int sa_len,
+				      struct l2tp_connect_info *info)
+{
+	switch (sa_len) {
+	case sizeof(struct sockaddr_pppol2tp):
+	{
+		const struct sockaddr_pppol2tp *sa_v2in4 = sa;
+
+		if (sa_v2in4->sa_protocol != PX_PROTO_OL2TP)
+			return -EINVAL;
+
+		info->version = 2;
+		info->fd = sa_v2in4->pppol2tp.fd;
+		info->tunnel_id = sa_v2in4->pppol2tp.s_tunnel;
+		info->peer_tunnel_id = sa_v2in4->pppol2tp.d_tunnel;
+		info->session_id = sa_v2in4->pppol2tp.s_session;
+		info->peer_session_id = sa_v2in4->pppol2tp.d_session;
+
+		break;
+	}
+	case sizeof(struct sockaddr_pppol2tpv3):
+	{
+		const struct sockaddr_pppol2tpv3 *sa_v3in4 = sa;
+
+		if (sa_v3in4->sa_protocol != PX_PROTO_OL2TP)
+			return -EINVAL;
+
+		info->version = 3;
+		info->fd = sa_v3in4->pppol2tp.fd;
+		info->tunnel_id = sa_v3in4->pppol2tp.s_tunnel;
+		info->peer_tunnel_id = sa_v3in4->pppol2tp.d_tunnel;
+		info->session_id = sa_v3in4->pppol2tp.s_session;
+		info->peer_session_id = sa_v3in4->pppol2tp.d_session;
+
+		break;
+	}
+	case sizeof(struct sockaddr_pppol2tpin6):
+	{
+		const struct sockaddr_pppol2tpin6 *sa_v2in6 = sa;
+
+		if (sa_v2in6->sa_protocol != PX_PROTO_OL2TP)
+			return -EINVAL;
+
+		info->version = 2;
+		info->fd = sa_v2in6->pppol2tp.fd;
+		info->tunnel_id = sa_v2in6->pppol2tp.s_tunnel;
+		info->peer_tunnel_id = sa_v2in6->pppol2tp.d_tunnel;
+		info->session_id = sa_v2in6->pppol2tp.s_session;
+		info->peer_session_id = sa_v2in6->pppol2tp.d_session;
+
+		break;
+	}
+	case sizeof(struct sockaddr_pppol2tpv3in6):
+	{
+		const struct sockaddr_pppol2tpv3in6 *sa_v3in6 = sa;
+
+		if (sa_v3in6->sa_protocol != PX_PROTO_OL2TP)
+			return -EINVAL;
+
+		info->version = 3;
+		info->fd = sa_v3in6->pppol2tp.fd;
+		info->tunnel_id = sa_v3in6->pppol2tp.s_tunnel;
+		info->peer_tunnel_id = sa_v3in6->pppol2tp.d_tunnel;
+		info->session_id = sa_v3in6->pppol2tp.s_session;
+		info->peer_session_id = sa_v3in6->pppol2tp.d_session;
+
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* connect() handler. Attach a PPPoX socket to a tunnel UDP socket
  */
 static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
 			    int sockaddr_len, int flags)
 {
 	struct sock *sk = sock->sk;
-	struct sockaddr_pppol2tp *sp = (struct sockaddr_pppol2tp *) uservaddr;
 	struct pppox_sock *po = pppox_sk(sk);
 	struct l2tp_session *session = NULL;
+	struct l2tp_connect_info info;
 	struct l2tp_tunnel *tunnel;
 	struct pppol2tp_session *ps;
 	struct l2tp_session_cfg cfg = { 0, };
-	int error = 0;
-	u32 tunnel_id, peer_tunnel_id;
-	u32 session_id, peer_session_id;
 	bool drop_refcnt = false;
 	bool drop_tunnel = false;
 	bool new_session = false;
 	bool new_tunnel = false;
-	int ver = 2;
-	int fd;
-
-	lock_sock(sk);
-
-	error = -EINVAL;
+	int error;
 
-	if (sockaddr_len != sizeof(struct sockaddr_pppol2tp) &&
-	    sockaddr_len != sizeof(struct sockaddr_pppol2tpv3) &&
-	    sockaddr_len != sizeof(struct sockaddr_pppol2tpin6) &&
-	    sockaddr_len != sizeof(struct sockaddr_pppol2tpv3in6))
-		goto end;
+	error = pppol2tp_sockaddr_get_info(uservaddr, sockaddr_len, &info);
+	if (error < 0)
+		return error;
 
-	if (sp->sa_protocol != PX_PROTO_OL2TP)
-		goto end;
+	lock_sock(sk);
 
 	/* Check for already bound sockets */
 	error = -EBUSY;
@@ -633,56 +706,12 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
 	if (sk->sk_user_data)
 		goto end; /* socket is already attached */
 
-	/* Get params from socket address. Handle L2TPv2 and L2TPv3.
-	 * This is nasty because there are different sockaddr_pppol2tp
-	 * structs for L2TPv2, L2TPv3, over IPv4 and IPv6. We use
-	 * the sockaddr size to determine which structure the caller
-	 * is using.
-	 */
-	peer_tunnel_id = 0;
-	if (sockaddr_len == sizeof(struct sockaddr_pppol2tp)) {
-		fd = sp->pppol2tp.fd;
-		tunnel_id = sp->pppol2tp.s_tunnel;
-		peer_tunnel_id = sp->pppol2tp.d_tunnel;
-		session_id = sp->pppol2tp.s_session;
-		peer_session_id = sp->pppol2tp.d_session;
-	} else if (sockaddr_len == sizeof(struct sockaddr_pppol2tpv3)) {
-		struct sockaddr_pppol2tpv3 *sp3 =
-			(struct sockaddr_pppol2tpv3 *) sp;
-		ver = 3;
-		fd = sp3->pppol2tp.fd;
-		tunnel_id = sp3->pppol2tp.s_tunnel;
-		peer_tunnel_id = sp3->pppol2tp.d_tunnel;
-		session_id = sp3->pppol2tp.s_session;
-		peer_session_id = sp3->pppol2tp.d_session;
-	} else if (sockaddr_len == sizeof(struct sockaddr_pppol2tpin6)) {
-		struct sockaddr_pppol2tpin6 *sp6 =
-			(struct sockaddr_pppol2tpin6 *) sp;
-		fd = sp6->pppol2tp.fd;
-		tunnel_id = sp6->pppol2tp.s_tunnel;
-		peer_tunnel_id = sp6->pppol2tp.d_tunnel;
-		session_id = sp6->pppol2tp.s_session;
-		peer_session_id = sp6->pppol2tp.d_session;
-	} else if (sockaddr_len == sizeof(struct sockaddr_pppol2tpv3in6)) {
-		struct sockaddr_pppol2tpv3in6 *sp6 =
-			(struct sockaddr_pppol2tpv3in6 *) sp;
-		ver = 3;
-		fd = sp6->pppol2tp.fd;
-		tunnel_id = sp6->pppol2tp.s_tunnel;
-		peer_tunnel_id = sp6->pppol2tp.d_tunnel;
-		session_id = sp6->pppol2tp.s_session;
-		peer_session_id = sp6->pppol2tp.d_session;
-	} else {
-		error = -EINVAL;
-		goto end; /* bad socket address */
-	}
-
 	/* Don't bind if tunnel_id is 0 */
 	error = -EINVAL;
-	if (tunnel_id == 0)
+	if (!info.tunnel_id)
 		goto end;
 
-	tunnel = l2tp_tunnel_get(sock_net(sk), tunnel_id);
+	tunnel = l2tp_tunnel_get(sock_net(sk), info.tunnel_id);
 	if (tunnel)
 		drop_tunnel = true;
 
@@ -690,7 +719,7 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
 	 * peer_session_id is 0. Otherwise look up tunnel using supplied
 	 * tunnel id.
 	 */
-	if ((session_id == 0) && (peer_session_id == 0)) {
+	if (!info.session_id && !info.peer_session_id) {
 		if (tunnel == NULL) {
 			struct l2tp_tunnel_cfg tcfg = {
 				.encap = L2TP_ENCAPTYPE_UDP,
@@ -700,12 +729,16 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
 			/* Prevent l2tp_tunnel_register() from trying to set up
 			 * a kernel socket.
 			 */
-			if (fd < 0) {
+			if (info.fd < 0) {
 				error = -EBADF;
 				goto end;
 			}
 
-			error = l2tp_tunnel_create(sock_net(sk), fd, ver, tunnel_id, peer_tunnel_id, &tcfg, &tunnel);
+			error = l2tp_tunnel_create(sock_net(sk), info.fd,
+						   info.version,
+						   info.tunnel_id,
+						   info.peer_tunnel_id, &tcfg,
+						   &tunnel);
 			if (error < 0)
 				goto end;
 
@@ -734,9 +767,9 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
 		tunnel->recv_payload_hook = pppol2tp_recv_payload_hook;
 
 	if (tunnel->peer_tunnel_id == 0)
-		tunnel->peer_tunnel_id = peer_tunnel_id;
+		tunnel->peer_tunnel_id = info.peer_tunnel_id;
 
-	session = l2tp_session_get(sock_net(sk), tunnel, session_id);
+	session = l2tp_session_get(sock_net(sk), tunnel, info.session_id);
 	if (session) {
 		drop_refcnt = true;
 
@@ -765,8 +798,8 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
 		cfg.pw_type = L2TP_PWTYPE_PPP;
 
 		session = l2tp_session_create(sizeof(struct pppol2tp_session),
-					      tunnel, session_id,
-					      peer_session_id, &cfg);
+					      tunnel, info.session_id,
+					      info.peer_session_id, &cfg);
 		if (IS_ERR(session)) {
 			error = PTR_ERR(session);
 			goto end;
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH 2/3] i40e: split XDP_TX tail and XDP_REDIRECT map flushing
From: Björn Töpel @ 2018-06-26 16:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Netdev, John Fastabend, jasowang, Daniel Borkmann,
	Björn Töpel, Alexei Starovoitov, intel-wired-lan
In-Reply-To: <153002759360.15389.18278323388927454181.stgit@firesoul>

Den tis 26 juni 2018 kl 18:08 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> The driver was combining the XDP_TX tail flush and XDP_REDIRECT
> map flushing (xdp_do_flush_map).  This is suboptimal, these two
> flush operations should be kept separate.
>
> It looks like the mistake was copy-pasted from ixgbe.
>
> Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c |   24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 8ffb7454e67c..c1c027743159 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2200,9 +2200,10 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
>         return true;
>  }
>
> -#define I40E_XDP_PASS 0
> -#define I40E_XDP_CONSUMED 1
> -#define I40E_XDP_TX 2
> +#define I40E_XDP_PASS          0
> +#define I40E_XDP_CONSUMED      BIT(0)
> +#define I40E_XDP_TX            BIT(1)
> +#define I40E_XDP_REDIR         BIT(2)
>
>  static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
>                               struct i40e_ring *xdp_ring);
> @@ -2249,7 +2250,7 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
>                 break;
>         case XDP_REDIRECT:
>                 err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> -               result = !err ? I40E_XDP_TX : I40E_XDP_CONSUMED;
> +               result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
>                 break;
>         default:
>                 bpf_warn_invalid_xdp_action(act);
> @@ -2312,7 +2313,8 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>         struct sk_buff *skb = rx_ring->skb;
>         u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
> -       bool failure = false, xdp_xmit = false;
> +       unsigned int xdp_xmit = 0;
> +       bool failure = false;
>         struct xdp_buff xdp;
>
>         xdp.rxq = &rx_ring->xdp_rxq;
> @@ -2373,8 +2375,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>                 }
>
>                 if (IS_ERR(skb)) {
> -                       if (PTR_ERR(skb) == -I40E_XDP_TX) {
> -                               xdp_xmit = true;
> +                       unsigned int xdp_res = -PTR_ERR(skb);
> +
> +                       if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) {
> +                               xdp_xmit |= xdp_res;
>                                 i40e_rx_buffer_flip(rx_ring, rx_buffer, size);
>                         } else {
>                                 rx_buffer->pagecnt_bias++;
> @@ -2428,12 +2432,14 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>                 total_rx_packets++;
>         }
>
> -       if (xdp_xmit) {
> +       if (xdp_xmit & I40E_XDP_REDIR)
> +               xdp_do_flush_map();
> +
> +       if (xdp_xmit & I40E_XDP_TX) {
>                 struct i40e_ring *xdp_ring =
>                         rx_ring->vsi->xdp_rings[rx_ring->queue_index];
>
>                 i40e_xdp_ring_update_tail(xdp_ring);
> -               xdp_do_flush_map();
>         }
>
>         rx_ring->skb = skb;
>

Nice! Added intel-wired-lan to Cc.

Acked-by: Björn Töpel <bjorn.topel@intel.com>

^ permalink raw reply

* Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface with FAN fault attribute
From: Guenter Roeck @ 2018-06-26 16:32 UTC (permalink / raw)
  To: Vadim Pasternak
  Cc: Andrew Lunn, davem@davemloft.net, netdev@vger.kernel.org,
	rui.zhang@intel.com, edubezval@gmail.com, jiri@resnulli.us, mlxsw,
	Michael Shych
In-Reply-To: <HE1PR0502MB3753695D03AD5D8A8DA4CD84A2490@HE1PR0502MB3753.eurprd05.prod.outlook.com>

On Tue, Jun 26, 2018 at 02:47:01PM +0000, Vadim Pasternak wrote:
> 
> 
> > -----Original Message-----
> > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > Sent: Tuesday, June 26, 2018 5:29 PM
> > To: Vadim Pasternak <vadimp@mellanox.com>
> > Cc: davem@davemloft.net; netdev@vger.kernel.org; linux@roeck-us.net;
> > rui.zhang@intel.com; edubezval@gmail.com; jiri@resnulli.us; mlxsw
> > <mlxsw@mellanox.com>; Michael Shych <michaelsh@mellanox.com>
> > Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface
> > with FAN fault attribute
> > 
> > > +static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev,
> > > +					  struct device_attribute *attr,
> > > +					  char *buf)
> > > +{
> > > +	struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
> > > +			container_of(attr, struct mlxsw_hwmon_attr,
> > dev_attr);
> > > +	struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
> > > +	char mfsm_pl[MLXSW_REG_MFSM_LEN];
> > > +	u16 tach;
> > > +	int err;
> > > +
> > > +	mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index);
> > > +	err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm),
> > mfsm_pl);
> > > +	if (err) {
> > > +		dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query
> > fan\n");
> > > +		return err;
> > > +	}
> > > +	tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl);
> > > +
> > > +	return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 : 0);
> > > +}
> > 
> > Documentation/hwmon/sysfs-interface says:
> > 
> > Alarms are direct indications read from the chips. The drivers do NOT make
> > comparisons of readings to thresholds. This allows violations between readings
> > to be caught and alarmed. The exact definition of an alarm (for example,
> > whether a threshold must be met or must be exceeded to cause an alarm) is
> > chip-dependent.
> > 
> > Now, this is a fault, not an alarm. But does the same apply?
> 
Yes, it does. There are no "soft" alarms / faults.

> Hi Andrew,
> 
> Hardware provides minimum value for tachometer.
> Tachometer is considered as faulty in case it's below this
> value.

This is for user space to decide, not for the kernel.

> In case any tachometer is faulty, PWM according to the
> system requirements should be set to 100% until the fault

system requirements. Again, this is for user space to decide.

> is not recovered (f.e. by physical replacing of bad unit).
> This is the motivation to expose fan{x}_fault in the way
> it's exposed.
> 
> Thanks,
> Vadim.
> 
> > 
> >      Andrew

^ permalink raw reply

* Re: [PATCH V4 0/8] net: ethernet: stmmac: add support for stm32mp1
From: Alexandre Torgue @ 2018-06-26 16:31 UTC (permalink / raw)
  To: Christophe Roullier, mark.rutland, mcoquelin.stm32,
	peppe.cavallaro
  Cc: devicetree, linux-arm-kernel, netdev, andrew
In-Reply-To: <1527090479-5263-1-git-send-email-christophe.roullier@st.com>

Hi Christophe,

On 05/23/2018 05:47 PM, Christophe Roullier wrote:
> Patches to have Ethernet support on stm32mp1
> Changelog:
> Remark from Rob Herring
> Move Documentation/devicetree/bindings/arm/stm32.txt in
> Documentation/devicetree/bindings/arm/stm32/stm32.txt and create
> Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt
> 
> Replace also in arch/arm/boot/dts/stm32mp157c.dtsi, syscfg: system-config@50020000
> with syscfg: syscon@50020000syscfg: system-config@50020000
> 
> Christophe Roullier (8):
>    net: ethernet: stmmac: add adaptation for stm32mp157c.
>    dt-bindings: stm32-dwmac: add support of MPU families
>    ARM: dts: stm32: add ethernet pins to stm32mp157c
>    ARM: dts: stm32: Add syscfg on stm32mp1
>    ARM: dts: stm32: Add ethernet dwmac on stm32mp1
>    net: stmmac: add dwmac-4.20a compatible
>    ARM: dts: stm32: add support of ethernet on stm32mp157c-ev1
>    dt-bindings: stm32: add compatible for syscon
> 
>   Documentation/devicetree/bindings/arm/stm32.txt    |  10 -
>   .../devicetree/bindings/arm/stm32/stm32-syscon.txt |  14 ++
>   .../devicetree/bindings/arm/stm32/stm32.txt        |  10 +
>   .../devicetree/bindings/net/stm32-dwmac.txt        |  18 +-
>   arch/arm/boot/dts/stm32mp157-pinctrl.dtsi          |  46 ++++
>   arch/arm/boot/dts/stm32mp157c-ev1.dts              |  20 ++
>   arch/arm/boot/dts/stm32mp157c.dtsi                 |  35 +++
>   drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c  | 270 +++++++++++++++++++--
>   .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   3 +-
>   9 files changed, 398 insertions(+), 28 deletions(-)
>   delete mode 100644 Documentation/devicetree/bindings/arm/stm32.txt
>   create mode 100644 Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt
>   create mode 100644 Documentation/devicetree/bindings/arm/stm32/stm32.txt
> 

As discussed I squashed "ARM: dts: stm32: add ethernet pins to 
stm32mp157c" and "ARM: dts: stm32: add support of ethernet on 
stm32mp157c-ev1" ans fixed interrupt binding issue.

So DT patches applied on stm32-next.

regards
Alex

^ permalink raw reply

* Re: [rds-devel] [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-26 16:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, rds-devel
In-Reply-To: <20180626145323.GH20575@oracle.com>

On (06/26/18 10:53), Sowmini Varadhan wrote:
> Date: Tue, 26 Jun 2018 10:53:23 -0400
> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> To: David Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org, rds-devel@oss.oracle.com
> Subject: Re: [rds-devel] [PATCH net-next] rds: clean up loopback
> 
> and just to add, the fix itself is logically correct, so belongs in
> net-next. What I dont have (and therefore did not target net) is
> official confirmation that the syzbot failures are root-caused to the
> absence of this patch (since there is no reproducer for many of these,
> and no crash dumps available from syzbot).  
> 

With help from Dmitry, I just got the confirmation from syzbot that
"syzbot has tested the proposed patch and the reproducer did not trigger 
crash:"

thus, we can mark this

Reported-and-tested-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com

and yes, it can target net.

Thanks
--Sowmini

^ permalink raw reply

* Re: [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization
From: Song Liu @ 2018-06-26 16:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking,
	Jiong Wang
In-Reply-To: <CAJpBn1zFrBPupsug7NFrx2PV5ew+2mc3D4Zn7vWnujzyT6wEbA@mail.gmail.com>

On Tue, Jun 26, 2018 at 12:08 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Mon, Jun 25, 2018 at 10:50 PM, Song Liu <liu.song.a23@gmail.com> wrote:
>> On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
>> <jakub.kicinski@netronome.com> wrote:
>>> From: Jiong Wang <jiong.wang@netronome.com>
>>>
>>> Map read has been supported on NFP, this patch enables optimization for
>>> memcpy from map to packet.
>>>
>>> This patch also fixed one latent bug which will cause copying from
>>> unexpected address once memcpy for map pointer enabled.
>>>
>>> Reported-by: Mary Pham <mary.pham@netronome.com>
>>> Reported-by: David Beckett <david.beckett@netronome.com>
>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> ---
>>>  drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>>> index 8a92088df0d7..33111739b210 100644
>>> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>>> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
>>> @@ -670,7 +670,7 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>>>         xfer_num = round_up(len, 4) / 4;
>>>
>>>         if (src_40bit_addr)
>>> -               addr40_offset(nfp_prog, meta->insn.src_reg, off, &src_base,
>>> +               addr40_offset(nfp_prog, meta->insn.src_reg * 2, off, &src_base,
>>>                               &off);
>>
>> Did this break other cases before this patch?
>>
>> I am sorry if this is a dumb question. I don't think I fully
>> understand addr40_offset().
>
> Only map memory uses 40 bit addressing right now, so the if was pretty
> much dead code before the patch.
>
> The memcpy optimization was left out of the initial map support due to
> insufficient test coverage, I should have probably left more of the 40
> bit addressing code out back then.

Thanks for the explanation!

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* [PATCH net] KEYS: DNS: fix parsing multiple options
From: David Howells @ 2018-06-26 16:20 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

From: Eric Biggers <ebiggers@google.com>

My recent fix for dns_resolver_preparse() printing very long strings was
incomplete, as shown by syzbot which still managed to hit the
WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:

    precision 50001 too large
    WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0

The bug this time isn't just a printing bug, but also a logical error
when multiple options ("#"-separated strings) are given in the key
payload.  Specifically, when separating an option string into name and
value, if there is no value then the name is incorrectly considered to
end at the end of the key payload, rather than the end of the current
option.  This bypasses validation of the option length, and also means
that specifying multiple options is broken -- which presumably has gone
unnoticed as there is currently only one valid option anyway.

Fix it by correctly calculating the length of the option name.

Reproducer:

    perl -e 'print "#A#", "\x00" x 50000' | keyctl padd dns_resolver desc @s

Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/dns_resolver/dns_key.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dns_resolver/dns_key.c b/net/dns_resolver/dns_key.c
index 40c851693f77..d448823d4d2e 100644
--- a/net/dns_resolver/dns_key.c
+++ b/net/dns_resolver/dns_key.c
@@ -97,7 +97,7 @@ dns_resolver_preparse(struct key_preparsed_payload *prep)
 				return -EINVAL;
 			}
 
-			eq = memchr(opt, '=', opt_len) ?: end;
+			eq = memchr(opt, '=', opt_len) ?: next_opt;
 			opt_nlen = eq - opt;
 			eq++;
 			opt_vlen = next_opt - eq; /* will be -1 if no value */

^ permalink raw reply related

* Re: [PATCH net-next] sh_eth: fix *enum* {A|M}PR_BIT
From: Geert Uytterhoeven @ 2018-06-26 16:18 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, David S. Miller, Linux-Renesas
In-Reply-To: <e67e6256-4ae9-4527-d482-cf3bb50921cf@cogentembedded.com>

On Tue, Jun 26, 2018 at 5:43 PM Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> The *enum* {A|M}PR_BIT were declared in the commit 86a74ff21a7a ("net:
> sh_eth: add support for  Renesas SuperH Ethernet") adding SH771x support,
> however the SH771x manual  doesn't have the APR/MPR registers described
> and the code writing to them for SH7710 was later removed by the commit
> 380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct
> sh_eth_cpu_data""). All the newer SoC manuals have these registers
> documented as having a 16-bit TIME parameter of the PAUSE frame, not
> 1-bit -- update the *enum* accordingly, fixing up the APR/MPR writes...
>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net] netfilter: nf_log: don't hold nf_log_mutex during user access
From: Pablo Neira Ayuso @ 2018-06-26 16:05 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel, security
In-Reply-To: <20180625152200.200145-1-jannh@google.com>

On Mon, Jun 25, 2018 at 05:22:00PM +0200, Jann Horn wrote:
> The old code would indefinitely block other users of nf_log_mutex if
> a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
> region. Fix it by moving proc_dostring() out of the locked region.
> 
> This is a followup to commit 266d07cb1c9a ("netfilter: nf_log: fix
> sleeping function called from invalid context"), which changed this code
> from using rcu_read_lock() to taking nf_log_mutex.

Applied.

^ permalink raw reply

* Re: [PATCH net] netfilter: nf_log: fix uninit read in nf_log_proc_dostring
From: Pablo Neira Ayuso @ 2018-06-26 16:05 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jozsef Kadlecsik, Florian Westphal, netfilter-devel, coreteam,
	David S. Miller, netdev, linux-kernel
In-Reply-To: <20180620163345.212776-1-jannh@google.com>

On Wed, Jun 20, 2018 at 06:33:45PM +0200, Jann Horn wrote:
> When proc_dostring() is called with a non-zero offset in strict mode, it
> doesn't just write to the ->data buffer, it also reads. Make sure it
> doesn't read uninitialized data.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: fix kernel panic when NIC firmware is older than 1.7.2
From: Shannon Nelson @ 2018-06-26 16:03 UTC (permalink / raw)
  To: Felix Manlunas, davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	ricardo.farrington
In-Reply-To: <20180626115807.GA7089@felix-thinkpad.cavium.com>

On 6/26/2018 4:58 AM, Felix Manlunas wrote:
> From: Rick Farrington <ricardo.farrington@cavium.com>
> 
> Pre-1.7.2 NIC firmware does not support (and does not respond to) the "get
> speed" command which is sent by the 1.7.2 driver during modprobe.  Due to a
> bug in older firmware (with respect to unknown commands), this unsupported
> command causes a cascade of errors that ends in a kernel panic.
> 
> Fix it by making the sending of the "get speed" command conditional on the
> firmware version.
> 
> Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
> Acked-by: Derek Chickles <derek.chickles@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
> ---
> Note: To avoid checkpatch.pl "WARNING: line over 80 characters", the comma
>        that separates the arguments in the call to strcmp() was placed one
>        line below the usual spot.
> 
>   drivers/net/ethernet/cavium/liquidio/lio_main.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> index 7cb4e75..f83f884 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> @@ -3671,7 +3671,16 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
>   			OCTEON_CN2350_25GB_SUBSYS_ID ||
>   		    octeon_dev->subsystem_id ==
>   			OCTEON_CN2360_25GB_SUBSYS_ID) {
> -			liquidio_get_speed(lio);
> +			/* speed control unsupported in f/w older than 1.7.2 */
> +			if (strcmp(octeon_dev->fw_info.liquidio_firmware_version
> +			   , "1.7.2") < 0) {

Will the liquidio_firmware_version ever end up something like 1.7.10? 
If so, this strcmp() may not do what you want.

sln

> +				dev_info(&octeon_dev->pci_dev->dev,
> +					 "speed setting not supported by f/w.");
> +				octeon_dev->speed_setting = 25;
> +				octeon_dev->no_speed_setting = 1;
> +			} else {
> +				liquidio_get_speed(lio);
> +			}
>   
>   			if (octeon_dev->speed_setting == 0) {
>   				octeon_dev->speed_setting = 25;
> 

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-26 16:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Siwei Liu, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180626183706-mutt-send-email-mst@kernel.org>

On Tue, 26 Jun 2018 18:38:51 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote:
> > On Tue, 26 Jun 2018 04:50:25 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:  
> > > > > > > > Might not neccessarily be something wrong, but it's very limited to
> > > > > > > > prohibit the MAC of VF from changing when enslaved by failover.    
> > > > > > > You mean guest changing MAC? I'm not sure why we prohibit that.    
> > > > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > > > impression was that sync'ing the MAC address change between all 3
> > > > > > devices is challenging, as the failover driver uses MAC address to
> > > > > > match net_device internally.    
> > > > 
> > > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
> > > > of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
> > > > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
> > > > don't allow changing guest MAC unless it is a trusted VF.    
> > > 
> > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> > > For example I can see host just
> > > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> > > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> > >   
> > 
> > So, what I get from this is that QEMU needs to be able to control all
> > of standby, uuid, and mac to accommodate the different setups
> > (respectively have libvirt/management software set it up). Is the host
> > able to find out respectively define whether a VF is trusted?  
> 
> You do it with ip link I think but QEMU doesn't normally do this,
> it relies on libvirt to poke at host kernel and supply the info.
> 

Ok, that makes me conclude that we definitely need to involve the
libvirt folks before we proceed further with defining QEMU interfaces.

^ permalink raw reply

* Re: [RFC] net: Add new LoRaWAN subsystem
From: Jian-Hong Pan @ 2018-06-26 16:02 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Marcel Holtmann, David S. Miller, Alexander Aring, Stefan Schmidt,
	linux-wpan - ML, netdev, linux-kernel
In-Reply-To: <5e6b2d6a-f413-7547-6c03-c41dff21453c@suse.de>

Hi Andreas,

2018-06-24 23:49 GMT+08:00 Andreas Färber <afaerber@suse.de>:
> Hi Jian-Hong Pan,
>
> Am 13.05.2018 um 04:42 schrieb Jian-Hong Pan:
>> Hi Jiri and Marcel,
>>
>> 2018-05-11 23:39 GMT+08:00 Marcel Holtmann <marcel@holtmann.org>:
>>> Hi Jian-Hong,
>>>
>>>> A Low-Power Wide-Area Network (LPWAN) is a type of wireless
>>>> telecommunication wide area network designed to allow long range
>>>> communications at a low bit rate among things (connected objects), such
>>>> as sensors operated on a battery.  It can be used widely in IoT area.
>>>> LoRaWAN, which is one kind of implementation of LPWAN, is a medium
>>>> access control (MAC) layer protocol for managing communication between
>>>> LPWAN gateways and end-node devices, maintained by the LoRa Alliance.
>>>> LoRaWAN™ Specification could be downloaded at:
>>>> https://lora-alliance.org/lorawan-for-developers
>>>>
>>>> However, LoRaWAN is not implemented in Linux kernel right now, so I am
>>>> trying to develop it.  Here is my repository:
>>>> https://github.com/starnight/LoRa/tree/lorawan-ndo/LoRaWAN
>>>>
>>>> Because it is a kind of network, the ideal usage in an user space
>>>> program should be like "socket(PF_LORAWAN, SOCK_DGRAM, 0)" and with
>>>> other socket APIs.  Therefore, the definitions like AF_LORAWAN,
>>>> PF_LORAWAN ..., must be listed in the header files of glibc.
>>>> For the driver in kernel space, the definitions also must be listed in
>>>> the corresponding Linux socket header files.
>>>> Especially, both are for the testing programs.
>>>>
>>>> Back to the mentioned "LoRaWAN is not implemented in Linux kernel now".
>>>> Could or should we add the definitions into corresponding kernel header
>>>> files now, if LoRaWAN will be accepted as a subsystem in Linux?
>>>
>>> when you submit your LoRaWAN subsystem to netdev for review, include a patch that adds these new address family definitions. Just pick the next one available. There will be no pre-allocation of numbers until your work has been accepted upstream. Meaning, that the number might change if other address families get merged before yours. So you have to keep updating. glibc will eventually follow the number assigned by the kernel.
>>
>> Thanks for your guidance.  I will follow the steps.
>
> I have been working on a similar thing on and off since proposing it at
> FOSDEM 2017:

Wow!  Great!  I get new friends :)

> At https://github.com/afaerber/lora-modules you will find my proof of
> concept of PF_LORA with SOCK_DGRAM and stub drivers for various modules.
> My idea was to layer LoRaWAN on top of LoRa later.

We have the same idea here.

> The way I have developed this was to simply reuse numbers unused in our
> distro kernel and built my modules against the distro kernel, to avoid
> frequent reboots and full kernel builds.

I use the the AF_MAX number as AF_LORAWAN and the new AF_MAX will be
the old AF_MAX + 1
And so on ...

> Not having looked at your code yet, do you think our implementations are
> fairly independent at this point, or do you see conflicts apart from
> number allocation? Like, I am currently using lora0 as name - are you
> planning to use lorawan0 or rather something more generic like lpwan0?

The interface name I created is loraX, X will be 0, 1, 2 ...

4: lora0: <NOARP,UP,LOWER_UP> mtu 20 qdisc noqueue state UNKNOWN group
default qlen 1000
    link/[830] 01:02:03:04 brd ff:ff:ff:ff

> We might place your code in net/lora/lorawan/ and mine in net/lora/?

My implementation is:
LoRaWAN class module: net/lorawan/
LoRa device drivers: drivers/net/lorawan/sx127X ...

> More problematic would be the actual device drivers, where some devices
> would support both modes - some with soft MAC, others with full MAC. Do
> you have any ideas how to handle that in a sane way?

Let me guess!  You have the LoRa "chips" and "modules", am I correct?

If I am right, here is my opinion:
Most of the LoRa chips go with the SPI interface.  These are okay.

However, the LoRa modules go with their own protocols (like AT
commands) over the serial port.
The modules also are the combination of an MCU and a LoRa chip. Users
can flash the firmware of the MCU on their own directly.  I prefer having user
space applications deal with these modules, until there is a formal spec for
these kind of modules.

Regards,
Jian-Hong Pan

> Please keep me CC'ed on any follow-ups.
>
> Regards,
> Andreas
>
> --
> SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply

* [PATCH v5 net-next] net:sched: add action inheritdsfield to skbedit
From: Fu, Qiaobin @ 2018-06-26 15:58 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: Marcelo Ricardo Leitner, Davide Caratti, Michel Machado,
	netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com
In-Reply-To: <B84B92F9-B872-4430-B7E2-FBF23E543632@bu.edu>

The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v5:
*Update the drop counter for TC_ACT_SHOT

v4:
*Not allow setting flags other than the expected ones.

*Allow dumping the pure flags.

v3:
*Use optional flags, so that it won't break old versions of tc.

*Allow users to set both SKBEDIT_F_PRIORITY and SKBEDIT_F_INHERITDSFIELD flags.

v2:
*Fix the style issue

*Move the code from skbmod to skbedit

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
---

Note that the motivation for this patch is found in the following discussion:
https://www.spinics.net/lists/netdev/msg501061.html
---
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index fbcfe27a4e6c..6de6071ebed6 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -30,6 +30,7 @@
#define SKBEDIT_F_MARK			0x4
#define SKBEDIT_F_PTYPE			0x8
#define SKBEDIT_F_MASK			0x10
+#define SKBEDIT_F_INHERITDSFIELD	0x20

struct tc_skbedit {
	tc_gen;
@@ -45,6 +46,7 @@ enum {
	TCA_SKBEDIT_PAD,
	TCA_SKBEDIT_PTYPE,
	TCA_SKBEDIT_MASK,
+	TCA_SKBEDIT_FLAGS,
	__TCA_SKBEDIT_MAX
};
#define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 6138d1d71900..dfaf5d8028dd 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -23,6 +23,9 @@
#include <linux/rtnetlink.h>
#include <net/netlink.h>
#include <net/pkt_sched.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/dsfield.h>

#include <linux/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_skbedit.h>
@@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,

	if (d->flags & SKBEDIT_F_PRIORITY)
		skb->priority = d->priority;
+	if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
+		int wlen = skb_network_offset(skb);
+
+		switch (tc_skb_protocol(skb)) {
+		case htons(ETH_P_IP):
+			wlen += sizeof(struct iphdr);
+			if (!pskb_may_pull(skb, wlen))
+				goto err;
+			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+			break;
+
+		case htons(ETH_P_IPV6):
+			wlen += sizeof(struct ipv6hdr);
+			if (!pskb_may_pull(skb, wlen))
+				goto err;
+			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+			break;
+		}
+	}
	if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
	    skb->dev->real_num_tx_queues > d->queue_mapping)
		skb_set_queue_mapping(skb, d->queue_mapping);
@@ -53,6 +75,11 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,

	spin_unlock(&d->tcf_lock);
	return d->tcf_action;
+
+err:
+	d->tcf_qstats.drops++;
+	spin_unlock(&d->tcf_lock);
+	return TC_ACT_SHOT;
}

static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
@@ -62,6 +89,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
	[TCA_SKBEDIT_MARK]		= { .len = sizeof(u32) },
	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
+	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
};

static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -114,6 +142,13 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
		mask = nla_data(tb[TCA_SKBEDIT_MASK]);
	}

+	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
+		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
+
+		if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
+			flags |= SKBEDIT_F_INHERITDSFIELD;
+	}
+
	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);

	exists = tcf_idr_check(tn, parm->index, a, bind);
@@ -178,6 +213,7 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
		.action  = d->tcf_action,
	};
	struct tcf_t t;
+	u64 pure_flags = 0;

	if (nla_put(skb, TCA_SKBEDIT_PARMS, sizeof(opt), &opt))
		goto nla_put_failure;
@@ -196,6 +232,11 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
	if ((d->flags & SKBEDIT_F_MASK) &&
	    nla_put_u32(skb, TCA_SKBEDIT_MASK, d->mask))
		goto nla_put_failure;
+	if (d->flags & SKBEDIT_F_INHERITDSFIELD)
+		pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+	if (pure_flags != 0 &&
+	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
+		goto nla_put_failure;

	tcf_tm_dump(&t, &d->tcf_tm);
	if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), &t, TCA_SKBEDIT_PAD))

^ permalink raw reply related

* Re: [PATCH v3,net-next] vlan: implement vlan id and protocol changes
From: Ido Schimmel @ 2018-06-26 15:57 UTC (permalink / raw)
  To: Chas Williams; +Cc: dsa, David S. Miller, netdev, Roopa Prabhu
In-Reply-To: <CAG2-Gkm0u3Od64nAMpUzq+=M+cj3VS0J1VQ8L5BChbo7vig+kA@mail.gmail.com>

On Tue, Jun 26, 2018 at 09:31:55AM -0400, Chas Williams wrote:
> On Mon, Jun 25, 2018 at 4:45 PM David Ahern <dsa@cumulusnetworks.com> wrote:
> 
> > On 6/25/18 4:30 AM, Chas Williams wrote:
> > > vlan_changelink silently ignores attempts to change the vlan id
> > > or protocol id of an existing vlan interface.  Implement by adding
> > > the new vlan id and protocol to the interface's vlan group and then
> > > removing the old vlan id and protocol from the vlan group.
> > >
> > > Signed-off-by: Chas Williams <3chas3@gmail.com>
> > > ---
> > >  include/linux/netdevice.h |  1 +
> > >  net/8021q/vlan.c          |  4 ++--
> > >  net/8021q/vlan.h          |  2 ++
> > >  net/8021q/vlan_netlink.c  | 38 ++++++++++++++++++++++++++++++++++++++
> > >  net/core/dev.c            |  1 +
> > >  5 files changed, 44 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 3ec9850c7936..a95ae238addf 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -2409,6 +2409,7 @@ enum netdev_cmd {
> > >       NETDEV_CVLAN_FILTER_DROP_INFO,
> > >       NETDEV_SVLAN_FILTER_PUSH_INFO,
> > >       NETDEV_SVLAN_FILTER_DROP_INFO,
> > > +     NETDEV_CHANGEVLAN,
> > >  };
> > >  const char *netdev_cmd_to_name(enum netdev_cmd cmd);
> > >
> >
> > you add the new notifier, but do not add any hooks to catch and process it.
> >
> 
> I can remove it.  I thought it would be prudent to add it now.
> This could also really be NETDEV_CHANGE.  I wasn't sure
> which would be more acceptable.
> 
> 
> > Personally, I think it is a bit sketchy to change the vlan id on an
> > existing device and I suspect it will cause latent errors.
> >
> 
> It's not any different than changing any other layer 2 property on a device.
> If you change the MTU or the MAC address, you are potentially going to
> cause latent errors.

It is different in switch ASICs, at least. The MTU and MAC don't have
any state associated with them. The VLAN does.

For example, when you assign an IP address to a VLAN device configured
on top of an mlxsw port (e.g., swp1.10), then you are basically creating
a router interface (RIF) that is able to route packets. This RIF is
bound to the port and the VLAN {1, 10} which cannot be changed during
the lifetime of the RIF (at least w/o impacting traffic). The MAC and
the MTU can be easily changed and are changed following
NETDEV_CHANGEADDR and NETDEV_CHANGEMTU events.

Similar problems exist in bridged VLAN devices.

> 
> 
> >
> > What's your use case for trying to implement the change versus causing
> > it to generate an unsupported error?
> >
> 
> It's far more convenient to be able to change the VLAN ID and proto
> instead of having to delete the link and put it back.  That's a lot of
> churn (netlink mesages, kernel calls) for something relatively simple.
> 
> 
> >
> > If this patch does get accepted, I believe the mlxsw switchdev driver
> > will be impacted.
> >
> 
> How so?  It was relying on the fact that VLAN changes were ignored?

It is relying on existing kernel behavior which doesn't allow to change
the VLAN.

tl;dr - I'm still not convinced this is actually needed, but if you're
going to allow such behavior, then please also include a notification
that enables existing in-kernel users to refuse the operation.

Thanks

^ permalink raw reply

* [PATCH net-next] tcp: remove one indentation level in tcp_create_openreq_child
From: Eric Dumazet @ 2018-06-26 15:45 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_minisocks.c | 223 ++++++++++++++++++++-------------------
 1 file changed, 113 insertions(+), 110 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 1dda1341a223937580b4efdbedb21ae50b221ff7..dac5893a52b4520d86ed2fcadbfb561a559fcd3d 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -449,119 +449,122 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 				      struct sk_buff *skb)
 {
 	struct sock *newsk = inet_csk_clone_lock(sk, req, GFP_ATOMIC);
-
-	if (newsk) {
-		const struct inet_request_sock *ireq = inet_rsk(req);
-		struct tcp_request_sock *treq = tcp_rsk(req);
-		struct inet_connection_sock *newicsk = inet_csk(newsk);
-		struct tcp_sock *newtp = tcp_sk(newsk);
-		struct tcp_sock *oldtp = tcp_sk(sk);
-
-		smc_check_reset_syn_req(oldtp, req, newtp);
-
-		/* Now setup tcp_sock */
-		newtp->pred_flags = 0;
-
-		newtp->rcv_wup = newtp->copied_seq =
-		newtp->rcv_nxt = treq->rcv_isn + 1;
-		newtp->segs_in = 1;
-
-		newtp->snd_sml = newtp->snd_una =
-		newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
-
-		INIT_LIST_HEAD(&newtp->tsq_node);
-		INIT_LIST_HEAD(&newtp->tsorted_sent_queue);
-
-		tcp_init_wl(newtp, treq->rcv_isn);
-
-		newtp->srtt_us = 0;
-		newtp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
-		minmax_reset(&newtp->rtt_min, tcp_jiffies32, ~0U);
-		newicsk->icsk_rto = TCP_TIMEOUT_INIT;
-		newicsk->icsk_ack.lrcvtime = tcp_jiffies32;
-
-		newtp->packets_out = 0;
-		newtp->retrans_out = 0;
-		newtp->sacked_out = 0;
-		newtp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
-		newtp->tlp_high_seq = 0;
-		newtp->lsndtime = tcp_jiffies32;
-		newsk->sk_txhash = treq->txhash;
-		newtp->last_oow_ack_time = 0;
-		newtp->total_retrans = req->num_retrans;
-
-		/* So many TCP implementations out there (incorrectly) count the
-		 * initial SYN frame in their delayed-ACK and congestion control
-		 * algorithms that we must have the following bandaid to talk
-		 * efficiently to them.  -DaveM
-		 */
-		newtp->snd_cwnd = TCP_INIT_CWND;
-		newtp->snd_cwnd_cnt = 0;
-
-		/* There's a bubble in the pipe until at least the first ACK. */
-		newtp->app_limited = ~0U;
-
-		tcp_init_xmit_timers(newsk);
-		newtp->write_seq = newtp->pushed_seq = treq->snt_isn + 1;
-
-		newtp->rx_opt.saw_tstamp = 0;
-
-		newtp->rx_opt.dsack = 0;
-		newtp->rx_opt.num_sacks = 0;
-
-		newtp->urg_data = 0;
-
-		if (sock_flag(newsk, SOCK_KEEPOPEN))
-			inet_csk_reset_keepalive_timer(newsk,
-						       keepalive_time_when(newtp));
-
-		newtp->rx_opt.tstamp_ok = ireq->tstamp_ok;
-		newtp->rx_opt.sack_ok = ireq->sack_ok;
-		newtp->window_clamp = req->rsk_window_clamp;
-		newtp->rcv_ssthresh = req->rsk_rcv_wnd;
-		newtp->rcv_wnd = req->rsk_rcv_wnd;
-		newtp->rx_opt.wscale_ok = ireq->wscale_ok;
-		if (newtp->rx_opt.wscale_ok) {
-			newtp->rx_opt.snd_wscale = ireq->snd_wscale;
-			newtp->rx_opt.rcv_wscale = ireq->rcv_wscale;
-		} else {
-			newtp->rx_opt.snd_wscale = newtp->rx_opt.rcv_wscale = 0;
-			newtp->window_clamp = min(newtp->window_clamp, 65535U);
-		}
-		newtp->snd_wnd = (ntohs(tcp_hdr(skb)->window) <<
-				  newtp->rx_opt.snd_wscale);
-		newtp->max_window = newtp->snd_wnd;
-
-		if (newtp->rx_opt.tstamp_ok) {
-			newtp->rx_opt.ts_recent = req->ts_recent;
-			newtp->rx_opt.ts_recent_stamp = get_seconds();
-			newtp->tcp_header_len = sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED;
-		} else {
-			newtp->rx_opt.ts_recent_stamp = 0;
-			newtp->tcp_header_len = sizeof(struct tcphdr);
-		}
-		newtp->tsoffset = treq->ts_off;
+	const struct inet_request_sock *ireq = inet_rsk(req);
+	struct tcp_request_sock *treq = tcp_rsk(req);
+	struct inet_connection_sock *newicsk;
+	struct tcp_sock *oldtp, *newtp;
+
+	if (!newsk)
+		return NULL;
+
+	newicsk = inet_csk(newsk);
+	newtp = tcp_sk(newsk);
+	oldtp = tcp_sk(sk);
+
+	smc_check_reset_syn_req(oldtp, req, newtp);
+
+	/* Now setup tcp_sock */
+	newtp->pred_flags = 0;
+
+	newtp->rcv_wup = newtp->copied_seq =
+	newtp->rcv_nxt = treq->rcv_isn + 1;
+	newtp->segs_in = 1;
+
+	newtp->snd_sml = newtp->snd_una =
+	newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
+
+	INIT_LIST_HEAD(&newtp->tsq_node);
+	INIT_LIST_HEAD(&newtp->tsorted_sent_queue);
+
+	tcp_init_wl(newtp, treq->rcv_isn);
+
+	newtp->srtt_us = 0;
+	newtp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
+	minmax_reset(&newtp->rtt_min, tcp_jiffies32, ~0U);
+	newicsk->icsk_rto = TCP_TIMEOUT_INIT;
+	newicsk->icsk_ack.lrcvtime = tcp_jiffies32;
+
+	newtp->packets_out = 0;
+	newtp->retrans_out = 0;
+	newtp->sacked_out = 0;
+	newtp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
+	newtp->tlp_high_seq = 0;
+	newtp->lsndtime = tcp_jiffies32;
+	newsk->sk_txhash = treq->txhash;
+	newtp->last_oow_ack_time = 0;
+	newtp->total_retrans = req->num_retrans;
+
+	/* So many TCP implementations out there (incorrectly) count the
+	 * initial SYN frame in their delayed-ACK and congestion control
+	 * algorithms that we must have the following bandaid to talk
+	 * efficiently to them.  -DaveM
+	 */
+	newtp->snd_cwnd = TCP_INIT_CWND;
+	newtp->snd_cwnd_cnt = 0;
+
+	/* There's a bubble in the pipe until at least the first ACK. */
+	newtp->app_limited = ~0U;
+
+	tcp_init_xmit_timers(newsk);
+	newtp->write_seq = newtp->pushed_seq = treq->snt_isn + 1;
+
+	newtp->rx_opt.saw_tstamp = 0;
+
+	newtp->rx_opt.dsack = 0;
+	newtp->rx_opt.num_sacks = 0;
+
+	newtp->urg_data = 0;
+
+	if (sock_flag(newsk, SOCK_KEEPOPEN))
+		inet_csk_reset_keepalive_timer(newsk,
+					       keepalive_time_when(newtp));
+
+	newtp->rx_opt.tstamp_ok = ireq->tstamp_ok;
+	newtp->rx_opt.sack_ok = ireq->sack_ok;
+	newtp->window_clamp = req->rsk_window_clamp;
+	newtp->rcv_ssthresh = req->rsk_rcv_wnd;
+	newtp->rcv_wnd = req->rsk_rcv_wnd;
+	newtp->rx_opt.wscale_ok = ireq->wscale_ok;
+	if (newtp->rx_opt.wscale_ok) {
+		newtp->rx_opt.snd_wscale = ireq->snd_wscale;
+		newtp->rx_opt.rcv_wscale = ireq->rcv_wscale;
+	} else {
+		newtp->rx_opt.snd_wscale = newtp->rx_opt.rcv_wscale = 0;
+		newtp->window_clamp = min(newtp->window_clamp, 65535U);
+	}
+	newtp->snd_wnd = ntohs(tcp_hdr(skb)->window) << newtp->rx_opt.snd_wscale;
+	newtp->max_window = newtp->snd_wnd;
+
+	if (newtp->rx_opt.tstamp_ok) {
+		newtp->rx_opt.ts_recent = req->ts_recent;
+		newtp->rx_opt.ts_recent_stamp = get_seconds();
+		newtp->tcp_header_len = sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED;
+	} else {
+		newtp->rx_opt.ts_recent_stamp = 0;
+		newtp->tcp_header_len = sizeof(struct tcphdr);
+	}
+	newtp->tsoffset = treq->ts_off;
 #ifdef CONFIG_TCP_MD5SIG
-		newtp->md5sig_info = NULL;	/*XXX*/
-		if (newtp->af_specific->md5_lookup(sk, newsk))
-			newtp->tcp_header_len += TCPOLEN_MD5SIG_ALIGNED;
+	newtp->md5sig_info = NULL;	/*XXX*/
+	if (newtp->af_specific->md5_lookup(sk, newsk))
+		newtp->tcp_header_len += TCPOLEN_MD5SIG_ALIGNED;
 #endif
-		if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len)
-			newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len;
-		newtp->rx_opt.mss_clamp = req->mss;
-		tcp_ecn_openreq_child(newtp, req);
-		newtp->fastopen_req = NULL;
-		newtp->fastopen_rsk = NULL;
-		newtp->syn_data_acked = 0;
-		newtp->rack.mstamp = 0;
-		newtp->rack.advanced = 0;
-		newtp->rack.reo_wnd_steps = 1;
-		newtp->rack.last_delivered = 0;
-		newtp->rack.reo_wnd_persist = 0;
-		newtp->rack.dsack_seen = 0;
+	if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len)
+		newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len;
+	newtp->rx_opt.mss_clamp = req->mss;
+	tcp_ecn_openreq_child(newtp, req);
+	newtp->fastopen_req = NULL;
+	newtp->fastopen_rsk = NULL;
+	newtp->syn_data_acked = 0;
+	newtp->rack.mstamp = 0;
+	newtp->rack.advanced = 0;
+	newtp->rack.reo_wnd_steps = 1;
+	newtp->rack.last_delivered = 0;
+	newtp->rack.reo_wnd_persist = 0;
+	newtp->rack.dsack_seen = 0;
+
+	__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
 
-		__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
-	}
 	return newsk;
 }
 EXPORT_SYMBOL(tcp_create_openreq_child);
-- 
2.18.0.rc2.346.g013aa6912e-goog

^ permalink raw reply related

* [PATCH net-next] sh_eth: fix *enum* {A|M}PR_BIT
From: Sergei Shtylyov @ 2018-06-26 15:42 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc
In-Reply-To: <2809eba8-4c9a-1d5f-a47d-8125777e365b@cogentembedded.com>

The *enum* {A|M}PR_BIT were declared in the commit 86a74ff21a7a ("net:
sh_eth: add support for  Renesas SuperH Ethernet") adding SH771x support,
however the SH771x manual  doesn't have the APR/MPR registers described
and the code writing to them for SH7710 was later removed by the commit
380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct
sh_eth_cpu_data""). All the newer SoC manuals have these registers
documented as having a 16-bit TIME parameter of the PAUSE frame, not
1-bit -- update the *enum* accordingly, fixing up the APR/MPR writes...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
This patch is against DaveM's 'net-next.git' repo.

 drivers/net/ethernet/renesas/sh_eth.c |    4 ++--
 drivers/net/ethernet/renesas/sh_eth.h |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -1521,9 +1521,9 @@ static int sh_eth_dev_init(struct net_de
 
 	/* mask reset */
 	if (mdp->cd->apr)
-		sh_eth_write(ndev, APR_AP, APR);
+		sh_eth_write(ndev, 1, APR);
 	if (mdp->cd->mpr)
-		sh_eth_write(ndev, MPR_MP, MPR);
+		sh_eth_write(ndev, 1, MPR);
 	if (mdp->cd->tpauser)
 		sh_eth_write(ndev, TPAUSER_UNLIMITED, TPAUSER);
 
Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net-next/drivers/net/ethernet/renesas/sh_eth.h
@@ -383,12 +383,12 @@ enum ECSIPR_STATUS_MASK_BIT {
 
 /* APR */
 enum APR_BIT {
-	APR_AP = 0x00000001,
+	APR_AP = 0x0000ffff,
 };
 
 /* MPR */
 enum MPR_BIT {
-	MPR_MP = 0x00000001,
+	MPR_MP = 0x0000ffff,
 };
 
 /* TRSCER */

^ permalink raw reply

* [PATCH 3/3] virtio_net: split XDP_TX kick and XDP_REDIRECT map flushing
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov
In-Reply-To: <153002741940.15389.10466368482771753300.stgit@firesoul>

The driver was combining XDP_TX virtqueue_kick and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

The suboptimal behavior was introduced in commit 9267c430c6b6
("virtio-net: add missing virtqueue kick when flushing packets").

Fixes: 9267c430c6b6 ("virtio-net: add missing virtqueue kick when flushing packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/virtio_net.c |   30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1619ee3070b6..ae47ecf80c2d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -53,6 +53,10 @@ module_param(napi_tx, bool, 0644);
 /* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
 #define VIRTIO_XDP_HEADROOM 256
 
+/* Separating two types of XDP xmit */
+#define VIRTIO_XDP_TX		BIT(0)
+#define VIRTIO_XDP_REDIR	BIT(1)
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -582,7 +586,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 				     struct receive_queue *rq,
 				     void *buf, void *ctx,
 				     unsigned int len,
-				     bool *xdp_xmit)
+				     unsigned int *xdp_xmit)
 {
 	struct sk_buff *skb;
 	struct bpf_prog *xdp_prog;
@@ -654,14 +658,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				goto err_xdp;
 			}
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_TX;
 			rcu_read_unlock();
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			err = xdp_do_redirect(dev, &xdp, xdp_prog);
 			if (err)
 				goto err_xdp;
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_REDIR;
 			rcu_read_unlock();
 			goto xdp_xmit;
 		default:
@@ -723,7 +727,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 void *buf,
 					 void *ctx,
 					 unsigned int len,
-					 bool *xdp_xmit)
+					 unsigned int *xdp_xmit)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
@@ -818,7 +822,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					put_page(xdp_page);
 				goto err_xdp;
 			}
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_TX;
 			if (unlikely(xdp_page != page))
 				put_page(page);
 			rcu_read_unlock();
@@ -830,7 +834,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					put_page(xdp_page);
 				goto err_xdp;
 			}
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_REDIR;
 			if (unlikely(xdp_page != page))
 				put_page(page);
 			rcu_read_unlock();
@@ -939,7 +943,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 }
 
 static int receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
-		       void *buf, unsigned int len, void **ctx, bool *xdp_xmit)
+		       void *buf, unsigned int len, void **ctx,
+		       unsigned int *xdp_xmit)
 {
 	struct net_device *dev = vi->dev;
 	struct sk_buff *skb;
@@ -1232,7 +1237,8 @@ static void refill_work(struct work_struct *work)
 	}
 }
 
-static int virtnet_receive(struct receive_queue *rq, int budget, bool *xdp_xmit)
+static int virtnet_receive(struct receive_queue *rq, int budget,
+			   unsigned int *xdp_xmit)
 {
 	struct virtnet_info *vi = rq->vq->vdev->priv;
 	unsigned int len, received = 0, bytes = 0;
@@ -1321,7 +1327,7 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	struct virtnet_info *vi = rq->vq->vdev->priv;
 	struct send_queue *sq;
 	unsigned int received, qp;
-	bool xdp_xmit = false;
+	unsigned int xdp_xmit = 0;
 
 	virtnet_poll_cleantx(rq);
 
@@ -1331,12 +1337,14 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	if (received < budget)
 		virtqueue_napi_complete(napi, rq->vq, received);
 
-	if (xdp_xmit) {
+	if (xdp_xmit & VIRTIO_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & VIRTIO_XDP_TX) {
 		qp = vi->curr_queue_pairs - vi->xdp_queue_pairs +
 		     smp_processor_id();
 		sq = &vi->sq[qp];
 		virtqueue_kick(sq->vq);
-		xdp_do_flush_map();
 	}
 
 	return received;

^ permalink raw reply related

* [PATCH 2/3] i40e: split XDP_TX tail and XDP_REDIRECT map flushing
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov
In-Reply-To: <153002741940.15389.10466368482771753300.stgit@firesoul>

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

It looks like the mistake was copy-pasted from ixgbe.

Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 8ffb7454e67c..c1c027743159 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2200,9 +2200,10 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
 	return true;
 }
 
-#define I40E_XDP_PASS 0
-#define I40E_XDP_CONSUMED 1
-#define I40E_XDP_TX 2
+#define I40E_XDP_PASS		0
+#define I40E_XDP_CONSUMED	BIT(0)
+#define I40E_XDP_TX		BIT(1)
+#define I40E_XDP_REDIR		BIT(2)
 
 static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
 			      struct i40e_ring *xdp_ring);
@@ -2249,7 +2250,7 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 		break;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? I40E_XDP_TX : I40E_XDP_CONSUMED;
+		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2312,7 +2313,8 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct sk_buff *skb = rx_ring->skb;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
-	bool failure = false, xdp_xmit = false;
+	unsigned int xdp_xmit = 0;
+	bool failure = false;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
@@ -2373,8 +2375,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		}
 
 		if (IS_ERR(skb)) {
-			if (PTR_ERR(skb) == -I40E_XDP_TX) {
-				xdp_xmit = true;
+			unsigned int xdp_res = -PTR_ERR(skb);
+
+			if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
 				i40e_rx_buffer_flip(rx_ring, rx_buffer, size);
 			} else {
 				rx_buffer->pagecnt_bias++;
@@ -2428,12 +2432,14 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		total_rx_packets++;
 	}
 
-	if (xdp_xmit) {
+	if (xdp_xmit & I40E_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & I40E_XDP_TX) {
 		struct i40e_ring *xdp_ring =
 			rx_ring->vsi->xdp_rings[rx_ring->queue_index];
 
 		i40e_xdp_ring_update_tail(xdp_ring);
-		xdp_do_flush_map();
 	}
 
 	rx_ring->skb = skb;

^ permalink raw reply related

* [PATCH 1/3] ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov
In-Reply-To: <153002741940.15389.10466368482771753300.stgit@firesoul>

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4929f7265598..5f8a969638b2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2186,9 +2186,10 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
-#define IXGBE_XDP_PASS 0
-#define IXGBE_XDP_CONSUMED 1
-#define IXGBE_XDP_TX 2
+#define IXGBE_XDP_PASS		0
+#define IXGBE_XDP_CONSUMED	BIT(0)
+#define IXGBE_XDP_TX		BIT(1)
+#define IXGBE_XDP_REDIR		BIT(2)
 
 static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 			       struct xdp_frame *xdpf);
@@ -2225,7 +2226,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
 		if (!err)
-			result = IXGBE_XDP_TX;
+			result = IXGBE_XDP_REDIR;
 		else
 			result = IXGBE_XDP_CONSUMED;
 		break;
@@ -2285,7 +2286,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	unsigned int mss = 0;
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
-	bool xdp_xmit = false;
+	unsigned int xdp_xmit = 0;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
@@ -2328,8 +2329,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 		if (IS_ERR(skb)) {
-			if (PTR_ERR(skb) == -IXGBE_XDP_TX) {
-				xdp_xmit = true;
+			unsigned int xdp_res = -PTR_ERR(skb);
+
+			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
 				ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size);
 			} else {
 				rx_buffer->pagecnt_bias++;
@@ -2401,7 +2404,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		total_rx_packets++;
 	}
 
-	if (xdp_xmit) {
+	if (xdp_xmit & IXGBE_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & IXGBE_XDP_TX) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 
 		/* Force memory writes to complete before letting h/w
@@ -2409,8 +2415,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 */
 		wmb();
 		writel(ring->next_to_use, ring->tail);
-
-		xdp_do_flush_map();
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox