Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 net-next 2/4] selftests: rtnetlink: use dummydev as a test device
From: Shannon Nelson @ 2018-06-25 23:41 UTC (permalink / raw)
  To: davem, netdev, jakub.kicinski; +Cc: anders.roxell, linux-kselftest
In-Reply-To: <1529970096-23199-1-git-send-email-shannon.nelson@oracle.com>

We really shouldn't mess with local system settings, so let's
use the already created dummy device instead for ipsec testing.
Oh, and let's put the temp file into a proper directory.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 tools/testing/selftests/net/rtnetlink.sh | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 261a981..15948cf 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -523,21 +523,19 @@ kci_test_macsec()
 kci_test_ipsec()
 {
 	ret=0
-
-	# find an ip address on this machine and make up a destination
-	srcip=`ip -o addr | awk '/inet / { print $4; }' | grep -v "^127" | head -1 | cut -f1 -d/`
-	net=`echo $srcip | cut -f1-3 -d.`
-	base=`echo $srcip | cut -f4 -d.`
-	dstip="$net."`expr $base + 1`
-
 	algo="aead rfc4106(gcm(aes)) 0x3132333435363738393031323334353664636261 128"
+	srcip=192.168.123.1
+	dstip=192.168.123.2
+	spi=7
+
+	ip addr add $srcip dev $devdummy
 
 	# flush to be sure there's nothing configured
 	ip x s flush ; ip x p flush
 	check_err $?
 
 	# start the monitor in the background
-	tmpfile=`mktemp ipsectestXXX`
+	tmpfile=`mktemp /var/run/ipsectestXXX`
 	mpid=`(ip x m > $tmpfile & echo $!) 2>/dev/null`
 	sleep 0.2
 
@@ -601,6 +599,7 @@ kci_test_ipsec()
 	check_err $?
 	ip x p flush
 	check_err $?
+	ip addr del $srcip/32 dev $devdummy
 
 	if [ $ret -ne 0 ]; then
 		echo "FAIL: ipsec"
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 net-next 0/4] Updates for ipsec selftests
From: Shannon Nelson @ 2018-06-25 23:41 UTC (permalink / raw)
  To: davem, netdev, jakub.kicinski; +Cc: anders.roxell, linux-kselftest

Fix up the existing ipsec selftest and add tests for
the ipsec offload driver API.

v2: addressed formatting nits in netdevsim from Jakub Kicinski

Shannon Nelson (4):
  selftests: rtnetlink: clear the return code at start of ipsec test
  selftests: rtnetlink: use dummydev as a test device
  netdevsim: add ipsec offload testing
  selftests: rtnetlink: add ipsec offload API test

 drivers/net/netdevsim/Makefile           |   4 +
 drivers/net/netdevsim/ipsec.c            | 345 +++++++++++++++++++++++++++++++
 drivers/net/netdevsim/netdev.c           |   7 +
 drivers/net/netdevsim/netdevsim.h        |  37 ++++
 tools/testing/selftests/net/rtnetlink.sh | 132 +++++++++++-
 5 files changed, 518 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/netdevsim/ipsec.c

-- 
2.7.4

^ permalink raw reply

* [PATCH v2] fib_rules: match rules based on suppress_* properties too
From: Jason A. Donenfeld @ 2018-06-25 23:39 UTC (permalink / raw)
  To: roopa, netdev; +Cc: Jason A. Donenfeld
In-Reply-To: <CAJieiUiqUCRBGTqkCWQTFJuZa7AvcoAzPAbJc3LyQT6WjBan+A@mail.gmail.com>

Two rules with different values of suppress_prefix or suppress_ifgroup
are not the same. This fixes an -EEXIST when running:

   $ ip -4 rule add table main suppress_prefixlength 0

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
---
This adds the new condition you mentioned. I'm not sure what you make of
DaveM's remark about this not being in the original code, but here is
nonetheless the requested change.

 net/core/fib_rules.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 126ffc5bc630..bc8425d81022 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -416,6 +416,14 @@ static struct fib_rule *rule_find(struct fib_rules_ops *ops,
 		if (rule->mark && r->mark != rule->mark)
 			continue;
 
+		if (rule->suppress_ifgroup != -1 &&
+		    r->suppress_ifgroup != rule->suppress_ifgroup)
+			continue;
+
+		if (rule->suppress_prefixlen != -1 &&
+		    r->suppress_prefixlen != rule->suppress_prefixlen)
+			continue;
+
 		if (rule->mark_mask && r->mark_mask != rule->mark_mask)
 			continue;
 
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH net 2/2] nfp: reject binding to shared blocks
From: kbuild test robot @ 2018-06-25 23:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kbuild-all, davem, oss-drivers, netdev, John Hurley,
	Jakub Kicinski
In-Reply-To: <20180625202246.28871-3-jakub.kicinski@netronome.com>

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]

Hi John,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Jakub-Kicinski/nfp-MPLS-and-shared-blocks-TC-offload-fixes/20180626-042358
config: i386-randconfig-x001-201825 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/netronome/nfp/flower/offload.c: In function 'nfp_flower_setup_tc_block':
>> drivers/net/ethernet/netronome/nfp/flower/offload.c:634:6: error: implicit declaration of function 'tcf_block_shared'; did you mean 'tcf_block_dev'? [-Werror=implicit-function-declaration]
     if (tcf_block_shared(f->block))
         ^~~~~~~~~~~~~~~~
         tcf_block_dev
   cc1: some warnings being treated as errors

vim +634 drivers/net/ethernet/netronome/nfp/flower/offload.c

   625	
   626	static int nfp_flower_setup_tc_block(struct net_device *netdev,
   627					     struct tc_block_offload *f)
   628	{
   629		struct nfp_repr *repr = netdev_priv(netdev);
   630	
   631		if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
   632			return -EOPNOTSUPP;
   633	
 > 634		if (tcf_block_shared(f->block))
   635			return -EOPNOTSUPP;
   636	
   637		switch (f->command) {
   638		case TC_BLOCK_BIND:
   639			return tcf_block_cb_register(f->block,
   640						     nfp_flower_setup_tc_block_cb,
   641						     repr, repr);
   642		case TC_BLOCK_UNBIND:
   643			tcf_block_cb_unregister(f->block,
   644						nfp_flower_setup_tc_block_cb,
   645						repr);
   646			return 0;
   647		default:
   648			return -EOPNOTSUPP;
   649		}
   650	}
   651	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33423 bytes --]

^ permalink raw reply

* [PATCH 3/4] net: lan78xx: Add support for VLAN tag stripping.
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

The chip supports stripping the VLAN tag and reporting it
in metadata.
Complete the support for this.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index afe7fa3..f72a8f5 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -64,6 +64,7 @@
 #define DEFAULT_RX_CSUM_ENABLE		(true)
 #define DEFAULT_TSO_CSUM_ENABLE		(true)
 #define DEFAULT_VLAN_FILTER_ENABLE	(true)
+#define DEFAULT_VLAN_RX_OFFLOAD		(true)
 #define TX_OVERHEAD			(8)
 #define RXW_PADDING			2
 
@@ -2363,6 +2364,11 @@ static int lan78xx_set_features(struct net_device *netdev,
 		pdata->rfe_ctl &= ~(RFE_CTL_ICMP_COE_ | RFE_CTL_IGMP_COE_);
 	}
 
+	if (features & NETIF_F_HW_VLAN_CTAG_RX)
+		pdata->rfe_ctl |= RFE_CTL_VLAN_STRIP_;
+	else
+		pdata->rfe_ctl &= ~RFE_CTL_VLAN_STRIP_;
+
 	if (features & NETIF_F_HW_VLAN_CTAG_FILTER)
 		pdata->rfe_ctl |= RFE_CTL_VLAN_FILTER_;
 	else
@@ -2976,6 +2982,9 @@ static int lan78xx_bind(struct lan78xx_net *dev, struct usb_interface *intf)
 	if (DEFAULT_TSO_CSUM_ENABLE)
 		dev->net->features |= NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_SG;
 
+	if (DEFAULT_VLAN_RX_OFFLOAD)
+		dev->net->features |= NETIF_F_HW_VLAN_CTAG_RX;
+
 	if (DEFAULT_VLAN_FILTER_ENABLE)
 		dev->net->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
@@ -3052,6 +3061,16 @@ static void lan78xx_rx_csum_offload(struct lan78xx_net *dev,
 	}
 }
 
+static void lan78xx_rx_vlan_offload(struct lan78xx_net *dev,
+				    struct sk_buff *skb,
+				    u32 rx_cmd_a, u32 rx_cmd_b)
+{
+	if ((dev->net->features & NETIF_F_HW_VLAN_CTAG_RX) &&
+	    (rx_cmd_a & RX_CMD_A_FVTG_))
+		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+				       (rx_cmd_b & 0xffff));
+}
+
 static void lan78xx_skb_return(struct lan78xx_net *dev, struct sk_buff *skb)
 {
 	int		status;
@@ -3116,6 +3135,8 @@ static int lan78xx_rx(struct lan78xx_net *dev, struct sk_buff *skb)
 			if (skb->len == size) {
 				lan78xx_rx_csum_offload(dev, skb,
 							rx_cmd_a, rx_cmd_b);
+				lan78xx_rx_vlan_offload(dev, skb,
+							rx_cmd_a, rx_cmd_b);
 
 				skb_trim(skb, skb->len - 4); /* remove fcs */
 				skb->truesize = size + sizeof(struct sk_buff);
@@ -3134,6 +3155,7 @@ static int lan78xx_rx(struct lan78xx_net *dev, struct sk_buff *skb)
 			skb_set_tail_pointer(skb2, size);
 
 			lan78xx_rx_csum_offload(dev, skb2, rx_cmd_a, rx_cmd_b);
+			lan78xx_rx_vlan_offload(dev, skb2, rx_cmd_a, rx_cmd_b);
 
 			skb_trim(skb2, skb2->len - 4); /* remove fcs */
 			skb2->truesize = size + sizeof(struct sk_buff);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v2 net-next] net/sched: add skbprio scheduler
From: Nishanth Devarajan @ 2018-06-25 23:27 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, Jiri Pirko, David Miller, netdev, doucette,
	michel
In-Reply-To: <CAM_iQpV9uiSA=T2ZN=gaJSuzB+dci9S9FiUx=dAQYomPORfjLg@mail.gmail.com>

On Sat, Jun 23, 2018 at 02:43:16PM -0700, Cong Wang wrote:
> On Sat, Jun 23, 2018 at 1:47 PM, Nishanth Devarajan <ndev2021@gmail.com> wrote:
> > diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> > index 37b5096..6fd07e8 100644
> > --- a/include/uapi/linux/pkt_sched.h
> > +++ b/include/uapi/linux/pkt_sched.h
> ...
> > +#define SKBPRIO_MAX_PRIORITY 64
> > +
> > +struct tc_skbprio_qopt {
> > +       __u32   limit;          /* Queue length in packets. */
> > +};
> 
> 
> Since this is just an integer, you can just make it NLA_U32 instead
> of a struct?
> 
>

Making it NLA_U32, wouldn't that be incurring a nla_policy struct in the
code? I also feel uneasy that we'd be straying convention of having a tc qopt
struct to pass in essential parameters from userspace.

> > +static int skbprio_change(struct Qdisc *sch, struct nlattr *opt,
> > +                       struct netlink_ext_ack *extack)
> > +{
> > +       struct skbprio_sched_data *q = qdisc_priv(sch);
> > +       struct tc_skbprio_qopt *ctl = nla_data(opt);
> > +       const unsigned int min_limit = 1;
> > +
> > +       if (ctl->limit == (typeof(ctl->limit))-1)
> > +               q->max_limit = max(qdisc_dev(sch)->tx_queue_len, min_limit);
> > +       else if (ctl->limit < min_limit ||
> > +               ctl->limit > qdisc_dev(sch)->tx_queue_len)
> > +               return -EINVAL;
> > +       else
> > +               q->max_limit = ctl->limit;
> > +
> > +       return 0;
> > +}
> 
> Isn't q->max_limit same with sch->limit?
>

q->max_limit was intended to represent the maximum limit that Skbprio could
accomodate i.e the tx queue len of the device attached to the qdisc, to check
the limit parameter passed from userspace. I'll correct this in v3.
 
> Also, please avoid dev->tx_queue_len here, it may change
> independently of your qdisc change, unless you want to implement
> ops->change_tx_queue_len().

OK, will make this change.

^ permalink raw reply

* [PATCH net-next] selftests: forwarding: mirror_gre_vlan_bridge_1q: Unset rp_filter
From: Petr Machata @ 2018-06-25 23:20 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah

The IP addresses of tunnel endpoint at H3 are set at the VLAN device
$h3.555. Therefore when test_gretap_untagged_egress() sets vlan 555 to
egress untagged at $swp3, $h3's rp_filter rejects these packets. The
test then spuriously fails.

Therefore turn off net.ipv4.conf.{all, $h3}.rp_filter.

Fixes: 9c7c8a82442c ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 .../selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh        | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh b/tools/testing/selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh
index 5dbc7a08f4bd..1ac5038ae256 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh
@@ -39,6 +39,12 @@ setup_prepare()
 	swp3=${NETIFS[p5]}
 	h3=${NETIFS[p6]}
 
+	# gt4's remote address is at $h3.555, not $h3. Thus the packets arriving
+	# directly to $h3 for test_gretap_untagged_egress() are rejected by
+	# rp_filter and the test spuriously fails.
+	sysctl_set net.ipv4.conf.all.rp_filter 0
+	sysctl_set net.ipv4.conf.$h3.rp_filter 0
+
 	vrf_prepare
 	mirror_gre_topo_create
 
@@ -65,6 +71,9 @@ cleanup()
 
 	mirror_gre_topo_destroy
 	vrf_cleanup
+
+	sysctl_restore net.ipv4.conf.$h3.rp_filter
+	sysctl_restore net.ipv4.conf.all.rp_filter
 }
 
 test_vlan_match()
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH net-next 3/4] netdevsim: add ipsec offload testing
From: Jakub Kicinski @ 2018-06-25 23:09 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: davem, netdev, anders.roxell, linux-kselftest
In-Reply-To: <c9658422-13a3-b926-cc18-c759a1a0d343@oracle.com>

On Mon, 25 Jun 2018 15:37:10 -0700, Shannon Nelson wrote:
> On 6/22/2018 9:07 PM, Jakub Kicinski wrote:
> > On Fri, 22 Jun 2018 17:31:37 -0700, Shannon Nelson wrote:  
> >> Implement the IPsec/XFRM offload API for testing.
> >>
> >> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>  
> >> +#define NSIM_IPSEC_AUTH_BITS	128
> >> +
> >> +/**
> >> + * nsim_ipsec_dbg_read - read for ipsec data
> >> + * @filp: the opened file
> >> + * @buffer: where to write the data for the user to read
> >> + * @count: the size of the user's buffer
> >> + * @ppos: file position offset
> >> + **/
> >> +static ssize_t nsim_dbg_netdev_ops_read(struct file *filp,  
> > 
> > Doesn't match the kdoc.  Please run
> > 
> > ./scripts/kernel-doc -none $file
> > 
> > if you want kdoc.  Although IMHO you may as well drop the kdoc, your
> > code is quite self explanatory and local.  
> 
> By adding -v to that I got a couple of warnings that I didn't include 
> the Return information - is that what you were commenting on?  The rest 
> seems acceptable to the script I'm using from the net-next tree.

Hm, strange.  Two things: first kdoc requires () after function name,
second the function is called nsim_dbg_netdev_ops_read() while the doc
refers to nsim_ipsec_dbg_read().  Perhaps the combination of the two
makes the script miss the problem. 

> >> +					char __user *buffer,
> >> +					size_t count, loff_t *ppos)
> >> +{
> >> +	struct netdevsim *ns = filp->private_data;
> >> +	struct nsim_ipsec *ipsec = &ns->ipsec;
> >> +	size_t bufsize;
> >> +	char *buf, *p;
> >> +	int len;
> >> +	int i;
> >> +
> >> +	/* don't allow partial reads */
> >> +	if (*ppos != 0)
> >> +		return 0;
> >> +
> >> +	/* the buffer needed is
> >> +	 * (num SAs * 3 lines each * ~60 bytes per line) + one more line
> >> +	 */
> >> +	bufsize = (ipsec->count * 4 * 60) + 60;
> >> +	buf = kzalloc(bufsize, GFP_KERNEL);
> >> +	if (!buf)
> >> +		return -ENOMEM;
> >> +
> >> +	p = buf;
> >> +	p += snprintf(p, bufsize - (p - buf),
> >> +		      "SA count=%u tx=%u\n",
> >> +		      ipsec->count, ipsec->tx);
> >> +
> >> +	for (i = 0; i < NSIM_IPSEC_MAX_SA_COUNT; i++) {
> >> +		struct nsim_sa *sap = &ipsec->sa[i];
> >> +
> >> +		if (!sap->used)
> >> +			continue;
> >> +
> >> +		p += snprintf(p, bufsize - (p - buf),
> >> +			      "sa[%i] %cx ipaddr=0x%08x %08x %08x %08x\n",
> >> +			      i, (sap->rx ? 'r' : 't'), sap->ipaddr[0],
> >> +			      sap->ipaddr[1], sap->ipaddr[2], sap->ipaddr[3]);
> >> +		p += snprintf(p, bufsize - (p - buf),
> >> +			      "sa[%i]    spi=0x%08x proto=0x%x salt=0x%08x crypt=%d\n",
> >> +			      i, be32_to_cpu(sap->xs->id.spi),
> >> +			      sap->xs->id.proto, sap->salt, sap->crypt);
> >> +		p += snprintf(p, bufsize - (p - buf),
> >> +			      "sa[%i]    key=0x%08x %08x %08x %08x\n",
> >> +			      i, sap->key[0], sap->key[1],
> >> +			      sap->key[2], sap->key[3]);
> >> +	}
> >> +
> >> +	len = simple_read_from_buffer(buffer, count, ppos, buf, p - buf);  
> > 
> > Why not seq_file for this?  
> 
> Why bother with more interface code?  This is useful enough to support 
> the API testing needed.

No objection on this, seq_file is less error prone, but I don't mind.
FWIW you can drop the *ppos == 0 requirement, simple_read_from_buffer()
will handle other cases just fine.

> >> +	kfree(buf);
> >> +	return len;
> >> +}
> >> +
> >> +static const struct file_operations ipsec_dbg_fops = {
> >> +	.owner = THIS_MODULE,
> >> +	.open = simple_open,
> >> +	.read = nsim_dbg_netdev_ops_read,
> >> +};
> >> +
> >> +/**
> >> + * nsim_ipsec_find_empty_idx - find the first unused security parameter index
> >> + * @ipsec: pointer to ipsec struct
> >> + **/
> >> +static int nsim_ipsec_find_empty_idx(struct nsim_ipsec *ipsec)
> >> +{
> >> +	u32 i;
> >> +
> >> +	if (ipsec->count == NSIM_IPSEC_MAX_SA_COUNT)
> >> +		return -ENOSPC;
> >> +
> >> +	/* search sa table */
> >> +	for (i = 0; i < NSIM_IPSEC_MAX_SA_COUNT; i++) {
> >> +		if (!ipsec->sa[i].used)
> >> +			return i;
> >> +	}
> >> +
> >> +	return -ENOSPC;  
> > 
> > FWIW I personally find bitmaps and find_first_zero_bit() etc. nice and
> > concise for a small ID allocator, but no objection to open coding.  
> 
> Sure, we could add a parallel bitmap data structure to track usage of 
> our array elements, and probably would for a much larger array so as to 
> lessen the impact of a serial search.  But, since this is a short array 
> for simple testing purposes, the search time is minimal so I think the 
> simple code is fine.

Ack, no objection.

> >   
> >> +	} else if (key_len == 128) {
> >> +		*mysalt = 0;
> >> +	} else {
> >> +		netdev_err(dev, "IPsec hw offload only supports 128 bit keys with optional 32 bit salt\n");
> >> +		return -EINVAL;
> >> +	}
> >> +	memcpy(mykey, key_data, 16);
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * nsim_ipsec_add_sa - program device with a security association
> >> + * @xs: pointer to transformer state struct
> >> + **/
> >> +static int nsim_ipsec_add_sa(struct xfrm_state *xs)
> >> +{
> >> +	struct net_device *dev = xs->xso.dev;
> >> +	struct netdevsim *ns = netdev_priv(dev);
> >> +	struct nsim_ipsec *ipsec = &ns->ipsec;  
> > 
> > xmas tree again (initialize out of line if you have to)  
> 
> This one is pretty much the way I've done in the past with no complaints 
> and seems common enough in other net drivers, specifically when dealing 
> with netdev and netdevpriv elements.  Only the first line is out of 
> place, with the next lines dependent on it.

I know, but I'd really prefer you just followed the rule here.

> >> diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
> >> index 3a8581a..1708dee 100644
> >> --- a/drivers/net/netdevsim/netdevsim.h
> >> +++ b/drivers/net/netdevsim/netdevsim.h
> >> @@ -29,6 +29,29 @@ struct bpf_prog;
> >>   struct dentry;
> >>   struct nsim_vf_config;
> >>   
> >> +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD)
> >> +#define NSIM_IPSEC_MAX_SA_COUNT		33  
> > 
> > 33 caught my eye - out of curiosity is it 2^5 + 1 to catch some type of
> > bug or failure mode?  
> 
> For test rigs, I often use something like this to help flush out any 
> interesting power-of-two or alignment assumptions in the code.  I don't 
> expect anything here, but it doesn't hurt.

Cool, seems like a good idea anyway!

> >> +#define NSIM_IPSEC_VALID		BIT(31)
> >> +
> >> +struct nsim_sa {
> >> +	struct xfrm_state *xs;
> >> +	__be32 ipaddr[4];
> >> +	u32 key[4];
> >> +	u32 salt;
> >> +	bool used;
> >> +	bool crypt;
> >> +	bool rx;
> >> +};
> >> +
> >> +struct nsim_ipsec {
> >> +	struct nsim_sa sa[NSIM_IPSEC_MAX_SA_COUNT];
> >> +	struct dentry *pfile;
> >> +	u32 count;
> >> +	u32 tx;
> >> +	u32 ok;
> >> +};
> >> +#endif  
> > 
> > No need to wrap struct definitions in #if/#endif.  
> 
> I suppose this is a philosophical point... Since CONFIG_XFRM_OFFLOAD is 
> not yet a common config setting, I'd like to keep it here to not break 
> other folks' builds or dirty them up with unused struct definitions when 
> they aren't playing with IPsec offload anyway.

The fewer ifdefs in the code the better.  Both from pure LoC stand
point but also because someone may actually change things around
without having CONFIG_XFRM_OFFLOAD enabled and break *your*
configuration.  E.g. you are depending on indirect include of
net/xfrm.h, what if someone was to change headers around and broke that
implicit dependency?  The fewer ifdefs the better.

^ permalink raw reply

* Re: [PATCH 07/14] net: davinci_emac: use nvmem to retrieve the mac address
From: Grygorii Strashko @ 2018-06-25 23:09 UTC (permalink / raw)
  To: Bartosz Golaszewski, Sekhar Nori, Kevin Hilman, Russell King,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman
  Cc: netdev, linux-omap, linux-kernel, linux-arm-kernel,
	Bartosz Golaszewski
In-Reply-To: <20180625155025.12567-8-brgl@bgdev.pl>



On 06/25/2018 10:50 AM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> 
> All users which store the MAC address in EEPROM now register relevant
> nvmem cells. Switch to retrieving the MAC address over the nvmem
> framework.
> 
> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> ---
>   drivers/net/ethernet/ti/davinci_emac.c | 29 +++++++++++++++++---------
>   1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
> index a1a6445b5a7e..22c2322e46be 100644
> --- a/drivers/net/ethernet/ti/davinci_emac.c
> +++ b/drivers/net/ethernet/ti/davinci_emac.c
> @@ -67,7 +67,7 @@
>   #include <linux/of_irq.h>
>   #include <linux/of_net.h>
>   #include <linux/mfd/syscon.h>
> -
> +#include <linux/nvmem-consumer.h>
>   #include <asm/irq.h>
>   #include <asm/page.h>
>   
> @@ -1696,7 +1696,6 @@ davinci_emac_of_get_pdata(struct platform_device *pdev, struct emac_priv *priv)
>   	const struct of_device_id *match;
>   	const struct emac_platform_data *auxdata;
>   	struct emac_platform_data *pdata = NULL;
> -	const u8 *mac_addr;
>   
>   	if (!IS_ENABLED(CONFIG_OF) || !pdev->dev.of_node)
>   		return dev_get_platdata(&pdev->dev);
> @@ -1708,12 +1707,6 @@ davinci_emac_of_get_pdata(struct platform_device *pdev, struct emac_priv *priv)
>   	np = pdev->dev.of_node;
>   	pdata->version = EMAC_VERSION_2;
>   
> -	if (!is_valid_ether_addr(pdata->mac_addr)) {
> -		mac_addr = of_get_mac_address(np);
> -		if (mac_addr)
> -			ether_addr_copy(pdata->mac_addr, mac_addr);
> -	}
> -

Not sure it is correct. of_get_mac_address() is called when board data doesn't
provide MAC address with expectation that MAC address is defined in DT (usually by u-boot).
Standard DT properties for MAC address "local-mac-address" and "mac-address".

So, you can't just drop it and replace with nvmem.

Two options are here:
1) try to read MAC-address from nvmem then try DT (as it's now)
2) try to read DT then try nvmem [then random MAC | fail]


>   	of_property_read_u32(np, "ti,davinci-ctrl-reg-offset",
>   			     &pdata->ctrl_reg_offset);
>   
> @@ -1783,7 +1776,9 @@ static int davinci_emac_probe(struct platform_device *pdev)
>   	struct cpdma_params dma_params;
>   	struct clk *emac_clk;
>   	unsigned long emac_bus_frequency;
> -
> +	struct nvmem_cell *cell;
> +	void *mac_addr;
> +	size_t mac_addr_len;
>   
>   	/* obtain emac clock from kernel */
>   	emac_clk = devm_clk_get(&pdev->dev, NULL);
> @@ -1815,8 +1810,22 @@ static int davinci_emac_probe(struct platform_device *pdev)
>   		goto err_free_netdev;
>   	}
>   
> +	cell = nvmem_cell_get(&pdev->dev, "mac-address");
> +	if (!IS_ERR(cell)) {
> +		mac_addr = nvmem_cell_read(cell, &mac_addr_len);
> +		if (!IS_ERR(mac_addr)) {
> +			if (is_valid_ether_addr(mac_addr)) {
> +				dev_info(&pdev->dev,
> +					 "Read MAC addr from EEPROM: %pM\n",
> +					 mac_addr);
> +				memcpy(priv->mac_addr, mac_addr, ETH_ALEN);
> +			}
> +			kfree(mac_addr);
> +		}
> +		nvmem_cell_put(cell);
> +	}
> +
>   	/* MAC addr and PHY mask , RMII enable info from platform_data */
> -	memcpy(priv->mac_addr, pdata->mac_addr, ETH_ALEN);
>   	priv->phy_id = pdata->phy_id;
>   	priv->rmii_en = pdata->rmii_en;
>   	priv->version = pdata->version;
> 

-- 
regards,
-grygorii

^ permalink raw reply

* [net-next PATCH v4 7/7] Documentation: Add explanation for XPS using Rx-queue(s) map
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom
In-Reply-To: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 Documentation/ABI/testing/sysfs-class-net-queues |   11 ++++
 Documentation/networking/scaling.txt             |   57 ++++++++++++++++++----
 2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-net-queues b/Documentation/ABI/testing/sysfs-class-net-queues
index 0c0df91..978b763 100644
--- a/Documentation/ABI/testing/sysfs-class-net-queues
+++ b/Documentation/ABI/testing/sysfs-class-net-queues
@@ -42,6 +42,17 @@ Description:
 		network device transmit queue. Possible vaules depend on the
 		number of available CPU(s) in the system.
 
+What:		/sys/class/<iface>/queues/tx-<queue>/xps_rxqs
+Date:		June 2018
+KernelVersion:	4.18.0
+Contact:	netdev@vger.kernel.org
+Description:
+		Mask of the receive queue(s) currently enabled to participate
+		into the Transmit Packet Steering packet processing flow for this
+		network device transmit queue. Possible values depend on the
+		number of available receive queue(s) in the network device.
+		Default is disabled.
+
 What:		/sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time
 Date:		November 2011
 KernelVersion:	3.3
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index f55639d..8336116 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -366,8 +366,13 @@ XPS: Transmit Packet Steering
 
 Transmit Packet Steering is a mechanism for intelligently selecting
 which transmit queue to use when transmitting a packet on a multi-queue
-device. To accomplish this, a mapping from CPU to hardware queue(s) is
-recorded. The goal of this mapping is usually to assign queues
+device. This can be accomplished by recording two kinds of maps, either
+a mapping of CPU to hardware queue(s) or a mapping of receive queue(s)
+to hardware transmit queue(s).
+
+1. XPS using CPUs map
+
+The goal of this mapping is usually to assign queues
 exclusively to a subset of CPUs, where the transmit completions for
 these queues are processed on a CPU within this set. This choice
 provides two benefits. First, contention on the device queue lock is
@@ -377,12 +382,35 @@ transmit queue). Secondly, cache miss rate on transmit completion is
 reduced, in particular for data cache lines that hold the sk_buff
 structures.
 
-XPS is configured per transmit queue by setting a bitmap of CPUs that
-may use that queue to transmit. The reverse mapping, from CPUs to
-transmit queues, is computed and maintained for each network device.
-When transmitting the first packet in a flow, the function
-get_xps_queue() is called to select a queue. This function uses the ID
-of the running CPU as a key into the CPU-to-queue lookup table. If the
+2. XPS using receive queues map
+
+This mapping is used to pick transmit queue based on the receive
+queue(s) map configuration set by the administrator. A set of receive
+queues can be mapped to a set of transmit queues (many:many), although
+the common use case is a 1:1 mapping. This will enable sending packets
+on the same queue associations for transmit and receive. This is useful for
+busy polling multi-threaded workloads where there are challenges in
+associating a given CPU to a given application thread. The application
+threads are not pinned to CPUs and each thread handles packets
+received on a single queue. The receive queue number is cached in the
+socket for the connection. In this model, sending the packets on the same
+transmit queue corresponding to the associated receive queue has benefits
+in keeping the CPU overhead low. Transmit completion work is locked into
+the same queue-association that a given application is polling on. This
+avoids the overhead of triggering an interrupt on another CPU. When the
+application cleans up the packets during the busy poll, transmit completion
+may be processed along with it in the same thread context and so result in
+reduced latency.
+
+XPS is configured per transmit queue by setting a bitmap of
+CPUs/receive-queues that may use that queue to transmit. The reverse
+mapping, from CPUs to transmit queues or from receive-queues to transmit
+queues, is computed and maintained for each network device. When
+transmitting the first packet in a flow, the function get_xps_queue() is
+called to select a queue. This function uses the ID of the receive queue
+for the socket connection for a match in the receive queue-to-transmit queue
+lookup table. Alternatively, this function can also use the ID of the
+running CPU as a key into the CPU-to-queue lookup table. If the
 ID matches a single queue, that is used for transmission. If multiple
 queues match, one is selected by using the flow hash to compute an index
 into the set.
@@ -404,11 +432,15 @@ acknowledged.
 
 XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
 default for SMP). The functionality remains disabled until explicitly
-configured. To enable XPS, the bitmap of CPUs that may use a transmit
-queue is configured using the sysfs file entry:
+configured. To enable XPS, the bitmap of CPUs/receive-queues that may
+use a transmit queue is configured using the sysfs file entry:
 
+For selection based on CPUs map:
 /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
 
+For selection based on receive-queues map:
+/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
+
 == Suggested Configuration
 
 For a network device with a single transmission queue, XPS configuration
@@ -421,6 +453,11 @@ best CPUs to share a given queue are probably those that share the cache
 with the CPU that processes transmit completions for that queue
 (transmit interrupts).
 
+For transmit queue selection based on receive queue(s), XPS has to be
+explicitly configured mapping receive-queue(s) to transmit queue(s). If the
+user configuration for receive-queue map does not apply, then the transmit
+queue is selected based on the CPUs map.
+
 Per TX Queue rate limitation:
 =============================
 

^ permalink raw reply related

* [net-next PATCH v4 6/7] net-sysfs: Add interface for Rx queue(s) map per Tx queue
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom
In-Reply-To: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>

Extend transmit queue sysfs attribute to configure Rx queue(s) map
per Tx queue. By default no receive queues are configured for the
Tx queue.

- /sys/class/net/eth0/queues/tx-*/xps_rxqs

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 net/core/net-sysfs.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b39987c..5d2ed02 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1283,6 +1283,86 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
 
 static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
 	= __ATTR_RW(xps_cpus);
+
+static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
+{
+	struct net_device *dev = queue->dev;
+	struct xps_dev_maps *dev_maps;
+	unsigned long *mask, index;
+	int j, len, num_tc = 1, tc = 0;
+
+	mask = kcalloc(BITS_TO_LONGS(dev->num_rx_queues), sizeof(long),
+		       GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	if (dev->num_tc) {
+		num_tc = dev->num_tc;
+		tc = netdev_txq_to_tc(dev, index);
+		if (tc < 0)
+			return -EINVAL;
+	}
+
+	rcu_read_lock();
+	dev_maps = rcu_dereference(dev->xps_rxqs_map);
+	if (dev_maps) {
+		for (j = -1; j = attrmask_next(j, NULL, dev->num_rx_queues),
+		     j < dev->num_rx_queues;) {
+			int i, tci = j * num_tc + tc;
+			struct xps_map *map;
+
+			map = rcu_dereference(dev_maps->attr_map[tci]);
+			if (!map)
+				continue;
+
+			for (i = map->len; i--;) {
+				if (map->queues[i] == index) {
+					set_bit(j, mask);
+					break;
+				}
+			}
+		}
+	}
+
+	len = bitmap_print_to_pagebuf(false, buf, mask, dev->num_rx_queues);
+	rcu_read_unlock();
+	kfree(mask);
+
+	return len < PAGE_SIZE ? len : -EINVAL;
+}
+
+static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
+			      size_t len)
+{
+	struct net_device *dev = queue->dev;
+	unsigned long *mask, index;
+	int err;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	mask = kcalloc(BITS_TO_LONGS(dev->num_rx_queues), sizeof(long),
+		       GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	err = bitmap_parse(buf, len, mask, dev->num_rx_queues);
+	if (err) {
+		kfree(mask);
+		return err;
+	}
+
+	err = __netif_set_xps_queue(dev, mask, index, true);
+	kfree(mask);
+	return err ? : len;
+}
+
+static struct netdev_queue_attribute xps_rxqs_attribute __ro_after_init
+	= __ATTR_RW(xps_rxqs);
 #endif /* CONFIG_XPS */
 
 static struct attribute *netdev_queue_default_attrs[] __ro_after_init = {
@@ -1290,6 +1370,7 @@ static struct attribute *netdev_queue_default_attrs[] __ro_after_init = {
 	&queue_traffic_class.attr,
 #ifdef CONFIG_XPS
 	&xps_cpus_attribute.attr,
+	&xps_rxqs_attribute.attr,
 	&queue_tx_maxrate.attr,
 #endif
 	NULL

^ permalink raw reply related

* [net-next PATCH v4 5/7] net: Enable Tx queue selection based on Rx queues
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom
In-Reply-To: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>

This patch adds support to pick Tx queue based on the Rx queue(s) map
configuration set by the admin through the sysfs attribute
for each Tx queue. If the user configuration for receive queue(s) map
does not apply, then the Tx queue selection falls back to CPU(s) map
based selection and finally to hashing.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/net/sock.h |    4 +++
 net/core/dev.c     |   62 ++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 0ff4416..cb18139 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1710,6 +1710,10 @@ static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb)
 #endif
 }
 
+static inline int sk_rx_queue_get(const struct sock *sk)
+{
+	return sk ? sk->sk_rx_queue_mapping - 1 : -1;
+}
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 {
 	sk_tx_queue_clear(sk);
diff --git a/net/core/dev.c b/net/core/dev.c
index df2a78d..2450c5e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3454,35 +3454,63 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 }
 #endif /* CONFIG_NET_EGRESS */
 
-static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
+#ifdef CONFIG_XPS
+static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
+			       struct xps_dev_maps *dev_maps, unsigned int tci)
+{
+	struct xps_map *map;
+	int queue_index = -1;
+
+	if (dev->num_tc) {
+		tci *= dev->num_tc;
+		tci += netdev_get_prio_tc_map(dev, skb->priority);
+	}
+
+	map = rcu_dereference(dev_maps->attr_map[tci]);
+	if (map) {
+		if (map->len == 1)
+			queue_index = map->queues[0];
+		else
+			queue_index = map->queues[reciprocal_scale(
+						skb_get_hash(skb), map->len)];
+		if (unlikely(queue_index >= dev->real_num_tx_queues))
+			queue_index = -1;
+	}
+	return queue_index;
+}
+#endif
+
+static int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_XPS
 	struct xps_dev_maps *dev_maps;
-	struct xps_map *map;
+	struct sock *sk = skb->sk;
 	int queue_index = -1;
 
 	if (!static_key_false(&xps_needed))
 		return -1;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_cpus_map);
+	if (!static_key_false(&xps_rxqs_needed))
+		goto get_cpus_map;
+
+	dev_maps = rcu_dereference(dev->xps_rxqs_map);
 	if (dev_maps) {
-		unsigned int tci = skb->sender_cpu - 1;
+		int tci = sk_rx_queue_get(sk);
 
-		if (dev->num_tc) {
-			tci *= dev->num_tc;
-			tci += netdev_get_prio_tc_map(dev, skb->priority);
-		}
+		if (tci >= 0 && tci < dev->num_rx_queues)
+			queue_index = __get_xps_queue_idx(dev, skb, dev_maps,
+							  tci);
+	}
 
-		map = rcu_dereference(dev_maps->attr_map[tci]);
-		if (map) {
-			if (map->len == 1)
-				queue_index = map->queues[0];
-			else
-				queue_index = map->queues[reciprocal_scale(skb_get_hash(skb),
-									   map->len)];
-			if (unlikely(queue_index >= dev->real_num_tx_queues))
-				queue_index = -1;
+get_cpus_map:
+	if (queue_index < 0) {
+		dev_maps = rcu_dereference(dev->xps_cpus_map);
+		if (dev_maps) {
+			unsigned int tci = skb->sender_cpu - 1;
+
+			queue_index = __get_xps_queue_idx(dev, skb, dev_maps,
+							  tci);
 		}
 	}
 	rcu_read_unlock();

^ permalink raw reply related

* [net-next PATCH v4 4/7] net: Record receive queue number for a connection
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom
In-Reply-To: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>

This patch adds a new field to sock_common 'skc_rx_queue_mapping'
which holds the receive queue number for the connection. The Rx queue
is marked in tcp_finish_connect() to allow a client app to do
SO_INCOMING_NAPI_ID after a connect() call to get the right queue
association for a socket. Rx queue is also marked in tcp_conn_request()
to allow syn-ack to go on the right tx-queue associated with
the queue on which syn is received.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 include/net/busy_poll.h |    1 +
 include/net/sock.h      |   14 ++++++++++++++
 net/core/sock.c         |    4 ++++
 net/ipv4/tcp_input.c    |    3 +++
 4 files changed, 22 insertions(+)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index c518743..9e36fda6 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -151,6 +151,7 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	sk->sk_napi_id = skb->napi_id;
 #endif
+	sk_rx_queue_set(sk, skb);
 }
 
 /* variant used for unconnected sockets */
diff --git a/include/net/sock.h b/include/net/sock.h
index 009fd30..0ff4416 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -139,6 +139,7 @@ typedef __u64 __bitwise __addrpair;
  *	@skc_node: main hash linkage for various protocol lookup tables
  *	@skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol
  *	@skc_tx_queue_mapping: tx queue number for this connection
+ *	@skc_rx_queue_mapping: rx queue number for this connection
  *	@skc_flags: place holder for sk_flags
  *		%SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
  *		%SO_OOBINLINE settings, %SO_TIMESTAMPING settings
@@ -215,6 +216,9 @@ struct sock_common {
 		struct hlist_nulls_node skc_nulls_node;
 	};
 	unsigned short		skc_tx_queue_mapping;
+#ifdef CONFIG_XPS
+	unsigned short		skc_rx_queue_mapping;
+#endif
 	union {
 		int		skc_incoming_cpu;
 		u32		skc_rcv_wnd;
@@ -326,6 +330,9 @@ struct sock {
 #define sk_nulls_node		__sk_common.skc_nulls_node
 #define sk_refcnt		__sk_common.skc_refcnt
 #define sk_tx_queue_mapping	__sk_common.skc_tx_queue_mapping
+#ifdef CONFIG_XPS
+#define sk_rx_queue_mapping	__sk_common.skc_rx_queue_mapping
+#endif
 
 #define sk_dontcopy_begin	__sk_common.skc_dontcopy_begin
 #define sk_dontcopy_end		__sk_common.skc_dontcopy_end
@@ -1696,6 +1703,13 @@ static inline int sk_tx_queue_get(const struct sock *sk)
 	return sk ? sk->sk_tx_queue_mapping - 1 : -1;
 }
 
+static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb)
+{
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = skb_get_rx_queue(skb) + 1;
+#endif
+}
+
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 {
 	sk_tx_queue_clear(sk);
diff --git a/net/core/sock.c b/net/core/sock.c
index bcc4182..5e4715b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2818,6 +2818,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_pacing_rate = ~0U;
 	sk->sk_pacing_shift = 10;
 	sk->sk_incoming_cpu = -1;
+
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = ~0;
+#endif
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 76ca88f..c404c53 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -78,6 +78,7 @@
 #include <linux/errqueue.h>
 #include <trace/events/tcp.h>
 #include <linux/static_key.h>
+#include <net/busy_poll.h>
 
 int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 
@@ -5584,6 +5585,7 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
 	if (skb) {
 		icsk->icsk_af_ops->sk_rx_dst_set(sk, skb);
 		security_inet_conn_established(sk, skb);
+		sk_mark_napi_id(sk, skb);
 	}
 
 	tcp_init_transfer(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
@@ -6412,6 +6414,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	tcp_rsk(req)->snt_isn = isn;
 	tcp_rsk(req)->txhash = net_tx_rndhash();
 	tcp_openreq_init_rwin(req, sk, dst);
+	sk_rx_queue_set(req_to_sk(req), skb);
 	if (!want_cookie) {
 		tcp_reqsk_record_syn(sk, req, skb);
 		fastopen_sk = tcp_try_fastopen(sk, skb, req, &foc, dst);

^ permalink raw reply related

* [net-next PATCH v4 3/7] net: sock: Change tx_queue_mapping in sock_common to unsigned short
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom
In-Reply-To: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>

Change 'skc_tx_queue_mapping' field in sock_common structure from
'int' to 'unsigned short' type with 0 indicating unset and
a positive queue value being set. This way it is consistent with
the queue_mapping field in the sk_buff. This will also accommodate
adding a new 'unsigned short' field in sock_common in the next
patch for rx_queue_mapping.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/net/sock.h |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index b3b7541..009fd30 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -214,7 +214,7 @@ struct sock_common {
 		struct hlist_node	skc_node;
 		struct hlist_nulls_node skc_nulls_node;
 	};
-	int			skc_tx_queue_mapping;
+	unsigned short		skc_tx_queue_mapping;
 	union {
 		int		skc_incoming_cpu;
 		u32		skc_rcv_wnd;
@@ -1681,17 +1681,19 @@ static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb,
 
 static inline void sk_tx_queue_set(struct sock *sk, int tx_queue)
 {
-	sk->sk_tx_queue_mapping = tx_queue;
+	/* sk_tx_queue_mapping accept only upto a 16-bit value */
+	WARN_ON((unsigned short)tx_queue > USHRT_MAX);
+	sk->sk_tx_queue_mapping = tx_queue + 1;
 }
 
 static inline void sk_tx_queue_clear(struct sock *sk)
 {
-	sk->sk_tx_queue_mapping = -1;
+	sk->sk_tx_queue_mapping = 0;
 }
 
 static inline int sk_tx_queue_get(const struct sock *sk)
 {
-	return sk ? sk->sk_tx_queue_mapping : -1;
+	return sk ? sk->sk_tx_queue_mapping - 1 : -1;
 }
 
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)

^ permalink raw reply related

* [net-next PATCH v4 2/7] net: Use static_key for XPS maps
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom
In-Reply-To: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>

Use static_key for XPS maps to reduce the cost of extra map checks,
similar to how it is used for RPS and RFS. This includes static_key
'xps_needed' for XPS and another for 'xps_rxqs_needed' for XPS using
Rx queues map.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 net/core/dev.c |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 2552556..df2a78d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2081,6 +2081,10 @@ int netdev_txq_to_tc(struct net_device *dev, unsigned int txq)
 EXPORT_SYMBOL(netdev_txq_to_tc);
 
 #ifdef CONFIG_XPS
+struct static_key xps_needed __read_mostly;
+EXPORT_SYMBOL(xps_needed);
+struct static_key xps_rxqs_needed __read_mostly;
+EXPORT_SYMBOL(xps_rxqs_needed);
 static DEFINE_MUTEX(xps_map_mutex);
 #define xmap_dereference(P)		\
 	rcu_dereference_protected((P), lockdep_is_held(&xps_map_mutex))
@@ -2170,12 +2174,14 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 
 	mutex_lock(&xps_map_mutex);
 
-	dev_maps = xmap_dereference(dev->xps_rxqs_map);
-	if (dev_maps) {
-		nr_ids = dev->num_rx_queues;
-		clean_xps_maps(dev, possible_mask, dev_maps, nr_ids, offset,
-			       count, true);
-
+	if (static_key_false(&xps_rxqs_needed)) {
+		dev_maps = xmap_dereference(dev->xps_rxqs_map);
+		if (dev_maps) {
+			nr_ids = dev->num_rx_queues;
+			clean_xps_maps(dev, possible_mask, dev_maps, nr_ids,
+				       offset, count, true);
+		}
+		static_key_slow_dec(&xps_rxqs_needed);
 	}
 
 	dev_maps = xmap_dereference(dev->xps_cpus_map);
@@ -2189,6 +2195,7 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 		       false);
 
 out_no_maps:
+	static_key_slow_dec(&xps_needed);
 	mutex_unlock(&xps_map_mutex);
 }
 
@@ -2297,6 +2304,10 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	if (!new_dev_maps)
 		goto out_no_new_maps;
 
+	static_key_slow_inc(&xps_needed);
+	if (is_rxqs_map)
+		static_key_slow_inc(&xps_rxqs_needed);
+
 	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
 	     j < nr_ids;) {
 		/* copy maps belonging to foreign traffic classes */
@@ -3450,6 +3461,9 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 	struct xps_map *map;
 	int queue_index = -1;
 
+	if (!static_key_false(&xps_needed))
+		return -1;
+
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {

^ permalink raw reply related

* [net-next PATCH v4 1/7] net: Refactor XPS for CPUs and Rx queues
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom
In-Reply-To: <152994950582.9733.3330634251364177102.stgit@anamhost.jf.intel.com>

Refactor XPS code to support Tx queue selection based on
CPU(s) map or Rx queue(s) map.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/cpumask.h   |   11 ++
 include/linux/netdevice.h |  100 +++++++++++++++++++++
 net/core/dev.c            |  211 ++++++++++++++++++++++++++++++---------------
 net/core/net-sysfs.c      |    4 -
 4 files changed, 246 insertions(+), 80 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index bf53d89..57f20a0 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -115,12 +115,17 @@ extern struct cpumask __cpu_active_mask;
 #define cpu_active(cpu)		((cpu) == 0)
 #endif
 
-/* verify cpu argument to cpumask_* operators */
-static inline unsigned int cpumask_check(unsigned int cpu)
+static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits)
 {
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
-	WARN_ON_ONCE(cpu >= nr_cpumask_bits);
+	WARN_ON_ONCE(cpu >= bits);
 #endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+}
+
+/* verify cpu argument to cpumask_* operators */
+static inline unsigned int cpumask_check(unsigned int cpu)
+{
+	cpu_max_bits_warn(cpu, nr_cpumask_bits);
 	return cpu;
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec9850..c534f03 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -730,10 +730,15 @@ struct xps_map {
  */
 struct xps_dev_maps {
 	struct rcu_head rcu;
-	struct xps_map __rcu *cpu_map[0];
+	struct xps_map __rcu *attr_map[0]; /* Either CPUs map or RXQs map */
 };
-#define XPS_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) +		\
+
+#define XPS_CPU_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) +	\
 	(nr_cpu_ids * (_tcs) * sizeof(struct xps_map *)))
+
+#define XPS_RXQ_DEV_MAPS_SIZE(_tcs, _rxqs) (sizeof(struct xps_dev_maps) +\
+	(_rxqs * (_tcs) * sizeof(struct xps_map *)))
+
 #endif /* CONFIG_XPS */
 
 #define TC_MAX_QUEUE	16
@@ -1909,7 +1914,8 @@ struct net_device {
 	int			watchdog_timeo;
 
 #ifdef CONFIG_XPS
-	struct xps_dev_maps __rcu *xps_maps;
+	struct xps_dev_maps __rcu *xps_cpus_map;
+	struct xps_dev_maps __rcu *xps_rxqs_map;
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc __rcu	*miniq_egress;
@@ -3258,6 +3264,94 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
 #ifdef CONFIG_XPS
 int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			u16 index);
+int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
+			  u16 index, bool is_rxqs_map);
+
+/**
+ *	attr_test_mask - Test a CPU or Rx queue set in a cpumask/rx queues mask
+ *	@j: CPU/Rx queue index
+ *	@mask: bitmask of all cpus/rx queues
+ *	@nr_bits: number of bits in the bitmask
+ *
+ * Test if a CPU or Rx queue index is set in a mask of all CPU/Rx queues.
+ */
+static inline bool attr_test_mask(unsigned long j, const unsigned long *mask,
+				  unsigned int nr_bits)
+{
+	cpu_max_bits_warn(j, nr_bits);
+	return test_bit(j, mask);
+}
+
+/**
+ *	attr_test_online - Test for online CPU/Rx queue
+ *	@j: CPU/Rx queue index
+ *	@online_mask: bitmask for CPUs/Rx queues that are online
+ *	@nr_bits: number of bits in the bitmask
+ *
+ * Returns true if a CPU/Rx queue is online.
+ */
+static inline bool attr_test_online(unsigned long j,
+				    const unsigned long *online_mask,
+				    unsigned int nr_bits)
+{
+	cpu_max_bits_warn(j, nr_bits);
+
+	if (online_mask)
+		return test_bit(j, online_mask);
+
+	if (j >= 0 && j < nr_bits)
+		return true;
+
+	return false;
+}
+
+/**
+ *	attrmask_next - get the next CPU/Rx queue in a cpumask/Rx queues mask
+ *	@n: CPU/Rx queue index
+ *	@srcp: the cpumask/Rx queue mask pointer
+ *	@nr_bits: number of bits in the bitmask
+ *
+ * Returns >= nr_bits if no further CPUs/Rx queues set.
+ */
+static inline unsigned int attrmask_next(int n, const unsigned long *srcp,
+					 unsigned int nr_bits)
+{
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpu_max_bits_warn(n, nr_bits);
+
+	if (srcp)
+		return find_next_bit(srcp, nr_bits, n + 1);
+
+	return n + 1;
+}
+
+/**
+ *	attrmask_next_and - get the next CPU/Rx queue in *src1p & *src2p
+ *	@n: CPU/Rx queue index
+ *	@src1p: the first CPUs/Rx queues mask pointer
+ *	@src2p: the second CPUs/Rx queues mask pointer
+ *	@nr_bits: number of bits in the bitmask
+ *
+ * Returns >= nr_bits if no further CPUs/Rx queues set in both.
+ */
+static inline int attrmask_next_and(int n, const unsigned long *src1p,
+				    const unsigned long *src2p,
+				    unsigned int nr_bits)
+{
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpu_max_bits_warn(n, nr_bits);
+
+	if (src1p && src2p)
+		return find_next_and_bit(src1p, src2p, nr_bits, n + 1);
+	else if (src1p)
+		return find_next_bit(src1p, nr_bits, n + 1);
+	else if (src2p)
+		return find_next_bit(src2p, nr_bits, n + 1);
+
+	return n + 1;
+}
 #else
 static inline int netif_set_xps_queue(struct net_device *dev,
 				      const struct cpumask *mask,
diff --git a/net/core/dev.c b/net/core/dev.c
index a5aa1c7..2552556 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2092,7 +2092,7 @@ static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
 	int pos;
 
 	if (dev_maps)
-		map = xmap_dereference(dev_maps->cpu_map[tci]);
+		map = xmap_dereference(dev_maps->attr_map[tci]);
 	if (!map)
 		return false;
 
@@ -2105,7 +2105,7 @@ static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
 			break;
 		}
 
-		RCU_INIT_POINTER(dev_maps->cpu_map[tci], NULL);
+		RCU_INIT_POINTER(dev_maps->attr_map[tci], NULL);
 		kfree_rcu(map, rcu);
 		return false;
 	}
@@ -2135,31 +2135,58 @@ static bool remove_xps_queue_cpu(struct net_device *dev,
 	return active;
 }
 
+static void clean_xps_maps(struct net_device *dev, const unsigned long *mask,
+			   struct xps_dev_maps *dev_maps, unsigned int nr_ids,
+			   u16 offset, u16 count, bool is_rxqs_map)
+{
+	bool active = false;
+	int i, j;
+
+	for (j = -1; j = attrmask_next(j, mask, nr_ids),
+	     j < nr_ids;)
+		active |= remove_xps_queue_cpu(dev, dev_maps, j, offset,
+					       count);
+	if (!active) {
+		if (is_rxqs_map) {
+			RCU_INIT_POINTER(dev->xps_rxqs_map, NULL);
+		} else {
+			RCU_INIT_POINTER(dev->xps_cpus_map, NULL);
+
+			for (i = offset + (count - 1); count--; i--)
+				netdev_queue_numa_node_write(
+					netdev_get_tx_queue(dev, i),
+							NUMA_NO_NODE);
+		}
+		kfree_rcu(dev_maps, rcu);
+	}
+}
+
 static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 				   u16 count)
 {
+	const unsigned long *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps;
-	int cpu, i;
-	bool active = false;
+	unsigned int nr_ids;
 
 	mutex_lock(&xps_map_mutex);
-	dev_maps = xmap_dereference(dev->xps_maps);
 
-	if (!dev_maps)
-		goto out_no_maps;
-
-	for_each_possible_cpu(cpu)
-		active |= remove_xps_queue_cpu(dev, dev_maps, cpu,
-					       offset, count);
+	dev_maps = xmap_dereference(dev->xps_rxqs_map);
+	if (dev_maps) {
+		nr_ids = dev->num_rx_queues;
+		clean_xps_maps(dev, possible_mask, dev_maps, nr_ids, offset,
+			       count, true);
 
-	if (!active) {
-		RCU_INIT_POINTER(dev->xps_maps, NULL);
-		kfree_rcu(dev_maps, rcu);
 	}
 
-	for (i = offset + (count - 1); count--; i--)
-		netdev_queue_numa_node_write(netdev_get_tx_queue(dev, i),
-					     NUMA_NO_NODE);
+	dev_maps = xmap_dereference(dev->xps_cpus_map);
+	if (!dev_maps)
+		goto out_no_maps;
+
+	if (num_possible_cpus() > 1)
+		possible_mask = cpumask_bits(cpu_possible_mask);
+	nr_ids = nr_cpu_ids;
+	clean_xps_maps(dev, possible_mask, dev_maps, nr_ids, offset, count,
+		       false);
 
 out_no_maps:
 	mutex_unlock(&xps_map_mutex);
@@ -2170,8 +2197,8 @@ static void netif_reset_xps_queues_gt(struct net_device *dev, u16 index)
 	netif_reset_xps_queues(dev, index, dev->num_tx_queues - index);
 }
 
-static struct xps_map *expand_xps_map(struct xps_map *map,
-				      int cpu, u16 index)
+static struct xps_map *expand_xps_map(struct xps_map *map, int attr_index,
+				      u16 index, bool is_rxqs_map)
 {
 	struct xps_map *new_map;
 	int alloc_len = XPS_MIN_MAP_ALLOC;
@@ -2183,7 +2210,7 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 		return map;
 	}
 
-	/* Need to add queue to this CPU's existing map */
+	/* Need to add tx-queue to this CPU's/rx-queue's existing map */
 	if (map) {
 		if (pos < map->alloc_len)
 			return map;
@@ -2191,9 +2218,14 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 		alloc_len = map->alloc_len * 2;
 	}
 
-	/* Need to allocate new map to store queue on this CPU's map */
-	new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL,
-			       cpu_to_node(cpu));
+	/* Need to allocate new map to store tx-queue on this CPU's/rx-queue's
+	 *  map
+	 */
+	if (is_rxqs_map)
+		new_map = kzalloc(XPS_MAP_SIZE(alloc_len), GFP_KERNEL);
+	else
+		new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL,
+				       cpu_to_node(attr_index));
 	if (!new_map)
 		return NULL;
 
@@ -2205,14 +2237,16 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 	return new_map;
 }
 
-int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
-			u16 index)
+int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
+			  u16 index, bool is_rxqs_map)
 {
+	const unsigned long *online_mask = NULL, *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL;
-	int i, cpu, tci, numa_node_id = -2;
+	int i, j, tci, numa_node_id = -2;
 	int maps_sz, num_tc = 1, tc = 0;
 	struct xps_map *map, *new_map;
 	bool active = false;
+	unsigned int nr_ids;
 
 	if (dev->num_tc) {
 		num_tc = dev->num_tc;
@@ -2221,16 +2255,27 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			return -EINVAL;
 	}
 
-	maps_sz = XPS_DEV_MAPS_SIZE(num_tc);
-	if (maps_sz < L1_CACHE_BYTES)
-		maps_sz = L1_CACHE_BYTES;
-
 	mutex_lock(&xps_map_mutex);
+	if (is_rxqs_map) {
+		maps_sz = XPS_RXQ_DEV_MAPS_SIZE(num_tc, dev->num_rx_queues);
+		dev_maps = xmap_dereference(dev->xps_rxqs_map);
+		nr_ids = dev->num_rx_queues;
+	} else {
+		maps_sz = XPS_CPU_DEV_MAPS_SIZE(num_tc);
+		if (num_possible_cpus() > 1) {
+			online_mask = cpumask_bits(cpu_online_mask);
+			possible_mask = cpumask_bits(cpu_possible_mask);
+		}
+		dev_maps = xmap_dereference(dev->xps_cpus_map);
+		nr_ids = nr_cpu_ids;
+	}
 
-	dev_maps = xmap_dereference(dev->xps_maps);
+	if (maps_sz < L1_CACHE_BYTES)
+		maps_sz = L1_CACHE_BYTES;
 
 	/* allocate memory for queue storage */
-	for_each_cpu_and(cpu, cpu_online_mask, mask) {
+	for (j = -1; j = attrmask_next_and(j, online_mask, mask, nr_ids),
+	     j < nr_ids;) {
 		if (!new_dev_maps)
 			new_dev_maps = kzalloc(maps_sz, GFP_KERNEL);
 		if (!new_dev_maps) {
@@ -2238,73 +2283,81 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			return -ENOMEM;
 		}
 
-		tci = cpu * num_tc + tc;
-		map = dev_maps ? xmap_dereference(dev_maps->cpu_map[tci]) :
+		tci = j * num_tc + tc;
+		map = dev_maps ? xmap_dereference(dev_maps->attr_map[tci]) :
 				 NULL;
 
-		map = expand_xps_map(map, cpu, index);
+		map = expand_xps_map(map, j, index, is_rxqs_map);
 		if (!map)
 			goto error;
 
-		RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+		RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 	}
 
 	if (!new_dev_maps)
 		goto out_no_new_maps;
 
-	for_each_possible_cpu(cpu) {
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
 		/* copy maps belonging to foreign traffic classes */
-		for (i = tc, tci = cpu * num_tc; dev_maps && i--; tci++) {
+		for (i = tc, tci = j * num_tc; dev_maps && i--; tci++) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
 		/* We need to explicitly update tci as prevous loop
 		 * could break out early if dev_maps is NULL.
 		 */
-		tci = cpu * num_tc + tc;
+		tci = j * num_tc + tc;
 
-		if (cpumask_test_cpu(cpu, mask) && cpu_online(cpu)) {
-			/* add queue to CPU maps */
+		if (attr_test_mask(j, mask, nr_ids) &&
+		    attr_test_online(j, online_mask, nr_ids)) {
+			/* add tx-queue to CPU/rx-queue maps */
 			int pos = 0;
 
-			map = xmap_dereference(new_dev_maps->cpu_map[tci]);
+			map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			while ((pos < map->len) && (map->queues[pos] != index))
 				pos++;
 
 			if (pos == map->len)
 				map->queues[map->len++] = index;
 #ifdef CONFIG_NUMA
-			if (numa_node_id == -2)
-				numa_node_id = cpu_to_node(cpu);
-			else if (numa_node_id != cpu_to_node(cpu))
-				numa_node_id = -1;
+			if (!is_rxqs_map) {
+				if (numa_node_id == -2)
+					numa_node_id = cpu_to_node(j);
+				else if (numa_node_id != cpu_to_node(j))
+					numa_node_id = -1;
+			}
 #endif
 		} else if (dev_maps) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
 		/* copy maps belonging to foreign traffic classes */
 		for (i = num_tc - tc, tci++; dev_maps && --i; tci++) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 	}
 
-	rcu_assign_pointer(dev->xps_maps, new_dev_maps);
+	if (is_rxqs_map)
+		rcu_assign_pointer(dev->xps_rxqs_map, new_dev_maps);
+	else
+		rcu_assign_pointer(dev->xps_cpus_map, new_dev_maps);
 
 	/* Cleanup old maps */
 	if (!dev_maps)
 		goto out_no_old_maps;
 
-	for_each_possible_cpu(cpu) {
-		for (i = num_tc, tci = cpu * num_tc; i--; tci++) {
-			new_map = xmap_dereference(new_dev_maps->cpu_map[tci]);
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = num_tc, tci = j * num_tc; i--; tci++) {
+			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
 			if (map && map != new_map)
 				kfree_rcu(map, rcu);
 		}
@@ -2317,19 +2370,23 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	active = true;
 
 out_no_new_maps:
-	/* update Tx queue numa node */
-	netdev_queue_numa_node_write(netdev_get_tx_queue(dev, index),
-				     (numa_node_id >= 0) ? numa_node_id :
-				     NUMA_NO_NODE);
+	if (!is_rxqs_map) {
+		/* update Tx queue numa node */
+		netdev_queue_numa_node_write(netdev_get_tx_queue(dev, index),
+					     (numa_node_id >= 0) ?
+					     numa_node_id : NUMA_NO_NODE);
+	}
 
 	if (!dev_maps)
 		goto out_no_maps;
 
-	/* removes queue from unused CPUs */
-	for_each_possible_cpu(cpu) {
-		for (i = tc, tci = cpu * num_tc; i--; tci++)
+	/* removes tx-queue from unused CPUs/rx-queues */
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = tc, tci = j * num_tc; i--; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
-		if (!cpumask_test_cpu(cpu, mask) || !cpu_online(cpu))
+		if (!attr_test_mask(j, mask, nr_ids) ||
+		    !attr_test_online(j, online_mask, nr_ids))
 			active |= remove_xps_queue(dev_maps, tci, index);
 		for (i = num_tc - tc, tci++; --i; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
@@ -2337,7 +2394,10 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 
 	/* free map if not active */
 	if (!active) {
-		RCU_INIT_POINTER(dev->xps_maps, NULL);
+		if (is_rxqs_map)
+			RCU_INIT_POINTER(dev->xps_rxqs_map, NULL);
+		else
+			RCU_INIT_POINTER(dev->xps_cpus_map, NULL);
 		kfree_rcu(dev_maps, rcu);
 	}
 
@@ -2347,11 +2407,12 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	return 0;
 error:
 	/* remove any maps that we added */
-	for_each_possible_cpu(cpu) {
-		for (i = num_tc, tci = cpu * num_tc; i--; tci++) {
-			new_map = xmap_dereference(new_dev_maps->cpu_map[tci]);
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = num_tc, tci = j * num_tc; i--; tci++) {
+			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			map = dev_maps ?
-			      xmap_dereference(dev_maps->cpu_map[tci]) :
+			      xmap_dereference(dev_maps->attr_map[tci]) :
 			      NULL;
 			if (new_map && new_map != map)
 				kfree(new_map);
@@ -2363,6 +2424,12 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	kfree(new_dev_maps);
 	return -ENOMEM;
 }
+
+int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
+			u16 index)
+{
+	return __netif_set_xps_queue(dev, cpumask_bits(mask), index, false);
+}
 EXPORT_SYMBOL(netif_set_xps_queue);
 
 #endif
@@ -3384,7 +3451,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 	int queue_index = -1;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps);
+	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {
 		unsigned int tci = skb->sender_cpu - 1;
 
@@ -3393,7 +3460,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 			tci += netdev_get_prio_tc_map(dev, skb->priority);
 		}
 
-		map = rcu_dereference(dev_maps->cpu_map[tci]);
+		map = rcu_dereference(dev_maps->attr_map[tci]);
 		if (map) {
 			if (map->len == 1)
 				queue_index = map->queues[0];
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bb7e80f..b39987c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1227,13 +1227,13 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 		return -ENOMEM;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps);
+	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {
 		for_each_possible_cpu(cpu) {
 			int i, tci = cpu * num_tc + tc;
 			struct xps_map *map;
 
-			map = rcu_dereference(dev_maps->cpu_map[tci]);
+			map = rcu_dereference(dev_maps->attr_map[tci]);
 			if (!map)
 				continue;
 

^ permalink raw reply related

* [net-next PATCH v4 0/7] Symmetric queue selection using XPS for Rx queues
From: Amritha Nambiar @ 2018-06-25 18:04 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, alexander.duyck, edumazet, hannes, tom

This patch series implements support for Tx queue selection based on
Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue
using sysfs attribute. If the user configuration for Rx queues does
not apply, then the Tx queue selection falls back to XPS using CPUs and
finally to hashing.

XPS is refactored to support Tx queue selection based on either the
CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be
enabled. By default no receive queues are configured for the Tx queue.

- /sys/class/net/<dev>/queues/tx-*/xps_rxqs

A set of receive queues can be mapped to a set of transmit queues (many:many),
although the common use case is a 1:1 mapping. This will enable sending
packets on the same Tx-Rx queue association as this is useful for busy polling
multi-threaded workloads where it is not possible to pin the threads to
a CPU. This is a rework of Sridhar's patch for symmetric queueing via
socket option:
https://www.spinics.net/lists/netdev/msg453106.html

Testing Hints:
Kernel:  Linux 4.17.0-rc7+
Interface:
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x00015e0b

Configuration:
ethtool -L $iface combined 16
ethtool -C $iface rx-usecs 1000
sysctl net.core.busy_poll=1000
ATR disabled:
ethtool -K $iface ntuple on

Workload:
Modified memcached that changes the thread selection policy to be based
on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket
option. The default is round-robin.

Default: No rxqs_map configured
Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue

System:
Architecture:          x86_64
CPU(s):                72
Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

16 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 4/51/2215               2/30/5163
(usec)


intr/sec                        26655                   18606

contextswitch/sec               5145                    4044

insn per cycle                  0.43                    0.72

cache-misses                    6.919                   4.310
(% of all cache refs)

L1-dcache-load-                 4.49                    3.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.26                   8.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

32 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 10/112/5562             9/46/4637
(usec)


intr/sec                        30456                   27666

contextswitch/sec               7552                    5133

insn per cycle                  0.41                    0.49

cache-misses                    9.357                   2.769
(% of all cache refs)

L1-dcache-load-                 4.09                    3.98
-misses
(% of all L1-dcache hits)

LLC-load-misses                 12.96                   3.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

16 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  5/151/4989             9/69/2611
(usec)


intr/sec                        35686                   22907

contextswitch/sec               25522                   12281

insn per cycle                  0.67                    0.74

cache-misses                    8.652                   6.38
(% of all cache refs)

L1-dcache-load-                 3.19                    2.86
-misses
(% of all L1-dcache hits)

LLC-load-misses                 16.53                   11.99
(% of all LL-cache hits)

-------------------------------------------------------------------------------
32 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  6/163/6152             8/88/4209
(usec)


intr/sec                        47079                   26548

contextswitch/sec               42190                   39168

insn per cycle                  0.45                    0.54

cache-misses                    8.798                   4.668
(% of all cache refs)

L1-dcache-load-                 6.55                    6.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.91                   10.44
(% of all LL-cache hits)

-------------------------------------------------------------------------------

v4:
- Removed enum for map types and used boolean to identify rxqs_map vs cpus_map.
- Added comments for helper functions.
- Added another static_key for rxqs_map (xps_rxqs_needed).
- New patch to change tx_queue_mapping in sock_common to unsigned short.
- Separated marking receive queue number into a standalone patch.
- Changed wording in documentation (queue-pair to queue-association)

---

Amritha Nambiar (7):
      net: Refactor XPS for CPUs and Rx queues
      net: Use static_key for XPS maps
      net: sock: Change tx_queue_mapping in sock_common to unsigned short
      net: Record receive queue number for a connection
      net: Enable Tx queue selection based on Rx queues
      net-sysfs: Add interface for Rx queue(s) map per Tx queue
      Documentation: Add explanation for XPS using Rx-queue(s) map


 Documentation/ABI/testing/sysfs-class-net-queues |   11 +
 Documentation/networking/scaling.txt             |   57 ++++
 include/linux/cpumask.h                          |   11 +
 include/linux/netdevice.h                        |  100 ++++++++
 include/net/busy_poll.h                          |    1 
 include/net/sock.h                               |   28 ++
 net/core/dev.c                                   |  283 +++++++++++++++-------
 net/core/net-sysfs.c                             |   85 ++++++-
 net/core/sock.c                                  |    4 
 net/ipv4/tcp_input.c                             |    3 
 10 files changed, 474 insertions(+), 109 deletions(-)

^ permalink raw reply

* [PATCH v4 net-next] mdio-mux-gpio: Remove VLA usage
From: Kees Cook @ 2018-06-25 22:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, Joe Perches, Andrew Lunn, netdev

In the quest to remove all stack VLA usage from the kernel[1], this
allocates the values buffer during the callback instead of putting it
on the stack.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
v4: use struct_size() helper for allocation (Joe Perches)
v3: resend to netdev with Reviewed-by
v2: allocate array as part of structure (Andrew Lunn)
---
 drivers/net/phy/mdio-mux-gpio.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/mdio-mux-gpio.c b/drivers/net/phy/mdio-mux-gpio.c
index 082ffef0dec4..bc90764a8b8d 100644
--- a/drivers/net/phy/mdio-mux-gpio.c
+++ b/drivers/net/phy/mdio-mux-gpio.c
@@ -20,23 +20,23 @@
 struct mdio_mux_gpio_state {
 	struct gpio_descs *gpios;
 	void *mux_handle;
+	int values[];
 };
 
 static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
 				   void *data)
 {
 	struct mdio_mux_gpio_state *s = data;
-	int values[s->gpios->ndescs];
 	unsigned int n;
 
 	if (current_child == desired_child)
 		return 0;
 
 	for (n = 0; n < s->gpios->ndescs; n++)
-		values[n] = (desired_child >> n) & 1;
+		s->values[n] = (desired_child >> n) & 1;
 
 	gpiod_set_array_value_cansleep(s->gpios->ndescs, s->gpios->desc,
-				       values);
+				       s->values);
 
 	return 0;
 }
@@ -44,15 +44,21 @@ static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
 static int mdio_mux_gpio_probe(struct platform_device *pdev)
 {
 	struct mdio_mux_gpio_state *s;
+	struct gpio_descs *gpios;
 	int r;
 
-	s = devm_kzalloc(&pdev->dev, sizeof(*s), GFP_KERNEL);
-	if (!s)
+	gpios = gpiod_get_array(&pdev->dev, NULL, GPIOD_OUT_LOW);
+	if (IS_ERR(gpios))
+		return PTR_ERR(gpios);
+
+	s = devm_kzalloc(&pdev->dev, struct_size(s, values, gpios->ndescs),
+			 GFP_KERNEL);
+	if (!s) {
+		gpiod_put_array(gpios);
 		return -ENOMEM;
+	}
 
-	s->gpios = gpiod_get_array(&pdev->dev, NULL, GPIOD_OUT_LOW);
-	if (IS_ERR(s->gpios))
-		return PTR_ERR(s->gpios);
+	s->gpios = gpios;
 
 	r = mdio_mux_init(&pdev->dev, pdev->dev.of_node,
 			  mdio_mux_gpio_switch_fn, &s->mux_handle, s, NULL);
-- 
2.17.1


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [PATCH v3 net-next] mdio-mux-gpio: Remove VLA usage
From: Kees Cook @ 2018-06-25 22:39 UTC (permalink / raw)
  To: Joe Perches; +Cc: David S. Miller, LKML, Andrew Lunn, Network Development
In-Reply-To: <c892c7e7b271c9a269c78887777e691dce12f5c8.camel@perches.com>

On Mon, Jun 25, 2018 at 3:23 PM, Joe Perches <joe@perches.com> wrote:
> On Mon, 2018-06-25 at 15:09 -0700, Kees Cook wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this
>> allocates the values buffer during the callback instead of putting it
>> on the stack.
>
> []
>
>> diff --git a/drivers/net/phy/mdio-mux-gpio.c b/drivers/net/phy/mdio-mux-gpio.c
> []
>> @@ -44,15 +44,21 @@ static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
>>  static int mdio_mux_gpio_probe(struct platform_device *pdev)
>>  {
> []
>> +     s = devm_kzalloc(&pdev->dev, sizeof(*s->values) * gpios->ndescs +
>> +                                  sizeof(*s), GFP_KERNEL);
>
> Isn't this supposed to use your new struct_size()

Why yes. Yes it is. :) When treewide changes meet in the night! :)
I'll send a v4.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* Re: [PATCH net-next 3/4] netdevsim: add ipsec offload testing
From: Shannon Nelson @ 2018-06-25 22:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, anders.roxell, linux-kselftest
In-Reply-To: <20180622210708.43d50d15@cakuba.netronome.com>

On 6/22/2018 9:07 PM, Jakub Kicinski wrote:
> On Fri, 22 Jun 2018 17:31:37 -0700, Shannon Nelson wrote:
>> Implement the IPsec/XFRM offload API for testing.
>>
>> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
> 
> Thanks for the patch!  Just a number of stylistic nit picks.
> 
>> diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c
>> new file mode 100644
>> index 0000000..ad64266
>> --- /dev/null
>> +++ b/drivers/net/netdevsim/ipsec.c
>> @@ -0,0 +1,345 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright(c) 2018 Oracle and/or its affiliates. All rights reserved. */
>> +
>> +#include <net/xfrm.h>
>> +#include <crypto/aead.h>
>> +#include <linux/debugfs.h>
>> +#include "netdevsim.h"
> 
> Other files in the driver sort headers alphabetically and put an empty
> line between global and local headers.

Sure.

> 
>> +#define NSIM_IPSEC_AUTH_BITS	128
>> +
>> +/**
>> + * nsim_ipsec_dbg_read - read for ipsec data
>> + * @filp: the opened file
>> + * @buffer: where to write the data for the user to read
>> + * @count: the size of the user's buffer
>> + * @ppos: file position offset
>> + **/
>> +static ssize_t nsim_dbg_netdev_ops_read(struct file *filp,
> 
> Doesn't match the kdoc.  Please run
> 
> ./scripts/kernel-doc -none $file
> 
> if you want kdoc.  Although IMHO you may as well drop the kdoc, your
> code is quite self explanatory and local.

By adding -v to that I got a couple of warnings that I didn't include 
the Return information - is that what you were commenting on?  The rest 
seems acceptable to the script I'm using from the net-next tree.

> 
>> +					char __user *buffer,
>> +					size_t count, loff_t *ppos)
>> +{
>> +	struct netdevsim *ns = filp->private_data;
>> +	struct nsim_ipsec *ipsec = &ns->ipsec;
>> +	size_t bufsize;
>> +	char *buf, *p;
>> +	int len;
>> +	int i;
>> +
>> +	/* don't allow partial reads */
>> +	if (*ppos != 0)
>> +		return 0;
>> +
>> +	/* the buffer needed is
>> +	 * (num SAs * 3 lines each * ~60 bytes per line) + one more line
>> +	 */
>> +	bufsize = (ipsec->count * 4 * 60) + 60;
>> +	buf = kzalloc(bufsize, GFP_KERNEL);
>> +	if (!buf)
>> +		return -ENOMEM;
>> +
>> +	p = buf;
>> +	p += snprintf(p, bufsize - (p - buf),
>> +		      "SA count=%u tx=%u\n",
>> +		      ipsec->count, ipsec->tx);
>> +
>> +	for (i = 0; i < NSIM_IPSEC_MAX_SA_COUNT; i++) {
>> +		struct nsim_sa *sap = &ipsec->sa[i];
>> +
>> +		if (!sap->used)
>> +			continue;
>> +
>> +		p += snprintf(p, bufsize - (p - buf),
>> +			      "sa[%i] %cx ipaddr=0x%08x %08x %08x %08x\n",
>> +			      i, (sap->rx ? 'r' : 't'), sap->ipaddr[0],
>> +			      sap->ipaddr[1], sap->ipaddr[2], sap->ipaddr[3]);
>> +		p += snprintf(p, bufsize - (p - buf),
>> +			      "sa[%i]    spi=0x%08x proto=0x%x salt=0x%08x crypt=%d\n",
>> +			      i, be32_to_cpu(sap->xs->id.spi),
>> +			      sap->xs->id.proto, sap->salt, sap->crypt);
>> +		p += snprintf(p, bufsize - (p - buf),
>> +			      "sa[%i]    key=0x%08x %08x %08x %08x\n",
>> +			      i, sap->key[0], sap->key[1],
>> +			      sap->key[2], sap->key[3]);
>> +	}
>> +
>> +	len = simple_read_from_buffer(buffer, count, ppos, buf, p - buf);
> 
> Why not seq_file for this?

Why bother with more interface code?  This is useful enough to support 
the API testing needed.

> 
>> +	kfree(buf);
>> +	return len;
>> +}
>> +
>> +static const struct file_operations ipsec_dbg_fops = {
>> +	.owner = THIS_MODULE,
>> +	.open = simple_open,
>> +	.read = nsim_dbg_netdev_ops_read,
>> +};
>> +
>> +/**
>> + * nsim_ipsec_find_empty_idx - find the first unused security parameter index
>> + * @ipsec: pointer to ipsec struct
>> + **/
>> +static int nsim_ipsec_find_empty_idx(struct nsim_ipsec *ipsec)
>> +{
>> +	u32 i;
>> +
>> +	if (ipsec->count == NSIM_IPSEC_MAX_SA_COUNT)
>> +		return -ENOSPC;
>> +
>> +	/* search sa table */
>> +	for (i = 0; i < NSIM_IPSEC_MAX_SA_COUNT; i++) {
>> +		if (!ipsec->sa[i].used)
>> +			return i;
>> +	}
>> +
>> +	return -ENOSPC;
> 
> FWIW I personally find bitmaps and find_first_zero_bit() etc. nice and
> concise for a small ID allocator, but no objection to open coding.

Sure, we could add a parallel bitmap data structure to track usage of 
our array elements, and probably would for a much larger array so as to 
lessen the impact of a serial search.  But, since this is a short array 
for simple testing purposes, the search time is minimal so I think the 
simple code is fine.

> 
>> +}
>> +
>> +/**
>> + * nsim_ipsec_parse_proto_keys - find the key and salt based on the protocol
>> + * @xs: pointer to xfrm_state struct
>> + * @mykey: pointer to key array to populate
>> + * @mysalt: pointer to salt value to populate
>> + *
>> + * This copies the protocol keys and salt to our own data tables.  The
>> + * 82599 family only supports the one algorithm.
> 
> 82599 is a fine chip, it's not netdevsim tho? ;)

Yeah, guess where I hacked the code from...  Thanks, I missed this 
reference.

> 
>> + **/
>> +static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs,
>> +				       u32 *mykey, u32 *mysalt)
>> +{
>> +	struct net_device *dev = xs->xso.dev;
>> +	unsigned char *key_data;
>> +	char *alg_name = NULL;
>> +	const char aes_gcm_name[] = "rfc4106(gcm(aes))";
>> +	int key_len;
> 
> reverse xmas tree please

Yep, missed it here.

> 
>> +
>> +	if (!xs->aead) {
>> +		netdev_err(dev, "Unsupported IPsec algorithm\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (xs->aead->alg_icv_len != NSIM_IPSEC_AUTH_BITS) {
>> +		netdev_err(dev, "IPsec offload requires %d bit authentication\n",
>> +			   NSIM_IPSEC_AUTH_BITS);
>> +		return -EINVAL;
>> +	}
>> +
>> +	key_data = &xs->aead->alg_key[0];
>> +	key_len = xs->aead->alg_key_len;
>> +	alg_name = xs->aead->alg_name;
>> +
>> +	if (strcmp(alg_name, aes_gcm_name)) {
>> +		netdev_err(dev, "Unsupported IPsec algorithm - please use %s\n",
>> +			   aes_gcm_name);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* The key bytes come down in a bigendian array of bytes, so
>> +	 * we don't need to do any byteswapping.
> 
> Why the mention of bigendian?  82599 needs big endian? -.^

Yep, another useless reference left over from the hack-n-slash - I'll 
remove it.

> 
>> +	 * 160 accounts for 16 byte key and 4 byte salt
>> +	 */
>> +	if (key_len > 128) {
> 
> s/128/NSIM_IPSEC_AUTH_BITS/ ?

Sure.

> 
>> +		*mysalt = ((u32 *)key_data)[4];
> 
> Is alignment guaranteed?  There are the unaligned helpers if you need
> them..

Since the key_data must be at least 128 bits in this implementation, and 
the key itself is 128 bits, anything after is salt, so we can assume 
that the salt data is aligned.

> 
>> +	} else if (key_len == 128) {
>> +		*mysalt = 0;
>> +	} else {
>> +		netdev_err(dev, "IPsec hw offload only supports 128 bit keys with optional 32 bit salt\n");
>> +		return -EINVAL;
>> +	}
>> +	memcpy(mykey, key_data, 16);
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * nsim_ipsec_add_sa - program device with a security association
>> + * @xs: pointer to transformer state struct
>> + **/
>> +static int nsim_ipsec_add_sa(struct xfrm_state *xs)
>> +{
>> +	struct net_device *dev = xs->xso.dev;
>> +	struct netdevsim *ns = netdev_priv(dev);
>> +	struct nsim_ipsec *ipsec = &ns->ipsec;
> 
> xmas tree again (initialize out of line if you have to)

This one is pretty much the way I've done in the past with no complaints 
and seems common enough in other net drivers, specifically when dealing 
with netdev and netdevpriv elements.  Only the first line is out of 
place, with the next lines dependent on it.

> 
>> +	struct nsim_sa sa;
>> +	u16 sa_idx;
>> +	int ret;
>> +
>> +	if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
>> +		netdev_err(dev, "Unsupported protocol 0x%04x for ipsec offload\n",
>> +			   xs->id.proto);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (xs->calg) {
>> +		netdev_err(dev, "Compression offload not supported\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* find the first unused index */
>> +	ret = nsim_ipsec_find_empty_idx(ipsec);
>> +	if (ret < 0) {
>> +		netdev_err(dev, "No space for SA in Rx table!\n");
>> +		return ret;
>> +	}
>> +	sa_idx = (u16)ret;
>> +
>> +	memset(&sa, 0, sizeof(sa));
>> +	sa.used = true;
>> +	sa.xs = xs;
>> +
>> +	if (sa.xs->id.proto & IPPROTO_ESP)
>> +		sa.crypt = xs->ealg || xs->aead;
>> +
>> +	/* get the key and salt */
>> +	ret = nsim_ipsec_parse_proto_keys(xs, sa.key, &sa.salt);
>> +	if (ret) {
>> +		netdev_err(dev, "Failed to get key data for SA table\n");
>> +		return ret;
>> +	}
>> +
>> +	if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
>> +		sa.rx = true;
>> +
>> +		if (xs->props.family == AF_INET6)
>> +			memcpy(sa.ipaddr, &xs->id.daddr.a6, 16);
>> +		else
>> +			memcpy(&sa.ipaddr[3], &xs->id.daddr.a4, 4);
>> +	}
>> +
>> +	/* the preparations worked, so save the info */
>> +	memcpy(&ipsec->sa[sa_idx], &sa, sizeof(sa));
>> +
>> +	/* the XFRM stack doesn't like offload_handle == 0,
>> +	 * so add a bitflag in case our array index is 0
>> +	 */
>> +	xs->xso.offload_handle = sa_idx | NSIM_IPSEC_VALID;
>> +	ipsec->count++;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * nsim_ipsec_del_sa - clear out this specific SA
>> + * @xs: pointer to transformer state struct
>> + **/
>> +static void nsim_ipsec_del_sa(struct xfrm_state *xs)
>> +{
>> +	struct net_device *dev = xs->xso.dev;
>> +	struct netdevsim *ns = netdev_priv(dev);
>> +	struct nsim_ipsec *ipsec = &ns->ipsec;
>> +	u16 sa_idx;
>> +
>> +	sa_idx = xs->xso.offload_handle & ~NSIM_IPSEC_VALID;
>> +	if (!ipsec->sa[sa_idx].used) {
>> +		netdev_err(dev, "Invalid SA for delete sa_idx=%d\n", sa_idx);
>> +		return;
>> +	}
>> +
>> +	memset(&ipsec->sa[sa_idx], 0, sizeof(struct nsim_sa));
>> +	ipsec->count--;
>> +}
>> +
>> +/**
>> + * nsim_ipsec_offload_ok - can this packet use the xfrm hw offload
>> + * @skb: current data packet
>> + * @xs: pointer to transformer state struct
>> + **/
>> +static bool nsim_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
>> +{
>> +	struct net_device *dev = xs->xso.dev;
>> +	struct netdevsim *ns = netdev_priv(dev);
>> +	struct nsim_ipsec *ipsec = &ns->ipsec;
>> +
>> +	ipsec->ok++;
>> +
>> +	return true;
>> +}
>> +
>> +static const struct xfrmdev_ops nsim_xfrmdev_ops = {
>> +	.xdo_dev_state_add = nsim_ipsec_add_sa,
>> +	.xdo_dev_state_delete = nsim_ipsec_del_sa,
>> +	.xdo_dev_offload_ok = nsim_ipsec_offload_ok,
> 
> Please align the initializers by adding tabs before '='.

Sure.

> 
>> +};
>> +
>> +/**
>> + * nsim_ipsec_tx - check Tx packet for ipsec offload
>> + * @ns: pointer to ns structure
>> + * @skb: current data packet
>> + **/
>> +int nsim_ipsec_tx(struct netdevsim *ns, struct sk_buff *skb)
>> +{
>> +	struct nsim_ipsec *ipsec = &ns->ipsec;
>> +	struct xfrm_state *xs;
>> +	struct nsim_sa *tsa;
>> +	u32 sa_idx;
>> +
>> +	/* do we even need to check this packet? */
>> +	if (!skb->sp)
>> +		return 1;
>> +
>> +	if (unlikely(!skb->sp->len)) {
>> +		netdev_err(ns->netdev, "%s: no xfrm state len = %d\n",
>> +			   __func__, skb->sp->len);
> 
> Hmm..  __func__ started appearing in errors?  Perhaps either always or
> never add it?
> 
> Also, I know this is not a real device, but please always use rate
> limited print functions on the data path.
> 
>> +		return 0;
>> +	}
>> +
>> +	xs = xfrm_input_state(skb);
>> +	if (unlikely(!xs)) {
>> +		netdev_err(ns->netdev, "%s: no xfrm_input_state() xs = %p\n",
>> +			   __func__, xs);
>> +		return 0;
>> +	}
>> +
>> +	sa_idx = xs->xso.offload_handle & ~NSIM_IPSEC_VALID;
>> +	if (unlikely(sa_idx > NSIM_IPSEC_MAX_SA_COUNT)) {
>> +		netdev_err(ns->netdev, "%s: bad sa_idx=%d max=%d\n",
>> +			   __func__, sa_idx, NSIM_IPSEC_MAX_SA_COUNT);
>> +		return 0;
>> +	}
>> +
>> +	tsa = &ipsec->sa[sa_idx];
>> +	if (unlikely(!tsa->used)) {
>> +		netdev_err(ns->netdev, "%s: unused sa_idx=%d\n",
>> +			   __func__, sa_idx);
>> +		return 0;
>> +	}
>> +
>> +	if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
>> +		netdev_err(ns->netdev, "%s: unexpected proto=%d\n",
>> +			   __func__, xs->id.proto);
>> +		return 0;
>> +	}
>> +
>> +	ipsec->tx++;
>> +
>> +	return 1;
>> +}
> 
> Looks like the function should return bool?

Sure.

> 
>> +
>> +/**
>> + * nsim_ipsec_init - initialize security registers for IPSec operation
>> + * @ns: board private structure
> 
> "board"?  Yes, the kdoc may be best removed ;)
> 
>> + **/
>> +void nsim_ipsec_init(struct netdevsim *ns)
>> +{
>> +	ns->netdev->xfrmdev_ops = &nsim_xfrmdev_ops;
>> +
>> +#define NSIM_ESP_FEATURES	(NETIF_F_HW_ESP | \
>> +				 NETIF_F_HW_ESP_TX_CSUM | \
>> +				 NETIF_F_GSO_ESP)
>> +
>> +	ns->netdev->features |= NSIM_ESP_FEATURES;
>> +	ns->netdev->hw_enc_features |= NSIM_ESP_FEATURES;
>> +
>> +	ns->ipsec.pfile = debugfs_create_file("ipsec", 0400, ns->ddir, ns,
>> +					      &ipsec_dbg_fops);
>> +}
>> +
>> +void nsim_ipsec_teardown(struct netdevsim *ns)
>> +{
>> +	struct nsim_ipsec *ipsec = &ns->ipsec;
>> +
>> +	if (ipsec->count)
>> +		netdev_err(ns->netdev, "%s: tearing down IPsec offload with %d SAs left\n",
>> +			   __func__, ipsec->count);
>> +	debugfs_remove_recursive(ipsec->pfile);
>> +}
>> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
>> index ec68f38..6ce8604d 100644
>> --- a/drivers/net/netdevsim/netdev.c
>> +++ b/drivers/net/netdevsim/netdev.c
>> @@ -171,6 +171,8 @@ static int nsim_init(struct net_device *dev)
>>   	if (err)
>>   		goto err_unreg_dev;
>>   
>> +	nsim_ipsec_init(ns);
>> +
>>   	return 0;
>>   
>>   err_unreg_dev:
>> @@ -186,6 +188,7 @@ static void nsim_uninit(struct net_device *dev)
>>   {
>>   	struct netdevsim *ns = netdev_priv(dev);
>>   
>> +	nsim_ipsec_teardown(ns);
>>   	nsim_devlink_teardown(ns);
>>   	debugfs_remove_recursive(ns->ddir);
>>   	nsim_bpf_uninit(ns);
>> @@ -203,11 +206,15 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct netdevsim *ns = netdev_priv(dev);
>>   
>> +	if (!nsim_ipsec_tx(ns, skb))
>> +		goto out;
>> +
>>   	u64_stats_update_begin(&ns->syncp);
>>   	ns->tx_packets++;
>>   	ns->tx_bytes += skb->len;
>>   	u64_stats_update_end(&ns->syncp);
>>   
>> +out:
>>   	dev_kfree_skb(skb);
>>   
>>   	return NETDEV_TX_OK;
>> diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
>> index 3a8581a..1708dee 100644
>> --- a/drivers/net/netdevsim/netdevsim.h
>> +++ b/drivers/net/netdevsim/netdevsim.h
>> @@ -29,6 +29,29 @@ struct bpf_prog;
>>   struct dentry;
>>   struct nsim_vf_config;
>>   
>> +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD)
>> +#define NSIM_IPSEC_MAX_SA_COUNT		33
> 
> 33 caught my eye - out of curiosity is it 2^5 + 1 to catch some type of
> bug or failure mode?

For test rigs, I often use something like this to help flush out any 
interesting power-of-two or alignment assumptions in the code.  I don't 
expect anything here, but it doesn't hurt.

> 
>> +#define NSIM_IPSEC_VALID		BIT(31)
>> +
>> +struct nsim_sa {
>> +	struct xfrm_state *xs;
>> +	__be32 ipaddr[4];
>> +	u32 key[4];
>> +	u32 salt;
>> +	bool used;
>> +	bool crypt;
>> +	bool rx;
>> +};
>> +
>> +struct nsim_ipsec {
>> +	struct nsim_sa sa[NSIM_IPSEC_MAX_SA_COUNT];
>> +	struct dentry *pfile;
>> +	u32 count;
>> +	u32 tx;
>> +	u32 ok;
>> +};
>> +#endif
> 
> No need to wrap struct definitions in #if/#endif.

I suppose this is a philosophical point... Since CONFIG_XFRM_OFFLOAD is 
not yet a common config setting, I'd like to keep it here to not break 
other folks' builds or dirty them up with unused struct definitions when 
they aren't playing with IPsec offload anyway.

> 
>>   struct netdevsim {
>>   	struct net_device *netdev;
>>   
>> @@ -67,6 +90,9 @@ struct netdevsim {
>>   #if IS_ENABLED(CONFIG_NET_DEVLINK)
>>   	struct devlink *devlink;
>>   #endif
>> +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD)
>> +	struct nsim_ipsec ipsec;
>> +#endif
>>   };
>>   
>>   extern struct dentry *nsim_ddir;
>> @@ -147,6 +173,17 @@ static inline void nsim_devlink_exit(void)
>>   }
>>   #endif
>>   
>> +#if IS_ENABLED(CONFIG_XFRM_OFFLOAD)
>> +void nsim_ipsec_init(struct netdevsim *ns);
>> +void nsim_ipsec_teardown(struct netdevsim *ns);
>> +int nsim_ipsec_tx(struct netdevsim *ns, struct sk_buff *skb);
>> +#else
>> +static inline void nsim_ipsec_init(struct netdevsim *ns) {};
>> +static inline void nsim_ipsec_teardown(struct netdevsim *ns) {};
>> +static inline int nsim_ipsec_tx(struct netdevsim *ns, struct sk_buff *skb)
>> +								{ return 1; };
> 
> Please use the same formatting for static inlines as the rest of the
> file.  The ';' are also unnecessary.

Sure

> 
> Other than those formatting nit picks looks good to me :)
> 

Cheers,
sln

^ permalink raw reply

* Re: [PATCH v3 net-next] mdio-mux-gpio: Remove VLA usage
From: Joe Perches @ 2018-06-25 22:23 UTC (permalink / raw)
  To: Kees Cook, David S. Miller; +Cc: linux-kernel, Andrew Lunn, netdev
In-Reply-To: <20180625220917.GA25022@beast>

On Mon, 2018-06-25 at 15:09 -0700, Kees Cook wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> allocates the values buffer during the callback instead of putting it
> on the stack.

[]

> diff --git a/drivers/net/phy/mdio-mux-gpio.c b/drivers/net/phy/mdio-mux-gpio.c
[]
> @@ -44,15 +44,21 @@ static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
>  static int mdio_mux_gpio_probe(struct platform_device *pdev)
>  {
[]
> +	s = devm_kzalloc(&pdev->dev, sizeof(*s->values) * gpios->ndescs +
> +				     sizeof(*s), GFP_KERNEL);

Isn't this supposed to use your new struct_size()

^ permalink raw reply

* [PATCH v3 net-next] mdio-mux-gpio: Remove VLA usage
From: Kees Cook @ 2018-06-25 22:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, Andrew Lunn, netdev

In the quest to remove all stack VLA usage from the kernel[1], this
allocates the values buffer during the callback instead of putting it
on the stack.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
v2: allocate array as part of structure (Andrew Lunn)
v3: resend to netdev with Reviewed-by
---
 drivers/net/phy/mdio-mux-gpio.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/mdio-mux-gpio.c b/drivers/net/phy/mdio-mux-gpio.c
index 082ffef0dec4..6b55e0ddef63 100644
--- a/drivers/net/phy/mdio-mux-gpio.c
+++ b/drivers/net/phy/mdio-mux-gpio.c
@@ -20,23 +20,23 @@
 struct mdio_mux_gpio_state {
 	struct gpio_descs *gpios;
 	void *mux_handle;
+	int values[];
 };
 
 static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
 				   void *data)
 {
 	struct mdio_mux_gpio_state *s = data;
-	int values[s->gpios->ndescs];
 	unsigned int n;
 
 	if (current_child == desired_child)
 		return 0;
 
 	for (n = 0; n < s->gpios->ndescs; n++)
-		values[n] = (desired_child >> n) & 1;
+		s->values[n] = (desired_child >> n) & 1;
 
 	gpiod_set_array_value_cansleep(s->gpios->ndescs, s->gpios->desc,
-				       values);
+				       s->values);
 
 	return 0;
 }
@@ -44,15 +44,21 @@ static int mdio_mux_gpio_switch_fn(int current_child, int desired_child,
 static int mdio_mux_gpio_probe(struct platform_device *pdev)
 {
 	struct mdio_mux_gpio_state *s;
+	struct gpio_descs *gpios;
 	int r;
 
-	s = devm_kzalloc(&pdev->dev, sizeof(*s), GFP_KERNEL);
-	if (!s)
+	gpios = gpiod_get_array(&pdev->dev, NULL, GPIOD_OUT_LOW);
+	if (IS_ERR(gpios))
+		return PTR_ERR(gpios);
+
+	s = devm_kzalloc(&pdev->dev, sizeof(*s->values) * gpios->ndescs +
+				     sizeof(*s), GFP_KERNEL);
+	if (!s) {
+		gpiod_put_array(gpios);
 		return -ENOMEM;
+	}
 
-	s->gpios = gpiod_get_array(&pdev->dev, NULL, GPIOD_OUT_LOW);
-	if (IS_ERR(s->gpios))
-		return PTR_ERR(s->gpios);
+	s->gpios = gpios;
 
 	r = mdio_mux_init(&pdev->dev, pdev->dev.of_node,
 			  mdio_mux_gpio_switch_fn, &s->mux_handle, s, NULL);
-- 
2.17.1


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [PATCH net-next v2 7/7] net: sched: call reoffload op on block callback reg
From: Jiri Pirko @ 2018-06-25 21:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625213010.13266-8-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 11:30:10PM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>Call the reoffload tcf_proto_op on all tcf_proto nodes in all chains of a
>block when a callback tries to register to a block that already has
>offloaded rules. If all existing rules cannot be offloaded then the
>registration is rejected. This replaces the previous policy of rejecting
>such callback registration outright.
>
>On unregistration of a callback, the rules are flushed for that given cb.
>The implementation of block sharing in the NFP driver, for example,
>duplicates shared rules to all devs bound to a block. This meant that
>rules could still exist in hw even after a device is unbound from a block
>(assuming the block still remains active).
>
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next v2 4/7] net: sched: cls_matchall: implement offload tcf_proto_op
From: Jiri Pirko @ 2018-06-25 21:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625213010.13266-5-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 11:30:07PM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>Add the reoffload tcf_proto_op in matchall to generate an offload message
>for each filter in the given tcf_proto. Call the specified callback with
>this new offload message. The function only returns an error if the
>callback rejects adding a 'hardware only' rule.
>
>Ensure matchall flags correctly report if the rule is in hw by keeping a
>reference counter for the number of instances of the rule offloaded. Only
>update the flag when this counter changes from or to 0.
>
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next v2 3/7] net: sched: cls_flower: implement offload tcf_proto_op
From: Jiri Pirko @ 2018-06-25 21:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625213010.13266-4-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 11:30:06PM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>Add the reoffload tcf_proto_op in flower to generate an offload message
>for each filter in the given tcf_proto. Call the specified callback with
>this new offload message. The function only returns an error if the
>callback rejects adding a 'hardware only' rule.
>
>A filter contains a flag to indicate if it is in hardware or not. To
>ensure the reoffload function properly maintains this flag, keep a
>reference counter for the number of instances of the filter that are in
>hardware. Only update the flag when this counter changes from or to 0. Add
>a generic helper function to implement this behaviour.
>
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox