Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 0/2] tcp: reduce quickack pressure for ECN
From: David Miller @ 2018-05-22 19:43 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, vanj, ncardwell, ycheng, soheil, eric.dumazet
In-Reply-To: <20180521220857.229273-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 21 May 2018 15:08:55 -0700

> Small patch series changing TCP behavior vs quickack and ECN
> 
> First patch is a refactoring, adding parameter to tcp_incr_quickack()
> and tcp_enter_quickack_mode() helpers.
> 
> Second patch implements the change, lowering number of ACK packets
> sent after an ECN event.

Series applied, thanks Eric.

^ permalink raw reply

* Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2018-05-22
From: David Miller @ 2018-05-22 19:46 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180522174527.19680-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 22 May 2018 10:45:18 -0700

> This series contains updates to i40e only.
> 
> Jake provides all the changes in this series starting with making it
> consistent in how we approach the bit lock.  Fixed the reporting of the
> VEB statistics and the queue statistics to always return every queue
> even if it is not currently in use.  Use WARN_ONCE() so that the first
> time we end up with an incorrect size we will dump a stack trace and a
> message to help highlight the issue early in testing.  Folded the fixed
> string prefix into the stat string definition.  Instead of using a
> separate char *p pointer when copying strings, use the data pointer
> directly.  Added code comments for several of the statistic functions to
> better explain the number and ordering of statistics.

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH bpf-next v3 07/10] bpf: fix multi-function JITed dump obtained via syscall
From: Jakub Kicinski @ 2018-05-22 19:47 UTC (permalink / raw)
  To: Sandipan Das; +Cc: ast, daniel, netdev, linuxppc-dev, mpe, naveen.n.rao
In-Reply-To: <6f245a366d5a2957e2256f4bd89ab56ade6508d5.1527008647.git.sandipan@linux.vnet.ibm.com>

On Tue, 22 May 2018 22:46:10 +0530, Sandipan Das wrote:
> Currently, for multi-function programs, we cannot get the JITed
> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
> command. Because of this, userspace tools such as bpftool fail
> to identify a multi-function program as being JITed or not.
> 
> With the JIT enabled and the test program running, this can be
> verified as follows:
> 
>   # cat /proc/sys/net/core/bpf_jit_enable
>   1
> 
> Before applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T11:43:38+0530  uid 0
>           xlated 216B  not jited  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>   no instructions returned
> 
> After applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T12:13:01+0530  uid 0
>           xlated 216B  jited 308B  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>      0:   nop
>      4:   nop
>      8:   mflr    r0
>      c:   std     r0,16(r1)
>     10:   stdu    r1,-112(r1)
>     14:   std     r31,104(r1)
>     18:   addi    r31,r1,48
>     1c:   li      r3,10
>   ...
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  kernel/bpf/syscall.c | 36 +++++++++++++++++++++++++++++++++---
>  1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index f0ad4b5f0224..1c4cba91e523 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1970,13 +1970,43 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	 * for offload.
>  	 */
>  	ulen = info.jited_prog_len;
> -	info.jited_prog_len = prog->jited_len;
> +	if (prog->aux->func_cnt) {
> +		u32 i;
> +
> +		info.jited_prog_len = 0;
> +		for (i = 0; i < prog->aux->func_cnt; i++)
> +			info.jited_prog_len += prog->aux->func[i]->jited_len;
> +	} else {
> +		info.jited_prog_len = prog->jited_len;
> +	}
> +
>  	if (info.jited_prog_len && ulen) {
>  		if (bpf_dump_raw_ok()) {
>  			uinsns = u64_to_user_ptr(info.jited_prog_insns);
>  			ulen = min_t(u32, info.jited_prog_len, ulen);
> -			if (copy_to_user(uinsns, prog->bpf_func, ulen))
> -				return -EFAULT;
> +
> +			/* for multi-function programs, copy the JITed
> +			 * instructions for all the functions
> +			 */
> +			if (prog->aux->func_cnt) {
> +				u32 len, free, i;
> +				u8 *img;
> +
> +				free = ulen;
> +				for (i = 0; i < prog->aux->func_cnt; i++) {
> +					len = prog->aux->func[i]->jited_len;
> +					img = (u8 *) prog->aux->func[i]->bpf_func;
> +					if (len > free)
> +						break;

nit: interesting, the previous code used to fill up the space
completely, I would personally vote to keep that behaviour and do:

    len = min(len, free);
    copy();
    free -= len;
    if (!free)
        break;

otherwise the user space doesn't know when to stop disassembling
truncated output.  But that's really a corner case, so not sure we care.

> +					if (copy_to_user(uinsns, img, len))
> +						return -EFAULT;
> +					uinsns += len;
> +					free -= len;
> +				}
> +			} else {
> +				if (copy_to_user(uinsns, prog->bpf_func, ulen))
> +					return -EFAULT;
> +			}
>  		} else {
>  			info.jited_prog_insns = 0;
>  		}

^ permalink raw reply

* Re: [PATCH net 1/1] qed: Fix mask for physical address in ILT entry
From: David Miller @ 2018-05-22 19:48 UTC (permalink / raw)
  To: Shahed.Shaikh; +Cc: netdev, Ariel.Elior, Dept-EngEverestLinuxL2
In-Reply-To: <DM2PR07MB154780CFD55C394E7D3212E89D940@DM2PR07MB1547.namprd07.prod.outlook.com>

From: "Shaikh, Shahed" <Shahed.Shaikh@cavium.com>
Date: Tue, 22 May 2018 19:42:46 +0000

> Can you please queues this fix for -stable?

I did, see:

http://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive=

^ permalink raw reply

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 19:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Sridhar Samudrala, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180522173844.GP2149@nanopsycho>

On Tue, May 22, 2018 at 07:38:44PM +0200, Jiri Pirko wrote:
> >> >> In private
> >> >> flag. I don't see no reason to break this pattern here.
> >> >
> >> >Other masters are setup from userspace, this one is set up automatically
> >> >by kernel. So the bar is higher, we need an interface that existing
> >> >userspace knows about.  We can't just say "oh if userspace set this up
> >> >it should know to skip lowerdevs".
> >> >
> >> >Otherwise multiple interfaces with same mac tend to confuse userspace.
> >> 
> >> No difference, really.
> >> Regardless who does the setup, and independent userspace deamon should
> >> react accordingly.
> >
> >If the deamon does the setup itself, it's reasonable to require that it
> >learns about new flags each time we add a new driver.  If it doesn't,
> >then I think it's less reasonable.
> 
> No need. The "IFLA_MASTER" attr is always there to be looked at. That is
> enough.

Oh so if it has an master, skip it? Sorry, I misunderstood what you were
saying earlier.

Thanks, this makes sense to me.

-- 
MST

^ permalink raw reply

* Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
From: Jakub Kicinski @ 2018-05-22 19:55 UTC (permalink / raw)
  To: Sandipan Das
  Cc: ast, daniel, netdev, linuxppc-dev, mpe, naveen.n.rao,
	Quentin Monnet
In-Reply-To: <88b61b11ebca5b44bad0c34225b6f2383e5983a5.1527008647.git.sandipan@linux.vnet.ibm.com>

On Tue, 22 May 2018 22:46:13 +0530, Sandipan Das wrote:
> +		if (info.nr_jited_func_lens && info.jited_func_lens) {
> +			struct kernel_sym *sym = NULL;
> +			unsigned char *img = buf;
> +			__u64 *ksyms = NULL;
> +			__u32 *lens;
> +			__u32 i;
> +
> +			if (info.nr_jited_ksyms) {
> +				kernel_syms_load(&dd);
> +				ksyms = (__u64 *) info.jited_ksyms;
> +			}
> +
> +			lens = (__u32 *) info.jited_func_lens;
> +			for (i = 0; i < info.nr_jited_func_lens; i++) {
> +				if (ksyms) {
> +					sym = kernel_syms_search(&dd, ksyms[i]);
> +					if (sym)
> +						printf("%s:\n", sym->name);
> +					else
> +						printf("%016llx:\n", ksyms[i]);
> +				}
> +
> +				disasm_print_insn(img, lens[i], opcodes, name);
> +				img += lens[i];
> +				printf("\n");
> +			}
> +		} else {

The output doesn't seem to be JSON-compatible :(  We try to make sure
all bpftool command can produce valid JSON when run with -j (or -p)
switch.

Would it be possible to make each function a separate JSON object with
"name" and "insn" array?  Would that work?

^ permalink raw reply

* Re: Regression: Approximate 34% performance hit in receive throughput over ixgbe seen due to build_skb patch
From: Alexander Duyck @ 2018-05-22 20:03 UTC (permalink / raw)
  To: William Kucharski
  Cc: LKML, Netdev, intel-wired-lan, Jeff Kirsher, Duyck, Alexander H
In-Reply-To: <4F646FBB-FE0B-4FEE-98E5-3CA2DF0598DE@oracle.com>

On Tue, May 22, 2018 at 12:29 PM, William Kucharski
<william.kucharski@oracle.com> wrote:
>
>
>> On May 22, 2018, at 12:23 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
>>
>> 3. There should be a private flag that can be updated via "ethtool
>> --set-priv-flags" called "legacy-rx" that you can enable that will
>> roll back to the original that did the copy-break type approach for
>> small packets and the headers of the frame.
>
> With legacy-rx enabled, most of the regression goes away, but it's still present
> as compared to the code without the patch; the regression then drops to about 6%:
>
> # ethtool --show-priv-flags eno1
> Private flags for eno1:
> legacy-rx: on
>
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
>
>  65536      64   60.00     35934709      0     306.64
>  65536           60.00     33791739            288.35
>
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
>
>  65536      64   60.00     39254351      0     334.97
>  65536           60.00     36761069            313.69
>
> Is this variance to be expected, or do you think modification of the
> interrupt delay would achieve better results?
>
>
>     William Kucharski
>

I would think with modification of interrupt delay you could probably
do much better if my assumption is correct and the issue is us sitting
on packets for too long so we overrun the socket buffer and start
dropping packets or stalling the Tx.

Thanks.

- Alex

^ permalink raw reply

* Re: [Cake] [PATCH net-next v14 6/7] sch_cake: Add overhead compensation support to the rate shaper
From: Marcelo Ricardo Leitner @ 2018-05-22 20:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, cake
In-Reply-To: <52F9132E-4FDC-495A-A020-BCD963B3E3CF@toke.dk>

On Tue, May 22, 2018 at 10:44:53AM +0200, Toke Høiland-Jørgensen wrote:
> 
> 
> On 22 May 2018 01:45:13 CEST, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> >On Mon, May 21, 2018 at 10:35:58PM +0200, Toke Høiland-Jørgensen wrote:
> >> +static u32 cake_overhead(struct cake_sched_data *q, const struct
> >sk_buff *skb)
> >> +{
> >> +	const struct skb_shared_info *shinfo = skb_shinfo(skb);
> >> +	unsigned int hdr_len, last_len = 0;
> >> +	u32 off = skb_network_offset(skb);
> >> +	u32 len = qdisc_pkt_len(skb);
> >> +	u16 segs = 1;
> >> +
> >> +	q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8);
> >> +
> >> +	if (!shinfo->gso_size)
> >> +		return cake_calc_overhead(q, len, off);
> >> +
> >> +	/* borrowed from qdisc_pkt_len_init() */
> >> +	hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
> >> +
> >> +	/* + transport layer */
> >> +	if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 |
> >> +						SKB_GSO_TCPV6))) {
> >> +		const struct tcphdr *th;
> >> +		struct tcphdr _tcphdr;
> >> +
> >> +		th = skb_header_pointer(skb, skb_transport_offset(skb),
> >> +					sizeof(_tcphdr), &_tcphdr);
> >> +		if (likely(th))
> >> +			hdr_len += __tcp_hdrlen(th);
> >> +	} else {
> >
> >I didn't see some code limiting GSO packets to just TCP or UDP. Is it
> >safe to assume that this packet is an UDP one, and not SCTP or ESP,
> >for example?
> 
> As the comment says, I nicked this from the qdisc init code.
> So I assume it's safe? :)

As long as it doesn't go further than this, it is. As in, it is just
validating if it can contain an UDP header, and if so, account for its
size, without actually reading the header.

Considering everything !TCP as UDP work as an approximation, which is
quite accurate. SCTP header is just 4 bytes bigger than UDP header and
is equal to ESP header size.

> 
> >> +		struct udphdr _udphdr;
> >> +
> >> +		if (skb_header_pointer(skb, skb_transport_offset(skb),
> >> +				       sizeof(_udphdr), &_udphdr))
> >> +			hdr_len += sizeof(struct udphdr);
> >> +	}
> >> +
> >> +	if (unlikely(shinfo->gso_type & SKB_GSO_DODGY))
> >> +		segs = DIV_ROUND_UP(skb->len - hdr_len,
> >> +				    shinfo->gso_size);
> >> +	else
> >> +		segs = shinfo->gso_segs;
> >> +
> >> +	len = shinfo->gso_size + hdr_len;
> >> +	last_len = skb->len - shinfo->gso_size * (segs - 1);
> >> +
> >> +	return (cake_calc_overhead(q, len, off) * (segs - 1) +
> >> +		cake_calc_overhead(q, last_len, off));
> >> +}
> >> +
> 

^ permalink raw reply

* [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jeff Kirsher @ 2018-05-22 20:38 UTC (permalink / raw)
  To: davem, dledford, jgg
  Cc: Sindhu Devale, netdev, linux-rdma, nhorman, sassmann, jogreene,
	Shiraz Saleem, Jeff Kirsher

From: Sindhu Devale <sindhu.devale@intel.com>

Currently i40iw is dependent on i40e symbols
i40e_register_client and i40e_unregister_client due to
which i40iw cannot be loaded without i40e being loaded.

This patch allows RDMA driver to build and load without
linking to LAN driver and without LAN driver being loaded
first. Once the LAN driver is loaded, the RDMA driver
is notified through the netdevice notifiers to register
as client to the LAN driver. Add function pointers to IDC
register/unregister in the private VSI structure. This
allows a RDMA driver to build without linking to i40e.

Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/infiniband/hw/i40iw/i40iw.h           |  24 +++
 drivers/infiniband/hw/i40iw/i40iw_main.c      | 141 ++++++++++++++++--
 drivers/infiniband/hw/i40iw/i40iw_utils.c     |   5 +-
 drivers/net/ethernet/intel/i40e/i40e.h        |   1 +
 drivers/net/ethernet/intel/i40e/i40e_client.h |   9 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   6 +
 6 files changed, 173 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw.h b/drivers/infiniband/hw/i40iw/i40iw.h
index d5d8c1be345a..c6398b73a8da 100644
--- a/drivers/infiniband/hw/i40iw/i40iw.h
+++ b/drivers/infiniband/hw/i40iw/i40iw.h
@@ -119,6 +119,30 @@
 #define I40IW_CQP_COMPL_SQ_WQE_FLUSHED    3
 #define I40IW_CQP_COMPL_RQ_SQ_WQE_FLUSHED 4
 
+enum I40IW_IDC_STATE {
+	I40IW_STATE_INVALID,
+	I40IW_STATE_VALID,
+	I40IW_STATE_REG_FAILED
+};
+
+struct i40iw_peer {
+	struct module *module;
+#define MAX_PEER_NAME_SIZE 8
+	char name[MAX_PEER_NAME_SIZE];
+	enum I40IW_IDC_STATE state;
+	atomic_t ref_count;
+	int (*idc_reg_peer_driver)(struct i40e_client *i40iw_client);
+	int (*idc_unreg_peer_driver)(struct i40e_client *i40iw_client);
+};
+
+struct i40iw_peer_drv {
+	struct i40e_client i40iw_client;
+	struct i40iw_peer peer;
+};
+
+bool i40iw_is_new_peer(struct net_device *netdev);
+void i40iw_reg_peer(void);
+
 struct i40iw_cqp_compl_info {
 	u32 op_ret_val;
 	u16 maj_err_code;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_main.c b/drivers/infiniband/hw/i40iw/i40iw_main.c
index 9cd0d3ef9057..f4c5be11c1d4 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_main.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_main.c
@@ -78,6 +78,7 @@ MODULE_AUTHOR("Intel Corporation, <e1000-rdma@lists.sourceforge.net>");
 MODULE_DESCRIPTION("Intel(R) Ethernet Connection X722 iWARP RDMA Driver");
 MODULE_LICENSE("Dual BSD/GPL");
 
+static struct i40iw_peer_drv peer_drv;
 static struct i40e_client i40iw_client;
 static char i40iw_client_name[I40E_CLIENT_STR_LENGTH] = "i40iw";
 
@@ -103,6 +104,30 @@ static struct notifier_block i40iw_netdevice_notifier = {
 	.notifier_call = i40iw_netdevice_event
 };
 
+/**
+ * i40iw_open_inc_ref - Increment ref count for a open
+ */
+static void i40iw_open_inc_ref(void)
+{
+	atomic_inc(&peer_drv.peer.ref_count);
+}
+
+/**
+ * i40iw_open_dec_ref - Decrement ref count for a open
+ */
+static void i40iw_open_dec_ref(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID &&
+	    atomic_dec_and_test(&peer->ref_count)) {
+		peer->state = I40IW_STATE_INVALID;
+		peer->idc_unreg_peer_driver(&peer_drv.i40iw_client);
+		module_put(peer->module);
+	}
+}
+
 /**
  * i40iw_find_i40e_handler - find a handler given a client info
  * @ldev: pointer to a client info
@@ -1710,6 +1735,7 @@ static int i40iw_open(struct i40e_info *ldev, struct i40e_client *client)
 		if(iwdev->param_wq == NULL)
 			break;
 		i40iw_pr_info("i40iw_open completed\n");
+		i40iw_open_inc_ref();
 		return 0;
 	} while (0);
 
@@ -1801,6 +1827,7 @@ static void i40iw_close(struct i40e_info *ldev, struct i40e_client *client, bool
 	i40iw_cm_teardown_connections(iwdev, NULL, NULL, true);
 	destroy_workqueue(iwdev->virtchnl_wq);
 	i40iw_deinit_device(iwdev);
+	i40iw_open_dec_ref();
 }
 
 /**
@@ -2024,6 +2051,104 @@ static const struct i40e_client_ops i40e_ops = {
 	.vf_capable = i40iw_vf_capable
 };
 
+/**
+ * i40iw_is_new_peer - check netdev of the peer driver
+ * @netdev: netdev of peer driver
+ */
+bool i40iw_is_new_peer(struct net_device *netdev)
+{
+	struct idc_srv_provider *sp;
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID)
+		return false;
+
+	if (netdev->dev.parent && netdev->dev.parent->driver &&
+	    !strncmp(netdev->dev.parent->driver->name, peer->name, sizeof(peer->name))) {
+		sp = (struct idc_srv_provider *)netdev_priv(netdev);
+		if (sp->signature != IDC_SIGNATURE || sp->version)
+			return false;
+
+		/* Found the driver */
+		peer->idc_reg_peer_driver = sp->idc_reg_peer_driver;
+		peer->idc_unreg_peer_driver = sp->idc_unreg_peer_driver;
+		peer->module = netdev->dev.parent->driver->owner;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
+ * i40iw_initialize_client - Setup client struct
+ */
+static void i40iw_initialize_client(void)
+{
+	struct i40e_client *i40iw_client = &peer_drv.i40iw_client;
+
+	i40iw_client->version.major = CLIENT_IW_INTERFACE_VERSION_MAJOR;
+	i40iw_client->version.minor = CLIENT_IW_INTERFACE_VERSION_MINOR;
+	i40iw_client->version.build = CLIENT_IW_INTERFACE_VERSION_BUILD;
+	i40iw_client->ops = &i40e_ops;
+	memcpy(i40iw_client->name, i40iw_client_name, I40E_CLIENT_STR_LENGTH);
+	i40iw_client->type = I40E_CLIENT_IWARP;
+	strncpy(peer_drv.peer.name, "i40e", sizeof(peer_drv.peer.name));
+}
+
+/**
+ * i40iw_reg_peer - Register with peer
+ */
+void i40iw_reg_peer(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+
+	if (peer->state == I40IW_STATE_VALID)
+		return;
+
+	if (peer->idc_reg_peer_driver &&
+	    !peer->idc_reg_peer_driver(&peer_drv.i40iw_client)) {
+		peer->state = I40IW_STATE_VALID;
+		try_module_get(peer->module);
+	} else {
+		peer->state = I40IW_STATE_REG_FAILED;
+	}
+}
+
+/**
+ * i40iw_find_idc_peer - Search netdevs for a peer driver
+ */
+static void i40iw_find_idc_peer(void)
+{
+	struct net_device *dev;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, dev) {
+		if (i40iw_is_new_peer(dev))
+			break;
+	}
+	rcu_read_unlock();
+	i40iw_reg_peer();
+}
+
+/**
+ * i40iw_unreg_peer - Unregister with peer
+ */
+static void i40iw_unreg_peer(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID) {
+		peer->state = I40IW_STATE_INVALID;
+		peer->idc_unreg_peer_driver(&peer_drv.i40iw_client);
+		module_put(peer->module);
+	}
+}
+
 /**
  * i40iw_init_module - driver initialization function
  *
@@ -2032,20 +2157,12 @@ static const struct i40e_client_ops i40e_ops = {
  */
 static int __init i40iw_init_module(void)
 {
-	int ret;
-
-	memset(&i40iw_client, 0, sizeof(i40iw_client));
-	i40iw_client.version.major = CLIENT_IW_INTERFACE_VERSION_MAJOR;
-	i40iw_client.version.minor = CLIENT_IW_INTERFACE_VERSION_MINOR;
-	i40iw_client.version.build = CLIENT_IW_INTERFACE_VERSION_BUILD;
-	i40iw_client.ops = &i40e_ops;
-	memcpy(i40iw_client.name, i40iw_client_name, I40E_CLIENT_STR_LENGTH);
-	i40iw_client.type = I40E_CLIENT_IWARP;
 	spin_lock_init(&i40iw_handler_lock);
-	ret = i40e_register_client(&i40iw_client);
+	i40iw_initialize_client();
+	i40iw_find_idc_peer();
 	i40iw_register_notifiers();
 
-	return ret;
+	return 0;
 }
 
 /**
@@ -2057,7 +2174,7 @@ static int __init i40iw_init_module(void)
 static void __exit i40iw_exit_module(void)
 {
 	i40iw_unregister_notifiers();
-	i40e_unregister_client(&i40iw_client);
+	i40iw_unreg_peer();
 }
 
 module_init(i40iw_init_module);
diff --git a/drivers/infiniband/hw/i40iw/i40iw_utils.c b/drivers/infiniband/hw/i40iw/i40iw_utils.c
index a9ea966877f2..264939942da0 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_utils.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_utils.c
@@ -314,8 +314,11 @@ int i40iw_netdevice_event(struct notifier_block *notifier,
 	event_netdev = netdev_notifier_info_to_dev(ptr);
 
 	hdl = i40iw_find_netdev(event_netdev);
-	if (!hdl)
+	if (!hdl) {
+		if (i40iw_is_new_peer(event_netdev))
+			i40iw_reg_peer();
 		return NOTIFY_DONE;
+	}
 
 	iwdev = &hdl->device;
 	if (iwdev->init_state < RDMA_DEV_REGISTERED || iwdev->closing)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 7a80652e2500..e3171b696848 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -789,6 +789,7 @@ struct i40e_vsi {
 } ____cacheline_internodealigned_in_smp;
 
 struct i40e_netdev_priv {
+	struct idc_srv_provider prov_callbacks;
 	struct i40e_vsi *vsi;
 };
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.h b/drivers/net/ethernet/intel/i40e/i40e_client.h
index 72994baf4941..95a47df9c104 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.h
@@ -44,6 +44,15 @@ struct i40e_client;
 #define I40E_QUEUE_TYPE_PE_AEQ  0x80
 #define I40E_QUEUE_INVALID_IDX	0xFFFF
 
+#define IDC_SIGNATURE 0x494e54454c494443ULL	/* INTELIDC */
+struct idc_srv_provider {
+	u64 signature;
+	u8 version;
+	u8 rsvd[7];
+	int (*idc_reg_peer_driver)(struct i40e_client *client);
+	int (*idc_unreg_peer_driver)(struct i40e_client *client);
+};
+
 struct i40e_qv_info {
 	u32 v_idx; /* msix_vector */
 	u16 ceq_idx;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b5daa5c9c7de..984001ae7680 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11913,6 +11913,12 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	np = netdev_priv(netdev);
 	np->vsi = vsi;
 
+	np->prov_callbacks.signature = IDC_SIGNATURE;
+	np->prov_callbacks.version = 0;
+	memset(np->prov_callbacks.rsvd, 0, sizeof(np->prov_callbacks.rsvd));
+	np->prov_callbacks.idc_reg_peer_driver = i40e_register_client;
+	np->prov_callbacks.idc_unreg_peer_driver = i40e_unregister_client;
+
 	hw_enc_features = NETIF_F_SG			|
 			  NETIF_F_IP_CSUM		|
 			  NETIF_F_IPV6_CSUM		|
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH 07/33] iwlwifi: mvm: use match_string() helper
From: Andy Shevchenko @ 2018-05-22 20:41 UTC (permalink / raw)
  To: Yisheng Xie, Luca Coelho
  Cc: Linux Kernel Mailing List, Kalle Valo, Intel Linux Wireless,
	Johannes Berg, Emmanuel Grumbach, open list:TI WILINK WIRELES...,
	netdev
In-Reply-To: <63a78572-b7d3-9cc7-9e22-5bd19cad3333@huawei.com>

On Tue, May 22, 2018 at 6:30 AM, Yisheng Xie <xieyisheng1@huawei.com> wrote:

>> But it's up tu Loca.

Shame on me. I meant Luca, of course!
Luca, sorry.

> OK, I will change it if Loca agree your opinion.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH net-next 0/7] net/ipv6: Fix route append and replace use cases
From: David Ahern @ 2018-05-22 20:44 UTC (permalink / raw)
  To: David Miller, dsahern; +Cc: netdev, Thomas.Winter, idosch, sharpd, roopa
In-Reply-To: <20180522.144617.1003575784676243779.davem@davemloft.net>

On 5/22/18 12:46 PM, David Miller wrote:
> 
> Ok, I'll apply this series.
> 
> But if this breaks things for anyone in a practical way, I am unfortunately
> going to have to revert no matter how silly the current behavior may be.
> 

Understood. I have to try the best option first. I'll look at
regressions if they happen.

^ permalink raw reply

* [PATCH net] net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
From: Roopa Prabhu @ 2018-05-22 20:44 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv4/fib_frontend.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..e66172a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
 	[RTA_ENCAP]		= { .type = NLA_NESTED },
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
+	[RTA_TABLE]		= { .type = NLA_U32 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next] enic: set DMA mask to 47 bit
From: Govindarajulu Varadarajan @ 2018-05-22 13:37 UTC (permalink / raw)
  To: davem, netdev; +Cc: benve, Govindarajulu Varadarajan

In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then
failover to DMA") DMA mask was changed from 40 bits to 64 bits.
Hardware actually supports only 47 bits.

Fixes: 624dbf55a359b("driver/net: enic: Try DMA 64 first, then failover
to DMA")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 81684acf52af..8a8b12b720ef 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2747,11 +2747,11 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_set_master(pdev);
 
 	/* Query PCI controller on system for DMA addressing
-	 * limitation for the device.  Try 64-bit first, and
+	 * limitation for the device.  Try 47-bit first, and
 	 * fail to 32-bit.
 	 */
 
-	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(47));
 	if (err) {
 		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (err) {
@@ -2765,10 +2765,10 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 			goto err_out_release_regions;
 		}
 	} else {
-		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(47));
 		if (err) {
 			dev_err(dev, "Unable to obtain %u-bit DMA "
-				"for consistent allocations, aborting\n", 64);
+				"for consistent allocations, aborting\n", 47);
 			goto err_out_release_regions;
 		}
 		using_dac = 1;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next v2 1/7] net: dsa: qca8k: Add QCA8334 binding documentation
From: Michal @ 2018-05-22 20:50 UTC (permalink / raw)
  To: Rob Herring
  Cc: netdev, linux-kernel, devicetree, f.fainelli, vivien.didelot,
	andrew, mark.rutland, davem, michal.vokac
In-Reply-To: <20180522194039.GA15413@rob-hp-laptop>

On 22.5.2018 21:40, Rob Herring wrote:
> On Tue, May 22, 2018 at 01:16:26PM +0200, Michal Vokáč wrote:
>> Add support for the four-port variant of the Qualcomm QCA833x switch.
>>
>> The CPU port default link settings can be reconfigured using
>> a fixed-link sub-node.
>>
>> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
>> ---
>> Changes in v2:
>>   - Add commit message and document fixed-link binding.
>>
>>   .../devicetree/bindings/net/dsa/qca8k.txt          | 23 +++++++++++++++++++++-
>>   1 file changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> index 9c67ee4..15b9057 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> @@ -2,7 +2,10 @@
>>   
>>   Required properties:
>>   
>> -- compatible: should be "qca,qca8337"
>> +- compatible: should be one of:
>> +    "qca,qca8334"
>> +    "qca,qca8337"
>> +
>>   - #size-cells: must be 0
>>   - #address-cells: must be 1
>>   
>> @@ -14,6 +17,20 @@ port and PHY id, each subnode describing a port needs to have a valid phandle
>>   referencing the internal PHY connected to it. The CPU port of this switch is
>>   always port 0.
>>   
>> +A CPU port node has the following optional property:
> 
> s/property/node/
> 
> Otherwise,
> 
> Reviewed-by: Rob Herring <robh@kernel.org>

Good catch, I will correct this.
Thanks for the review Rob.

Michal

^ permalink raw reply

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-22 20:54 UTC (permalink / raw)
  To: Jiri Pirko, Michael S. Tsirkin
  Cc: stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180522161246.GN2149@nanopsycho>



On 5/22/2018 9:12 AM, Jiri Pirko wrote:
> Fixing the subj, sorry about that.
>
> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>> On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>>>> On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>>>>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>>>>>> Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>>>>>> Use the registration/notification framework supported by the generic
>>>>>>> failover infrastructure.
>>>>>>>
>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>> In previous patchset versions, the common code did
>>>>>> netdev_rx_handler_register() and netdev_upper_dev_link() etc
>>>>>> (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>>>>>
>>>>>> This should be part of the common "failover" code.
>>>> Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>>>> netvsc and only commonize the notifier and the main event handler routine.
>>>> Another complication is that netvsc does part of registration in a delayed workqueue.
>>> :( This kind of degrades the whole efford of having single solution
>>> in "failover" module. I think that common parts, as
>>> netdev_rx_handler_register() and others certainly is should be inside
>>> the common module. This is not a good time to minimize changes. Let's do
>>> the thing properly and fix the netvsc mess now.
>>>
>>>
>>>> It should be possible to move some of the code from net_failover.c to generic
>>>> failover.c in future if Stephen is ok with it.
>>>>
>>>>
>>>>> Also note that in the current patchset you use IFF_FAILOVER flag for
>>>>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>>>>> IFF_FAILOVER_SLAVE should be used.
>>>> Not sure which code you are referring to.  I only set IFF_FAILOVER_SLAVE
>>>> in patch 3.
>>> The existing netvsc driver.
>> We really can't change netvsc's flags now, even if it's interface is
>> messy, it's being used in the field. We can add a flag that makes netvsc
>> behave differently, and if this flag also allows enhanced functionality
>> userspace will gradually switch.
> Okay, although in this case, it really does not make much sense, so be
> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
> now. (This once-wrong-forever-wrong policy is flustrating me).
>
> But since this patchset introduces private flag IFF_FAILOVER and
> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
> netdevice to get at least some consistency between virtio_net and
> netvsc.

OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
register/unregister routines so that it is consistent with virtio_net.

Based on your discussion with mst, i think we can even remove IFF_SLAVE
setting on netvsc as it should not impact userspace.  If Stephen is OK
we can make this change too.

Do you see any other items that need to be resolved for this series to go in
this merge window?



>
>> Anything breaking userspace I fully expect Stephen to nack and
>> IMO with good reason.
>>
>> -- 
>> MST

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jason Gunthorpe @ 2018-05-22 20:56 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <20180522203831.20624-1-jeffrey.t.kirsher@intel.com>

On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> From: Sindhu Devale <sindhu.devale@intel.com>
> 
> Currently i40iw is dependent on i40e symbols
> i40e_register_client and i40e_unregister_client due to
> which i40iw cannot be loaded without i40e being loaded.
> 
> This patch allows RDMA driver to build and load without
> linking to LAN driver and without LAN driver being loaded
> first. Once the LAN driver is loaded, the RDMA driver
> is notified through the netdevice notifiers to register
> as client to the LAN driver. Add function pointers to IDC
> register/unregister in the private VSI structure. This
> allows a RDMA driver to build without linking to i40e.

Why would you want to do this? The rdma driver is non-functional
without the ethernet driver, so why on earth would we want to defeat
the module dependency mechanism?

Jason

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jeff Kirsher @ 2018-05-22 21:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <20180522205612.GD7502@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > From: Sindhu Devale <sindhu.devale@intel.com>
> > 
> > Currently i40iw is dependent on i40e symbols
> > i40e_register_client and i40e_unregister_client due to
> > which i40iw cannot be loaded without i40e being loaded.
> > 
> > This patch allows RDMA driver to build and load without
> > linking to LAN driver and without LAN driver being loaded
> > first. Once the LAN driver is loaded, the RDMA driver
> > is notified through the netdevice notifiers to register
> > as client to the LAN driver. Add function pointers to IDC
> > register/unregister in the private VSI structure. This
> > allows a RDMA driver to build without linking to i40e.
> 
> Why would you want to do this? The rdma driver is non-functional
> without the ethernet driver, so why on earth would we want to defeat
> the module dependency mechanism?

This change is driven by the OSV's like Red Hat, where customer's were
updating the i40e driver, which in turn broke i40iw.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net-next v5 0/3] fib rule selftest
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get, 
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
     reply (as pointed out by DaveM). I now try to reset the skb after the route
     lookup and before the netlink send (testing shows this is ok. More eyes and
     any feedback here will be helpful)

v5 - dropped RTA_TABLE ipv4_rtm_policy update from this series and posted
     it separately for net (feedback from Eric)

Roopa Prabhu (3):
  ipv4: support sport, dport and ip_proto in RTM_GETROUTE
  ipv6: support sport, dport and ip_proto in RTM_GETROUTE
  selftests: net: initial fib rule tests

 include/uapi/linux/rtnetlink.h                |   2 +
 net/ipv4/route.c                              | 152 ++++++++++++-----
 net/ipv6/route.c                              |  25 +++
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++++++++++++++++++++++++++
 5 files changed, 366 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

-- 
2.1.4

^ permalink raw reply

* [PATCH net-next v5 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet
In-Reply-To: <1527023009-13609-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/net/ip.h               |   3 +
 include/uapi/linux/rtnetlink.h |   3 +
 net/ipv4/Makefile              |   2 +-
 net/ipv4/fib_frontend.c        |   3 +
 net/ipv4/netlink.c             |  23 +++++++
 net/ipv4/route.c               | 146 ++++++++++++++++++++++++++++++-----------
 6 files changed, 140 insertions(+), 40 deletions(-)
 create mode 100644 net/ipv4/netlink.c

diff --git a/include/net/ip.h b/include/net/ip.h
index bada1f1..0d2281b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -664,4 +664,7 @@ extern int sysctl_icmp_msgs_burst;
 int ip_misc_proc_init(void);
 #endif
 
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+				struct netlink_ext_ack *extack);
+
 #endif	/* _IP_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 9b15005..cabb210 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,9 @@ enum rtattr_type_t {
 	RTA_PAD,
 	RTA_UID,
 	RTA_TTL_PROPAGATE,
+	RTA_IP_PROTO,
+	RTA_SPORT,
+	RTA_DPORT,
 	__RTA_MAX
 };
 
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index b379520..13f2ba9 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,7 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-	     metrics.o
+	     metrics.o netlink.o
 
 obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
 obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..897ae92 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,9 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
 	[RTA_ENCAP]		= { .type = NLA_NESTED },
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
+	[RTA_IP_PROTO]		= { .type = NLA_U8 },
+	[RTA_SPORT]		= { .type = NLA_U16 },
+	[RTA_DPORT]		= { .type = NLA_U16 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
diff --git a/net/ipv4/netlink.c b/net/ipv4/netlink.c
new file mode 100644
index 0000000..f86bb4f
--- /dev/null
+++ b/net/ipv4/netlink.c
@@ -0,0 +1,23 @@
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <linux/types.h>
+#include <net/net_namespace.h>
+#include <net/netlink.h>
+#include <net/ip.h>
+
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+				struct netlink_ext_ack *extack)
+{
+	*ip_proto = nla_get_u8(attr);
+
+	switch (*ip_proto) {
+	case IPPROTO_TCP:
+	case IPPROTO_UDP:
+	case IPPROTO_ICMP:
+		return 0;
+	default:
+		NL_SET_ERR_MSG(extack, "Unsupported ip proto");
+		return -EOPNOTSUPP;
+	}
+}
+EXPORT_SYMBOL_GPL(rtm_getroute_parse_ip_proto);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2cfa1b5..0e401dc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2574,11 +2574,10 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
 /* called with rcu_read_lock held */
-static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
-			struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-			u32 seq)
+static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
+			struct rtable *rt, u32 table_id, struct flowi4 *fl4,
+			struct sk_buff *skb, u32 portid, u32 seq)
 {
-	struct rtable *rt = skb_rtable(skb);
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
 	unsigned long expires = 0;
@@ -2674,7 +2673,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 			}
 		} else
 #endif
-			if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
+			if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
 				goto nla_put_failure;
 	}
 
@@ -2689,43 +2688,93 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 	return -EMSGSIZE;
 }
 
+static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst,
+						   u8 ip_proto, __be16 sport,
+						   __be16 dport)
+{
+	struct sk_buff *skb;
+	struct iphdr *iph;
+
+	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return NULL;
+
+	/* Reserve room for dummy headers, this skb can pass
+	 * through good chunk of routing engine.
+	 */
+	skb_reset_mac_header(skb);
+	skb_reset_network_header(skb);
+	skb->protocol = htons(ETH_P_IP);
+	iph = skb_put(skb, sizeof(struct iphdr));
+	iph->protocol = ip_proto;
+	iph->saddr = src;
+	iph->daddr = dst;
+	iph->version = 0x4;
+	iph->frag_off = 0;
+	iph->ihl = 0x5;
+	skb_set_transport_header(skb, skb->len);
+
+	switch (iph->protocol) {
+	case IPPROTO_UDP: {
+		struct udphdr *udph;
+
+		udph = skb_put_zero(skb, sizeof(struct udphdr));
+		udph->source = sport;
+		udph->dest = dport;
+		udph->len = sizeof(struct udphdr);
+		udph->check = 0;
+		break;
+	}
+	case IPPROTO_TCP: {
+		struct tcphdr *tcph;
+
+		tcph = skb_put_zero(skb, sizeof(struct tcphdr));
+		tcph->source	= sport;
+		tcph->dest	= dport;
+		tcph->doff	= sizeof(struct tcphdr) / 4;
+		tcph->rst = 1;
+		tcph->check = ~tcp_v4_check(sizeof(struct tcphdr),
+					    src, dst, 0);
+		break;
+	}
+	case IPPROTO_ICMP: {
+		struct icmphdr *icmph;
+
+		icmph = skb_put_zero(skb, sizeof(struct icmphdr));
+		icmph->type = ICMP_ECHO;
+		icmph->code = 0;
+	}
+	}
+
+	return skb;
+}
+
 static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 			     struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(in_skb->sk);
-	struct rtmsg *rtm;
 	struct nlattr *tb[RTA_MAX+1];
+	u32 table_id = RT_TABLE_MAIN;
+	__be16 sport = 0, dport = 0;
 	struct fib_result res = {};
+	u8 ip_proto = IPPROTO_UDP;
 	struct rtable *rt = NULL;
+	struct sk_buff *skb;
+	struct rtmsg *rtm;
 	struct flowi4 fl4;
 	__be32 dst = 0;
 	__be32 src = 0;
+	kuid_t uid;
 	u32 iif;
 	int err;
 	int mark;
-	struct sk_buff *skb;
-	u32 table_id = RT_TABLE_MAIN;
-	kuid_t uid;
 
 	err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy,
 			  extack);
 	if (err < 0)
-		goto errout;
+		return err;
 
 	rtm = nlmsg_data(nlh);
-
-	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
-	if (!skb) {
-		err = -ENOBUFS;
-		goto errout;
-	}
-
-	/* Reserve room for dummy headers, this skb can pass
-	   through good chunk of routing engine.
-	 */
-	skb_reset_mac_header(skb);
-	skb_reset_network_header(skb);
-
 	src = tb[RTA_SRC] ? nla_get_in_addr(tb[RTA_SRC]) : 0;
 	dst = tb[RTA_DST] ? nla_get_in_addr(tb[RTA_DST]) : 0;
 	iif = tb[RTA_IIF] ? nla_get_u32(tb[RTA_IIF]) : 0;
@@ -2735,14 +2784,22 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	else
 		uid = (iif ? INVALID_UID : current_uid());
 
-	/* Bugfix: need to give ip_route_input enough of an IP header to
-	 * not gag.
-	 */
-	ip_hdr(skb)->protocol = IPPROTO_UDP;
-	ip_hdr(skb)->saddr = src;
-	ip_hdr(skb)->daddr = dst;
+	if (tb[RTA_IP_PROTO]) {
+		err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+						  &ip_proto, extack);
+		if (err)
+			return err;
+	}
+
+	if (tb[RTA_SPORT])
+		sport = nla_get_be16(tb[RTA_SPORT]);
 
-	skb_reserve(skb, MAX_HEADER + sizeof(struct iphdr));
+	if (tb[RTA_DPORT])
+		dport = nla_get_be16(tb[RTA_DPORT]);
+
+	skb = inet_rtm_getroute_build_skb(src, dst, ip_proto, sport, dport);
+	if (!skb)
+		return -ENOBUFS;
 
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.daddr = dst;
@@ -2751,6 +2808,11 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	fl4.flowi4_oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0;
 	fl4.flowi4_mark = mark;
 	fl4.flowi4_uid = uid;
+	if (sport)
+		fl4.fl4_sport = sport;
+	if (dport)
+		fl4.fl4_dport = dport;
+	fl4.flowi4_proto = ip_proto;
 
 	rcu_read_lock();
 
@@ -2760,10 +2822,10 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		dev = dev_get_by_index_rcu(net, iif);
 		if (!dev) {
 			err = -ENODEV;
-			goto errout_free;
+			goto errout_rcu;
 		}
 
-		skb->protocol	= htons(ETH_P_IP);
+		fl4.flowi4_iif = iif; /* for rt_fill_info */
 		skb->dev	= dev;
 		skb->mark	= mark;
 		err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
@@ -2783,7 +2845,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	}
 
 	if (err)
-		goto errout_free;
+		goto errout_rcu;
 
 	if (rtm->rtm_flags & RTM_F_NOTIFY)
 		rt->rt_flags |= RTCF_NOTIFY;
@@ -2791,34 +2853,40 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
 		table_id = res.table ? res.table->tb_id : 0;
 
+	/* reset skb for netlink reply msg */
+	skb_trim(skb, 0);
+	skb_reset_network_header(skb);
+	skb_reset_transport_header(skb);
+	skb_reset_mac_header(skb);
+
 	if (rtm->rtm_flags & RTM_F_FIB_MATCH) {
 		if (!res.fi) {
 			err = fib_props[res.type].error;
 			if (!err)
 				err = -EHOSTUNREACH;
-			goto errout_free;
+			goto errout_rcu;
 		}
 		err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
 				    nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
 				    rt->rt_type, res.prefix, res.prefixlen,
 				    fl4.flowi4_tos, res.fi, 0);
 	} else {
-		err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
+		err = rt_fill_info(net, dst, src, rt, table_id, &fl4, skb,
 				   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
 	}
 	if (err < 0)
-		goto errout_free;
+		goto errout_rcu;
 
 	rcu_read_unlock();
 
 	err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
-errout:
-	return err;
 
 errout_free:
+	return err;
+errout_rcu:
 	rcu_read_unlock();
 	kfree_skb(skb);
-	goto errout;
+	goto errout_free;
 }
 
 void ip_rt_multicast_event(struct in_device *in_dev)
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 2/3] ipv6: support sport, dport and ip_proto in RTM_GETROUTE
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet
In-Reply-To: <1527023009-13609-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv6/route.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bcb8785..038d661 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -63,6 +63,7 @@
 #include <net/lwtunnel.h>
 #include <net/ip_tunnels.h>
 #include <net/l3mdev.h>
+#include <net/ip.h>
 #include <trace/events/fib6.h>
 
 #include <linux/uaccess.h>
@@ -4083,6 +4084,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
 	[RTA_TABLE]		= { .type = NLA_U32 },
+	[RTA_IP_PROTO]		= { .type = NLA_U8 },
+	[RTA_SPORT]		= { .type = NLA_U16 },
+	[RTA_DPORT]		= { .type = NLA_U16 },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -4795,6 +4799,19 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	else
 		fl6.flowi6_uid = iif ? INVALID_UID : current_uid();
 
+	if (tb[RTA_SPORT])
+		fl6.fl6_sport = nla_get_be16(tb[RTA_SPORT]);
+
+	if (tb[RTA_DPORT])
+		fl6.fl6_dport = nla_get_be16(tb[RTA_DPORT]);
+
+	if (tb[RTA_IP_PROTO]) {
+		err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+						  &fl6.flowi6_proto, extack);
+		if (err)
+			goto errout;
+	}
+
 	if (iif) {
 		struct net_device *dev;
 		int flags = 0;
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 3/3] selftests: net: initial fib rule tests
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet
In-Reply-To: <1527023009-13609-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 248 ++++++++++++++++++++++++++
 2 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/fib_rule_tests.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index e60dddb..7cb0f49 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,7 +6,7 @@ CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh
-TEST_PROGS += udpgso_bench.sh
+TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
new file mode 100755
index 0000000..d4cfb6a
--- /dev/null
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -0,0 +1,248 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB rules API
+
+ret=0
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+IP="ip -netns testns"
+
+RTABLE=100
+GW_IP4=192.51.100.2
+SRC_IP=192.51.100.3
+GW_IP6=2001:db8:1::2
+SRC_IP6=2001:db8:1::3
+
+DEV_ADDR=192.51.100.1
+DEV=dummy0
+
+log_test()
+{
+	local rc=$1
+	local expected=$2
+	local msg="$3"
+
+	if [ ${rc} -eq ${expected} ]; then
+		nsuccess=$((nsuccess+1))
+		printf "\n    TEST: %-50s  [ OK ]\n" "${msg}"
+	else
+		nfail=$((nfail+1))
+		printf "\n    TEST: %-50s  [FAIL]\n" "${msg}"
+		if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+			echo
+			echo "hit enter to continue, 'q' to quit"
+			read a
+			[ "$a" = "q" ] && exit 1
+		fi
+	fi
+}
+
+log_section()
+{
+	echo
+	echo "######################################################################"
+	echo "TEST SECTION: $*"
+	echo "######################################################################"
+}
+
+setup()
+{
+	set -e
+	ip netns add testns
+	$IP link set dev lo up
+
+	$IP link add dummy0 type dummy
+	$IP link set dev dummy0 up
+	$IP address add 198.51.100.1/24 dev dummy0
+	$IP -6 address add 2001:db8:1::1/64 dev dummy0
+
+	set +e
+}
+
+cleanup()
+{
+	$IP link del dev dummy0 &> /dev/null
+	ip netns del testns
+}
+
+fib_check_iproute_support()
+{
+	ip rule help 2>&1 | grep -q $1
+	if [ $? -ne 0 ]; then
+		echo "SKIP: iproute2 iprule too old, missing $1 match"
+		return 1
+	fi
+
+	ip route get help 2>&1 | grep -q $2
+	if [ $? -ne 0 ]; then
+		echo "SKIP: iproute2 get route too old, missing $2 match"
+		return 1
+	fi
+
+	return 0
+}
+
+fib_rule6_del()
+{
+	$IP -6 rule del $1
+	log_test $? 0 "rule6 del $1"
+}
+
+fib_rule6_del_by_pref()
+{
+	pref=$($IP -6 rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+	$IP -6 rule del pref $pref
+}
+
+fib_rule6_test_match_n_redirect()
+{
+	local match="$1"
+	local getmatch="$2"
+
+	$IP -6 rule add $match table $RTABLE
+	$IP -6 route get $GW_IP6 $getmatch | grep -q "table $RTABLE"
+	log_test $? 0 "rule6 check: $1"
+
+	fib_rule6_del_by_pref "$match"
+	log_test $? 0 "rule6 del by pref: $match"
+}
+
+fib_rule6_test()
+{
+	# setup the fib rule redirect route
+	$IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
+
+	match="oif $DEV"
+	fib_rule6_test_match_n_redirect "$match" "$match" "oif redirect to table"
+
+	match="from $SRC_IP6 iif $DEV"
+	fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to table"
+
+	match="tos 0x10"
+	fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to table"
+
+	match="fwmark 0x64"
+	getmatch="mark 0x64"
+	fib_rule6_test_match_n_redirect "$match" "$getmatch" "fwmark redirect to table"
+
+	fib_check_iproute_support "uidrange" "uid"
+	if [ $? -eq 0 ]; then
+		match="uidrange 100-100"
+		getmatch="uid 100"
+		fib_rule6_test_match_n_redirect "$match" "$getmatch" "uid redirect to table"
+	fi
+
+	fib_check_iproute_support "sport" "sport"
+	if [ $? -eq 0 ]; then
+		match="sport 666 dport 777"
+		fib_rule6_test_match_n_redirect "$match" "$match" "sport and dport redirect to table"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto tcp"
+		fib_rule6_test_match_n_redirect "$match" "$match" "ipproto match"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto icmp"
+		fib_rule6_test_match_n_redirect "$match" "$match" "ipproto icmp match"
+	fi
+}
+
+fib_rule4_del()
+{
+	$IP rule del $1
+	log_test $? 0 "del $1"
+}
+
+fib_rule4_del_by_pref()
+{
+	pref=$($IP rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+	$IP rule del pref $pref
+}
+
+fib_rule4_test_match_n_redirect()
+{
+	local match="$1"
+	local getmatch="$2"
+
+	$IP rule add $match table $RTABLE
+	$IP route get $GW_IP4 $getmatch | grep -q "table $RTABLE"
+	log_test $? 0 "rule4 check: $1"
+
+	fib_rule4_del_by_pref "$match"
+	log_test $? 0 "rule4 del by pref: $match"
+}
+
+fib_rule4_test()
+{
+	# setup the fib rule redirect route
+	$IP route add table $RTABLE default via $GW_IP4 dev $DEV onlink
+
+	match="oif $DEV"
+	fib_rule4_test_match_n_redirect "$match" "$match" "oif redirect to table"
+
+	match="from $SRC_IP iif $DEV"
+	fib_rule4_test_match_n_redirect "$match" "$match" "iif redirect to table"
+
+	match="tos 0x10"
+	fib_rule4_test_match_n_redirect "$match" "$match" "tos redirect to table"
+
+	match="fwmark 0x64"
+	getmatch="mark 0x64"
+	fib_rule4_test_match_n_redirect "$match" "$getmatch" "fwmark redirect to table"
+
+	fib_check_iproute_support "uidrange" "uid"
+	if [ $? -eq 0 ]; then
+		match="uidrange 100-100"
+		getmatch="uid 100"
+		fib_rule4_test_match_n_redirect "$match" "$getmatch" "uid redirect to table"
+	fi
+
+	fib_check_iproute_support "sport" "sport"
+	if [ $? -eq 0 ]; then
+		match="sport 666 dport 777"
+		fib_rule4_test_match_n_redirect "$match" "$match" "sport and dport redirect to table"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto tcp"
+		fib_rule4_test_match_n_redirect "$match" "$match" "ipproto tcp match"
+	fi
+
+	fib_check_iproute_support "ipproto" "ipproto"
+	if [ $? -eq 0 ]; then
+		match="ipproto icmp"
+		fib_rule4_test_match_n_redirect "$match" "$match" "ipproto icmp match"
+	fi
+}
+
+run_fibrule_tests()
+{
+	log_section "IPv4 fib rule"
+	fib_rule4_test
+	log_section "IPv6 fib rule"
+	fib_rule6_test
+}
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit 0
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+	echo "SKIP: Could not run test without ip tool"
+	exit 0
+fi
+
+# start clean
+cleanup &> /dev/null
+setup
+run_fibrule_tests
+cleanup
+
+exit $ret
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v2] ath10k: transmit queued frames after waking queues
From: Niklas Cassel @ 2018-05-22 21:15 UTC (permalink / raw)
  To: Rajkumar Manoharan
  Cc: Kalle Valo, David S. Miller, ath10k, linux-wireless, netdev,
	linux-kernel, linux-wireless-owner, erik.stromdahl
In-Reply-To: <8195be7603a8cd659d25a9c3d898b891@codeaurora.org>

On Mon, May 21, 2018 at 04:11:38PM -0700, Rajkumar Manoharan wrote:
> On 2018-05-21 13:43, Niklas Cassel wrote:
> > The following problem was observed when running iperf:
> [...]
> > 
> > In order to avoid trying to flush the queue every time we free a frame,
> > only do this when there are 3 or less frames pending, and while we
> > actually have frames in the queue. This logic was copied from
> > mt76_txq_schedule (mt76), one of few other drivers that are actually
> > using wake_tx_queue.
> > 
> > Suggested-by: Toke Høiland-Jørgensen <toke@toke.dk>
> > Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
> > ---
> > Changes since V1: use READ_ONCE() to disallow the compiler
> > optimizing things in undesirable ways.
> > 
> >  drivers/net/wireless/ath/ath10k/txrx.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/net/wireless/ath/ath10k/txrx.c
> > b/drivers/net/wireless/ath/ath10k/txrx.c
> > index cda164f6e9f6..264cf0bd5c00 100644
> > --- a/drivers/net/wireless/ath/ath10k/txrx.c
> > +++ b/drivers/net/wireless/ath/ath10k/txrx.c
> > @@ -95,6 +95,9 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
> >  		wake_up(&htt->empty_tx_wq);
> >  	spin_unlock_bh(&htt->tx_lock);
> > 
> > +	if (READ_ONCE(htt->num_pending_tx) <= 3 && !list_empty(&ar->txqs))
> > +		ath10k_mac_tx_push_pending(ar);
> > +
> Niklas,

Hello Rajkumar

> 
> Sorry for the late response. ath10k_mac_tx_push_pending is already called
> at the end of NAPI handler. Isn't that enough to process pending frames?

This is true for e.g. ATH10K_BUS_PCI and ATH10K_BUS_SNOC,
but not for e.g. ATH10K_BUS_SDIO and ATH10K_BUS_USB.

While there is some SDIO code merged in Kalle's tree already,
this problem was found when merging
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/?h=ath10k-pending-sdio-usb
with Kalle's ath-next branch.

> 
> Earlier we observed performance issues in calling push_pending from each
> tx completion. IMHO this change may introduce the same problem again.

I prefer functional TX over performance issues,
but I agree that it is unfortunate that SDIO doesn't use
ath10k_htt_txrx_compl_task().
Erik, is there a reason for this?

Perhaps it would be possible to call ath10k_mac_tx_push_pending()
from the equivalent to ath10k_htt_txrx_compl_task(),
but from SDIO's point of view.

Another solution might be to change so that we only call
ath10k_mac_tx_push_pending() from ath10k_txrx_tx_unref()
if (htt->num_pending_tx == 0). That should decrease the number
of calls to ath10k_mac_tx_push_pending(), while still avoiding
a "TX deadlock" scenario for SDIO.

Regards,
Niklas

^ permalink raw reply

* [PATCH 1/2] rfkill: Rename rfkill_any_led_trigger* functions
From: João Paulo Rechi Vita @ 2018-05-22 21:29 UTC (permalink / raw)
  To: Johannes Berg, David S. Miller
  Cc: linux, Michał Kępień, João Paulo Rechi Vita,
	linux-wireless, netdev, linux-kernel

Rename these functions to rfkill_global_led_trigger*, as they are going
to be extended to handle another global rfkill led trigger.

This commit does not change any functionality.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
---
 net/rfkill/core.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 59d0eb960275..6d64d14f4b0a 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -178,9 +178,9 @@ static void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 }
 
 static struct led_trigger rfkill_any_led_trigger;
-static struct work_struct rfkill_any_work;
+static struct work_struct rfkill_global_led_trigger_work;
 
-static void rfkill_any_led_trigger_worker(struct work_struct *work)
+static void rfkill_global_led_trigger_worker(struct work_struct *work)
 {
 	enum led_brightness brightness = LED_OFF;
 	struct rfkill *rfkill;
@@ -197,28 +197,29 @@ static void rfkill_any_led_trigger_worker(struct work_struct *work)
 	led_trigger_event(&rfkill_any_led_trigger, brightness);
 }
 
-static void rfkill_any_led_trigger_event(void)
+static void rfkill_global_led_trigger_event(void)
 {
-	schedule_work(&rfkill_any_work);
+	schedule_work(&rfkill_global_led_trigger_work);
 }
 
-static void rfkill_any_led_trigger_activate(struct led_classdev *led_cdev)
+static void rfkill_global_led_trigger_activate(struct led_classdev *led_cdev)
 {
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 }
 
-static int rfkill_any_led_trigger_register(void)
+static int rfkill_global_led_trigger_register(void)
 {
-	INIT_WORK(&rfkill_any_work, rfkill_any_led_trigger_worker);
+	INIT_WORK(&rfkill_global_led_trigger_work,
+			rfkill_global_led_trigger_worker);
 	rfkill_any_led_trigger.name = "rfkill-any";
-	rfkill_any_led_trigger.activate = rfkill_any_led_trigger_activate;
+	rfkill_any_led_trigger.activate = rfkill_global_led_trigger_activate;
 	return led_trigger_register(&rfkill_any_led_trigger);
 }
 
-static void rfkill_any_led_trigger_unregister(void)
+static void rfkill_global_led_trigger_unregister(void)
 {
 	led_trigger_unregister(&rfkill_any_led_trigger);
-	cancel_work_sync(&rfkill_any_work);
+	cancel_work_sync(&rfkill_global_led_trigger_work);
 }
 #else
 static void rfkill_led_trigger_event(struct rfkill *rfkill)
@@ -234,16 +235,16 @@ static inline void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 {
 }
 
-static void rfkill_any_led_trigger_event(void)
+static void rfkill_global_led_trigger_event(void)
 {
 }
 
-static int rfkill_any_led_trigger_register(void)
+static int rfkill_global_led_trigger_register(void)
 {
 	return 0;
 }
 
-static void rfkill_any_led_trigger_unregister(void)
+static void rfkill_global_led_trigger_unregister(void)
 {
 }
 #endif /* CONFIG_RFKILL_LEDS */
@@ -354,7 +355,7 @@ static void rfkill_set_block(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 
 	if (prev != curr)
 		rfkill_event(rfkill);
@@ -535,7 +536,7 @@ bool rfkill_set_hw_state(struct rfkill *rfkill, bool blocked)
 	spin_unlock_irqrestore(&rfkill->lock, flags);
 
 	rfkill_led_trigger_event(rfkill);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 
 	if (rfkill->registered && prev != blocked)
 		schedule_work(&rfkill->uevent_work);
@@ -579,7 +580,7 @@ bool rfkill_set_sw_state(struct rfkill *rfkill, bool blocked)
 		schedule_work(&rfkill->uevent_work);
 
 	rfkill_led_trigger_event(rfkill);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 
 	return blocked;
 }
@@ -629,7 +630,7 @@ void rfkill_set_states(struct rfkill *rfkill, bool sw, bool hw)
 			schedule_work(&rfkill->uevent_work);
 
 		rfkill_led_trigger_event(rfkill);
-		rfkill_any_led_trigger_event();
+		rfkill_global_led_trigger_event();
 	}
 }
 EXPORT_SYMBOL(rfkill_set_states);
@@ -1046,7 +1047,7 @@ int __must_check rfkill_register(struct rfkill *rfkill)
 #endif
 	}
 
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 	rfkill_send_events(rfkill, RFKILL_OP_ADD);
 
 	mutex_unlock(&rfkill_global_mutex);
@@ -1079,7 +1080,7 @@ void rfkill_unregister(struct rfkill *rfkill)
 	mutex_lock(&rfkill_global_mutex);
 	rfkill_send_events(rfkill, RFKILL_OP_DEL);
 	list_del_init(&rfkill->node);
-	rfkill_any_led_trigger_event();
+	rfkill_global_led_trigger_event();
 	mutex_unlock(&rfkill_global_mutex);
 
 	rfkill_led_trigger_unregister(rfkill);
@@ -1332,7 +1333,7 @@ static int __init rfkill_init(void)
 	if (error)
 		goto error_misc;
 
-	error = rfkill_any_led_trigger_register();
+	error = rfkill_global_led_trigger_register();
 	if (error)
 		goto error_led_trigger;
 
@@ -1346,7 +1347,7 @@ static int __init rfkill_init(void)
 
 #ifdef CONFIG_RFKILL_INPUT
 error_input:
-	rfkill_any_led_trigger_unregister();
+	rfkill_global_led_trigger_unregister();
 #endif
 error_led_trigger:
 	misc_deregister(&rfkill_miscdev);
@@ -1362,7 +1363,7 @@ static void __exit rfkill_exit(void)
 #ifdef CONFIG_RFKILL_INPUT
 	rfkill_handler_exit();
 #endif
-	rfkill_any_led_trigger_unregister();
+	rfkill_global_led_trigger_unregister();
 	misc_deregister(&rfkill_miscdev);
 	class_unregister(&rfkill_class);
 }
-- 
2.17.0

^ permalink raw reply related

* [PATCH 2/2] rfkill: Create rfkill-none LED trigger
From: João Paulo Rechi Vita @ 2018-05-22 21:29 UTC (permalink / raw)
  To: Johannes Berg, David S. Miller
  Cc: linux, Michał Kępień, João Paulo Rechi Vita,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20180522212932.5357-1-jprvita@endlessm.com>

Creates a new trigger rfkill-none, as a complement to rfkill-any, which
drives LEDs when any radio is enabled. The new trigger is meant to turn
a LED ON whenever all radios are OFF, and turn it OFF otherwise.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
---
 net/rfkill/core.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 6d64d14f4b0a..07235520b00f 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -178,6 +178,7 @@ static void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 }
 
 static struct led_trigger rfkill_any_led_trigger;
+static struct led_trigger rfkill_none_led_trigger;
 static struct work_struct rfkill_global_led_trigger_work;
 
 static void rfkill_global_led_trigger_worker(struct work_struct *work)
@@ -195,6 +196,8 @@ static void rfkill_global_led_trigger_worker(struct work_struct *work)
 	mutex_unlock(&rfkill_global_mutex);
 
 	led_trigger_event(&rfkill_any_led_trigger, brightness);
+	led_trigger_event(&rfkill_none_led_trigger, brightness == LED_OFF ?
+							LED_FULL : LED_OFF);
 }
 
 static void rfkill_global_led_trigger_event(void)
@@ -202,22 +205,32 @@ static void rfkill_global_led_trigger_event(void)
 	schedule_work(&rfkill_global_led_trigger_work);
 }
 
-static void rfkill_global_led_trigger_activate(struct led_classdev *led_cdev)
-{
-	rfkill_global_led_trigger_event();
-}
-
 static int rfkill_global_led_trigger_register(void)
 {
+	int ret;
+
 	INIT_WORK(&rfkill_global_led_trigger_work,
 			rfkill_global_led_trigger_worker);
+
 	rfkill_any_led_trigger.name = "rfkill-any";
-	rfkill_any_led_trigger.activate = rfkill_global_led_trigger_activate;
-	return led_trigger_register(&rfkill_any_led_trigger);
+	ret = led_trigger_register(&rfkill_any_led_trigger);
+	if (ret)
+		return ret;
+
+	rfkill_none_led_trigger.name = "rfkill-none";
+	ret = led_trigger_register(&rfkill_none_led_trigger);
+	if (ret)
+		led_trigger_unregister(&rfkill_any_led_trigger);
+	else
+		/* Delay activation until all global triggers are registered */
+		rfkill_global_led_trigger_event();
+
+	return ret;
 }
 
 static void rfkill_global_led_trigger_unregister(void)
 {
+	led_trigger_unregister(&rfkill_none_led_trigger);
 	led_trigger_unregister(&rfkill_any_led_trigger);
 	cancel_work_sync(&rfkill_global_led_trigger_work);
 }
-- 
2.17.0

^ permalink raw reply related

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jason Gunthorpe @ 2018-05-22 21:33 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <079ceee3bc8cd0ea50dd7ddc12b27512ca5ac49e.camel@intel.com>

On Tue, May 22, 2018 at 02:04:06PM -0700, Jeff Kirsher wrote:
> On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> > On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > > From: Sindhu Devale <sindhu.devale@intel.com>
> > > 
> > > Currently i40iw is dependent on i40e symbols
> > > i40e_register_client and i40e_unregister_client due to
> > > which i40iw cannot be loaded without i40e being loaded.
> > > 
> > > This patch allows RDMA driver to build and load without
> > > linking to LAN driver and without LAN driver being loaded
> > > first. Once the LAN driver is loaded, the RDMA driver
> > > is notified through the netdevice notifiers to register
> > > as client to the LAN driver. Add function pointers to IDC
> > > register/unregister in the private VSI structure. This
> > > allows a RDMA driver to build without linking to i40e.
> > 
> > Why would you want to do this? The rdma driver is non-functional
> > without the ethernet driver, so why on earth would we want to defeat
> > the module dependency mechanism?
> 
> This change is driven by the OSV's like Red Hat, where customer's were
> updating the i40e driver, which in turn broke i40iw.

So that isn't a reason to put something into the main line kernel, and
I'm deeply skeptical this change is even sane.

It has been a while since I've looked at RH's kernel, but AFAIK,
breakage should only happen if the ABIs around
i40e_unregister_client/etc change..

So if the i40e module updates breaks the ABI, then stuffing that same
broken ABI through a function pointer is *totally* wrong.

Looks like a NAK for the RDMA side.

Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox