Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH iproute2 0/3] bridge: filtering by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 15:30 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

On 04/11/2016 03:18 PM, Nikolay Aleksandrov wrote:
> Hi,
> This set adds support for filtering by a vlan id when showing fdb/mdb/vlan
> entries. Currently the filtering is implemented entirely in user-space, but
> the plan is to add kernel support as well. The vlan show part is also needed
> for the future per-vlan statistics in order to be able to show them only for
> a specific vlan. I plan to update the bridge man page soon as it's missing
> other options too and it seemed inconsistent to add this given that there're
> potential paragraphs missing, thus I'll post a separate patch for that.
> 
> Thank you,
>  Nik
> 

Self-NAK, after discussing with colleagues, we think it'd be better not to print
the non-matching ports at all (right now they're printed with empty "vlan ids"
column). I'll post a v2 with updated patch 03.

Cheers,
 Nik

^ permalink raw reply

* [PATCH] mwifiex: fix possible NULL dereference
From: Sudip Mukherjee @ 2016-04-11 15:27 UTC (permalink / raw)
  To: Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo
  Cc: linux-kernel, linux-wireless, netdev, Sudip Mukherjee

From: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>

We have a check for card just after dereferencing it. So if it is NULL
we have already dereferenced it before its check. Lets dereference it
after checking card for NULL.

Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index edf8b07..84562d0 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
 {
 	struct pcie_service_card *card = adapter->card;
 	const struct mwifiex_pcie_card_reg *reg;
-	struct pci_dev *pdev = card->dev;
+	struct pci_dev *pdev;
 	int i;
 
 	if (card) {
+		pdev = card->dev;
 		if (card->msix_enable) {
 			for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
 				synchronize_irq(card->msix_entries[i].vector);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH 1/9] net: mediatek: update the IRQ part of the binding document
From: Rob Herring @ 2016-04-11 15:24 UTC (permalink / raw)
  To: John Crispin
  Cc: David S. Miller, Felix Fietkau, Matthias Brugger,
	Sean Wang (王志亘), netdev, linux-mediatek,
	linux-kernel, devicetree
In-Reply-To: <1460051876-53135-1-git-send-email-blogic@openwrt.org>

On Thu, Apr 07, 2016 at 07:57:48PM +0200, John Crispin wrote:
> The current binding document only describes a single interrupt. Update the
> document by adding the 2 other interrupts.
> 
> The driver currently only uses a single interrupt. The HW is however able
> to using IRQ grouping to split TX and RX onto separate GIC irqs.

I assume you aren't breaking existing DTs, and the driver will continue 
to work with a single irq specified?

> 
> Signed-off-by: John Crispin <blogic@openwrt.org>
> Cc: devicetree@vger.kernel.org
> ---
>  Documentation/devicetree/bindings/net/mediatek-net.txt |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
> index 5ca7929..2f142be 100644
> --- a/Documentation/devicetree/bindings/net/mediatek-net.txt
> +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
> @@ -9,7 +9,7 @@ have dual GMAC each represented by a child node..
>  Required properties:
>  - compatible: Should be "mediatek,mt7623-eth"
>  - reg: Address and length of the register set for the device
> -- interrupts: Should contain the frame engines interrupt
> +- interrupts: Should contain the three frame engines interrupts

Need to define what each irq is and the order.

>  - clocks: the clock used by the core
>  - clock-names: the names of the clock listed in the clocks property. These are
>  	"ethif", "esw", "gp2", "gp1"
> @@ -42,7 +42,9 @@ eth: ethernet@1b100000 {
>  		 <&ethsys CLK_ETHSYS_GP2>,
>  		 <&ethsys CLK_ETHSYS_GP1>;
>  	clock-names = "ethif", "esw", "gp2", "gp1";
> -	interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW>;
> +	interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW
> +		      GIC_SPI 199 IRQ_TYPE_LEVEL_LOW
> +		      GIC_SPI 198 IRQ_TYPE_LEVEL_LOW>;
>  	power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
>  	resets = <&ethsys MT2701_ETHSYS_ETH_RST>;
>  	reset-names = "eth";
> -- 
> 1.7.10.4
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Lars Persson @ 2016-04-11 15:17 UTC (permalink / raw)
  To: Eric Dumazet, Lars Persson; +Cc: netdev, jhs, linux-kernel, xiyou.wangcong
In-Reply-To: <1460384551.6473.551.camel@edumazet-glaptop3.roam.corp.google.com>



On 04/11/2016 04:22 PM, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:
>
>> I though it would be prudent because the queue can be non-empty even for
>> the case of skb=NULL. So should it be there in this patch, another patch
>> or not at all ?
>
> Then maybe change return code ?
>
> It seems strange that a validate_xmit_skb_list() failure stops the
> __qdisc_run() loop but schedules another round.
>
>

It was suggested by Cong Wang to return 0 in order to stop the loop. Do 
you guys agree that the loop should be stopped for such failures ? Then 
I will put the schedule call inside the if as you proposed earlier.

- Lars

^ permalink raw reply

* [PATCH net-next] vxlan: fix incorrect type
From: Jiri Benc @ 2016-04-11 15:06 UTC (permalink / raw)
  To: netdev; +Cc: Dan Carpenter

The protocol is 16bit, not 32bit.

Fixes: e1e5314de08ba ("vxlan: implement GPE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9f3634064c92..7f697a3f00a4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1181,7 +1181,7 @@ out:
 }
 
 static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
-				__be32 *protocol,
+				__be16 *protocol,
 				struct sk_buff *skb, u32 vxflags)
 {
 	struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
@@ -1284,7 +1284,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 	struct vxlanhdr unparsed;
 	struct vxlan_metadata _md;
 	struct vxlan_metadata *md = &_md;
-	__be32 protocol = htons(ETH_P_TEB);
+	__be16 protocol = htons(ETH_P_TEB);
 	bool raw_proto = false;
 	void *oiph;
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Yang Yingliang @ 2016-04-11 14:42 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, dingtianhong
In-Reply-To: <1460135072.6473.441.camel@edumazet-glaptop3.roam.corp.google.com>



On 2016/4/9 1:04, Eric Dumazet wrote:
> On Fri, 2016-04-08 at 12:53 -0400, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Fri, 08 Apr 2016 07:44:25 -0700
>>
>>> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>>>
>>>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>>>
>>> Try :
>>>
>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>
>>> And restart your flows.
>>
>> I'm honestly beginning to suspect a bug in their driver and how they
>> handle skb->truesize.
>>
>> Yang, until you show us the driver you are using and how is handles
>> receive packets, we are largely in the dark about a major component
>> of this issue and that is entirely unfair to us.
>
> Apparently their skb->truesize and skb->len combinations are correct.
>
> I suspect an issue with rcvbuf autouning on a bidirectional tcp traffic.
> We mostly focus on unidirectional flows, but they seem to use a mixed
> case.
>
> Also, fact that sendmsg() locks the socket for the duration of the call
> is problematic : I suspect their issues would mostly disappear by using
> smaller chunk sizes (ie 64KB per sendmsg() instead of 256KB).
It's less packets dropping with using 64KB chunk.

>
> We also could add resched points in sendmsg() (processing backlog if it
> gets too hot), but I fear this would slow down the fast path.
>
>
>
>
>

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Eric Dumazet @ 2016-04-11 14:22 UTC (permalink / raw)
  To: Lars Persson; +Cc: Lars Persson, netdev, jhs, linux-kernel, xiyou.wangcong
In-Reply-To: <570BA8C7.1000905@axis.com>

On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:

> I though it would be prudent because the queue can be non-empty even for 
> the case of skb=NULL. So should it be there in this patch, another patch 
> or not at all ?

Then maybe change return code ?

It seems strange that a validate_xmit_skb_list() failure stops the
__qdisc_run() loop but schedules another round.

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Lars Persson @ 2016-04-11 13:38 UTC (permalink / raw)
  To: Eric Dumazet, Lars Persson; +Cc: netdev, jhs, linux-kernel, xiyou.wangcong
In-Reply-To: <1460380981.6473.544.camel@edumazet-glaptop3.roam.corp.google.com>



On 04/11/2016 03:23 PM, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote:
>> A failure in validate_xmit_skb_list() triggered an unconditional call
>> to dev_requeue_skb with skb=NULL. This slowly grows the queue
>> discipline's qlen count until all traffic through the queue stops.
>>
>> By introducing a NULL check in dev_requeue_skb it was also necessary
>> to make the __netif_schedule call conditional to avoid scheduling an
>> empty queue.
>>
>> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
>> Signed-off-by: Lars Persson <larper@axis.com>
>> ---
>>   net/sched/sch_generic.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index f18c350..4e6a79c 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops);
>>
>>   static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>>   {
>> -	q->gso_skb = skb;
>> -	q->qstats.requeues++;
>> -	q->q.qlen++;	/* it's still part of the queue */
>> -	__netif_schedule(q);
>> +	if (skb) {
>> +		q->gso_skb = skb;
>> +		q->qstats.requeues++;
>> +		q->q.qlen++;	/* it's still part of the queue */
>> +	}
>> +	if (qdisc_qlen(q))
>> +		__netif_schedule(q);
>>
>>   	return 0;
>>   }
>
>
> Please always CC patch author when fixing a bug.
>
> Why adding the if (qdisc_qlen(q)) extra test ?
>
> This seems unrelated to the bug fix, and probably should be part of a
> second patch targeting net-next tree.

I though it would be prudent because the queue can be non-empty even for 
the case of skb=NULL. So should it be there in this patch, another patch 
or not at all ?

>
> Also please add a likely() clause
>
> if (likely(skb)) {
>          q->gso_skb = skb;
>          q->qstats.requeues++;
>          q->q.qlen++;    /* it's still part of the queue */
>          __netif_schedule(q);
> }

Will fix.

> Thanks !
>
>
>
>
>

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Eric Dumazet @ 2016-04-11 13:23 UTC (permalink / raw)
  To: Lars Persson; +Cc: netdev, jhs, linux-kernel, xiyou.wangcong, Lars Persson
In-Reply-To: <1460355869-13539-1-git-send-email-larper@axis.com>

On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote:
> A failure in validate_xmit_skb_list() triggered an unconditional call
> to dev_requeue_skb with skb=NULL. This slowly grows the queue
> discipline's qlen count until all traffic through the queue stops.
> 
> By introducing a NULL check in dev_requeue_skb it was also necessary
> to make the __netif_schedule call conditional to avoid scheduling an
> empty queue.
> 
> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
> Signed-off-by: Lars Persson <larper@axis.com>
> ---
>  net/sched/sch_generic.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index f18c350..4e6a79c 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops);
>  
>  static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>  {
> -	q->gso_skb = skb;
> -	q->qstats.requeues++;
> -	q->q.qlen++;	/* it's still part of the queue */
> -	__netif_schedule(q);
> +	if (skb) {
> +		q->gso_skb = skb;
> +		q->qstats.requeues++;
> +		q->q.qlen++;	/* it's still part of the queue */
> +	}
> +	if (qdisc_qlen(q))
> +		__netif_schedule(q);
>  
>  	return 0;
>  }


Please always CC patch author when fixing a bug.

Why adding the if (qdisc_qlen(q)) extra test ?

This seems unrelated to the bug fix, and probably should be part of a
second patch targeting net-next tree.

Also please add a likely() clause

if (likely(skb)) {
        q->gso_skb = skb;
        q->qstats.requeues++;
        q->q.qlen++;    /* it's still part of the queue */
        __netif_schedule(q);
}

Thanks !

^ permalink raw reply

* [PATCH iproute2 3/3] bridge: vlan: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vid" to bridge vlan show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to
match the add/del one - "vid". This filtering can be used also with the
"-compressvlans" option to see in which range is a vlan (if in any).
Also this will be used to show only specific per-vlan statistics later
when support is added to the kernel for it.

Examples:
$ bridge vlan show vid 450
port	vlan ids
eth1
eth2	 450

br0

$ bridge -c vlan show vid 450
port	vlan ids
eth1
eth2	 400-500

br0

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 bridge/vlan.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 51 insertions(+), 12 deletions(-)

diff --git a/bridge/vlan.c b/bridge/vlan.c
index ae588323d9b1..8e125c15f84c 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -13,13 +13,13 @@
 #include "br_common.h"
 #include "utils.h"
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
 	fprintf(stderr, "Usage: bridge vlan { add | del } vid VLAN_ID dev DEV [ pvid] [ untagged ]\n");
 	fprintf(stderr, "                                                     [ self ] [ master ]\n");
-	fprintf(stderr, "       bridge vlan { show } [ dev DEV ]\n");
+	fprintf(stderr, "       bridge vlan { show } [ dev DEV ] [ vid VLAN_ID ]\n");
 	exit(-1);
 }
 
@@ -138,6 +138,38 @@ static int vlan_modify(int cmd, int argc, char **argv)
 	return 0;
 }
 
+static void print_vid_range(FILE *f, __u16 v_start, __u16 v_end, __u16 flags)
+{
+	fprintf(f, "\t %hu", v_start);
+	if (v_start != v_end)
+		fprintf(f, "-%hu", v_end);
+	if (flags & BRIDGE_VLAN_INFO_PVID)
+		fprintf(f, " PVID");
+	if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
+		fprintf(f, " Egress Untagged");
+	fprintf(f, "\n");
+}
+
+/* In order to use this function for both filtering and non-filtering cases
+ * we need to make it a tristate:
+ * return -1 - if filtering we've gone over so don't continue
+ * return  0 - skip entry and continue (applies to range start or to entries
+ *             which are less than filter_vlan)
+ * return  1 - print the entry and continue
+ */
+static int filter_vlan_check(struct bridge_vlan_info *vinfo)
+{
+	/* if we're filtering we should stop on the first greater entry */
+	if (filter_vlan && vinfo->vid > filter_vlan &&
+	    !(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
+		return -1;
+	if ((vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) ||
+	    vinfo->vid < filter_vlan)
+		return 0;
+
+	return 1;
+}
+
 static int print_vlan(const struct sockaddr_nl *who,
 		      struct nlmsghdr *n,
 		      void *arg)
@@ -174,26 +206,28 @@ static int print_vlan(const struct sockaddr_nl *who,
 	} else {
 		struct rtattr *i, *list = tb[IFLA_AF_SPEC];
 		int rem = RTA_PAYLOAD(list);
+		__u16 last_vid_start = 0;
 
 		fprintf(fp, "%s", ll_index_to_name(ifm->ifi_index));
 		for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
 			struct bridge_vlan_info *vinfo;
+			int vcheck_ret;
 
 			if (i->rta_type != IFLA_BRIDGE_VLAN_INFO)
 				continue;
 
 			vinfo = RTA_DATA(i);
-			if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END)
-				fprintf(fp, "-%hu", vinfo->vid);
-			else
-				fprintf(fp, "\t %hu", vinfo->vid);
-			if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN)
+
+			if (!(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
+				last_vid_start = vinfo->vid;
+			vcheck_ret = filter_vlan_check(vinfo);
+			if (!vcheck_ret)
 				continue;
-			if (vinfo->flags & BRIDGE_VLAN_INFO_PVID)
-				fprintf(fp, " PVID");
-			if (vinfo->flags & BRIDGE_VLAN_INFO_UNTAGGED)
-				fprintf(fp, " Egress Untagged");
-			fprintf(fp, "\n");
+			else if (vcheck_ret == 1)
+				print_vid_range(fp, last_vid_start, vinfo->vid,
+						vinfo->flags);
+			else
+				break;
 		}
 	}
 	fprintf(fp, "\n");
@@ -211,6 +245,11 @@ static int vlan_show(int argc, char **argv)
 			if (filter_dev)
 				duparg("dev", *argv);
 			filter_dev = *argv;
+		} else if (strcmp(*argv, "vid") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vid", *argv);
+			filter_vlan = atoi(*argv);
 		}
 		argc--; argv++;
 	}
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 2/3] bridge: mdb: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vid" to bridge mdb show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to match
the add/del one - "vid".

Example:
$ bridge mdb show vid 200
dev br0 port eth2 grp 239.0.0.1 permanent vid 200

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 bridge/mdb.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 842536ec003c..6c904f8e6ae8 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -24,12 +24,12 @@
 	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct br_port_msg))))
 #endif
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
 	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp] [vid VID]\n");
-	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
+	fprintf(stderr, "       bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
 	exit(-1);
 }
 
@@ -92,6 +92,8 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e,
 	const void *src;
 	int af;
 
+	if (filter_vlan && e->vid != filter_vlan)
+		return;
 	af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6;
 	src = af == AF_INET ? (const void *)&e->addr.u.ip4 :
 			      (const void *)&e->addr.u.ip6;
@@ -195,6 +197,11 @@ static int mdb_show(int argc, char **argv)
 			if (filter_dev)
 				duparg("dev", *argv);
 			filter_dev = *argv;
+		} else if (strcmp(*argv, "vid") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vid", *argv);
+			filter_vlan = atoi(*argv);
 		}
 		argc--; argv++;
 	}
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 1/3] bridge: fdb: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vlan" to bridge fdb show so the user can request
filtering by a specific vlan id. Currently the filtering is implemented
only in user-space. The argument name has been chosen to match the
add/del one - "vlan".

Example:
$ bridge fdb show vlan 400
52:54:00:bf:57:16 dev eth2 vlan 400 master br0 permanent

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 bridge/fdb.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index df55e86df83f..be849f980a80 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -27,7 +27,7 @@
 #include "rt_names.h"
 #include "utils.h"
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
@@ -35,7 +35,7 @@ static void usage(void)
 			"              [ self ] [ master ] [ use ] [ router ]\n"
 			"              [ local | static | dynamic ] [ dst IPADDR ] [ vlan VID ]\n"
 			"              [ port PORT] [ vni VNI ] [ via DEV ]\n");
-	fprintf(stderr, "       bridge fdb [ show [ br BRDEV ] [ brport DEV ] ]\n");
+	fprintf(stderr, "       bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ] ]\n");
 	exit(-1);
 }
 
@@ -65,6 +65,7 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	struct ndmsg *r = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
 	struct rtattr *tb[NDA_MAX+1];
+	__u16 vid = 0;
 
 	if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
 		fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
@@ -88,6 +89,12 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
 		     n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
 
+	if (tb[NDA_VLAN])
+		vid = rta_getattr_u16(tb[NDA_VLAN]);
+
+	if (filter_vlan && filter_vlan != vid)
+		return 0;
+
 	if (n->nlmsg_type == RTM_DELNEIGH)
 		fprintf(fp, "Deleted ");
 
@@ -115,11 +122,8 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 				    RTA_DATA(tb[NDA_DST])));
 	}
 
-	if (tb[NDA_VLAN]) {
-		__u16 vid = rta_getattr_u16(tb[NDA_VLAN]);
-
+	if (vid)
 		fprintf(fp, "vlan %hu ", vid);
-	}
 
 	if (tb[NDA_PORT])
 		fprintf(fp, "port %d ", ntohs(rta_getattr_u16(tb[NDA_PORT])));
@@ -190,6 +194,11 @@ static int fdb_show(int argc, char **argv)
 		} else if (strcmp(*argv, "br") == 0) {
 			NEXT_ARG();
 			br = *argv;
+		} else if (strcmp(*argv, "vlan") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vlan", *argv);
+			filter_vlan = atoi(*argv);
 		} else {
 			if (matches(*argv, "help") == 0)
 				usage();
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 0/3] bridge: filtering by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov

Hi,
This set adds support for filtering by a vlan id when showing fdb/mdb/vlan
entries. Currently the filtering is implemented entirely in user-space, but
the plan is to add kernel support as well. The vlan show part is also needed
for the future per-vlan statistics in order to be able to show them only for
a specific vlan. I plan to update the bridge man page soon as it's missing
other options too and it seemed inconsistent to add this given that there're
potential paragraphs missing, thus I'll post a separate patch for that.

Thank you,
 Nik

Nikolay Aleksandrov (3):
  bridge: fdb: add support to filter by vlan id
  bridge: mdb: add support to filter by vlan id
  bridge: vlan: add support to filter by vlan id

 bridge/fdb.c  | 21 ++++++++++++++------
 bridge/mdb.c  | 11 +++++++++--
 bridge/vlan.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++------------
 3 files changed, 75 insertions(+), 20 deletions(-)

-- 
2.4.3

^ permalink raw reply

* Re: [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Mel Gorman @ 2016-04-11 13:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Mel Gorman, lsf, linux-mm, netdev@vger.kernel.org, Brenden Blanco,
	James Bottomley, Tom Herbert, lsf-pc, Alexei Starovoitov
In-Reply-To: <20160411142639.1c5e520b@redhat.com>

On Mon, Apr 11, 2016 at 02:26:39PM +0200, Jesper Dangaard Brouer wrote:
> > Which bottleneck dominates -- the page allocator or the DMA API when
> > setting up coherent pages?
> >
> 
> It is actually both, but mostly DMA on non-x86 archs.  The need to
> support multiple archs, then also cause a slowdown on x86, due to a
> side-effect.
> 
> On arch's like PowerPC, the DMA API is the bottleneck.  To workaround
> the cost of DMA calls, NIC driver alloc large order (compound) pages.
> (dma_map compound page, handout page-fragments for RX ring, and later
> dma_unmap when last RX page-fragments is seen).
> 

So, IMO only holding onto the DMA pages is all that is justified but not a
recycle of order-0 pages built on top of the core allocator. For DMA pages,
it would take a bit of legwork but the per-cpu allocator could be split
and converted to hold arbitrary sized pages with a constructer/destructor
to do the DMA coherency step when pages are taken from or handed back to
the core allocator. I'm not volunteering to do that unfortunately but I
estimate it'd be a few days work unless it needs to be per-CPU and NUMA
aware in which case the memory footprint will be high.

> > I'm wary of another page allocator API being introduced if it's for
> > performance reasons. In response to this thread, I spent two days on
> > a series that boosts performance of the allocator in the fast paths by
> > 11-18% to illustrate that there was low-hanging fruit for optimising. If
> > the one-LRU-per-node series was applied on top, there would be a further
> > boost to performance on the allocation side. It could be further boosted
> > if debugging checks and statistic updates were conditionally disabled by
> > the caller.
> 
> It is always great if you can optimized the page allocator.  IMHO the
> page allocator is too slow.

It's why I spent some time on it as any improvement in the allocator is
an unconditional win without requiring driver modifications.

> At least for my performance needs (67ns
> per packet, approx 201 cycles at 3GHz).  I've measured[1]
> alloc_pages(order=0) + __free_pages() to cost 277 cycles(tsc).
> 

It'd be worth retrying this with the branch

http://git.kernel.org/cgit/linux/kernel/git/mel/linux.git/log/?h=mm-vmscan-node-lru-v4r5

This is an unreleased series that contains both the page allocator
optimisations and the one-LRU-per-node series which in combination remove a
lot of code from the page allocator fast paths. I have no data on how the
combined series behaves but each series individually is known to improve
page allocator performance.

Once you have that, do a hackjob to remove the debugging checks from both the
alloc and free path and see what that leaves. They could be bypassed properly
with a __GFP_NOACCT flag used only by drivers that absolutely require pages
as quickly as possible and willing to be less safe to get that performance.

I expect then that the free path to be dominated by zone and pageblock
lookups which are much harder to remove. The zone lookup can be removed
if the caller knows exactly where the free pages need to go which is
unlikely. The pageblock lookup could be removed if it was coming from a
dedicated pool if the allocation side refills using pageblocks that are
always MIGRATE_UNMOVABLE.

> The trick described above, of allocating a higher order page and
> handing out page-fragments, also workaround this page allocator
> bottleneck (on x86).
> 

Be aware that compound order allocs like this are a double edged sword as
it'll be fast sometimes and other times require reclaim/compaction which
can stall for prolonged periods of time.

> I've measured order 3 (32KB) alloc_pages(order=3) + __free_pages() to
> cost approx 500 cycles(tsc).  That was more expensive, BUT an order=3
> page 32Kb correspond to 8 pages (32768/4096), thus 500/8 = 62.5
> cycles.  Usually a network RX-frame only need to be 2048 bytes, thus
> the "bulk" effect speed up is x16 (32768/2048), thus 31.25 cycles.
> 
> I view this as a bulking trick... maybe the page allocator can just
> give us a bulking API? ;-)
> 

It could on the alloc side relatively easily using either a variation of
rmqueue_bulk exposed at a higher level populating a linked list (link via
page->lru) or an array supplied by the caller.  It's harder to bulk free
quickly as the pages being freed are not necessarily in the same pageblock
requiring lookups in the free path.

Tricky to get right, but preferable to a whole new allocator.

> > The main reason another allocator concerns me is that those pages
> > are effectively pinned and cannot be reclaimed by the VM in low memory
> > situations. It ends up needing its own API for tuning the size and hoping
> > all the drivers get it right without causing OOM situations. It becomes
> > a slippery slope of introducing shrinkers, locking and complexity. Then
> > callers start getting concerned about NUMA locality and having to deal
> > with multiple lists to maintain performance. Ultimately, it ends up being
> > as slow as the page allocator and back to square 1 except now with more code.
> 
> The pages assigned to the RX ring queue are pinned like today.  The
> pages avail in the pool could easily be reclaimed.
> 

How easy depends on how it's structured. If it's a global per-cpu list
then it's an IPI to all CPUs which is straight-forward to implement but
slow to execute. If it's per-driver then there needs to be a locked list
of all pools and locking on each individual pool which could offset some
of the performance benefit of using the pool in the first place.

> I actually think we are better off providing a generic page pool
> interface the drivers can use.  Instead of the situation where drivers
> and subsystems invent their own, which does not cooperate in OOM
> situations.
> 

If it's offsetting DMA setup/teardown then I'd be a bit happier. If it's
yet-another-page allocator to bypass the core allocator then I'm less happy.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
From: Michael S. Tsirkin @ 2016-04-11 12:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: marius vlad, Stefan Hajnoczi, kvm, netdev, Ian Campbell,
	Claudio Imbrenda, Matt Benjamin, Greg Kurz, virtualization,
	Christoffer Dall
In-Reply-To: <20160411104548.GA12826@stefanha-x1.localdomain>

On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote:
> On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote:
> > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6).
> > > 
> > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but
> > > first I want to share the latest version of the code.  Several people are
> > > playing with vsock now so sharing the latest code should avoid duplicate work.
> > 
> > Thanks for this, I've been using it in my project and it mostly seems
> > fine.
> > 
> > One wrinkle I came across, which I'm not sure if it is by design or a
> > problem is that I can see this sequence coming from the guest (with
> > other activity in between):
> > 
> >     1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> >     2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> >     3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
> > 
> > I orignally had my backend close things down at #2, however this meant
> > that when #3 arrived it was for a non-existent socket (or, worse, an
> > active one if the ports got reused). I checked v5 of the spec
> > proposal[0] which says:
> >     If these bits are set and there are no more virtqueue buffers
> >     pending the socket is disconnected.
> > 
> > but I'm not entirely sure if this behaviour contradicts this or not
> > (the bits have both been set at #2, but not at the same time).
> > 
> > BTW, how does one tell if there are no more virtqueue buffers pending
> > or not while processing the op?
> 
> #2 is odd.  The shutdown bits are sticky so they cannot be cleared once
> set.  I would have expected just #1 and #3.  The behavior you observe
> look like a bug.
> 
> The spec text does not convey the meaning of OP_SHUTDOWN well.
> OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> for this connection.  "there are no more virtqueue buffers pending the
> socket" really means that this isn't an immediate close from the
> perspective of the application.  If the application still has unread rx
> buffers then the socket stays readable until the rx data has been fully
> read.

Yes but you also wrote:
	If these bits are set and there are no more virtqueue buffers
	pending the socket is disconnected.

how does remote know that there are no buffers pending and so it's safe
to reuse the same source/destination address now?  Maybe destination
should send RST at that point?



> > Another thing I noticed, which is really more to do with the generic
> > AF_VSOCK bits than anything to do with your patches is that there is no
> > limitations on which vsock ports a non-privileged user can bind to and
> > relatedly that there is no netns support so e.g. users in unproivileged
> > containers can bind to any vsock port and talk to the host, which might
> > be undesirable. For my use for now I just went with the big hammer
> > approach of denying access from anything other than init_net
> > namespace[1] while I consider what the right answer is.
> 
> From the vhost point of view each netns should have its own AF_VSOCK
> namespace.  This way two containers could act as "the host" (CID 2) for
> their respective guests.


I wonder how this interacts with the disconnect on migration
idea that you discussed. Specifically, socket has to stay connected

^ permalink raw reply

* pull-request: wireless-drivers-next 2016-04-11
From: Kalle Valo @ 2016-04-11 12:48 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev

Hi Dave,

here's a pull request for 4.7. More features, but nothing really
standing out. Please let me know if you have any problems.

Kalle


The following changes since commit 4da46cebbd3b4dc445195a9672c99c1353af5695:

  net/core/dev: Warn on a too-short GRO frame (2016-04-05 19:58:39 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git tags/wireless-drivers-next-for-davem-2016-04-11

for you to fetch changes up to 20ac1b325d8d526211b1276ecf9b64b7e8369f50:

  Merge ath-next from ath.git (2016-04-07 21:44:37 +0300)

----------------------------------------------------------------

wireless-drivers patches for 4.7

Major changes:

iwlwifi

* support for Link Quality measurement
* more work 9000 devices and MSIx
* continuation of the Dynamic Queue Allocation work
* make the paging less memory hungry
* 9000 new Rx path
* removal of IWLWIFI_UAPSD Kconfig option

ath10k

* implement push-pull tx model using mac80211 software queuing support
* enable scan in AP mode (NL80211_FEATURE_AP_SCAN)

wil6210

* add basic PBSS (Personal Basic Service Set) support
* add initial P2P support
* add oob_mode module parameter

----------------------------------------------------------------
Amitkumar Karwar (2):
      mwifiex: fix Tx timeout issue during suspend test
      mwifiex: advertise low priority scan feature

Anilkumar Kolli (1):
      ath10k: fix debugfs pktlog_filter write

Aviya Erenfeld (2):
      iwlwifi: mvm: add LQM vendor command and notification
      iwlwifi: add a debugfs hook for LQM

Ayala Beker (1):
      iwlwifi: mvm: update GSCAN capabilities

Bob Copeland (3):
      ath5k: fix incorrect indentation
      ath9k: fix a misleading indentation
      ath9k_htc: fix up indents with spaces

Chaya Rachel Ivgi (2):
      iwlwifi: mvm: handle async temperature notification with unlocked mutex
      iwlwifi: mvm: remove uneeded D0I3 checking

Colin Ian King (4):
      iwlwifi: pcie: remove duplicate assignment of variable isr_stats
      wl12xx: remove redundant null check on wl->scan.ssid
      brcmfmac: sdio: remove unused variable retry_limit
      mwifiex: ie_list is an array, so no need to check if NULL

Dan Carpenter (1):
      brcmfmac: uninitialized "ret" variable

David Spinadel (1):
      iwlwifi: mvm: set aux STA ID in scan config

Dedy Lansky (1):
      wil6210: p2p initial support

Emmanuel Grumbach (6):
      iwlwifi: pcie: print error value as signed int
      iwlwifi: mvm: modify the max SP to infinite
      iwlwifi: add missing mutex_destroy statements
      iwlwifi: make uapsd_disable module param a bitmap
      iwlwifi: remove IWLWIFI_UAPSD Kconfig
      iwlwifi: remove IWL_*_UCODE_API_OK

Eva Rachel Retuya (1):
      iwlwifi: dvm: use alloc_ordered_workqueue()

Ganapathi Bhat (2):
      mwifiex: add support for GTK rekey offload
      mwifiex: add support for wakeup on GTK rekey failure

Geert Uytterhoeven (1):
      mwifiex: Spelling s/minmum/minimum/, s/bandwidth/bandwith/

Geliang Tang (4):
      ipw2x00: use to_pci_dev()
      wlcore: use to_delayed_work()
      wl1251: use to_delayed_work()
      rtlwifi: use to_delayed_work()

Golan Ben-Ami (2):
      iwlwifi: mvm: support dumping UMAC internal txfifos
      iwlwifi: store fw memory segments length and addresses in run-time

Grzegorz Bajorski (1):
      ath10k: deliver mgmt frames from htt to monitor vifs only

Haim Dreyfuss (2):
      iwlwifi: 9000: update device id and FW serial number
      iwlwifi: pcie: Fix index iteration on free_irq in MSIX mode

Hamad Kadmany (1):
      wil6210: Set permanent MAC address to wiphy

Ivan Safonov (1):
      ath9k: Remove unnecessary ?: operator

Jes Sorensen (10):
      rtl8xxxu: Change name of struct tx_desc to be more decriptive
      rtl8xxxu: Rename TX descriptor bits to map them to 32/40 byte descriptors
      rtl8xxxu: Correct txdesc40 gid definition
      rtl8xxxu: TXDESC_SHORT_GI is txdesc32 only
      rtl8xxxu: 8192eu uses txdesc40
      rtl8xxxu: Update some register definitions
      rtl8xxxu: Use enums for chip version numbers
      rtl8xxxu: Identify 8192eu rev A/B parts correctly
      rtl8xxxu: Use correct H2C calls for 8192eu
      rtl8xxxu: Do not set LDOA15 / LDOV12 on 8192eu

Jia-Ju Bai (4):
      iwl4965: Fix a null pointer dereference in il_tx_queue_free and il_cmd_queue_free
      b43: Fix memory leaks in b43_bus_dev_ssb_init and b43_bus_dev_bcma_init
      rtl818x_pci: Disable pci device in error handling code
      iwl4965: Fix a memory leak in error handling code of __il4965_up

Joe Perches (1):
      rtlwifi: btcoexist: Convert BTC_PRINTK to btc_<foo>_dbg

Johannes Berg (1):
      iwlwifi: mvm: remove is_data_qos variable in TX

Joseph Salisbury (1):
      ath5k: Change led pin configuration for compaq c700 laptop

Julian Calaby (1):
      iwl4965: Fix more memory leaks in __il4965_up()

Kalle Valo (2):
      Merge tag 'iwlwifi-next-for-kalle-2016-03-30' of https://git.kernel.org/.../iwlwifi/iwlwifi-next
      Merge ath-next from ath.git

Larry Finger (11):
      rtlwifi: rtl8723be: Add antenna select module parameter
      rtlwifi: btcoexist: Implement antenna selection
      rtlwifi: Fix Smatch warnings
      rtlwifi: btcoexist: Fix Smatch warning
      rtlwifi: rtl8188ee: Fix Smatch warnings
      rtlwifi: rtl8192c-common: Fix Smatch warning
      rtlwifi: rtl8192ee: Fix Smatch warning
      rtlwifi: rtl8192se: Fix Smatch warning
      rtlwifi: rtl8723ae: Fix Smatch warning
      rtlwifi: rtl8723be: Fix Smatch warnings
      rtlwifi: rtl8821ae: Fix Smatch warnings

Liad Kaufman (7):
      iwlwifi: mvm: support bss dynamic alloc/dealloc of queues
      iwlwifi: trans: fix iwl_trans_txq_scd_cfg.sta_id sign
      iwlwifi: mvm: use bss client queue for bss station
      iwlwifi: mvm: set sta_id in SCD_QUEUE_CONFIG cmd
      iwlwifi: mvm: allocate dedicated queue for cab in dqa mode
      iwlwifi: mvm: move cmd queue to be #0 in dqa mode
      iwlwifi: mvm: fix inconsistent lock in dqa mode

Lior David (10):
      wil6210: add support for discovery mode during scan
      wil6210: switch to generated wmi.h
      wil6210: basic PBSS/PCP support
      wil6210: P2P_DEVICE virtual interface support
      wil6210: fix race conditions in p2p listen and search
      wil6210: clean ioctl debug message
      wil6210: fix no_fw_recovery mode with change_virtual_intf
      wil6210: pass is_go flag to firmware
      wil6210: add oob_mode module parameter
      wil6210: allow empty WMI commands in debugfs wmi_send

Luca Coelho (3):
      iwlwifi: pcie: refcounting is not necessary anymore
      iwlwifi: mvm: add a scan timeout for regular scans
      iwlwifi: mvm: allow setting the thermal state in D0i3

Markus Elfring (6):
      ath9k_htc: Delete unnecessary variable initialisation
      brcmfmac: Delete unnecessary variable initialisation
      iwlegacy: Return directly if allocation fails in il_eeprom_init()
      rsi: Delete unnecessary variable initialisation
      rsi: Delete unnecessary variable initialisation
      rsi: Move variable initialisation into error code

Matti Gottlieb (2):
      iwlwifi: mvm: Decrease size of the paging download buffer
      iwlwifi: mvm: make sure FW contains the right amount of paging sections

Maya Erez (3):
      wil6210: remove BACK RX and TX workers
      wil6210: AP: prevent connecting to already connected station
      wil6210: add support for platform specific notification events

Miaoqing Pan (23):
      ath9k: Update QCA953x initvals
      ath9k: Update AR9003 2.2 initvals
      ath9k: Update AR933x initvals
      ath9k: Update AR9340 initvals
      ath9k: Update AR9462 initvals
      ath9k: Update AR9485 initvals
      ath9k: Update AR955x initvals
      ath9k: Update AR9565 initvals
      ath9k: Update QCA956x initvals
      ath9k: Update AR9580 initvals
      ath9k: enable manual peak cal for all ar9300 chips
      ath9k: use AR_SREV_9003_PCOEM to identify PCOEM chips
      ath9k: set correct peak detect threshold
      ath9k: define correct GPIO numbers and bits mask
      ath9k: make GPIO API to support both of WMAC and SOC
      ath9k: free GPIO resource for SOC GPIOs
      ath9k: cleanup led_pin initial
      ath9k: Allow platform override BTCoex pin
      ath9k: add bits definition of BTCoex MODE2/3 for SOC chips
      ath9k: fix BTCoex access invalid registers for SOC chips
      ath9k: fix BTCoex configuration for SOC chips
      ath9k: fix reg dump data bus error
      ath9k: fix rng high cpu load

Michal Kazior (16):
      ath10k: refactor tx code
      ath10k: unify txpath decision
      ath10k: refactor tx pending management
      ath10k: maintain peer_id for each sta and vif
      ath10k: add fast peer_map lookup
      ath10k: add new htt message generation/parsing logic
      ath10k: implement wake_tx_queue
      ath10k: implement updating shared htt txq state
      ath10k: store txq in skb_cb
      ath10k: keep track of queue depth per txq
      ath10k: implement push-pull tx
      ath10k: fix HTT Tx CE ring size
      ath10k: change htt tx desc/qcache peer limit config
      ath10k: fix tx hang
      ath10k: fix pull-push tx threshold handling
      ath10k: fix null deref if device crashes early

Mohammed Shafi Shajakhan (2):
      ath10k: enable debugfs provision to enable Peer Stats feature
      ath10k: enable parsing per station rx duration for 10.4

Oren Givon (1):
      iwlwifi: edit the 9000 series PCI IDs

Peter Oh (2):
      ath10k: set MAC timestamp in management Rx frame
      ath10k: parse Rx MAC timestamp in mgmt frame for FW 10.4

Raja Mani (6):
      ath10k: free cached fw bin contents when get board id fails
      dt: bindings: add new dt entry for pre calibration in qcom, ath10k.txt
      ath10k: pass cal data location as an argument to ath10k_download_cal_{file|dt}
      ath10k: move cal data len to hw_params
      ath10k: incorporate qca4019 cal data download sequence
      ath10k: introduce Extended Resource Config support for 10.4

Rajkumar Manoharan (15):
      ath10k: fix firmware assert in monitor mode
      ath10k: handle channel change htt event
      ath10k: move mgmt descriptor limit handle under mgmt_tx
      ath10k: speedup htt rx descriptor processing for tx completion
      ath10k: copy tx fetch indication message
      ath10k: remove unused fw_desc processing
      ath10k: cleanup amsdu processing for rx indication
      ath10k: speedup htt rx descriptor processing for rx_ind
      ath10k: register ath10k_htt_htc_t2h_msg_handler
      ath10k: cleanup copy engine receive next completion
      ath10k: reuse copy engine 5 (htt rx) descriptors
      ath10k: combine txrx and replenish task
      ath10k: fix calibration init sequence of qca99x0
      ath10k: remove unnecessary warning for probe response drops
      ath10k: fix unconditional num_mpdus_ready subtraction

Sara Sharon (11):
      iwlwifi: pcie: clear trans reference on queue stop
      iwlwifi: pcie: fix global table size
      iwlwifi: pcie: enable interrupts explicitly on resume
      iwlwifi: pcie: do not pad QoS AMSDU
      iwlwifi: mvm: add support for new TX CMD API
      iwlwifi: pcie: write to legacy register also in MQ
      iwlwifi: remove support for fw older than -16.ucode
      iwlwifi: mvm: report checksum is done also for IPv6 packets
      iwlwifi: pcie: request one more interrupt vector
      iwlwifi: mvm: improve RSS configuration
      iwlwifi: mvm: enable TCP/UDP checksum support for 9000 family

Shengzhen Li (1):
      mwifiex: check revision id while choosing PCIe firmware

Steve deRosier (1):
      ath6kl: ignore WMI_TXE_NOTIFY_EVENTID based on fw capability flags

Vasanthakumar Thiagarajan (1):
      ath10k: advertise force AP scan feature

Vishal Thanki (1):
      rt2x00usb: Use usb anchor to manage URB

Vladimir Kondratiev (1):
      wil6210: replay attack detection

Wei-Ning Huang (1):
      mwifiex: fix NULL pointer dereference error

Xinming Hu (4):
      mwifiex: remove redundant GFP_DMA flag
      mwifiex: schedule main workqueue for transmitting bridge packets
      mwifiex: AMSDU Rx frame handling in AP mode
      mwifiex: dump pcie scratch registers

 .../bindings/net/wireless/qcom,ath10k.txt          |   23 +-
 drivers/net/wireless/ath/ath10k/ce.c               |   44 +-
 drivers/net/wireless/ath/ath10k/ce.h               |   15 +-
 drivers/net/wireless/ath/ath10k/core.c             |  156 ++-
 drivers/net/wireless/ath/ath10k/core.h             |   41 +-
 drivers/net/wireless/ath/ath10k/debug.c            |  100 +-
 drivers/net/wireless/ath/ath10k/htt.c              |    2 +-
 drivers/net/wireless/ath/ath10k/htt.h              |   55 +-
 drivers/net/wireless/ath/ath10k/htt_rx.c           |  714 +++++++----
 drivers/net/wireless/ath/ath10k/htt_tx.c           |  291 ++++-
 drivers/net/wireless/ath/ath10k/hw.h               |    6 +-
 drivers/net/wireless/ath/ath10k/mac.c              |  546 +++++++-
 drivers/net/wireless/ath/ath10k/mac.h              |    6 +
 drivers/net/wireless/ath/ath10k/pci.c              |  106 +-
 drivers/net/wireless/ath/ath10k/txrx.c             |   37 +-
 drivers/net/wireless/ath/ath10k/txrx.h             |    4 +-
 drivers/net/wireless/ath/ath10k/wmi-ops.h          |   23 +
 drivers/net/wireless/ath/ath10k/wmi.c              |  132 +-
 drivers/net/wireless/ath/ath10k/wmi.h              |   54 +
 drivers/net/wireless/ath/ath5k/led.c               |    2 +-
 drivers/net/wireless/ath/ath5k/phy.c               |    2 +-
 drivers/net/wireless/ath/ath5k/reset.c             |    4 +-
 drivers/net/wireless/ath/ath6kl/wmi.c              |    5 +
 .../net/wireless/ath/ath9k/ar9003_2p2_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar9003_calib.c      |   44 +-
 drivers/net/wireless/ath/ath9k/ar9003_eeprom.c     |   10 +-
 drivers/net/wireless/ath/ath9k/ar9003_mci.c        |   39 +-
 drivers/net/wireless/ath/ath9k/ar9003_phy.c        |   10 +-
 .../net/wireless/ath/ath9k/ar9330_1p1_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar9330_1p2_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar9340_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar9462_2p0_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar9462_2p1_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar9485_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar953x_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar955x_1p0_initvals.h   |    2 +-
 .../net/wireless/ath/ath9k/ar9565_1p0_initvals.h   |    2 +-
 drivers/net/wireless/ath/ath9k/ar956x_initvals.h   |    2 +-
 .../net/wireless/ath/ath9k/ar9580_1p0_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ath9k.h             |    4 -
 drivers/net/wireless/ath/ath9k/btcoex.c            |  138 +-
 drivers/net/wireless/ath/ath9k/btcoex.h            |    2 +
 drivers/net/wireless/ath/ath9k/debug.c             |   24 +-
 drivers/net/wireless/ath/ath9k/gpio.c              |   69 +-
 drivers/net/wireless/ath/ath9k/hif_usb.c           |    2 +-
 drivers/net/wireless/ath/ath9k/htc_drv_gpio.c      |    8 +-
 drivers/net/wireless/ath/ath9k/htc_drv_init.c      |   14 +-
 drivers/net/wireless/ath/ath9k/hw.c                |  267 ++--
 drivers/net/wireless/ath/ath9k/hw.h                |   11 +-
 drivers/net/wireless/ath/ath9k/init.c              |    1 -
 drivers/net/wireless/ath/ath9k/main.c              |    9 +-
 drivers/net/wireless/ath/ath9k/reg.h               |   90 +-
 drivers/net/wireless/ath/ath9k/rng.c               |   20 +-
 drivers/net/wireless/ath/wil6210/Makefile          |    1 +
 drivers/net/wireless/ath/wil6210/cfg80211.c        |  332 ++++-
 drivers/net/wireless/ath/wil6210/debugfs.c         |   59 +-
 drivers/net/wireless/ath/wil6210/interrupt.c       |    6 +-
 drivers/net/wireless/ath/wil6210/ioctl.c           |   11 +-
 drivers/net/wireless/ath/wil6210/main.c            |   81 +-
 drivers/net/wireless/ath/wil6210/netdev.c          |    7 +-
 drivers/net/wireless/ath/wil6210/p2p.c             |  253 ++++
 drivers/net/wireless/ath/wil6210/pcie_bus.c        |    1 +
 drivers/net/wireless/ath/wil6210/rx_reorder.c      |  204 +--
 drivers/net/wireless/ath/wil6210/trace.h           |   19 +-
 drivers/net/wireless/ath/wil6210/txrx.c            |   67 +-
 drivers/net/wireless/ath/wil6210/txrx.h            |   12 +-
 drivers/net/wireless/ath/wil6210/wil6210.h         |  110 +-
 drivers/net/wireless/ath/wil6210/wil_platform.h    |    8 +-
 drivers/net/wireless/ath/wil6210/wmi.c             |  134 +-
 drivers/net/wireless/ath/wil6210/wmi.h             | 1264 +++++++++----------
 drivers/net/wireless/broadcom/b43/main.c           |    6 +-
 .../wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c  |    2 +-
 .../wireless/broadcom/brcm80211/brcmfmac/sdio.c    |    5 +-
 drivers/net/wireless/intel/ipw2x00/ipw2100.c       |    2 +-
 drivers/net/wireless/intel/iwlegacy/4965-mac.c     |    3 +
 drivers/net/wireless/intel/iwlegacy/common.c       |   22 +-
 drivers/net/wireless/intel/iwlwifi/Kconfig         |   10 -
 drivers/net/wireless/intel/iwlwifi/dvm/main.c      |    2 +-
 drivers/net/wireless/intel/iwlwifi/iwl-1000.c      |   10 +-
 drivers/net/wireless/intel/iwlwifi/iwl-2000.c      |   18 +-
 drivers/net/wireless/intel/iwlwifi/iwl-5000.c      |   11 +-
 drivers/net/wireless/intel/iwlwifi/iwl-6000.c      |   20 +-
 drivers/net/wireless/intel/iwlwifi/iwl-7000.c      |   26 +-
 drivers/net/wireless/intel/iwlwifi/iwl-8000.c      |   13 +-
 drivers/net/wireless/intel/iwlwifi/iwl-9000.c      |   17 +-
 drivers/net/wireless/intel/iwlwifi/iwl-config.h    |    7 +-
 drivers/net/wireless/intel/iwlwifi/iwl-drv.c       |  100 +-
 .../net/wireless/intel/iwlwifi/iwl-fw-error-dump.h |    1 +
 drivers/net/wireless/intel/iwlwifi/iwl-fw-file.h   |   41 +-
 drivers/net/wireless/intel/iwlwifi/iwl-fw.h        |    2 +
 drivers/net/wireless/intel/iwlwifi/iwl-modparams.h |   10 +-
 drivers/net/wireless/intel/iwlwifi/iwl-prph.h      |   12 +
 drivers/net/wireless/intel/iwlwifi/iwl-trans.h     |    4 +-
 drivers/net/wireless/intel/iwlwifi/mvm/Makefile    |    2 +-
 drivers/net/wireless/intel/iwlwifi/mvm/coex.c      |   42 -
 .../net/wireless/intel/iwlwifi/mvm/coex_legacy.c   | 1315 --------------------
 drivers/net/wireless/intel/iwlwifi/mvm/constants.h |    1 -
 drivers/net/wireless/intel/iwlwifi/mvm/d3.c        |    2 +-
 .../net/wireless/intel/iwlwifi/mvm/debugfs-vif.c   |   85 ++
 drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c   |  169 +--
 drivers/net/wireless/intel/iwlwifi/mvm/fw-api-rx.h |   15 +-
 drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h |   35 +-
 drivers/net/wireless/intel/iwlwifi/mvm/fw-api.h    |  108 +-
 drivers/net/wireless/intel/iwlwifi/mvm/fw-dbg.c    |  140 ++-
 drivers/net/wireless/intel/iwlwifi/mvm/fw.c        |   54 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c  |   47 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c  |   75 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mvm.h       |   47 +-
 drivers/net/wireless/intel/iwlwifi/mvm/ops.c       |   34 +-
 drivers/net/wireless/intel/iwlwifi/mvm/power.c     |    2 +-
 drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c      |    9 +-
 drivers/net/wireless/intel/iwlwifi/mvm/scan.c      |   22 +
 drivers/net/wireless/intel/iwlwifi/mvm/sf.c        |    8 +-
 drivers/net/wireless/intel/iwlwifi/mvm/sta.c       |  262 +++-
 drivers/net/wireless/intel/iwlwifi/mvm/sta.h       |   87 +-
 drivers/net/wireless/intel/iwlwifi/mvm/tt.c        |   15 -
 drivers/net/wireless/intel/iwlwifi/mvm/tx.c        |  192 ++-
 drivers/net/wireless/intel/iwlwifi/mvm/utils.c     |  161 ++-
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c      |   16 +-
 drivers/net/wireless/intel/iwlwifi/pcie/internal.h |    6 +-
 drivers/net/wireless/intel/iwlwifi/pcie/rx.c       |   12 +-
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c    |   35 +-
 drivers/net/wireless/intel/iwlwifi/pcie/tx.c       |   80 +-
 .../net/wireless/marvell/mwifiex/11n_rxreorder.c   |    5 +-
 drivers/net/wireless/marvell/mwifiex/cfg80211.c    |   29 +-
 drivers/net/wireless/marvell/mwifiex/fw.h          |   11 +
 drivers/net/wireless/marvell/mwifiex/main.c        |    8 +-
 drivers/net/wireless/marvell/mwifiex/main.h        |    2 +
 drivers/net/wireless/marvell/mwifiex/pcie.c        |   98 +-
 drivers/net/wireless/marvell/mwifiex/pcie.h        |   18 +-
 drivers/net/wireless/marvell/mwifiex/sdio.c        |    7 +-
 drivers/net/wireless/marvell/mwifiex/sta_cmd.c     |   28 +
 drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c |    2 +
 drivers/net/wireless/marvell/mwifiex/sta_event.c   |    3 +
 drivers/net/wireless/marvell/mwifiex/sta_ioctl.c   |    3 +-
 drivers/net/wireless/marvell/mwifiex/tdls.c        |    2 +-
 drivers/net/wireless/marvell/mwifiex/uap_cmd.c     |    2 +-
 drivers/net/wireless/marvell/mwifiex/uap_txrx.c    |   92 ++
 drivers/net/wireless/ralink/rt2x00/rt2x00.h        |    3 +
 drivers/net/wireless/ralink/rt2x00/rt2x00dev.c     |    3 +
 drivers/net/wireless/ralink/rt2x00/rt2x00usb.c     |   21 +-
 drivers/net/wireless/realtek/rtl818x/rtl8180/dev.c |    4 +-
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c   |  163 ++-
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h   |  130 +-
 .../net/wireless/realtek/rtl8xxxu/rtl8xxxu_regs.h  |   31 +-
 .../realtek/rtlwifi/btcoexist/halbtc8192e2ant.c    |  847 ++++++-------
 .../realtek/rtlwifi/btcoexist/halbtc8723b1ant.c    |  611 +++++----
 .../realtek/rtlwifi/btcoexist/halbtc8723b2ant.c    |  865 ++++++-------
 .../realtek/rtlwifi/btcoexist/halbtc8821a1ant.c    |  652 +++++-----
 .../realtek/rtlwifi/btcoexist/halbtc8821a2ant.c    |  851 +++++++------
 .../realtek/rtlwifi/btcoexist/halbtcoutsrc.c       |   31 +-
 .../realtek/rtlwifi/btcoexist/halbtcoutsrc.h       |   19 +-
 .../wireless/realtek/rtlwifi/btcoexist/rtl_btc.c   |    5 +-
 drivers/net/wireless/realtek/rtlwifi/pci.c         |   39 +-
 .../net/wireless/realtek/rtlwifi/rtl8188ee/dm.c    |    2 +-
 .../net/wireless/realtek/rtlwifi/rtl8188ee/phy.c   |    3 +-
 .../wireless/realtek/rtlwifi/rtl8192c/dm_common.c  |    2 +-
 .../net/wireless/realtek/rtlwifi/rtl8192ee/trx.c   |    2 +-
 .../net/wireless/realtek/rtlwifi/rtl8192se/phy.c   |    2 +-
 .../wireless/realtek/rtlwifi/rtl8723ae/hal_btc.c   |    6 +-
 .../net/wireless/realtek/rtlwifi/rtl8723be/hw.c    |    5 +
 .../net/wireless/realtek/rtlwifi/rtl8723be/phy.c   |   10 +-
 .../net/wireless/realtek/rtlwifi/rtl8723be/rf.c    |    4 +-
 .../net/wireless/realtek/rtlwifi/rtl8723be/sw.c    |    3 +
 .../net/wireless/realtek/rtlwifi/rtl8821ae/dm.c    |    6 +-
 .../net/wireless/realtek/rtlwifi/rtl8821ae/phy.c   |    6 +-
 drivers/net/wireless/realtek/rtlwifi/wifi.h        |    5 +-
 drivers/net/wireless/rsi/rsi_91x_pkt.c             |   22 +-
 drivers/net/wireless/ti/wl1251/ps.c                |    2 +-
 drivers/net/wireless/ti/wl12xx/scan.c              |    2 +-
 drivers/net/wireless/ti/wlcore/main.c              |   10 +-
 drivers/net/wireless/ti/wlcore/ps.c                |    2 +-
 drivers/net/wireless/ti/wlcore/scan.c              |    2 +-
 include/linux/ath9k_platform.h                     |    4 +
 174 files changed, 7918 insertions(+), 5930 deletions(-)
 create mode 100644 drivers/net/wireless/ath/wil6210/p2p.c
 delete mode 100644 drivers/net/wireless/intel/iwlwifi/mvm/coex_legacy.c

-- 
Kalle Valo

^ permalink raw reply

* Re: [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Jesper Dangaard Brouer @ 2016-04-11 12:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lsf, linux-mm, netdev@vger.kernel.org, Brenden Blanco,
	James Bottomley, Tom Herbert, lsf-pc, Alexei Starovoitov, brouer
In-Reply-To: <20160411085819.GE21128@suse.de>

On Mon, 11 Apr 2016 09:58:19 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Thu, Apr 07, 2016 at 04:17:15PM +0200, Jesper Dangaard Brouer wrote:
> > (Topic proposal for MM-summit)
> > 
> > Network Interface Cards (NIC) drivers, and increasing speeds stress
> > the page-allocator (and DMA APIs).  A number of driver specific
> > open-coded approaches exists that work-around these bottlenecks in the
> > page allocator and DMA APIs. E.g. open-coded recycle mechanisms, and
> > allocating larger pages and handing-out page "fragments".
> > 
> > I'm proposing a generic page-pool recycle facility, that can cover the
> > driver use-cases, increase performance and open up for zero-copy RX.
> >   
> 
> Which bottleneck dominates -- the page allocator or the DMA API when
> setting up coherent pages?
>

It is actually both, but mostly DMA on non-x86 archs.  The need to
support multiple archs, then also cause a slowdown on x86, due to a
side-effect.

On arch's like PowerPC, the DMA API is the bottleneck.  To workaround
the cost of DMA calls, NIC driver alloc large order (compound) pages.
(dma_map compound page, handout page-fragments for RX ring, and later
dma_unmap when last RX page-fragments is seen).

The unfortunate side-effect is that these RX page-fragments (which
contain packet data) need to be considered 'read-only', because a
dma_unmap call can be destructive.  Network packets need to be
modified (minimum time-to-live).  Thus, netstack alloc new writable
memory, copy-over IP-headers, and adjust offset pointer into RX-page.
Avoiding the dma_unmap (AFAIK) will allow to make RX-pages writable.

Idea by page-pool is to recycling pages back to the originating
device, then we can avoid the need to call dma_unmap().  And only call
dma_map() when setting up pages.

> I'm wary of another page allocator API being introduced if it's for
> performance reasons. In response to this thread, I spent two days on
> a series that boosts performance of the allocator in the fast paths by
> 11-18% to illustrate that there was low-hanging fruit for optimising. If
> the one-LRU-per-node series was applied on top, there would be a further
> boost to performance on the allocation side. It could be further boosted
> if debugging checks and statistic updates were conditionally disabled by
> the caller.

It is always great if you can optimized the page allocator.  IMHO the
page allocator is too slow.  At least for my performance needs (67ns
per packet, approx 201 cycles at 3GHz).  I've measured[1]
alloc_pages(order=0) + __free_pages() to cost 277 cycles(tsc).

The trick described above, of allocating a higher order page and
handing out page-fragments, also workaround this page allocator
bottleneck (on x86).

I've measured order 3 (32KB) alloc_pages(order=3) + __free_pages() to
cost approx 500 cycles(tsc).  That was more expensive, BUT an order=3
page 32Kb correspond to 8 pages (32768/4096), thus 500/8 = 62.5
cycles.  Usually a network RX-frame only need to be 2048 bytes, thus
the "bulk" effect speed up is x16 (32768/2048), thus 31.25 cycles.

I view this as a bulking trick... maybe the page allocator can just
give us a bulking API? ;-)

> The main reason another allocator concerns me is that those pages
> are effectively pinned and cannot be reclaimed by the VM in low memory
> situations. It ends up needing its own API for tuning the size and hoping
> all the drivers get it right without causing OOM situations. It becomes
> a slippery slope of introducing shrinkers, locking and complexity. Then
> callers start getting concerned about NUMA locality and having to deal
> with multiple lists to maintain performance. Ultimately, it ends up being
> as slow as the page allocator and back to square 1 except now with more code.

The pages assigned to the RX ring queue are pinned like today.  The
pages avail in the pool could easily be reclaimed.

I actually think we are better off providing a generic page pool
interface the drivers can use.  Instead of the situation where drivers
and subsystems invent their own, which does not cooperate in OOM
situations.

For the networking fast forwarding use-case (NOT localhost delivery),
then the page pool size would actually be limited at a fairly small
fixed size.  Packets will be hard dropped if exceeding this limit.
The idea is, you want to limit the maximum latency the system can
introduce then forwarding a packet, even in high overload situations.
There is a good argumentation in section 3.2. of Google's paper[2].
They limit the pool size to 3000 and calculate this can max introduce
300 micro-sec latency.

> If it's the DMA API that dominates then something may be required but it
> should rely on the existing page allocator to alloc/free from. It would
> also need something like drain_all_pages to force free everything in there
> in low memory situations. Remember that multiple instances private to
> drivers or tasks will require shrinker implementations and the complexity
> may get unwieldly.

I'll read up on the shrinker interface.

[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench

[2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Eric Dumazet @ 2016-04-11 12:13 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong
In-Reply-To: <570B9126.9080806@huawei.com>

On Mon, 2016-04-11 at 19:57 +0800, Yang Yingliang wrote:
> 
> On 2016/4/8 22:44, Eric Dumazet wrote:
> > On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
> >
> >> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
> >
> > Try :
> >
> > echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> >
> > And restart your flows.
> >
> cat /proc/sys/net/ipv4/tcp_rmem
> 10240 2097152 10485760

What about leaving the default values ?

$ cat /proc/sys/net/ipv4/tcp_rmem
4096	87380	6291456

> 
> echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> 
> It seems has not effect.
> 

I have no idea what you did on the sender side to allow it to send more
than 1.5 MB then.

^ permalink raw reply

* Re: [PATCH net-next 09/11] fjes: Enhance changing MTU related work
From: Jiri Pirko @ 2016-04-11 12:03 UTC (permalink / raw)
  To: Taku Izumi; +Cc: davem, netdev
In-Reply-To: <1460362246-15380-1-git-send-email-izumi.taku@jp.fujitsu.com>

Mon, Apr 11, 2016 at 10:10:46AM CEST, izumi.taku@jp.fujitsu.com wrote:
>This patch enhances the fjes_change_mtu() method
>by introducing new flag named FJES_RX_MTU_CHANGING_DONE
>in rx_status. At the same time, default MTU value is
>changed into 65510 bytes.
>
>Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>

<snip>	
	
>@@ -793,19 +798,54 @@ static int fjes_change_mtu(struct net_device *netdev, int new_mtu)
> 			if (new_mtu == netdev->mtu)
> 				return 0;
> 
>-			if (running)
>-				fjes_close(netdev);
>+			ret = 0;
>+			break;
>+		}
>+	}
>+
>+	if (ret)
>+		return ret;
> 
>-			netdev->mtu = new_mtu;
>+	if (running) {
>+		for (epidx = 0; epidx < hw->max_epid; epidx++) {
>+			if (epidx == hw->my_epid)
>+				continue;
>+			hw->ep_shm_info[epidx].tx.info->v1i.rx_status &=
>+				~FJES_RX_MTU_CHANGING_DONE;
>+		}
>+		netif_tx_stop_all_queues(netdev);
>+		netif_carrier_off(netdev);
>+		cancel_work_sync(&adapter->tx_stall_task);
>+		napi_disable(&adapter->napi);
> 
>-			if (running)
>-				ret = fjes_open(netdev);
>+		msleep(1000);

Will it be enough? I would rather sleep 2000ms here, just to be sure :)

^ permalink raw reply

* Re: [PATCH net-next 04/11] fjes: Add debugfs entry for statistics
From: Jiri Pirko @ 2016-04-11 12:01 UTC (permalink / raw)
  To: Taku Izumi; +Cc: davem, netdev
In-Reply-To: <1460362216-15165-1-git-send-email-izumi.taku@jp.fujitsu.com>

Mon, Apr 11, 2016 at 10:10:16AM CEST, izumi.taku@jp.fujitsu.com wrote:
>This patch introduces debugfs entry named "statistics"
>for statistics information.
>You can get net_stats information by reading
>/sys/kernel/debug/fjes/fjes.N/statistics file.
>This is useful for debugging.
>
>Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
>---
> drivers/net/fjes/fjes_debugfs.c | 72 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
>diff --git a/drivers/net/fjes/fjes_debugfs.c b/drivers/net/fjes/fjes_debugfs.c
>index d2fd892..b0807c2 100644
>--- a/drivers/net/fjes/fjes_debugfs.c
>+++ b/drivers/net/fjes/fjes_debugfs.c
>@@ -31,6 +31,72 @@
> 
> static struct dentry *fjes_debug_root;
> 
>+static int fjes_dbg_stats_show(struct seq_file *m, void *v)
>+{
>+	struct fjes_adapter *adapter = m->private;
>+	struct fjes_hw *hw = &adapter->hw;
>+	int max_epid = hw->max_epid;
>+	int my_epid = hw->my_epid;
>+	int epidx;
>+
>+	seq_printf(m, "%41s", " ");
>+	for (epidx = 0; epidx < max_epid; epidx++)
>+		seq_printf(m, "%10s%d",
>+			   my_epid == epidx ? "(self)EP#" : "EP#", epidx);
>+	seq_printf(m, "\n");
>+
>+#define FJES_DEBUGFS_NET_STATS_ENTRY(X) do {				\
>+	seq_printf(m, "%-41s", #X);					\
>+	for (epidx = 0; epidx < max_epid; epidx++) {			\
>+		if (epidx == my_epid)					\
>+			seq_printf(m, "          -");			\
>+		else							\
>+			seq_printf(m, " %10llu",			\
>+				   hw->ep_shm_info[epidx].net_stats.X); \
>+	}								\
>+	seq_printf(m, "\n");						\
>+} while (0)
>+
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_packets);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_packets);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_bytes);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_bytes);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_dropped);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_dropped);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(multicast);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(collisions);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_length_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_over_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_crc_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_frame_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_fifo_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_missed_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_aborted_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_carrier_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_fifo_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_heartbeat_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_window_errors);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(rx_compressed);
>+	FJES_DEBUGFS_NET_STATS_ENTRY(tx_compressed);

This patch is certainly wrong. You should use existing well defined
stats API to expose this and not custom debufs blob.

^ permalink raw reply

* Re: [PATCH net-next 01/11] fjes: Add trace-gathering facility
From: Jiri Pirko @ 2016-04-11 11:59 UTC (permalink / raw)
  To: Taku Izumi; +Cc: davem, netdev
In-Reply-To: <1460362197-15033-1-git-send-email-izumi.taku@jp.fujitsu.com>

Mon, Apr 11, 2016 at 10:09:57AM CEST, izumi.taku@jp.fujitsu.com wrote:
>This patch introduces trace-gathering facility via ioctl.
>This data is useful for debugging this module and firmware.
>
>Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>

<snip>


>@@ -61,6 +62,7 @@ fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);
> static int fjes_change_mtu(struct net_device *, int);
> static int fjes_vlan_rx_add_vid(struct net_device *, __be16 proto, u16);
> static int fjes_vlan_rx_kill_vid(struct net_device *, __be16 proto, u16);
>+static int fjes_ioctl(struct net_device *, struct ifreq *, int cmd);
> static void fjes_tx_retry(struct net_device *);
> 
> static int fjes_acpi_add(struct acpi_device *);
>@@ -242,6 +244,7 @@ static const struct net_device_ops fjes_netdev_ops = {
> 	.ndo_tx_timeout		= fjes_tx_retry,
> 	.ndo_vlan_rx_add_vid	= fjes_vlan_rx_add_vid,
> 	.ndo_vlan_rx_kill_vid = fjes_vlan_rx_kill_vid,
>+	.ndo_do_ioctl		= fjes_ioctl,
> };

<snip>


>+
>+static int fjes_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
>+{
>+	switch (cmd) {
>+	case FJES_IOCTL_TRACE_START:
>+		return fjes_ioctl_trace_start(netdev, rq);
>+	case FJES_IOCTL_TRACE_STOP:
>+		return fjes_ioctl_trace_stop(netdev, rq);
>+	case FJES_IOCTL_TRACE_GETCFG:
>+		return fjes_ioctl_trace_getcfg(netdev, rq);
>+	default:
>+		return -EOPNOTSUPP;
>+	}
>+}

This ioctl tracing patch looks scarry to me. I don't think you should
add any functionality to ioctl today.

^ permalink raw reply

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Yang Yingliang @ 2016-04-11 11:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong
In-Reply-To: <1460126665.6473.437.camel@edumazet-glaptop3.roam.corp.google.com>



On 2016/4/8 22:44, Eric Dumazet wrote:
> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>
>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>
> Try :
>
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>
> And restart your flows.
>
cat /proc/sys/net/ipv4/tcp_rmem
10240 2097152 10485760

echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale

It seems has not effect.

^ permalink raw reply

* Re: [PATCH net-next 02/11] fjes: Add setting/getting register value feature via ioctl
From: Jiri Pirko @ 2016-04-11 11:56 UTC (permalink / raw)
  To: Taku Izumi; +Cc: davem, netdev
In-Reply-To: <1460362204-15078-1-git-send-email-izumi.taku@jp.fujitsu.com>

Mon, Apr 11, 2016 at 10:10:04AM CEST, izumi.taku@jp.fujitsu.com wrote:
>This patch introduces setting/getting register value feature
>via ioctl. This feature is useful for debugging.
>
>Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
>---
> drivers/net/fjes/fjes_ioctl.h |  7 +++++++
> drivers/net/fjes/fjes_main.c  | 39 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 46 insertions(+)
>
>diff --git a/drivers/net/fjes/fjes_ioctl.h b/drivers/net/fjes/fjes_ioctl.h
>index 35adfda..61619f7 100644
>--- a/drivers/net/fjes/fjes_ioctl.h
>+++ b/drivers/net/fjes/fjes_ioctl.h
>@@ -25,6 +25,8 @@
> #define FJES_IOCTL_TRACE_START		(SIOCDEVPRIVATE + 1)
> #define FJES_IOCTL_TRACE_STOP		(SIOCDEVPRIVATE + 2)
> #define FJES_IOCTL_TRACE_GETCFG		(SIOCDEVPRIVATE + 3)
>+#define FJES_IOCTL_DEV_GETREG		(SIOCDEVPRIVATE + 4)
>+#define FJES_IOCTL_DEV_SETREG		(SIOCDEVPRIVATE + 5)


This patch certainly looks wrong to me. Exposing read and mainly write
access to registers using ioctl? I don't think so...




> 
> struct fjes_ioctl_trace_start_req_val {
> 	u32	mode;
>@@ -70,6 +72,11 @@ struct fjes_ioctl_trace_param {
> 	union fjes_ioctl_trace_res_val	res;
> };
> 
>+struct fjes_ioctl_dev_reg_param {
>+	u32	offset;
>+	u32	val;
>+};
>+
> #define FJES_IOCTL_TRACE_START_ERR_NORMAL		(0x0000)
> #define FJES_IOCTL_TRACE_START_ERR_BUSY			(0x0001)
> #define FJES_IOCTL_TRACE_START_ERR_PARAM		(0x0100)
>diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
>index bc6e31d..40cf65d 100644
>--- a/drivers/net/fjes/fjes_main.c
>+++ b/drivers/net/fjes/fjes_main.c
>@@ -977,6 +977,40 @@ static int fjes_ioctl_trace_getcfg(struct net_device *netdev, struct ifreq *rq)
> 	return 0;
> }
> 
>+static int fjes_ioctl_reg_read(struct net_device *netdev, struct ifreq *rq)
>+{
>+	struct fjes_adapter *adapter = netdev_priv(netdev);
>+	struct fjes_ioctl_dev_reg_param reg;
>+	struct fjes_hw *hw = &adapter->hw;
>+
>+	if (copy_from_user(&reg, rq->ifr_data, sizeof(reg)))
>+		return -EFAULT;
>+
>+	reg.val = rd32(reg.offset);
>+
>+	if (copy_to_user(rq->ifr_data, &reg, sizeof(reg)))
>+		return -EFAULT;
>+
>+	return 0;
>+}
>+
>+static int fjes_ioctl_reg_write(struct net_device *netdev, struct ifreq *rq)
>+{
>+	struct fjes_adapter *adapter = netdev_priv(netdev);
>+	struct fjes_ioctl_dev_reg_param reg;
>+	struct fjes_hw *hw = &adapter->hw;
>+
>+	if (copy_from_user(&reg, rq->ifr_data, sizeof(reg)))
>+		return -EFAULT;
>+
>+	wr32(reg.offset, reg.val);
>+
>+	if (copy_to_user(rq->ifr_data, &reg, sizeof(reg)))
>+		return -EFAULT;
>+
>+	return 0;
>+}
>+
> static int fjes_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
> {
> 	switch (cmd) {
>@@ -986,7 +1020,12 @@ static int fjes_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
> 		return fjes_ioctl_trace_stop(netdev, rq);
> 	case FJES_IOCTL_TRACE_GETCFG:
> 		return fjes_ioctl_trace_getcfg(netdev, rq);
>+	case FJES_IOCTL_DEV_GETREG:
>+		return fjes_ioctl_reg_read(netdev, rq);
>+	case FJES_IOCTL_DEV_SETREG:
>+		return fjes_ioctl_reg_write(netdev, rq);
> 	default:
>+
> 		return -EOPNOTSUPP;
> 	}
> }
>-- 
>2.4.3
>

^ permalink raw reply

* Re: AF_VSOCK status
From: Antoine Martin @ 2016-04-11 11:53 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev
In-Reply-To: <20160406092609.GA17538@stefanha-x1.localdomain>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(snip)
> The patches are on the list (latest version sent last week): 
> http://comments.gmane.org/gmane.linux.kernel.virtualization/27455
> 
> They are only "Request For Comments" because the VIRTIO 
> specification changes have not been approved yet.  Once the spec
> is approved then the patches can be seriously considered for
> merging.
> 
> There will definitely be a v6 with Claudio Imbrenda's locking 
> fixes.
If that's any help, feel free to CC me and we'll test it.
(not sure how long I will stay subscribed to this high traffic list)

>> We now have a vsock transport merged into xpra, which works very 
>> well with the kernel and qemu versions found here: 
>> http://qemu-project.org/Features/VirtioVsock Congratulations on 
>> making this easy to use! Is the upcoming revised interface
>> likely to cause incompatibilities with existing binaries?
> 
> Userspace applications should not notice a difference.
Great.

>> It seems impossible for the host to connect to a guest: the
>> guest has to initiate the connection. Is this a feature / known 
>> limitation or am I missing something? For some of our use cases, 
>> it would be more practical to connect in the other direction.
> 
> host->guest connections have always been allowed.  I just checked 
> that it works with the latest code in my repo:
> 
> guest# nc-vsock -l 1234 host# nc-vsock 3 1234
Sorry about that, it does work fine, I must have tested it wrong.
With our latest code:
* host connecting to a guest session:
guest# xpra start --bind-vsock=auto:1234 --start-child=xterm
host# xpra attach vsock:$THECID:1234
* guest out to the host (no need for knowing the CID):
host# xpra start --bind-vsock=auto:1234 --start-child=xterm
guest# xpra attach vsock:host:1234

>> In terms of raw performance, I am getting about 10Gbps on an 
>> Intel Skylake i7 (the data stream arrives from the OS socket
>> recv syscall split into 256KB chunks), that's good but not much
>> faster than virtio-net and since the packets are avoiding all
>> sorts of OS layer overheads I was hoping to get a little bit
>> closer to the ~200Gbps memory bandwidth that this CPU+RAM are
>> capable of. Am I dreaming or just doing it wrong?
> 
> virtio-vsock is not yet optimized but the priority is not to make 
> something faster than virtio-net.  virtio-vsock is not for 
> applications who are trying to squeeze out every last drop of 
> performance.  Instead the goal is to have a transport for 
> guest<->hypervisor services that need to be zero-configuration.
Understood. It does that and this is a big win for us already, it's
also faster than virtio-net it seems, so this was not a complaint.

>> How hard would it be to introduce a virtio mmap-like transport
>> of some sort so that the guest and host could share some memory 
>> region? I assume this would give us the best possible
>> performance when transferring large amounts of data? (we already
>> have a local mmap transport we could adapt)
> 
> Shared memory is beyond the scope of virtio-vsock and it's
> unlikely to be added.
I wasn't thinking of adding this to virtio-vsock, this would be a
separate backend.

> There are a few existing ways to achieve that without involving 
> virtio-vsock: vhost-user or ivshmem.
Yes, I've looked at those and they seem a bit overkill for what we
want to achieve. We don't want sharing with multiple guests, or
interrupts.
All we want is a chunk of host memory to be accessible from the guest..

Thanks
Antoine
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlcLkBsACgkQGK2zHPGK1ruJ0wCfbNkc5L0ewUBuI7DgTyuwGRBz
aZoAn2pEbrVAkLoCMOunCYQ1FoaDIETr
=qrz1
-----END PGP SIGNATURE-----

^ permalink raw reply

* RE: AP firmware for TI wl1251 wifi chip (wl1251-fw-ap.bin)
From: Machani, Yaniv @ 2016-04-11 11:49 UTC (permalink / raw)
  To: Pali Rohár, Pavel Machek
  Cc: Chalmers, Kevin, Kalle Valo, Davis, Andrew, Mishol, Guy,
	Arik Nemtsov, Gery Kahn, Felipe Balbi, David Woodhouse,
	Aaro Koskinen, Ben Hutchings, David Gnedt, Ivaylo Dimitrov,
	Sebastian Reichel, Tony Lindgren, Menon, Nishanth,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20160411114118.GT8413@pali>

On Mon, Apr 11, 2016 at 14:41:18, Pali Rohár wrote:
> Mishol, Guy; Arik Nemtsov; Gery Kahn; Felipe Balbi; David Woodhouse; 
> Aaro Koskinen; Ben Hutchings; David Gnedt; Ivaylo Dimitrov; Sebastian 
> Reichel; Tony Lindgren; Menon, Nishanth; 
> linux-wireless@vger.kernel.org; netdev@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Subject: Re: AP firmware for TI wl1251 wifi chip (wl1251-fw-ap.bin)
> 
> On Sunday 10 April 2016 13:51:41 Pavel Machek wrote:
> > Is it "hardware can't do AP", "firmware can't do AP" or "current 
> > drivers do not support AP"?
> 

As both Firmware and HW are not in a stage where they can be modified, there is no support for AP mode in both.

Regards,
Yaniv Machani



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox