Netdev List
 help / color / mirror / Atom feed
* Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Jesper Dangaard Brouer @ 2016-04-11 16:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: James Bottomley, netdev@vger.kernel.org, Brenden Blanco, lsf,
	linux-mm, Mel Gorman, Tom Herbert, lsf-pc, Alexei Starovoitov,
	brouer
In-Reply-To: <20160411130826.GB32073@techsingularity.net>


On Mon, 11 Apr 2016 14:08:27 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> On Mon, Apr 11, 2016 at 02:26:39PM +0200, Jesper Dangaard Brouer wrote:
[...]
> > 
> > It is always great if you can optimized the page allocator.  IMHO the
> > page allocator is too slow.  
> 
> It's why I spent some time on it as any improvement in the allocator is
> an unconditional win without requiring driver modifications.
> 
> > At least for my performance needs (67ns
> > per packet, approx 201 cycles at 3GHz).  I've measured[1]
> > alloc_pages(order=0) + __free_pages() to cost 277 cycles(tsc).
> >   
> 
> It'd be worth retrying this with the branch
> 
> http://git.kernel.org/cgit/linux/kernel/git/mel/linux.git/log/?h=mm-vmscan-node-lru-v4r5
> 

The cost decreased to: 228 cycles(tsc), but there are some variations,
sometimes it increase to 238 cycles(tsc).

Nice, but there is still a looong way to my performance target, where I
can spend 201 cycles for the entire forwarding path....


> This is an unreleased series that contains both the page allocator
> optimisations and the one-LRU-per-node series which in combination remove a
> lot of code from the page allocator fast paths. I have no data on how the
> combined series behaves but each series individually is known to improve
> page allocator performance.
>
> Once you have that, do a hackjob to remove the debugging checks from both the
> alloc and free path and see what that leaves. They could be bypassed properly
> with a __GFP_NOACCT flag used only by drivers that absolutely require pages
> as quickly as possible and willing to be less safe to get that performance.

I would be interested in testing/benchmarking a patch where you remove
the debugging checks...

You are also welcome to try out my benchmarking modules yourself:
 https://github.com/netoptimizer/prototype-kernel/blob/master/getting_started.rst

This is really simple stuff (for rapid prototyping) I'm just doing:
 modprobe page_bench01; rmmod page_bench01 ; dmesg | tail -n40

[...]
> 
> Be aware that compound order allocs like this are a double edged sword as
> it'll be fast sometimes and other times require reclaim/compaction which
> can stall for prolonged periods of time.

Yes, I've notice that there can be a fairly high variation, when doing
compound order allocs, which is not so nice!  I really don't like these
variations....

Drivers also do tricks where they fallback to smaller order pages. E.g.
lookup function mlx4_alloc_pages().  I've tried to simulate that
function here:
https://github.com/netoptimizer/prototype-kernel/blob/91d323fc53/kernel/mm/bench/page_bench01.c#L69

It does not seem very optimal. I tried to mem pressure the system a bit
to cause the alloc_pages() to fail, and then the result were very bad,
something like 2500 cycles, and it usually got the next order pages.


> > I've measured order 3 (32KB) alloc_pages(order=3) + __free_pages() to
> > cost approx 500 cycles(tsc).  That was more expensive, BUT an order=3
> > page 32Kb correspond to 8 pages (32768/4096), thus 500/8 = 62.5
> > cycles.  Usually a network RX-frame only need to be 2048 bytes, thus
> > the "bulk" effect speed up is x16 (32768/2048), thus 31.25 cycles.

The order=3 cost were reduced to: 417 cycles(tsc), nice!  But I've also
seen it jump to 611 cycles.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next 00/11] FUJITSU Extended Socket driver version 1.1
From: David Miller @ 2016-04-11 15:56 UTC (permalink / raw)
  To: izumi.taku; +Cc: netdev
In-Reply-To: <1460362136-14968-1-git-send-email-izumi.taku@jp.fujitsu.com>


This submission is of an extremely low quality.

All of your ioctl additions are completely inappropriate, as are your
debugfs facilities.  You must remove all of them completely.

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Eric Dumazet @ 2016-04-11 15:52 UTC (permalink / raw)
  To: Lars Persson; +Cc: Lars Persson, netdev, jhs, linux-kernel, xiyou.wangcong
In-Reply-To: <570BC024.1070504@axis.com>

On Mon, 2016-04-11 at 17:17 +0200, Lars Persson wrote:
> 
> On 04/11/2016 04:22 PM, Eric Dumazet wrote:
> > On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:
> >
> >> I though it would be prudent because the queue can be non-empty even for
> >> the case of skb=NULL. So should it be there in this patch, another patch
> >> or not at all ?
> >
> > Then maybe change return code ?
> >
> > It seems strange that a validate_xmit_skb_list() failure stops the
> > __qdisc_run() loop but schedules another round.
> >
> >
> 
> It was suggested by Cong Wang to return 0 in order to stop the loop. Do 
> you guys agree that the loop should be stopped for such failures ? Then 
> I will put the schedule call inside the if as you proposed earlier.

What are the causes of validate_xmit_skb_list() failures ?

If gso segmentations fail because of memory pressure, better free more
skbs right now.

In any case, having a single test " if (skb)  " sounds better to me,
to have a fast path.

So your first patch was probably a better idea.

v2 has two tests instead of one.

^ permalink raw reply

* [PATCH iproute2 v2 3/3] bridge: vlan: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 15:45 UTC (permalink / raw)
  To: netdev; +Cc: roopa, stephen, Nikolay Aleksandrov
In-Reply-To: <1460389516-1643-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vid" to bridge vlan show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to
match the add/del one - "vid". This filtering can be used also with the
"-compressvlans" option to see in which range is a vlan (if in any).
Also this will be used to show only specific per-vlan statistics later
when support is added to the kernel for it.

Examples:
$ bridge vlan show vid 450
port	vlan ids
eth2	 450

$ bridge -c vlan show vid 450
port	vlan ids
eth2	 400-500

$ bridge vlan show vid 1
port	vlan ids
eth1	 1 PVID Egress Untagged
eth2	 1 PVID
br0	 1 PVID Egress Untagged

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: don't print ports which are not matching the vlan filter
The vcheck_ret == 1 case implicit use is to avoid a nesting level which
breaks a lot of lines and produces uglier result.

 bridge/vlan.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 50 insertions(+), 10 deletions(-)

diff --git a/bridge/vlan.c b/bridge/vlan.c
index ae588323d9b1..717025ae6eec 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -13,13 +13,13 @@
 #include "br_common.h"
 #include "utils.h"
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
 	fprintf(stderr, "Usage: bridge vlan { add | del } vid VLAN_ID dev DEV [ pvid] [ untagged ]\n");
 	fprintf(stderr, "                                                     [ self ] [ master ]\n");
-	fprintf(stderr, "       bridge vlan { show } [ dev DEV ]\n");
+	fprintf(stderr, "       bridge vlan { show } [ dev DEV ] [ vid VLAN_ID ]\n");
 	exit(-1);
 }
 
@@ -138,6 +138,26 @@ static int vlan_modify(int cmd, int argc, char **argv)
 	return 0;
 }
 
+/* In order to use this function for both filtering and non-filtering cases
+ * we need to make it a tristate:
+ * return -1 - if filtering we've gone over so don't continue
+ * return  0 - skip entry and continue (applies to range start or to entries
+ *             which are less than filter_vlan)
+ * return  1 - print the entry and continue
+ */
+static int filter_vlan_check(struct bridge_vlan_info *vinfo)
+{
+	/* if we're filtering we should stop on the first greater entry */
+	if (filter_vlan && vinfo->vid > filter_vlan &&
+	    !(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
+		return -1;
+	if ((vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) ||
+	    vinfo->vid < filter_vlan)
+		return 0;
+
+	return 1;
+}
+
 static int print_vlan(const struct sockaddr_nl *who,
 		      struct nlmsghdr *n,
 		      void *arg)
@@ -169,26 +189,40 @@ static int print_vlan(const struct sockaddr_nl *who,
 
 	/* if AF_SPEC isn't there, vlan table is not preset for this port */
 	if (!tb[IFLA_AF_SPEC]) {
-		fprintf(fp, "%s\tNone\n", ll_index_to_name(ifm->ifi_index));
+		if (!filter_vlan)
+			fprintf(fp, "%s\tNone\n",
+				ll_index_to_name(ifm->ifi_index));
 		return 0;
 	} else {
 		struct rtattr *i, *list = tb[IFLA_AF_SPEC];
 		int rem = RTA_PAYLOAD(list);
+		__u16 last_vid_start = 0;
 
-		fprintf(fp, "%s", ll_index_to_name(ifm->ifi_index));
+		if (!filter_vlan)
+			fprintf(fp, "%s", ll_index_to_name(ifm->ifi_index));
 		for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
 			struct bridge_vlan_info *vinfo;
+			int vcheck_ret;
 
 			if (i->rta_type != IFLA_BRIDGE_VLAN_INFO)
 				continue;
 
 			vinfo = RTA_DATA(i);
-			if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END)
-				fprintf(fp, "-%hu", vinfo->vid);
-			else
-				fprintf(fp, "\t %hu", vinfo->vid);
-			if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN)
+
+			if (!(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
+				last_vid_start = vinfo->vid;
+			vcheck_ret = filter_vlan_check(vinfo);
+			if (vcheck_ret == -1)
+				break;
+			else if (vcheck_ret == 0)
 				continue;
+
+			if (filter_vlan)
+				fprintf(fp, "%s",
+					ll_index_to_name(ifm->ifi_index));
+			fprintf(fp, "\t %hu", last_vid_start);
+			if (last_vid_start != vinfo->vid)
+				fprintf(fp, "-%hu", vinfo->vid);
 			if (vinfo->flags & BRIDGE_VLAN_INFO_PVID)
 				fprintf(fp, " PVID");
 			if (vinfo->flags & BRIDGE_VLAN_INFO_UNTAGGED)
@@ -196,7 +230,8 @@ static int print_vlan(const struct sockaddr_nl *who,
 			fprintf(fp, "\n");
 		}
 	}
-	fprintf(fp, "\n");
+	if (!filter_vlan)
+		fprintf(fp, "\n");
 	fflush(fp);
 	return 0;
 }
@@ -211,6 +246,11 @@ static int vlan_show(int argc, char **argv)
 			if (filter_dev)
 				duparg("dev", *argv);
 			filter_dev = *argv;
+		} else if (strcmp(*argv, "vid") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vid", *argv);
+			filter_vlan = atoi(*argv);
 		}
 		argc--; argv++;
 	}
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 v2 2/3] bridge: mdb: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 15:45 UTC (permalink / raw)
  To: netdev; +Cc: roopa, stephen, Nikolay Aleksandrov
In-Reply-To: <1460389516-1643-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vid" to bridge mdb show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to match
the add/del one - "vid".

Example:
$ bridge mdb show vid 200
dev br0 port eth2 grp 239.0.0.1 permanent vid 200

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: no change

 bridge/mdb.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 842536ec003c..6c904f8e6ae8 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -24,12 +24,12 @@
 	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct br_port_msg))))
 #endif
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
 	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp] [vid VID]\n");
-	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
+	fprintf(stderr, "       bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
 	exit(-1);
 }
 
@@ -92,6 +92,8 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e,
 	const void *src;
 	int af;
 
+	if (filter_vlan && e->vid != filter_vlan)
+		return;
 	af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6;
 	src = af == AF_INET ? (const void *)&e->addr.u.ip4 :
 			      (const void *)&e->addr.u.ip6;
@@ -195,6 +197,11 @@ static int mdb_show(int argc, char **argv)
 			if (filter_dev)
 				duparg("dev", *argv);
 			filter_dev = *argv;
+		} else if (strcmp(*argv, "vid") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vid", *argv);
+			filter_vlan = atoi(*argv);
 		}
 		argc--; argv++;
 	}
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 v2 1/3] bridge: fdb: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 15:45 UTC (permalink / raw)
  To: netdev; +Cc: roopa, stephen, Nikolay Aleksandrov
In-Reply-To: <1460389516-1643-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vlan" to bridge fdb show so the user can request
filtering by a specific vlan id. Currently the filtering is implemented
only in user-space. The argument name has been chosen to match the
add/del one - "vlan".

Example:
$ bridge fdb show vlan 400
52:54:00:bf:57:16 dev eth2 vlan 400 master br0 permanent

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: no change

 bridge/fdb.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index df55e86df83f..be849f980a80 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -27,7 +27,7 @@
 #include "rt_names.h"
 #include "utils.h"
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
@@ -35,7 +35,7 @@ static void usage(void)
 			"              [ self ] [ master ] [ use ] [ router ]\n"
 			"              [ local | static | dynamic ] [ dst IPADDR ] [ vlan VID ]\n"
 			"              [ port PORT] [ vni VNI ] [ via DEV ]\n");
-	fprintf(stderr, "       bridge fdb [ show [ br BRDEV ] [ brport DEV ] ]\n");
+	fprintf(stderr, "       bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ] ]\n");
 	exit(-1);
 }
 
@@ -65,6 +65,7 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	struct ndmsg *r = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
 	struct rtattr *tb[NDA_MAX+1];
+	__u16 vid = 0;
 
 	if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
 		fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
@@ -88,6 +89,12 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
 		     n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
 
+	if (tb[NDA_VLAN])
+		vid = rta_getattr_u16(tb[NDA_VLAN]);
+
+	if (filter_vlan && filter_vlan != vid)
+		return 0;
+
 	if (n->nlmsg_type == RTM_DELNEIGH)
 		fprintf(fp, "Deleted ");
 
@@ -115,11 +122,8 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 				    RTA_DATA(tb[NDA_DST])));
 	}
 
-	if (tb[NDA_VLAN]) {
-		__u16 vid = rta_getattr_u16(tb[NDA_VLAN]);
-
+	if (vid)
 		fprintf(fp, "vlan %hu ", vid);
-	}
 
 	if (tb[NDA_PORT])
 		fprintf(fp, "port %d ", ntohs(rta_getattr_u16(tb[NDA_PORT])));
@@ -190,6 +194,11 @@ static int fdb_show(int argc, char **argv)
 		} else if (strcmp(*argv, "br") == 0) {
 			NEXT_ARG();
 			br = *argv;
+		} else if (strcmp(*argv, "vlan") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vlan", *argv);
+			filter_vlan = atoi(*argv);
 		} else {
 			if (matches(*argv, "help") == 0)
 				usage();
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 v2 0/3] bridge: filtering by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 15:45 UTC (permalink / raw)
  To: netdev; +Cc: roopa, stephen, Nikolay Aleksandrov
In-Reply-To: <570BC2FD.8010201@cumulusnetworks.com>

Hi,
This set adds support for filtering by a vlan id when showing fdb/mdb/vlan
entries. Currently the filtering is implemented entirely in user-space, but
the plan is to add kernel support as well. The vlan show part is also needed
for the future per-vlan statistics in order to be able to show them only for
a specific vlan. I plan to update the bridge man page soon as it's missing
other options too and it seemed inconsistent to add this given that there're
potential paragraphs missing, thus I'll post a separate patch for that.

v2: in patch 03 print only the ports having the vlan instead of empty
"vlan ids" column

Thank you,
 Nik


Nikolay Aleksandrov (3):
  bridge: fdb: add support to filter by vlan id
  bridge: mdb: add support to filter by vlan id
  bridge: vlan: add support to filter by vlan id

 bridge/fdb.c  | 21 +++++++++++++++------
 bridge/mdb.c  | 11 +++++++++--
 bridge/vlan.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++----------
 3 files changed, 74 insertions(+), 18 deletions(-)

-- 
2.4.3

^ permalink raw reply

* Re: [PATCH iproute2 0/3] bridge: filtering by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 15:30 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

On 04/11/2016 03:18 PM, Nikolay Aleksandrov wrote:
> Hi,
> This set adds support for filtering by a vlan id when showing fdb/mdb/vlan
> entries. Currently the filtering is implemented entirely in user-space, but
> the plan is to add kernel support as well. The vlan show part is also needed
> for the future per-vlan statistics in order to be able to show them only for
> a specific vlan. I plan to update the bridge man page soon as it's missing
> other options too and it seemed inconsistent to add this given that there're
> potential paragraphs missing, thus I'll post a separate patch for that.
> 
> Thank you,
>  Nik
> 

Self-NAK, after discussing with colleagues, we think it'd be better not to print
the non-matching ports at all (right now they're printed with empty "vlan ids"
column). I'll post a v2 with updated patch 03.

Cheers,
 Nik

^ permalink raw reply

* [PATCH] mwifiex: fix possible NULL dereference
From: Sudip Mukherjee @ 2016-04-11 15:27 UTC (permalink / raw)
  To: Amitkumar Karwar, Nishant Sarmukadam, Kalle Valo
  Cc: linux-kernel, linux-wireless, netdev, Sudip Mukherjee

From: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>

We have a check for card just after dereferencing it. So if it is NULL
we have already dereferenced it before its check. Lets dereference it
after checking card for NULL.

Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index edf8b07..84562d0 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2884,10 +2884,11 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter)
 {
 	struct pcie_service_card *card = adapter->card;
 	const struct mwifiex_pcie_card_reg *reg;
-	struct pci_dev *pdev = card->dev;
+	struct pci_dev *pdev;
 	int i;
 
 	if (card) {
+		pdev = card->dev;
 		if (card->msix_enable) {
 			for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++)
 				synchronize_irq(card->msix_entries[i].vector);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH 1/9] net: mediatek: update the IRQ part of the binding document
From: Rob Herring @ 2016-04-11 15:24 UTC (permalink / raw)
  To: John Crispin
  Cc: David S. Miller, Felix Fietkau, Matthias Brugger,
	Sean Wang (王志亘), netdev, linux-mediatek,
	linux-kernel, devicetree
In-Reply-To: <1460051876-53135-1-git-send-email-blogic@openwrt.org>

On Thu, Apr 07, 2016 at 07:57:48PM +0200, John Crispin wrote:
> The current binding document only describes a single interrupt. Update the
> document by adding the 2 other interrupts.
> 
> The driver currently only uses a single interrupt. The HW is however able
> to using IRQ grouping to split TX and RX onto separate GIC irqs.

I assume you aren't breaking existing DTs, and the driver will continue 
to work with a single irq specified?

> 
> Signed-off-by: John Crispin <blogic@openwrt.org>
> Cc: devicetree@vger.kernel.org
> ---
>  Documentation/devicetree/bindings/net/mediatek-net.txt |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
> index 5ca7929..2f142be 100644
> --- a/Documentation/devicetree/bindings/net/mediatek-net.txt
> +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
> @@ -9,7 +9,7 @@ have dual GMAC each represented by a child node..
>  Required properties:
>  - compatible: Should be "mediatek,mt7623-eth"
>  - reg: Address and length of the register set for the device
> -- interrupts: Should contain the frame engines interrupt
> +- interrupts: Should contain the three frame engines interrupts

Need to define what each irq is and the order.

>  - clocks: the clock used by the core
>  - clock-names: the names of the clock listed in the clocks property. These are
>  	"ethif", "esw", "gp2", "gp1"
> @@ -42,7 +42,9 @@ eth: ethernet@1b100000 {
>  		 <&ethsys CLK_ETHSYS_GP2>,
>  		 <&ethsys CLK_ETHSYS_GP1>;
>  	clock-names = "ethif", "esw", "gp2", "gp1";
> -	interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW>;
> +	interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW
> +		      GIC_SPI 199 IRQ_TYPE_LEVEL_LOW
> +		      GIC_SPI 198 IRQ_TYPE_LEVEL_LOW>;
>  	power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
>  	resets = <&ethsys MT2701_ETHSYS_ETH_RST>;
>  	reset-names = "eth";
> -- 
> 1.7.10.4
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Lars Persson @ 2016-04-11 15:17 UTC (permalink / raw)
  To: Eric Dumazet, Lars Persson; +Cc: netdev, jhs, linux-kernel, xiyou.wangcong
In-Reply-To: <1460384551.6473.551.camel@edumazet-glaptop3.roam.corp.google.com>



On 04/11/2016 04:22 PM, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:
>
>> I though it would be prudent because the queue can be non-empty even for
>> the case of skb=NULL. So should it be there in this patch, another patch
>> or not at all ?
>
> Then maybe change return code ?
>
> It seems strange that a validate_xmit_skb_list() failure stops the
> __qdisc_run() loop but schedules another round.
>
>

It was suggested by Cong Wang to return 0 in order to stop the loop. Do 
you guys agree that the loop should be stopped for such failures ? Then 
I will put the schedule call inside the if as you proposed earlier.

- Lars

^ permalink raw reply

* [PATCH net-next] vxlan: fix incorrect type
From: Jiri Benc @ 2016-04-11 15:06 UTC (permalink / raw)
  To: netdev; +Cc: Dan Carpenter

The protocol is 16bit, not 32bit.

Fixes: e1e5314de08ba ("vxlan: implement GPE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9f3634064c92..7f697a3f00a4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1181,7 +1181,7 @@ out:
 }
 
 static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
-				__be32 *protocol,
+				__be16 *protocol,
 				struct sk_buff *skb, u32 vxflags)
 {
 	struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
@@ -1284,7 +1284,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 	struct vxlanhdr unparsed;
 	struct vxlan_metadata _md;
 	struct vxlan_metadata *md = &_md;
-	__be32 protocol = htons(ETH_P_TEB);
+	__be16 protocol = htons(ETH_P_TEB);
 	bool raw_proto = false;
 	void *oiph;
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Yang Yingliang @ 2016-04-11 14:42 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, dingtianhong
In-Reply-To: <1460135072.6473.441.camel@edumazet-glaptop3.roam.corp.google.com>



On 2016/4/9 1:04, Eric Dumazet wrote:
> On Fri, 2016-04-08 at 12:53 -0400, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Fri, 08 Apr 2016 07:44:25 -0700
>>
>>> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
>>>
>>>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
>>>
>>> Try :
>>>
>>> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
>>>
>>> And restart your flows.
>>
>> I'm honestly beginning to suspect a bug in their driver and how they
>> handle skb->truesize.
>>
>> Yang, until you show us the driver you are using and how is handles
>> receive packets, we are largely in the dark about a major component
>> of this issue and that is entirely unfair to us.
>
> Apparently their skb->truesize and skb->len combinations are correct.
>
> I suspect an issue with rcvbuf autouning on a bidirectional tcp traffic.
> We mostly focus on unidirectional flows, but they seem to use a mixed
> case.
>
> Also, fact that sendmsg() locks the socket for the duration of the call
> is problematic : I suspect their issues would mostly disappear by using
> smaller chunk sizes (ie 64KB per sendmsg() instead of 256KB).
It's less packets dropping with using 64KB chunk.

>
> We also could add resched points in sendmsg() (processing backlog if it
> gets too hot), but I fear this would slow down the fast path.
>
>
>
>
>

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Eric Dumazet @ 2016-04-11 14:22 UTC (permalink / raw)
  To: Lars Persson; +Cc: Lars Persson, netdev, jhs, linux-kernel, xiyou.wangcong
In-Reply-To: <570BA8C7.1000905@axis.com>

On Mon, 2016-04-11 at 15:38 +0200, Lars Persson wrote:

> I though it would be prudent because the queue can be non-empty even for 
> the case of skb=NULL. So should it be there in this patch, another patch 
> or not at all ?

Then maybe change return code ?

It seems strange that a validate_xmit_skb_list() failure stops the
__qdisc_run() loop but schedules another round.

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Lars Persson @ 2016-04-11 13:38 UTC (permalink / raw)
  To: Eric Dumazet, Lars Persson; +Cc: netdev, jhs, linux-kernel, xiyou.wangcong
In-Reply-To: <1460380981.6473.544.camel@edumazet-glaptop3.roam.corp.google.com>



On 04/11/2016 03:23 PM, Eric Dumazet wrote:
> On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote:
>> A failure in validate_xmit_skb_list() triggered an unconditional call
>> to dev_requeue_skb with skb=NULL. This slowly grows the queue
>> discipline's qlen count until all traffic through the queue stops.
>>
>> By introducing a NULL check in dev_requeue_skb it was also necessary
>> to make the __netif_schedule call conditional to avoid scheduling an
>> empty queue.
>>
>> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
>> Signed-off-by: Lars Persson <larper@axis.com>
>> ---
>>   net/sched/sch_generic.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index f18c350..4e6a79c 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops);
>>
>>   static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>>   {
>> -	q->gso_skb = skb;
>> -	q->qstats.requeues++;
>> -	q->q.qlen++;	/* it's still part of the queue */
>> -	__netif_schedule(q);
>> +	if (skb) {
>> +		q->gso_skb = skb;
>> +		q->qstats.requeues++;
>> +		q->q.qlen++;	/* it's still part of the queue */
>> +	}
>> +	if (qdisc_qlen(q))
>> +		__netif_schedule(q);
>>
>>   	return 0;
>>   }
>
>
> Please always CC patch author when fixing a bug.
>
> Why adding the if (qdisc_qlen(q)) extra test ?
>
> This seems unrelated to the bug fix, and probably should be part of a
> second patch targeting net-next tree.

I though it would be prudent because the queue can be non-empty even for 
the case of skb=NULL. So should it be there in this patch, another patch 
or not at all ?

>
> Also please add a likely() clause
>
> if (likely(skb)) {
>          q->gso_skb = skb;
>          q->qstats.requeues++;
>          q->q.qlen++;    /* it's still part of the queue */
>          __netif_schedule(q);
> }

Will fix.

> Thanks !
>
>
>
>
>

^ permalink raw reply

* Re: [PATCH net v2] net: sched: do not requeue a NULL skb
From: Eric Dumazet @ 2016-04-11 13:23 UTC (permalink / raw)
  To: Lars Persson; +Cc: netdev, jhs, linux-kernel, xiyou.wangcong, Lars Persson
In-Reply-To: <1460355869-13539-1-git-send-email-larper@axis.com>

On Mon, 2016-04-11 at 08:24 +0200, Lars Persson wrote:
> A failure in validate_xmit_skb_list() triggered an unconditional call
> to dev_requeue_skb with skb=NULL. This slowly grows the queue
> discipline's qlen count until all traffic through the queue stops.
> 
> By introducing a NULL check in dev_requeue_skb it was also necessary
> to make the __netif_schedule call conditional to avoid scheduling an
> empty queue.
> 
> Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
> Signed-off-by: Lars Persson <larper@axis.com>
> ---
>  net/sched/sch_generic.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index f18c350..4e6a79c 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops);
>  
>  static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>  {
> -	q->gso_skb = skb;
> -	q->qstats.requeues++;
> -	q->q.qlen++;	/* it's still part of the queue */
> -	__netif_schedule(q);
> +	if (skb) {
> +		q->gso_skb = skb;
> +		q->qstats.requeues++;
> +		q->q.qlen++;	/* it's still part of the queue */
> +	}
> +	if (qdisc_qlen(q))
> +		__netif_schedule(q);
>  
>  	return 0;
>  }


Please always CC patch author when fixing a bug.

Why adding the if (qdisc_qlen(q)) extra test ?

This seems unrelated to the bug fix, and probably should be part of a
second patch targeting net-next tree.

Also please add a likely() clause

if (likely(skb)) {
        q->gso_skb = skb;
        q->qstats.requeues++;
        q->q.qlen++;    /* it's still part of the queue */
        __netif_schedule(q);
}

Thanks !

^ permalink raw reply

* [PATCH iproute2 3/3] bridge: vlan: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vid" to bridge vlan show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to
match the add/del one - "vid". This filtering can be used also with the
"-compressvlans" option to see in which range is a vlan (if in any).
Also this will be used to show only specific per-vlan statistics later
when support is added to the kernel for it.

Examples:
$ bridge vlan show vid 450
port	vlan ids
eth1
eth2	 450

br0

$ bridge -c vlan show vid 450
port	vlan ids
eth1
eth2	 400-500

br0

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 bridge/vlan.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 51 insertions(+), 12 deletions(-)

diff --git a/bridge/vlan.c b/bridge/vlan.c
index ae588323d9b1..8e125c15f84c 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -13,13 +13,13 @@
 #include "br_common.h"
 #include "utils.h"
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
 	fprintf(stderr, "Usage: bridge vlan { add | del } vid VLAN_ID dev DEV [ pvid] [ untagged ]\n");
 	fprintf(stderr, "                                                     [ self ] [ master ]\n");
-	fprintf(stderr, "       bridge vlan { show } [ dev DEV ]\n");
+	fprintf(stderr, "       bridge vlan { show } [ dev DEV ] [ vid VLAN_ID ]\n");
 	exit(-1);
 }
 
@@ -138,6 +138,38 @@ static int vlan_modify(int cmd, int argc, char **argv)
 	return 0;
 }
 
+static void print_vid_range(FILE *f, __u16 v_start, __u16 v_end, __u16 flags)
+{
+	fprintf(f, "\t %hu", v_start);
+	if (v_start != v_end)
+		fprintf(f, "-%hu", v_end);
+	if (flags & BRIDGE_VLAN_INFO_PVID)
+		fprintf(f, " PVID");
+	if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
+		fprintf(f, " Egress Untagged");
+	fprintf(f, "\n");
+}
+
+/* In order to use this function for both filtering and non-filtering cases
+ * we need to make it a tristate:
+ * return -1 - if filtering we've gone over so don't continue
+ * return  0 - skip entry and continue (applies to range start or to entries
+ *             which are less than filter_vlan)
+ * return  1 - print the entry and continue
+ */
+static int filter_vlan_check(struct bridge_vlan_info *vinfo)
+{
+	/* if we're filtering we should stop on the first greater entry */
+	if (filter_vlan && vinfo->vid > filter_vlan &&
+	    !(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
+		return -1;
+	if ((vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) ||
+	    vinfo->vid < filter_vlan)
+		return 0;
+
+	return 1;
+}
+
 static int print_vlan(const struct sockaddr_nl *who,
 		      struct nlmsghdr *n,
 		      void *arg)
@@ -174,26 +206,28 @@ static int print_vlan(const struct sockaddr_nl *who,
 	} else {
 		struct rtattr *i, *list = tb[IFLA_AF_SPEC];
 		int rem = RTA_PAYLOAD(list);
+		__u16 last_vid_start = 0;
 
 		fprintf(fp, "%s", ll_index_to_name(ifm->ifi_index));
 		for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
 			struct bridge_vlan_info *vinfo;
+			int vcheck_ret;
 
 			if (i->rta_type != IFLA_BRIDGE_VLAN_INFO)
 				continue;
 
 			vinfo = RTA_DATA(i);
-			if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END)
-				fprintf(fp, "-%hu", vinfo->vid);
-			else
-				fprintf(fp, "\t %hu", vinfo->vid);
-			if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN)
+
+			if (!(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
+				last_vid_start = vinfo->vid;
+			vcheck_ret = filter_vlan_check(vinfo);
+			if (!vcheck_ret)
 				continue;
-			if (vinfo->flags & BRIDGE_VLAN_INFO_PVID)
-				fprintf(fp, " PVID");
-			if (vinfo->flags & BRIDGE_VLAN_INFO_UNTAGGED)
-				fprintf(fp, " Egress Untagged");
-			fprintf(fp, "\n");
+			else if (vcheck_ret == 1)
+				print_vid_range(fp, last_vid_start, vinfo->vid,
+						vinfo->flags);
+			else
+				break;
 		}
 	}
 	fprintf(fp, "\n");
@@ -211,6 +245,11 @@ static int vlan_show(int argc, char **argv)
 			if (filter_dev)
 				duparg("dev", *argv);
 			filter_dev = *argv;
+		} else if (strcmp(*argv, "vid") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vid", *argv);
+			filter_vlan = atoi(*argv);
 		}
 		argc--; argv++;
 	}
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 2/3] bridge: mdb: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vid" to bridge mdb show so the user can
request filtering by a specific vlan id. Currently the filtering is
implemented only in user-space. The argument name has been chosen to match
the add/del one - "vid".

Example:
$ bridge mdb show vid 200
dev br0 port eth2 grp 239.0.0.1 permanent vid 200

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 bridge/mdb.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 842536ec003c..6c904f8e6ae8 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -24,12 +24,12 @@
 	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct br_port_msg))))
 #endif
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
 	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp] [vid VID]\n");
-	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
+	fprintf(stderr, "       bridge mdb {show} [ dev DEV ] [ vid VID ]\n");
 	exit(-1);
 }
 
@@ -92,6 +92,8 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e,
 	const void *src;
 	int af;
 
+	if (filter_vlan && e->vid != filter_vlan)
+		return;
 	af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6;
 	src = af == AF_INET ? (const void *)&e->addr.u.ip4 :
 			      (const void *)&e->addr.u.ip6;
@@ -195,6 +197,11 @@ static int mdb_show(int argc, char **argv)
 			if (filter_dev)
 				duparg("dev", *argv);
 			filter_dev = *argv;
+		} else if (strcmp(*argv, "vid") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vid", *argv);
+			filter_vlan = atoi(*argv);
 		}
 		argc--; argv++;
 	}
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 1/3] bridge: fdb: add support to filter by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov
In-Reply-To: <1460380710-29583-1-git-send-email-nikolay@cumulusnetworks.com>

Add the optional keyword "vlan" to bridge fdb show so the user can request
filtering by a specific vlan id. Currently the filtering is implemented
only in user-space. The argument name has been chosen to match the
add/del one - "vlan".

Example:
$ bridge fdb show vlan 400
52:54:00:bf:57:16 dev eth2 vlan 400 master br0 permanent

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 bridge/fdb.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index df55e86df83f..be849f980a80 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -27,7 +27,7 @@
 #include "rt_names.h"
 #include "utils.h"
 
-static unsigned int filter_index;
+static unsigned int filter_index, filter_vlan;
 
 static void usage(void)
 {
@@ -35,7 +35,7 @@ static void usage(void)
 			"              [ self ] [ master ] [ use ] [ router ]\n"
 			"              [ local | static | dynamic ] [ dst IPADDR ] [ vlan VID ]\n"
 			"              [ port PORT] [ vni VNI ] [ via DEV ]\n");
-	fprintf(stderr, "       bridge fdb [ show [ br BRDEV ] [ brport DEV ] ]\n");
+	fprintf(stderr, "       bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan VID ] ]\n");
 	exit(-1);
 }
 
@@ -65,6 +65,7 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	struct ndmsg *r = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
 	struct rtattr *tb[NDA_MAX+1];
+	__u16 vid = 0;
 
 	if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
 		fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
@@ -88,6 +89,12 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
 		     n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
 
+	if (tb[NDA_VLAN])
+		vid = rta_getattr_u16(tb[NDA_VLAN]);
+
+	if (filter_vlan && filter_vlan != vid)
+		return 0;
+
 	if (n->nlmsg_type == RTM_DELNEIGH)
 		fprintf(fp, "Deleted ");
 
@@ -115,11 +122,8 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 				    RTA_DATA(tb[NDA_DST])));
 	}
 
-	if (tb[NDA_VLAN]) {
-		__u16 vid = rta_getattr_u16(tb[NDA_VLAN]);
-
+	if (vid)
 		fprintf(fp, "vlan %hu ", vid);
-	}
 
 	if (tb[NDA_PORT])
 		fprintf(fp, "port %d ", ntohs(rta_getattr_u16(tb[NDA_PORT])));
@@ -190,6 +194,11 @@ static int fdb_show(int argc, char **argv)
 		} else if (strcmp(*argv, "br") == 0) {
 			NEXT_ARG();
 			br = *argv;
+		} else if (strcmp(*argv, "vlan") == 0) {
+			NEXT_ARG();
+			if (filter_vlan)
+				duparg("vlan", *argv);
+			filter_vlan = atoi(*argv);
 		} else {
 			if (matches(*argv, "help") == 0)
 				usage();
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 0/3] bridge: filtering by vlan id
From: Nikolay Aleksandrov @ 2016-04-11 13:18 UTC (permalink / raw)
  To: netdev; +Cc: stephen, roopa, Nikolay Aleksandrov

Hi,
This set adds support for filtering by a vlan id when showing fdb/mdb/vlan
entries. Currently the filtering is implemented entirely in user-space, but
the plan is to add kernel support as well. The vlan show part is also needed
for the future per-vlan statistics in order to be able to show them only for
a specific vlan. I plan to update the bridge man page soon as it's missing
other options too and it seemed inconsistent to add this given that there're
potential paragraphs missing, thus I'll post a separate patch for that.

Thank you,
 Nik

Nikolay Aleksandrov (3):
  bridge: fdb: add support to filter by vlan id
  bridge: mdb: add support to filter by vlan id
  bridge: vlan: add support to filter by vlan id

 bridge/fdb.c  | 21 ++++++++++++++------
 bridge/mdb.c  | 11 +++++++++--
 bridge/vlan.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++------------
 3 files changed, 75 insertions(+), 20 deletions(-)

-- 
2.4.3

^ permalink raw reply

* Re: [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Mel Gorman @ 2016-04-11 13:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Mel Gorman, lsf, linux-mm, netdev@vger.kernel.org, Brenden Blanco,
	James Bottomley, Tom Herbert, lsf-pc, Alexei Starovoitov
In-Reply-To: <20160411142639.1c5e520b@redhat.com>

On Mon, Apr 11, 2016 at 02:26:39PM +0200, Jesper Dangaard Brouer wrote:
> > Which bottleneck dominates -- the page allocator or the DMA API when
> > setting up coherent pages?
> >
> 
> It is actually both, but mostly DMA on non-x86 archs.  The need to
> support multiple archs, then also cause a slowdown on x86, due to a
> side-effect.
> 
> On arch's like PowerPC, the DMA API is the bottleneck.  To workaround
> the cost of DMA calls, NIC driver alloc large order (compound) pages.
> (dma_map compound page, handout page-fragments for RX ring, and later
> dma_unmap when last RX page-fragments is seen).
> 

So, IMO only holding onto the DMA pages is all that is justified but not a
recycle of order-0 pages built on top of the core allocator. For DMA pages,
it would take a bit of legwork but the per-cpu allocator could be split
and converted to hold arbitrary sized pages with a constructer/destructor
to do the DMA coherency step when pages are taken from or handed back to
the core allocator. I'm not volunteering to do that unfortunately but I
estimate it'd be a few days work unless it needs to be per-CPU and NUMA
aware in which case the memory footprint will be high.

> > I'm wary of another page allocator API being introduced if it's for
> > performance reasons. In response to this thread, I spent two days on
> > a series that boosts performance of the allocator in the fast paths by
> > 11-18% to illustrate that there was low-hanging fruit for optimising. If
> > the one-LRU-per-node series was applied on top, there would be a further
> > boost to performance on the allocation side. It could be further boosted
> > if debugging checks and statistic updates were conditionally disabled by
> > the caller.
> 
> It is always great if you can optimized the page allocator.  IMHO the
> page allocator is too slow.

It's why I spent some time on it as any improvement in the allocator is
an unconditional win without requiring driver modifications.

> At least for my performance needs (67ns
> per packet, approx 201 cycles at 3GHz).  I've measured[1]
> alloc_pages(order=0) + __free_pages() to cost 277 cycles(tsc).
> 

It'd be worth retrying this with the branch

http://git.kernel.org/cgit/linux/kernel/git/mel/linux.git/log/?h=mm-vmscan-node-lru-v4r5

This is an unreleased series that contains both the page allocator
optimisations and the one-LRU-per-node series which in combination remove a
lot of code from the page allocator fast paths. I have no data on how the
combined series behaves but each series individually is known to improve
page allocator performance.

Once you have that, do a hackjob to remove the debugging checks from both the
alloc and free path and see what that leaves. They could be bypassed properly
with a __GFP_NOACCT flag used only by drivers that absolutely require pages
as quickly as possible and willing to be less safe to get that performance.

I expect then that the free path to be dominated by zone and pageblock
lookups which are much harder to remove. The zone lookup can be removed
if the caller knows exactly where the free pages need to go which is
unlikely. The pageblock lookup could be removed if it was coming from a
dedicated pool if the allocation side refills using pageblocks that are
always MIGRATE_UNMOVABLE.

> The trick described above, of allocating a higher order page and
> handing out page-fragments, also workaround this page allocator
> bottleneck (on x86).
> 

Be aware that compound order allocs like this are a double edged sword as
it'll be fast sometimes and other times require reclaim/compaction which
can stall for prolonged periods of time.

> I've measured order 3 (32KB) alloc_pages(order=3) + __free_pages() to
> cost approx 500 cycles(tsc).  That was more expensive, BUT an order=3
> page 32Kb correspond to 8 pages (32768/4096), thus 500/8 = 62.5
> cycles.  Usually a network RX-frame only need to be 2048 bytes, thus
> the "bulk" effect speed up is x16 (32768/2048), thus 31.25 cycles.
> 
> I view this as a bulking trick... maybe the page allocator can just
> give us a bulking API? ;-)
> 

It could on the alloc side relatively easily using either a variation of
rmqueue_bulk exposed at a higher level populating a linked list (link via
page->lru) or an array supplied by the caller.  It's harder to bulk free
quickly as the pages being freed are not necessarily in the same pageblock
requiring lookups in the free path.

Tricky to get right, but preferable to a whole new allocator.

> > The main reason another allocator concerns me is that those pages
> > are effectively pinned and cannot be reclaimed by the VM in low memory
> > situations. It ends up needing its own API for tuning the size and hoping
> > all the drivers get it right without causing OOM situations. It becomes
> > a slippery slope of introducing shrinkers, locking and complexity. Then
> > callers start getting concerned about NUMA locality and having to deal
> > with multiple lists to maintain performance. Ultimately, it ends up being
> > as slow as the page allocator and back to square 1 except now with more code.
> 
> The pages assigned to the RX ring queue are pinned like today.  The
> pages avail in the pool could easily be reclaimed.
> 

How easy depends on how it's structured. If it's a global per-cpu list
then it's an IPI to all CPUs which is straight-forward to implement but
slow to execute. If it's per-driver then there needs to be a locked list
of all pools and locking on each individual pool which could offset some
of the performance benefit of using the pool in the first place.

> I actually think we are better off providing a generic page pool
> interface the drivers can use.  Instead of the situation where drivers
> and subsystems invent their own, which does not cooperate in OOM
> situations.
> 

If it's offsetting DMA setup/teardown then I'd be a bit happier. If it's
yet-another-page allocator to bypass the core allocator then I'm less happy.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
From: Michael S. Tsirkin @ 2016-04-11 12:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: marius vlad, Stefan Hajnoczi, kvm, netdev, Ian Campbell,
	Claudio Imbrenda, Matt Benjamin, Greg Kurz, virtualization,
	Christoffer Dall
In-Reply-To: <20160411104548.GA12826@stefanha-x1.localdomain>

On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote:
> On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote:
> > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6).
> > > 
> > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but
> > > first I want to share the latest version of the code.  Several people are
> > > playing with vsock now so sharing the latest code should avoid duplicate work.
> > 
> > Thanks for this, I've been using it in my project and it mostly seems
> > fine.
> > 
> > One wrinkle I came across, which I'm not sure if it is by design or a
> > problem is that I can see this sequence coming from the guest (with
> > other activity in between):
> > 
> >     1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> >     2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> >     3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
> > 
> > I orignally had my backend close things down at #2, however this meant
> > that when #3 arrived it was for a non-existent socket (or, worse, an
> > active one if the ports got reused). I checked v5 of the spec
> > proposal[0] which says:
> >     If these bits are set and there are no more virtqueue buffers
> >     pending the socket is disconnected.
> > 
> > but I'm not entirely sure if this behaviour contradicts this or not
> > (the bits have both been set at #2, but not at the same time).
> > 
> > BTW, how does one tell if there are no more virtqueue buffers pending
> > or not while processing the op?
> 
> #2 is odd.  The shutdown bits are sticky so they cannot be cleared once
> set.  I would have expected just #1 and #3.  The behavior you observe
> look like a bug.
> 
> The spec text does not convey the meaning of OP_SHUTDOWN well.
> OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> for this connection.  "there are no more virtqueue buffers pending the
> socket" really means that this isn't an immediate close from the
> perspective of the application.  If the application still has unread rx
> buffers then the socket stays readable until the rx data has been fully
> read.

Yes but you also wrote:
	If these bits are set and there are no more virtqueue buffers
	pending the socket is disconnected.

how does remote know that there are no buffers pending and so it's safe
to reuse the same source/destination address now?  Maybe destination
should send RST at that point?



> > Another thing I noticed, which is really more to do with the generic
> > AF_VSOCK bits than anything to do with your patches is that there is no
> > limitations on which vsock ports a non-privileged user can bind to and
> > relatedly that there is no netns support so e.g. users in unproivileged
> > containers can bind to any vsock port and talk to the host, which might
> > be undesirable. For my use for now I just went with the big hammer
> > approach of denying access from anything other than init_net
> > namespace[1] while I consider what the right answer is.
> 
> From the vhost point of view each netns should have its own AF_VSOCK
> namespace.  This way two containers could act as "the host" (CID 2) for
> their respective guests.


I wonder how this interacts with the disconnect on migration
idea that you discussed. Specifically, socket has to stay connected

^ permalink raw reply

* pull-request: wireless-drivers-next 2016-04-11
From: Kalle Valo @ 2016-04-11 12:48 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev

Hi Dave,

here's a pull request for 4.7. More features, but nothing really
standing out. Please let me know if you have any problems.

Kalle


The following changes since commit 4da46cebbd3b4dc445195a9672c99c1353af5695:

  net/core/dev: Warn on a too-short GRO frame (2016-04-05 19:58:39 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git tags/wireless-drivers-next-for-davem-2016-04-11

for you to fetch changes up to 20ac1b325d8d526211b1276ecf9b64b7e8369f50:

  Merge ath-next from ath.git (2016-04-07 21:44:37 +0300)

----------------------------------------------------------------

wireless-drivers patches for 4.7

Major changes:

iwlwifi

* support for Link Quality measurement
* more work 9000 devices and MSIx
* continuation of the Dynamic Queue Allocation work
* make the paging less memory hungry
* 9000 new Rx path
* removal of IWLWIFI_UAPSD Kconfig option

ath10k

* implement push-pull tx model using mac80211 software queuing support
* enable scan in AP mode (NL80211_FEATURE_AP_SCAN)

wil6210

* add basic PBSS (Personal Basic Service Set) support
* add initial P2P support
* add oob_mode module parameter

----------------------------------------------------------------
Amitkumar Karwar (2):
      mwifiex: fix Tx timeout issue during suspend test
      mwifiex: advertise low priority scan feature

Anilkumar Kolli (1):
      ath10k: fix debugfs pktlog_filter write

Aviya Erenfeld (2):
      iwlwifi: mvm: add LQM vendor command and notification
      iwlwifi: add a debugfs hook for LQM

Ayala Beker (1):
      iwlwifi: mvm: update GSCAN capabilities

Bob Copeland (3):
      ath5k: fix incorrect indentation
      ath9k: fix a misleading indentation
      ath9k_htc: fix up indents with spaces

Chaya Rachel Ivgi (2):
      iwlwifi: mvm: handle async temperature notification with unlocked mutex
      iwlwifi: mvm: remove uneeded D0I3 checking

Colin Ian King (4):
      iwlwifi: pcie: remove duplicate assignment of variable isr_stats
      wl12xx: remove redundant null check on wl->scan.ssid
      brcmfmac: sdio: remove unused variable retry_limit
      mwifiex: ie_list is an array, so no need to check if NULL

Dan Carpenter (1):
      brcmfmac: uninitialized "ret" variable

David Spinadel (1):
      iwlwifi: mvm: set aux STA ID in scan config

Dedy Lansky (1):
      wil6210: p2p initial support

Emmanuel Grumbach (6):
      iwlwifi: pcie: print error value as signed int
      iwlwifi: mvm: modify the max SP to infinite
      iwlwifi: add missing mutex_destroy statements
      iwlwifi: make uapsd_disable module param a bitmap
      iwlwifi: remove IWLWIFI_UAPSD Kconfig
      iwlwifi: remove IWL_*_UCODE_API_OK

Eva Rachel Retuya (1):
      iwlwifi: dvm: use alloc_ordered_workqueue()

Ganapathi Bhat (2):
      mwifiex: add support for GTK rekey offload
      mwifiex: add support for wakeup on GTK rekey failure

Geert Uytterhoeven (1):
      mwifiex: Spelling s/minmum/minimum/, s/bandwidth/bandwith/

Geliang Tang (4):
      ipw2x00: use to_pci_dev()
      wlcore: use to_delayed_work()
      wl1251: use to_delayed_work()
      rtlwifi: use to_delayed_work()

Golan Ben-Ami (2):
      iwlwifi: mvm: support dumping UMAC internal txfifos
      iwlwifi: store fw memory segments length and addresses in run-time

Grzegorz Bajorski (1):
      ath10k: deliver mgmt frames from htt to monitor vifs only

Haim Dreyfuss (2):
      iwlwifi: 9000: update device id and FW serial number
      iwlwifi: pcie: Fix index iteration on free_irq in MSIX mode

Hamad Kadmany (1):
      wil6210: Set permanent MAC address to wiphy

Ivan Safonov (1):
      ath9k: Remove unnecessary ?: operator

Jes Sorensen (10):
      rtl8xxxu: Change name of struct tx_desc to be more decriptive
      rtl8xxxu: Rename TX descriptor bits to map them to 32/40 byte descriptors
      rtl8xxxu: Correct txdesc40 gid definition
      rtl8xxxu: TXDESC_SHORT_GI is txdesc32 only
      rtl8xxxu: 8192eu uses txdesc40
      rtl8xxxu: Update some register definitions
      rtl8xxxu: Use enums for chip version numbers
      rtl8xxxu: Identify 8192eu rev A/B parts correctly
      rtl8xxxu: Use correct H2C calls for 8192eu
      rtl8xxxu: Do not set LDOA15 / LDOV12 on 8192eu

Jia-Ju Bai (4):
      iwl4965: Fix a null pointer dereference in il_tx_queue_free and il_cmd_queue_free
      b43: Fix memory leaks in b43_bus_dev_ssb_init and b43_bus_dev_bcma_init
      rtl818x_pci: Disable pci device in error handling code
      iwl4965: Fix a memory leak in error handling code of __il4965_up

Joe Perches (1):
      rtlwifi: btcoexist: Convert BTC_PRINTK to btc_<foo>_dbg

Johannes Berg (1):
      iwlwifi: mvm: remove is_data_qos variable in TX

Joseph Salisbury (1):
      ath5k: Change led pin configuration for compaq c700 laptop

Julian Calaby (1):
      iwl4965: Fix more memory leaks in __il4965_up()

Kalle Valo (2):
      Merge tag 'iwlwifi-next-for-kalle-2016-03-30' of https://git.kernel.org/.../iwlwifi/iwlwifi-next
      Merge ath-next from ath.git

Larry Finger (11):
      rtlwifi: rtl8723be: Add antenna select module parameter
      rtlwifi: btcoexist: Implement antenna selection
      rtlwifi: Fix Smatch warnings
      rtlwifi: btcoexist: Fix Smatch warning
      rtlwifi: rtl8188ee: Fix Smatch warnings
      rtlwifi: rtl8192c-common: Fix Smatch warning
      rtlwifi: rtl8192ee: Fix Smatch warning
      rtlwifi: rtl8192se: Fix Smatch warning
      rtlwifi: rtl8723ae: Fix Smatch warning
      rtlwifi: rtl8723be: Fix Smatch warnings
      rtlwifi: rtl8821ae: Fix Smatch warnings

Liad Kaufman (7):
      iwlwifi: mvm: support bss dynamic alloc/dealloc of queues
      iwlwifi: trans: fix iwl_trans_txq_scd_cfg.sta_id sign
      iwlwifi: mvm: use bss client queue for bss station
      iwlwifi: mvm: set sta_id in SCD_QUEUE_CONFIG cmd
      iwlwifi: mvm: allocate dedicated queue for cab in dqa mode
      iwlwifi: mvm: move cmd queue to be #0 in dqa mode
      iwlwifi: mvm: fix inconsistent lock in dqa mode

Lior David (10):
      wil6210: add support for discovery mode during scan
      wil6210: switch to generated wmi.h
      wil6210: basic PBSS/PCP support
      wil6210: P2P_DEVICE virtual interface support
      wil6210: fix race conditions in p2p listen and search
      wil6210: clean ioctl debug message
      wil6210: fix no_fw_recovery mode with change_virtual_intf
      wil6210: pass is_go flag to firmware
      wil6210: add oob_mode module parameter
      wil6210: allow empty WMI commands in debugfs wmi_send

Luca Coelho (3):
      iwlwifi: pcie: refcounting is not necessary anymore
      iwlwifi: mvm: add a scan timeout for regular scans
      iwlwifi: mvm: allow setting the thermal state in D0i3

Markus Elfring (6):
      ath9k_htc: Delete unnecessary variable initialisation
      brcmfmac: Delete unnecessary variable initialisation
      iwlegacy: Return directly if allocation fails in il_eeprom_init()
      rsi: Delete unnecessary variable initialisation
      rsi: Delete unnecessary variable initialisation
      rsi: Move variable initialisation into error code

Matti Gottlieb (2):
      iwlwifi: mvm: Decrease size of the paging download buffer
      iwlwifi: mvm: make sure FW contains the right amount of paging sections

Maya Erez (3):
      wil6210: remove BACK RX and TX workers
      wil6210: AP: prevent connecting to already connected station
      wil6210: add support for platform specific notification events

Miaoqing Pan (23):
      ath9k: Update QCA953x initvals
      ath9k: Update AR9003 2.2 initvals
      ath9k: Update AR933x initvals
      ath9k: Update AR9340 initvals
      ath9k: Update AR9462 initvals
      ath9k: Update AR9485 initvals
      ath9k: Update AR955x initvals
      ath9k: Update AR9565 initvals
      ath9k: Update QCA956x initvals
      ath9k: Update AR9580 initvals
      ath9k: enable manual peak cal for all ar9300 chips
      ath9k: use AR_SREV_9003_PCOEM to identify PCOEM chips
      ath9k: set correct peak detect threshold
      ath9k: define correct GPIO numbers and bits mask
      ath9k: make GPIO API to support both of WMAC and SOC
      ath9k: free GPIO resource for SOC GPIOs
      ath9k: cleanup led_pin initial
      ath9k: Allow platform override BTCoex pin
      ath9k: add bits definition of BTCoex MODE2/3 for SOC chips
      ath9k: fix BTCoex access invalid registers for SOC chips
      ath9k: fix BTCoex configuration for SOC chips
      ath9k: fix reg dump data bus error
      ath9k: fix rng high cpu load

Michal Kazior (16):
      ath10k: refactor tx code
      ath10k: unify txpath decision
      ath10k: refactor tx pending management
      ath10k: maintain peer_id for each sta and vif
      ath10k: add fast peer_map lookup
      ath10k: add new htt message generation/parsing logic
      ath10k: implement wake_tx_queue
      ath10k: implement updating shared htt txq state
      ath10k: store txq in skb_cb
      ath10k: keep track of queue depth per txq
      ath10k: implement push-pull tx
      ath10k: fix HTT Tx CE ring size
      ath10k: change htt tx desc/qcache peer limit config
      ath10k: fix tx hang
      ath10k: fix pull-push tx threshold handling
      ath10k: fix null deref if device crashes early

Mohammed Shafi Shajakhan (2):
      ath10k: enable debugfs provision to enable Peer Stats feature
      ath10k: enable parsing per station rx duration for 10.4

Oren Givon (1):
      iwlwifi: edit the 9000 series PCI IDs

Peter Oh (2):
      ath10k: set MAC timestamp in management Rx frame
      ath10k: parse Rx MAC timestamp in mgmt frame for FW 10.4

Raja Mani (6):
      ath10k: free cached fw bin contents when get board id fails
      dt: bindings: add new dt entry for pre calibration in qcom, ath10k.txt
      ath10k: pass cal data location as an argument to ath10k_download_cal_{file|dt}
      ath10k: move cal data len to hw_params
      ath10k: incorporate qca4019 cal data download sequence
      ath10k: introduce Extended Resource Config support for 10.4

Rajkumar Manoharan (15):
      ath10k: fix firmware assert in monitor mode
      ath10k: handle channel change htt event
      ath10k: move mgmt descriptor limit handle under mgmt_tx
      ath10k: speedup htt rx descriptor processing for tx completion
      ath10k: copy tx fetch indication message
      ath10k: remove unused fw_desc processing
      ath10k: cleanup amsdu processing for rx indication
      ath10k: speedup htt rx descriptor processing for rx_ind
      ath10k: register ath10k_htt_htc_t2h_msg_handler
      ath10k: cleanup copy engine receive next completion
      ath10k: reuse copy engine 5 (htt rx) descriptors
      ath10k: combine txrx and replenish task
      ath10k: fix calibration init sequence of qca99x0
      ath10k: remove unnecessary warning for probe response drops
      ath10k: fix unconditional num_mpdus_ready subtraction

Sara Sharon (11):
      iwlwifi: pcie: clear trans reference on queue stop
      iwlwifi: pcie: fix global table size
      iwlwifi: pcie: enable interrupts explicitly on resume
      iwlwifi: pcie: do not pad QoS AMSDU
      iwlwifi: mvm: add support for new TX CMD API
      iwlwifi: pcie: write to legacy register also in MQ
      iwlwifi: remove support for fw older than -16.ucode
      iwlwifi: mvm: report checksum is done also for IPv6 packets
      iwlwifi: pcie: request one more interrupt vector
      iwlwifi: mvm: improve RSS configuration
      iwlwifi: mvm: enable TCP/UDP checksum support for 9000 family

Shengzhen Li (1):
      mwifiex: check revision id while choosing PCIe firmware

Steve deRosier (1):
      ath6kl: ignore WMI_TXE_NOTIFY_EVENTID based on fw capability flags

Vasanthakumar Thiagarajan (1):
      ath10k: advertise force AP scan feature

Vishal Thanki (1):
      rt2x00usb: Use usb anchor to manage URB

Vladimir Kondratiev (1):
      wil6210: replay attack detection

Wei-Ning Huang (1):
      mwifiex: fix NULL pointer dereference error

Xinming Hu (4):
      mwifiex: remove redundant GFP_DMA flag
      mwifiex: schedule main workqueue for transmitting bridge packets
      mwifiex: AMSDU Rx frame handling in AP mode
      mwifiex: dump pcie scratch registers

 .../bindings/net/wireless/qcom,ath10k.txt          |   23 +-
 drivers/net/wireless/ath/ath10k/ce.c               |   44 +-
 drivers/net/wireless/ath/ath10k/ce.h               |   15 +-
 drivers/net/wireless/ath/ath10k/core.c             |  156 ++-
 drivers/net/wireless/ath/ath10k/core.h             |   41 +-
 drivers/net/wireless/ath/ath10k/debug.c            |  100 +-
 drivers/net/wireless/ath/ath10k/htt.c              |    2 +-
 drivers/net/wireless/ath/ath10k/htt.h              |   55 +-
 drivers/net/wireless/ath/ath10k/htt_rx.c           |  714 +++++++----
 drivers/net/wireless/ath/ath10k/htt_tx.c           |  291 ++++-
 drivers/net/wireless/ath/ath10k/hw.h               |    6 +-
 drivers/net/wireless/ath/ath10k/mac.c              |  546 +++++++-
 drivers/net/wireless/ath/ath10k/mac.h              |    6 +
 drivers/net/wireless/ath/ath10k/pci.c              |  106 +-
 drivers/net/wireless/ath/ath10k/txrx.c             |   37 +-
 drivers/net/wireless/ath/ath10k/txrx.h             |    4 +-
 drivers/net/wireless/ath/ath10k/wmi-ops.h          |   23 +
 drivers/net/wireless/ath/ath10k/wmi.c              |  132 +-
 drivers/net/wireless/ath/ath10k/wmi.h              |   54 +
 drivers/net/wireless/ath/ath5k/led.c               |    2 +-
 drivers/net/wireless/ath/ath5k/phy.c               |    2 +-
 drivers/net/wireless/ath/ath5k/reset.c             |    4 +-
 drivers/net/wireless/ath/ath6kl/wmi.c              |    5 +
 .../net/wireless/ath/ath9k/ar9003_2p2_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar9003_calib.c      |   44 +-
 drivers/net/wireless/ath/ath9k/ar9003_eeprom.c     |   10 +-
 drivers/net/wireless/ath/ath9k/ar9003_mci.c        |   39 +-
 drivers/net/wireless/ath/ath9k/ar9003_phy.c        |   10 +-
 .../net/wireless/ath/ath9k/ar9330_1p1_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar9330_1p2_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar9340_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar9462_2p0_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar9462_2p1_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar9485_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ar953x_initvals.h   |    4 +-
 .../net/wireless/ath/ath9k/ar955x_1p0_initvals.h   |    2 +-
 .../net/wireless/ath/ath9k/ar9565_1p0_initvals.h   |    2 +-
 drivers/net/wireless/ath/ath9k/ar956x_initvals.h   |    2 +-
 .../net/wireless/ath/ath9k/ar9580_1p0_initvals.h   |    4 +-
 drivers/net/wireless/ath/ath9k/ath9k.h             |    4 -
 drivers/net/wireless/ath/ath9k/btcoex.c            |  138 +-
 drivers/net/wireless/ath/ath9k/btcoex.h            |    2 +
 drivers/net/wireless/ath/ath9k/debug.c             |   24 +-
 drivers/net/wireless/ath/ath9k/gpio.c              |   69 +-
 drivers/net/wireless/ath/ath9k/hif_usb.c           |    2 +-
 drivers/net/wireless/ath/ath9k/htc_drv_gpio.c      |    8 +-
 drivers/net/wireless/ath/ath9k/htc_drv_init.c      |   14 +-
 drivers/net/wireless/ath/ath9k/hw.c                |  267 ++--
 drivers/net/wireless/ath/ath9k/hw.h                |   11 +-
 drivers/net/wireless/ath/ath9k/init.c              |    1 -
 drivers/net/wireless/ath/ath9k/main.c              |    9 +-
 drivers/net/wireless/ath/ath9k/reg.h               |   90 +-
 drivers/net/wireless/ath/ath9k/rng.c               |   20 +-
 drivers/net/wireless/ath/wil6210/Makefile          |    1 +
 drivers/net/wireless/ath/wil6210/cfg80211.c        |  332 ++++-
 drivers/net/wireless/ath/wil6210/debugfs.c         |   59 +-
 drivers/net/wireless/ath/wil6210/interrupt.c       |    6 +-
 drivers/net/wireless/ath/wil6210/ioctl.c           |   11 +-
 drivers/net/wireless/ath/wil6210/main.c            |   81 +-
 drivers/net/wireless/ath/wil6210/netdev.c          |    7 +-
 drivers/net/wireless/ath/wil6210/p2p.c             |  253 ++++
 drivers/net/wireless/ath/wil6210/pcie_bus.c        |    1 +
 drivers/net/wireless/ath/wil6210/rx_reorder.c      |  204 +--
 drivers/net/wireless/ath/wil6210/trace.h           |   19 +-
 drivers/net/wireless/ath/wil6210/txrx.c            |   67 +-
 drivers/net/wireless/ath/wil6210/txrx.h            |   12 +-
 drivers/net/wireless/ath/wil6210/wil6210.h         |  110 +-
 drivers/net/wireless/ath/wil6210/wil_platform.h    |    8 +-
 drivers/net/wireless/ath/wil6210/wmi.c             |  134 +-
 drivers/net/wireless/ath/wil6210/wmi.h             | 1264 +++++++++----------
 drivers/net/wireless/broadcom/b43/main.c           |    6 +-
 .../wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c  |    2 +-
 .../wireless/broadcom/brcm80211/brcmfmac/sdio.c    |    5 +-
 drivers/net/wireless/intel/ipw2x00/ipw2100.c       |    2 +-
 drivers/net/wireless/intel/iwlegacy/4965-mac.c     |    3 +
 drivers/net/wireless/intel/iwlegacy/common.c       |   22 +-
 drivers/net/wireless/intel/iwlwifi/Kconfig         |   10 -
 drivers/net/wireless/intel/iwlwifi/dvm/main.c      |    2 +-
 drivers/net/wireless/intel/iwlwifi/iwl-1000.c      |   10 +-
 drivers/net/wireless/intel/iwlwifi/iwl-2000.c      |   18 +-
 drivers/net/wireless/intel/iwlwifi/iwl-5000.c      |   11 +-
 drivers/net/wireless/intel/iwlwifi/iwl-6000.c      |   20 +-
 drivers/net/wireless/intel/iwlwifi/iwl-7000.c      |   26 +-
 drivers/net/wireless/intel/iwlwifi/iwl-8000.c      |   13 +-
 drivers/net/wireless/intel/iwlwifi/iwl-9000.c      |   17 +-
 drivers/net/wireless/intel/iwlwifi/iwl-config.h    |    7 +-
 drivers/net/wireless/intel/iwlwifi/iwl-drv.c       |  100 +-
 .../net/wireless/intel/iwlwifi/iwl-fw-error-dump.h |    1 +
 drivers/net/wireless/intel/iwlwifi/iwl-fw-file.h   |   41 +-
 drivers/net/wireless/intel/iwlwifi/iwl-fw.h        |    2 +
 drivers/net/wireless/intel/iwlwifi/iwl-modparams.h |   10 +-
 drivers/net/wireless/intel/iwlwifi/iwl-prph.h      |   12 +
 drivers/net/wireless/intel/iwlwifi/iwl-trans.h     |    4 +-
 drivers/net/wireless/intel/iwlwifi/mvm/Makefile    |    2 +-
 drivers/net/wireless/intel/iwlwifi/mvm/coex.c      |   42 -
 .../net/wireless/intel/iwlwifi/mvm/coex_legacy.c   | 1315 --------------------
 drivers/net/wireless/intel/iwlwifi/mvm/constants.h |    1 -
 drivers/net/wireless/intel/iwlwifi/mvm/d3.c        |    2 +-
 .../net/wireless/intel/iwlwifi/mvm/debugfs-vif.c   |   85 ++
 drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c   |  169 +--
 drivers/net/wireless/intel/iwlwifi/mvm/fw-api-rx.h |   15 +-
 drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h |   35 +-
 drivers/net/wireless/intel/iwlwifi/mvm/fw-api.h    |  108 +-
 drivers/net/wireless/intel/iwlwifi/mvm/fw-dbg.c    |  140 ++-
 drivers/net/wireless/intel/iwlwifi/mvm/fw.c        |   54 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c  |   47 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c  |   75 +-
 drivers/net/wireless/intel/iwlwifi/mvm/mvm.h       |   47 +-
 drivers/net/wireless/intel/iwlwifi/mvm/ops.c       |   34 +-
 drivers/net/wireless/intel/iwlwifi/mvm/power.c     |    2 +-
 drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c      |    9 +-
 drivers/net/wireless/intel/iwlwifi/mvm/scan.c      |   22 +
 drivers/net/wireless/intel/iwlwifi/mvm/sf.c        |    8 +-
 drivers/net/wireless/intel/iwlwifi/mvm/sta.c       |  262 +++-
 drivers/net/wireless/intel/iwlwifi/mvm/sta.h       |   87 +-
 drivers/net/wireless/intel/iwlwifi/mvm/tt.c        |   15 -
 drivers/net/wireless/intel/iwlwifi/mvm/tx.c        |  192 ++-
 drivers/net/wireless/intel/iwlwifi/mvm/utils.c     |  161 ++-
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c      |   16 +-
 drivers/net/wireless/intel/iwlwifi/pcie/internal.h |    6 +-
 drivers/net/wireless/intel/iwlwifi/pcie/rx.c       |   12 +-
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c    |   35 +-
 drivers/net/wireless/intel/iwlwifi/pcie/tx.c       |   80 +-
 .../net/wireless/marvell/mwifiex/11n_rxreorder.c   |    5 +-
 drivers/net/wireless/marvell/mwifiex/cfg80211.c    |   29 +-
 drivers/net/wireless/marvell/mwifiex/fw.h          |   11 +
 drivers/net/wireless/marvell/mwifiex/main.c        |    8 +-
 drivers/net/wireless/marvell/mwifiex/main.h        |    2 +
 drivers/net/wireless/marvell/mwifiex/pcie.c        |   98 +-
 drivers/net/wireless/marvell/mwifiex/pcie.h        |   18 +-
 drivers/net/wireless/marvell/mwifiex/sdio.c        |    7 +-
 drivers/net/wireless/marvell/mwifiex/sta_cmd.c     |   28 +
 drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c |    2 +
 drivers/net/wireless/marvell/mwifiex/sta_event.c   |    3 +
 drivers/net/wireless/marvell/mwifiex/sta_ioctl.c   |    3 +-
 drivers/net/wireless/marvell/mwifiex/tdls.c        |    2 +-
 drivers/net/wireless/marvell/mwifiex/uap_cmd.c     |    2 +-
 drivers/net/wireless/marvell/mwifiex/uap_txrx.c    |   92 ++
 drivers/net/wireless/ralink/rt2x00/rt2x00.h        |    3 +
 drivers/net/wireless/ralink/rt2x00/rt2x00dev.c     |    3 +
 drivers/net/wireless/ralink/rt2x00/rt2x00usb.c     |   21 +-
 drivers/net/wireless/realtek/rtl818x/rtl8180/dev.c |    4 +-
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c   |  163 ++-
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h   |  130 +-
 .../net/wireless/realtek/rtl8xxxu/rtl8xxxu_regs.h  |   31 +-
 .../realtek/rtlwifi/btcoexist/halbtc8192e2ant.c    |  847 ++++++-------
 .../realtek/rtlwifi/btcoexist/halbtc8723b1ant.c    |  611 +++++----
 .../realtek/rtlwifi/btcoexist/halbtc8723b2ant.c    |  865 ++++++-------
 .../realtek/rtlwifi/btcoexist/halbtc8821a1ant.c    |  652 +++++-----
 .../realtek/rtlwifi/btcoexist/halbtc8821a2ant.c    |  851 +++++++------
 .../realtek/rtlwifi/btcoexist/halbtcoutsrc.c       |   31 +-
 .../realtek/rtlwifi/btcoexist/halbtcoutsrc.h       |   19 +-
 .../wireless/realtek/rtlwifi/btcoexist/rtl_btc.c   |    5 +-
 drivers/net/wireless/realtek/rtlwifi/pci.c         |   39 +-
 .../net/wireless/realtek/rtlwifi/rtl8188ee/dm.c    |    2 +-
 .../net/wireless/realtek/rtlwifi/rtl8188ee/phy.c   |    3 +-
 .../wireless/realtek/rtlwifi/rtl8192c/dm_common.c  |    2 +-
 .../net/wireless/realtek/rtlwifi/rtl8192ee/trx.c   |    2 +-
 .../net/wireless/realtek/rtlwifi/rtl8192se/phy.c   |    2 +-
 .../wireless/realtek/rtlwifi/rtl8723ae/hal_btc.c   |    6 +-
 .../net/wireless/realtek/rtlwifi/rtl8723be/hw.c    |    5 +
 .../net/wireless/realtek/rtlwifi/rtl8723be/phy.c   |   10 +-
 .../net/wireless/realtek/rtlwifi/rtl8723be/rf.c    |    4 +-
 .../net/wireless/realtek/rtlwifi/rtl8723be/sw.c    |    3 +
 .../net/wireless/realtek/rtlwifi/rtl8821ae/dm.c    |    6 +-
 .../net/wireless/realtek/rtlwifi/rtl8821ae/phy.c   |    6 +-
 drivers/net/wireless/realtek/rtlwifi/wifi.h        |    5 +-
 drivers/net/wireless/rsi/rsi_91x_pkt.c             |   22 +-
 drivers/net/wireless/ti/wl1251/ps.c                |    2 +-
 drivers/net/wireless/ti/wl12xx/scan.c              |    2 +-
 drivers/net/wireless/ti/wlcore/main.c              |   10 +-
 drivers/net/wireless/ti/wlcore/ps.c                |    2 +-
 drivers/net/wireless/ti/wlcore/scan.c              |    2 +-
 include/linux/ath9k_platform.h                     |    4 +
 174 files changed, 7918 insertions(+), 5930 deletions(-)
 create mode 100644 drivers/net/wireless/ath/wil6210/p2p.c
 delete mode 100644 drivers/net/wireless/intel/iwlwifi/mvm/coex_legacy.c

-- 
Kalle Valo

^ permalink raw reply

* Re: [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Jesper Dangaard Brouer @ 2016-04-11 12:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lsf, linux-mm, netdev@vger.kernel.org, Brenden Blanco,
	James Bottomley, Tom Herbert, lsf-pc, Alexei Starovoitov, brouer
In-Reply-To: <20160411085819.GE21128@suse.de>

 
On Mon, 11 Apr 2016 09:58:19 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Thu, Apr 07, 2016 at 04:17:15PM +0200, Jesper Dangaard Brouer wrote:
> > (Topic proposal for MM-summit)
> > 
> > Network Interface Cards (NIC) drivers, and increasing speeds stress
> > the page-allocator (and DMA APIs).  A number of driver specific
> > open-coded approaches exists that work-around these bottlenecks in the
> > page allocator and DMA APIs. E.g. open-coded recycle mechanisms, and
> > allocating larger pages and handing-out page "fragments".
> > 
> > I'm proposing a generic page-pool recycle facility, that can cover the
> > driver use-cases, increase performance and open up for zero-copy RX.
> >   
> 
> Which bottleneck dominates -- the page allocator or the DMA API when
> setting up coherent pages?
>

It is actually both, but mostly DMA on non-x86 archs.  The need to
support multiple archs, then also cause a slowdown on x86, due to a
side-effect.

On arch's like PowerPC, the DMA API is the bottleneck.  To workaround
the cost of DMA calls, NIC driver alloc large order (compound) pages.
(dma_map compound page, handout page-fragments for RX ring, and later
dma_unmap when last RX page-fragments is seen).

The unfortunate side-effect is that these RX page-fragments (which
contain packet data) need to be considered 'read-only', because a
dma_unmap call can be destructive.  Network packets need to be
modified (minimum time-to-live).  Thus, netstack alloc new writable
memory, copy-over IP-headers, and adjust offset pointer into RX-page.
Avoiding the dma_unmap (AFAIK) will allow to make RX-pages writable.

Idea by page-pool is to recycling pages back to the originating
device, then we can avoid the need to call dma_unmap().  And only call
dma_map() when setting up pages.


> I'm wary of another page allocator API being introduced if it's for
> performance reasons. In response to this thread, I spent two days on
> a series that boosts performance of the allocator in the fast paths by
> 11-18% to illustrate that there was low-hanging fruit for optimising. If
> the one-LRU-per-node series was applied on top, there would be a further
> boost to performance on the allocation side. It could be further boosted
> if debugging checks and statistic updates were conditionally disabled by
> the caller.

It is always great if you can optimized the page allocator.  IMHO the
page allocator is too slow.  At least for my performance needs (67ns
per packet, approx 201 cycles at 3GHz).  I've measured[1]
alloc_pages(order=0) + __free_pages() to cost 277 cycles(tsc).

The trick described above, of allocating a higher order page and
handing out page-fragments, also workaround this page allocator
bottleneck (on x86).

I've measured order 3 (32KB) alloc_pages(order=3) + __free_pages() to
cost approx 500 cycles(tsc).  That was more expensive, BUT an order=3
page 32Kb correspond to 8 pages (32768/4096), thus 500/8 = 62.5
cycles.  Usually a network RX-frame only need to be 2048 bytes, thus
the "bulk" effect speed up is x16 (32768/2048), thus 31.25 cycles.

I view this as a bulking trick... maybe the page allocator can just
give us a bulking API? ;-)


> The main reason another allocator concerns me is that those pages
> are effectively pinned and cannot be reclaimed by the VM in low memory
> situations. It ends up needing its own API for tuning the size and hoping
> all the drivers get it right without causing OOM situations. It becomes
> a slippery slope of introducing shrinkers, locking and complexity. Then
> callers start getting concerned about NUMA locality and having to deal
> with multiple lists to maintain performance. Ultimately, it ends up being
> as slow as the page allocator and back to square 1 except now with more code.

The pages assigned to the RX ring queue are pinned like today.  The
pages avail in the pool could easily be reclaimed.

I actually think we are better off providing a generic page pool
interface the drivers can use.  Instead of the situation where drivers
and subsystems invent their own, which does not cooperate in OOM
situations.

For the networking fast forwarding use-case (NOT localhost delivery),
then the page pool size would actually be limited at a fairly small
fixed size.  Packets will be hard dropped if exceeding this limit.
The idea is, you want to limit the maximum latency the system can
introduce then forwarding a packet, even in high overload situations.
There is a good argumentation in section 3.2. of Google's paper[2].
They limit the pool size to 3000 and calculate this can max introduce
300 micro-sec latency.


> If it's the DMA API that dominates then something may be required but it
> should rely on the existing page allocator to alloc/free from. It would
> also need something like drain_all_pages to force free everything in there
> in low memory situations. Remember that multiple instances private to
> drivers or tasks will require shrinker implementations and the complexity
> may get unwieldly.

I'll read up on the shrinker interface.


[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench

[2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Eric Dumazet @ 2016-04-11 12:13 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong
In-Reply-To: <570B9126.9080806@huawei.com>

On Mon, 2016-04-11 at 19:57 +0800, Yang Yingliang wrote:
> 
> On 2016/4/8 22:44, Eric Dumazet wrote:
> > On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
> >
> >> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
> >
> > Try :
> >
> > echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> >
> > And restart your flows.
> >
> cat /proc/sys/net/ipv4/tcp_rmem
> 10240 2097152 10485760

What about leaving the default values ?

$ cat /proc/sys/net/ipv4/tcp_rmem
4096	87380	6291456

> 
> echo 102400 20971520 104857600 > /proc/sys/net/ipv4/tcp_rmem
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> 
> It seems has not effect.
> 

I have no idea what you did on the sender side to allow it to send more
than 1.5 MB then.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox