Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Pablo Neira Ayuso @ 2016-09-22  9:21 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160921184827.GA15732-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>

On Wed, Sep 21, 2016 at 08:48:27PM +0200, Thomas Graf wrote:
> On 09/21/16 at 05:45pm, Pablo Neira Ayuso wrote:
> > On Tue, Sep 20, 2016 at 06:43:35PM +0200, Daniel Mack wrote:
> > > The point is that from an application's perspective, restricting the
> > > ability to bind a port and dropping packets that are being sent is a
> > > very different thing. Applications will start to behave differently if
> > > they can't bind to a port, and that's something we do not want to happen.
> > 
> > What is exactly the problem? Applications are not checking for return
> > value from bind? They should be fixed. If you want to collect
> > statistics, I see no reason why you couldn't collect them for every
> > EACCESS on each bind() call.
> 
> It's not about applications not checking the return value of bind().
> Unfortunately, many applications (or the respective libraries they use)
> retry on connect() failure but handle bind() errors as a hard failure
> and exit. Yes, it's an application or library bug but these
> applications have very specific exceptions how something fails.
> Sometimes even going from drop to RST will break applications.
> 
> Paranoia speaking: by returning errors where no error was returned
> before, undefined behaviour occurs. In Murphy speak: things break.
> 
> This is given and we can't fix it from the kernel side. Returning at
> system call level has many benefits but it's not always an option.
> 
> Adding the late hook does not prevent filtering at socket layer to
> also be added. I think we need both.

I have a hard time to buy this new specific hook, I think we should
shift focus of this debate, this is my proposal to untangle this:

You add a net/netfilter/nft_bpf.c expression that allows you to run
bpf programs from nf_tables. This expression can either run bpf
programs in a similar fashion to tc+bpf or run the bpf program that
you have attached to the cgroup.

To achieve this, I'd suggest you also add a new bpf chain type. That
new chain type would basically provide raw access to netfilter hooks
via nf_tables netlink interface.  This bpf chain would exclusively
take rules that use this new bpf expression.

I see good things on this proposal:

* This is consistent to what we offer via tc+bpf.

* It becomes easily visible to the user that a bpf program is running
  from the packet path, or any cgroup+bpf filtering is going on. Thus,
  no matter what those orchestrators do, this filtering becomes
  visible to sysadmins that are familiar with the existing command line
  tooling.

* You get access to all of the existing netfilter hooks in one go.

A side note on this: I would suggest this conversation focuses on
discussing aspects at a slightly higher level rather than counting raw
load and stores instructions...  I think this effort requires looking
at the whole forest, instead barfing at one single tree. Genericity
always comes at a slight cost, and to all those programmability fans
here, please remember we have a generic stack between hands after all.
So let's try to accomodate this new requirements in a way that makes
sense.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: fec: set mac address unconditionally
From: Uwe Kleine-König @ 2016-09-22  9:30 UTC (permalink / raw)
  To: Gavin Schenk; +Cc: fugang.duan, netdev, kernel
In-Reply-To: <20160922060836.h3rayf37jk3w75lm@pengutronix.de>

On Thu, Sep 22, 2016 at 08:08:36AM +0200, Uwe Kleine-König wrote:
> Acked-by: Uwe Kleine-König <u.kleine-koenig@eckelmann.de>
huch, cut and paste borkage: This should be

	Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

of course.

Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

* [PATCH iproute2 net-next] tc: m_vlan: Add vlan modify action
From: Shmulik Ladkani @ 2016-09-22  9:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Jamal Hadi Salim, Jiri Pirko, Shmulik Ladkani

The 'vlan modify' action allows to replace an existing 802.1q tag
according to user provided settings.
It accepts same arguments as the 'vlan push' action.

For example, this replaces vid 6 with vid 5:

 # tc filter add dev veth0 parent ffff: pref 1 protocol 802.1q \
      basic match 'meta(vlan mask 0xfff eq 6)' \
      action vlan modify id 5 continue

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 include/linux/tc_act/tc_vlan.h |  1 +
 man/man8/tc-vlan.8             | 25 ++++++++++++++++++------
 tc/m_vlan.c                    | 44 +++++++++++++++++++++++++++++++++---------
 3 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index be72b6e..bddb272 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -16,6 +16,7 @@
 
 #define TCA_VLAN_ACT_POP	1
 #define TCA_VLAN_ACT_PUSH	2
+#define TCA_VLAN_ACT_MODIFY	3
 
 struct tc_vlan {
 	tc_gen;
diff --git a/man/man8/tc-vlan.8 b/man/man8/tc-vlan.8
index 4d0c5c8..af3de1c 100644
--- a/man/man8/tc-vlan.8
+++ b/man/man8/tc-vlan.8
@@ -6,7 +6,7 @@ vlan - vlan manipulation module
 .in +8
 .ti -8
 .BR tc " ... " "action vlan" " { " pop " |"
-.IR PUSH " } [ " CONTROL " ]"
+.IR PUSH " | " MODIFY " } [ " CONTROL " ]"
 
 .ti -8
 .IR PUSH " := "
@@ -17,22 +17,30 @@ vlan - vlan manipulation module
 .BI id " VLANID"
 
 .ti -8
+.IR MODIFY " := "
+.BR modify " [ " protocol
+.IR VLANPROTO " ]"
+.BR " [ " priority
+.IR VLANPRIO " ] "
+.BI id " VLANID"
+
+.ti -8
 .IR CONTROL " := { "
 .BR reclassify " | " pipe " | " drop " | " continue " | " pass " }"
 .SH DESCRIPTION
 The
 .B vlan
 action allows to perform 802.1Q en- or decapsulation on a packet, reflected by
-the two operation modes
-.IR POP " and " PUSH .
+the operation modes
+.IR POP ", " PUSH " and " MODIFY .
 The
 .I POP
 mode is simple, as no further information is required to just drop the
 outer-most VLAN encapsulation. The
-.I PUSH
-mode on the other hand requires at least a
+.IR PUSH " and " MODIFY
+modes require at least a
 .I VLANID
-and allows to optionally choose the
+and allow to optionally choose the
 .I VLANPROTO
 to use.
 .SH OPTIONS
@@ -45,6 +53,11 @@ Encapsulation mode. Requires at least
 .B id
 option.
 .TP
+.B modify
+Replace mode. Existing 802.1Q tag is replaced. Requires at least
+.B id
+option.
+.TP
 .BI id " VLANID"
 Specify the VLAN ID to encapsulate into.
 .I VLANID
diff --git a/tc/m_vlan.c b/tc/m_vlan.c
index 05a63b4..6de1c49 100644
--- a/tc/m_vlan.c
+++ b/tc/m_vlan.c
@@ -23,6 +23,7 @@ static void explain(void)
 {
 	fprintf(stderr, "Usage: vlan pop\n");
 	fprintf(stderr, "       vlan push [ protocol VLANPROTO ] id VLANID [ priority VLANPRIO ] [CONTROL]\n");
+	fprintf(stderr, "       vlan modify [ protocol VLANPROTO ] id VLANID [ priority VLANPRIO ] [CONTROL]\n");
 	fprintf(stderr, "       VLANPROTO is one of 802.1Q or 802.1AD\n");
 	fprintf(stderr, "            with default: 802.1Q\n");
 	fprintf(stderr, "       CONTROL := reclassify | pipe | drop | continue | pass\n");
@@ -34,6 +35,21 @@ static void usage(void)
 	exit(-1);
 }
 
+static bool has_push_attribs(int action)
+{
+	return action == TCA_VLAN_ACT_PUSH || action == TCA_VLAN_ACT_MODIFY;
+}
+
+static const char *action_name(int action)
+{
+	static const char * const names[] = {
+		[TCA_VLAN_ACT_POP] = "pop",
+		[TCA_VLAN_ACT_PUSH] = "push",
+		[TCA_VLAN_ACT_MODIFY] = "modify",
+	};
+	return names[action];
+}
+
 static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 		      int tca_id, struct nlmsghdr *n)
 {
@@ -71,9 +87,17 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 				return -1;
 			}
 			action = TCA_VLAN_ACT_PUSH;
+		} else if (matches(*argv, "modify") == 0) {
+			if (action) {
+				fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+					*argv);
+				explain();
+				return -1;
+			}
+			action = TCA_VLAN_ACT_MODIFY;
 		} else if (matches(*argv, "id") == 0) {
-			if (action != TCA_VLAN_ACT_PUSH) {
-				fprintf(stderr, "\"%s\" is only valid for push\n",
+			if (!has_push_attribs(action)) {
+				fprintf(stderr, "\"%s\" is only valid for push/modify\n",
 					*argv);
 				explain();
 				return -1;
@@ -83,8 +107,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 				invarg("id is invalid", *argv);
 			id_set = 1;
 		} else if (matches(*argv, "protocol") == 0) {
-			if (action != TCA_VLAN_ACT_PUSH) {
-				fprintf(stderr, "\"%s\" is only valid for push\n",
+			if (!has_push_attribs(action)) {
+				fprintf(stderr, "\"%s\" is only valid for push/modify\n",
 					*argv);
 				explain();
 				return -1;
@@ -94,8 +118,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 				invarg("protocol is invalid", *argv);
 			proto_set = 1;
 		} else if (matches(*argv, "priority") == 0) {
-			if (action != TCA_VLAN_ACT_PUSH) {
-				fprintf(stderr, "\"%s\" is only valid for push\n",
+			if (!has_push_attribs(action)) {
+				fprintf(stderr, "\"%s\" is only valid for push/modify\n",
 					*argv);
 				explain();
 				return -1;
@@ -129,8 +153,9 @@ static int parse_vlan(struct action_util *a, int *argc_p, char ***argv_p,
 		}
 	}
 
-	if (action == TCA_VLAN_ACT_PUSH && !id_set) {
-		fprintf(stderr, "id needs to be set for push\n");
+	if (has_push_attribs(action) && !id_set) {
+		fprintf(stderr, "id needs to be set for %s\n",
+			action_name(action));
 		explain();
 		return -1;
 	}
@@ -186,7 +211,8 @@ static int print_vlan(struct action_util *au, FILE *f, struct rtattr *arg)
 		fprintf(f, " pop");
 		break;
 	case TCA_VLAN_ACT_PUSH:
-		fprintf(f, " push");
+	case TCA_VLAN_ACT_MODIFY:
+		fprintf(f, " %s", action_name(parm->v_action));
 		if (tb[TCA_VLAN_PUSH_VLAN_ID]) {
 			val = rta_getattr_u16(tb[TCA_VLAN_PUSH_VLAN_ID]);
 			fprintf(f, " id %u", val);
-- 
1.9.1

^ permalink raw reply related

* [PATCH] ipvlan: fix building without netfilter
From: Arnd Bergmann @ 2016-09-22  9:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: Arnd Bergmann, Mahesh Bandewar, David Ahern, Eric Dumazet, netdev,
	linux-kernel

The new l3s mode in ipvlan relies on netfilter interfaces, but
the ipvlan driver can be configured when CONFIG_NETFILTER is disabled,
leading to a build error:

drivers/net/ipvlan/ipvlan.h:132:22: error: 'struct nf_hook_state' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
drivers/net/ipvlan/ipvlan_main.c:14:27: error: array type has incomplete element type 'struct nf_hook_ops'
...

This adds a forward declaration for struct nf_hook_state, and hides
the newly added l3s code in an #ifdef.

Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ipvlan/ipvlan.h      | 1 +
 drivers/net/ipvlan/ipvlan_main.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 7e0732f5ea07..e239b90e42e0 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -128,6 +128,7 @@ bool ipvlan_addr_busy(struct ipvl_port *port, void *iaddr, bool is_v6);
 void ipvlan_ht_addr_del(struct ipvl_addr *addr);
 struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb,
 			      u16 proto);
+struct nf_hook_state;
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
 			     const struct nf_hook_state *state);
 #endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index f442eb366863..19e8b0b1b56f 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -9,6 +9,7 @@
 
 #include "ipvlan.h"
 
+#ifdef CONFIG_NETFILTER
 static u32 ipvl_nf_hook_refcnt = 0;
 
 static struct nf_hook_ops ipvl_nfops[] __read_mostly = {
@@ -25,6 +26,7 @@ static struct nf_hook_ops ipvl_nfops[] __read_mostly = {
 		.priority = INT_MAX,
 	},
 };
+#endif
 
 static struct l3mdev_ops ipvl_l3mdev_ops __read_mostly = {
 	.l3mdev_l3_rcv = ipvlan_l3_rcv,
@@ -37,6 +39,7 @@ static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
 
 static int ipvlan_register_nf_hook(void)
 {
+#ifdef CONFIG_NETFILTER
 	int err = 0;
 
 	if (!ipvl_nf_hook_refcnt) {
@@ -48,15 +51,20 @@ static int ipvlan_register_nf_hook(void)
 	}
 
 	return err;
+#else
+	return -EINVAL;
+#endif
 }
 
 static void ipvlan_unregister_nf_hook(void)
 {
+#ifdef CONFIG_NETFILTER
 	WARN_ON(!ipvl_nf_hook_refcnt);
 
 	ipvl_nf_hook_refcnt--;
 	if (!ipvl_nf_hook_refcnt)
 		_nf_unregister_hooks(ipvl_nfops, ARRAY_SIZE(ipvl_nfops));
+#endif
 }
 
 static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
-- 
2.9.0

^ permalink raw reply related

* Re: [patch net-next 1/6] fib: introduce FIB notification infrastructure
From: Jiri Pirko @ 2016-09-22  9:43 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, andy, f.fainelli, dsa, jhs, vivien.didelot,
	andrew, ivecera, kaber, john
In-Reply-To: <20160922051350.GA24119@splinter>

Thu, Sep 22, 2016 at 07:13:50AM CEST, idosch@idosch.org wrote:
>On Wed, Sep 21, 2016 at 01:53:09PM +0200, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> This allows to pass information about added/deleted FIB entries/rules to
>> whoever is interested. This is done in a very similar way as devinet
>> notifies address additions/removals.
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>
>[...]
>
>>  #include <linux/slab.h>
>>  #include <linux/export.h>
>>  #include <linux/vmalloc.h>
>> +#include <linux/notifier.h>
>>  #include <net/net_namespace.h>
>>  #include <net/ip.h>
>>  #include <net/protocol.h>
>> @@ -84,6 +85,44 @@
>>  #include <trace/events/fib.h>
>>  #include "fib_lookup.h"
>>  
>> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
>> +
>> +int register_fib_notifier(struct notifier_block *nb)
>> +{
>> +	return blocking_notifier_chain_register(&fib_chain, nb);
>> +}
>> +EXPORT_SYMBOL(register_fib_notifier);
>
>If we remove and insert the switch driver, then the existing FIB entries
>should be replayed when we register our notification block. Otherwise,
>all of these entries will be missing from the switch's tables. I believe
>it should be handled like register_netdevice_notifier(), where
>"registration and up events are replayed".

Agreed. This patchset does not change this behaviour as the original
switchdev solution has the same "ordering" issue. We should take care of
it as a part of yours ordering patchsets.

^ permalink raw reply

* Re: [patch net-next 3/6] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls
From: Jiri Pirko @ 2016-09-22  9:47 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa,
	nikolay, linville, andy, f.fainelli, dsa, jhs, vivien.didelot,
	andrew, ivecera, kaber, john
In-Reply-To: <20160922065102.GA13287@splinter>

Thu, Sep 22, 2016 at 08:51:02AM CEST, idosch@idosch.org wrote:
>On Wed, Sep 21, 2016 at 01:53:11PM +0200, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Until now, in order to offload a FIB entry to HW we use switchdev op.
>> However that has limits. Mainly in case we need to make the HW aware of
>> all route prefixes configured in kernel. HW needs to know those in order
>> to properly trap appropriate packets and pass the to kernel to do
>> the forwarding. Abort mechanism is now handled within the mlxsw driver.
>
>FWIW, I think it's smart to move abort into the driver instead of
>flushing all the routes from the namespace as before.
>
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>
>[...]
>
>> +static void mlxsw_sp_router_fib4_abort(struct mlxsw_sp *mlxsw_sp)
>> +{
>> +	char ralue_pl[MLXSW_REG_RALUE_LEN];
>> +	struct mlxsw_resources *resources;
>> +	struct mlxsw_sp_fib_entry *fib_entry;
>> +	struct mlxsw_sp_fib_entry *tmp;
>> +	struct mlxsw_sp_vr *vr;
>> +	int i;
>> +	int err;
>> +
>> +	resources = mlxsw_core_resources_get(mlxsw_sp->core);
>> +	for (i = 0; i < resources->max_virtual_routers; i++) {
>> +		vr = &mlxsw_sp->router.vrs[i];
>> +		if (!vr->used)
>> +			continue;
>> +
>> +		list_for_each_entry_safe(fib_entry, tmp,
>> +					 &vr->fib->entry_list, list) {
>> +			fib_info_offload_dec(fib_entry->fi);
>> +			mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
>> +			mlxsw_sp_fib_entry_remove(fib_entry->vr->fib,
>> +						  fib_entry);
>> +			mlxsw_sp_fib_entry_put_all(mlxsw_sp, fib_entry);
>
>If we now do the routing in slow path, then maybe it makes sense to also
>flush all the neighbour entries and prevent new neighbours from being
>programmed into the device?

Hmm, that makes sense. Since this patch does not change the existing
behaiour regarding this, I would like to address that in follow-up
patch.



>
>> +		}
>> +	}
>> +	mlxsw_sp->router.aborted = true;
>> +
>> +	mlxsw_reg_ralue_pack4(ralue_pl, MLXSW_SP_L3_PROTO_IPV4,
>> +			      MLXSW_REG_RALUE_OP_WRITE_WRITE, 0, 0, 0);
>
>I'm not sure about that, but the loop above removed all the tables from
>the device and now you are using table 0 again. Will this work w/o
>binding some tree to it (0?)?

You are right. Will fix.



>
>> +	mlxsw_reg_ralue_act_ip2me_pack(ralue_pl);
>> +	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
>> +	if (err)
>> +		dev_warn(mlxsw_sp->bus_info->dev, "Failed to set abort trap.\n");
>> +}
>
>Thanks

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Thomas Graf @ 2016-09-22  9:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Mack, htejun, daniel, ast, davem, kafai, fw, harald,
	netdev, sargun, cgroups
In-Reply-To: <20160922092138.GA12108@salvia>

On 09/22/16 at 11:21am, Pablo Neira Ayuso wrote:
> I have a hard time to buy this new specific hook, I think we should
> shift focus of this debate, this is my proposal to untangle this:
> 
> You add a net/netfilter/nft_bpf.c expression that allows you to run
> bpf programs from nf_tables. This expression can either run bpf
> programs in a similar fashion to tc+bpf or run the bpf program that
> you have attached to the cgroup.

So for every packet processed, you want to require the user to load
and run a (unJITed) nft program acting as a wrapper to run a JITed
BPF program? What it the benefit of this model compared to what Daniel
is proposing? The hooking point is the same. This only introduces
additional per packet overhead in the fast path. Am I missing something?

^ permalink raw reply

* Re: XDP (eXpress Data Path) documentation
From: Jesper Dangaard Brouer via iovisor-dev @ 2016-09-22 10:01 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Nathan Willis, Alexei Starovoitov,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org,
	Jonathan Corbet, linux-doc-u79uwXL29TY76Z2rM5mHXA, Saeed Mahameed
In-Reply-To: <CALx6S37W2qfceOUDnPvFM5PuGp4G3cWjgrQP5_W9Veosic61pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, 21 Sep 2016 17:03:24 -0700
Tom Herbert <tom-BjP2VixgY4xUbtYUoyoikg@public.gmane.org> wrote:

> On Tue, Sep 20, 2016 at 2:08 AM, Jesper Dangaard Brouer
> <brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Hi all,
> >
> > As promised, I've started documenting the XDP eXpress Data Path):
> >
> >  [1] https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/index.html
> >
> > IMHO the documentation have reached a stage where it is useful for the
> > XDP project, BUT I request collaboration on improving the documentation
> > from all. (Native English speakers are encouraged to send grammar fixes ;-))
> >  
> Hi Jesper,
> 
> Thanks for taking the initiative on the this, The document reads more
> like a design doc than description right now, that's probably okay
> since we could use a design doc.

Yes, I fully agree.

I want to state very clearly, this document is not an attempt to hijack
the XDP project and control the "spec".  This is an attempt to collaborate.
We discuss things on the mailing list, each with our own vision of the
project, and most times we reach an agreement. But nobody document this
agreement. 

Month later, we make implementation choices that goes against these
agreements, because we simply forgot.  If someone remembers, we have to
reiterate the same arguments again (like it just happened with the
XDP_ABORTED action, my mistake).  And can anybody remember the
consensus around VLANs, it never got implemented that way...

I had to start the doc project somewhere, so I dumped my own vision
into the docs, and what I could remember from upstream discussions.
I need collaboration from others to adjust and "fix" my vision of the
project, into something that becomes a common ground we all can agree
on.

If some part of the docs provoke you, good, then you have a reason to
correct and fix it.  I'll do my best to keep an very open-mind about
any changes.  This should be a very "live" document.  


> Under "Important to understand" there are some disclaimers that XDP
> does not implement qdiscs or BQL and fairness otherwise. This is true
> for it's own traffic, but it does not (or at least should not) affect
> these mechanisms or normal stack traffic running simultaneously. I
> think we've made assumptions about fairness between XDP and non-XDP
> queues, we probably want to clarify fairness (and also validate
> whatever assumptions we've made with testing).

I love people pointing out mistakes in the documentation.  I want
update this ASAP when people point it out.  I'll even do the work of
integrating and committing these changes, for people too lazy to git
clone the repo themselves.

For you section Tom, I fully agree, but I don't know how to formulate
and adjust this in the text.

For people that want to edit the docs, notice the link "Edit on GitHub"
which takes you directly to the file you need to edit...



> > You wouldn't believe it: But this pretty looking documentation actually
> > follows the new Kernel documentation format.  It is actually just
> > ".rst" text files stored in my github repository under kernel/Documentation [2]
> >
> >  [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/Documentation
> >
> > Thus, just git clone my repository and started editing and send me
> > patches (or github pull requests). Like:
> >
> >  $ git clone https://github.com/netoptimizer/prototype-kernel
> >  $ cd prototype-kernel/kernel/Documentation/
> >  $ make html
> >  $ firefox _build/html/index.html &
> >
> > This new documentation format combines the best of two worlds, pretty
> > online browser documentation with almost plain text files, and changes
> > being tracked via git commits [3] (and auto git hooks to generate the
> > readthedocs.org page). You got to love it! :-)
> >
> > --
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   Author of http://www.iptv-analyzer.org
> >   LinkedIn: http://www.linkedin.com/in/brouer
> >
> > [3] https://github.com/netoptimizer/prototype-kernel/commits/master  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq for control ring
From: Juergen Gross @ 2016-09-22 10:16 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xenproject.org,
	netdev@vger.kernel.orga, linux-kernel@vger.kernel.org
  Cc: Wei Liu
In-Reply-To: <37d32749052f4ff887c5637671085389@AMSPEX02CL03.citrite.net>

On 22/09/16 11:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
>> Juergen Gross
>> Sent: 22 September 2016 10:03
>> To: xen-devel@lists.xenproject.org; netdev@vger.kernel.orga; linux-
>> kernel@vger.kernel.org
>> Cc: Juergen Gross <jgross@suse.com>; Wei Liu <wei.liu2@citrix.com>
>> Subject: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq for
>> control ring
>>
>> Instead of open coding it use the threaded irq mechanism in xen-netback.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> How have you tested this change?

Only compile-tested and loaded the module. As this feature isn't being
used in linux netfront AFAIK it is not easily testable. OTOH the code
modification is rather limited and I've used the threaded irq in the
Xen scsiback driver myself, so I'm rather confident it will work.


Juergen

^ permalink raw reply

* RE: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq for control ring
From: Paul Durrant @ 2016-09-22 10:31 UTC (permalink / raw)
  To: Juergen Gross, xen-devel@lists.xenproject.org,
	netdev@vger.kernel.orga, linux-kernel@vger.kernel.org
  Cc: Wei Liu
In-Reply-To: <2f3c7f1b-6a36-8aaa-e10a-41a1d2cf37d8@suse.com>

> -----Original Message-----
> From: Juergen Gross [mailto:jgross@suse.com]
> Sent: 22 September 2016 11:17
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org;
> netdev@vger.kernel.orga <netdev@vger.kernel.org>; linux-
> kernel@vger.kernel.org
> Cc: Wei Liu <wei.liu2@citrix.com>
> Subject: Re: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq
> for control ring
> 
> On 22/09/16 11:09, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
> >> Juergen Gross
> >> Sent: 22 September 2016 10:03
> >> To: xen-devel@lists.xenproject.org; netdev@vger.kernel.orga; linux-
> >> kernel@vger.kernel.org
> >> Cc: Juergen Gross <jgross@suse.com>; Wei Liu <wei.liu2@citrix.com>
> >> Subject: [Xen-devel] [PATCH resend] xen-netback: switch to threaded
> >> irq for control ring
> >>
> >> Instead of open coding it use the threaded irq mechanism in xen-netback.
> >>
> >> Signed-off-by: Juergen Gross <jgross@suse.com>
> >
> > How have you tested this change?
> 
> Only compile-tested and loaded the module. As this feature isn't being used
> in linux netfront AFAIK it is not easily testable. OTOH the code modification is
> rather limited and I've used the threaded irq in the Xen scsiback driver
> myself, so I'm rather confident it will work.
> 

OK. How about doing the rx interrupt/task too so that it can be easily tested with a linux netfront?

  Paul

> 
> Juergen


^ permalink raw reply

* Re: [PATCH net-next 2/3] udp: implement memory accounting helpers
From: Paolo Abeni @ 2016-09-22 10:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, James Morris, Trond Myklebust,
	Alexander Duyck, Daniel Borkmann, Eric Dumazet, Tom Herbert,
	Hannes Frederic Sowa, linux-nfs
In-Reply-To: <1474500682.23058.88.camel@edumazet-glaptop3.roam.corp.google.com>

Hi Eric,

On Wed, 2016-09-21 at 16:31 -0700, Eric Dumazet wrote:
> Also does inet_diag properly give the forward_alloc to user ?
> 
> $ ss -mua
> State      Recv-Q Send-Q Local Address:Port                 Peer Addres
> s:Port
> UNCONN     51584  0          *:52460      *:*                    
> 	 skmem:(r51584,rb327680,t0,tb327680,f1664,w0,o0,bl0,d575)

Thank you very much for reviewing this! 

My bad, there is still a race which leads to temporary negative values
of fwd. I feel the fix is trivial but it needs some investigation.

> Couldn't we instead use an union of an atomic_t and int for
> sk->sk_forward_alloc ?

That was our first attempt, but we had some issue on mem scheduling; if
we use:

   if (atomic_sub_return(size, &sk->sk_forward_alloc_atomic) < 0) {
        // fwd alloc
   }

that leads to inescapable, temporary, negative value for
sk->sk_forward_alloc.

Another option would be:

again:
        fwd = atomic_read(&sk->sk_forward_alloc_atomic);
        if (fwd > size) {
		if (atomic_cmpxchg(&sk->sk_forward_alloc_atomic, fwd, fwd - size) != fwd)
			goto again;
        } else 
            // fwd alloc

which would be bad under high contention.

I think that using a single atomic to track both scheduling and reclaim
requires a similar loop to resolve reclaim contention. They should be
very rare, especially if we can avoid reclaiming on dequeue, but still
will complicate the code.

Finally, pahole shows that the number of cacheline used by an udp socket
on x86_64 does not change adding the two new atomic fields.

> All udp queues/dequeues would manipulate the atomic_t using regular
> atomic ops and use a special skb destructor (instead of sock_rfree())

I tried the special skb destructor and discarded it, but thinking again,
now I feel it can simplify the code, regardless of the used schema.

We will still need to replace the current in-kernel
skb_free_datagram_locked() for udp with something else, so a bit of
noise in the sunrpc subsystem will remain.

> Also I would not bother 'reclaiming' forward_alloc at dequeue, unless
> udp is under memory pressure.

We discarded that option because it would change significantly the
system behavior, but if there is agreement on it, I think that it would
make a really relevant (positive) difference!

> Please share your performance numbers, thanks !

Tested using pktgen with random src port, 64 bytes packet, wire-speed on
an 10G link as sender and udp_sink by Jesper
(https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c) as receiver, using l4 tuple rxhash to stress the contention, and one or more udp_sink instance with reuseport. 

nr readers	Kpps (vanilla)	Kpps (patched)
1		241		600
3		1800		2200
6		3450		3950
9		5100		5400
12		6400		6650

Thank you for the very nice suggestions!

Paolo

^ permalink raw reply

* Re: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq for control ring
From: Juergen Gross @ 2016-09-22 10:39 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xenproject.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
  Cc: Wei Liu
In-Reply-To: <78418d2acd0446e8bbbd634377b9e263@AMSPEX02CL03.citrite.net>

On 22/09/16 12:31, Paul Durrant wrote:
>> -----Original Message-----
>> From: Juergen Gross [mailto:jgross@suse.com]
>> Sent: 22 September 2016 11:17
>> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org;
>> netdev@vger.kernel.orga <netdev@vger.kernel.org>; linux-
>> kernel@vger.kernel.org
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> Subject: Re: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq
>> for control ring
>>
>> On 22/09/16 11:09, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
>>>> Juergen Gross
>>>> Sent: 22 September 2016 10:03
>>>> To: xen-devel@lists.xenproject.org; netdev@vger.kernel.orga; linux-
>>>> kernel@vger.kernel.org
>>>> Cc: Juergen Gross <jgross@suse.com>; Wei Liu <wei.liu2@citrix.com>
>>>> Subject: [Xen-devel] [PATCH resend] xen-netback: switch to threaded
>>>> irq for control ring
>>>>
>>>> Instead of open coding it use the threaded irq mechanism in xen-netback.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> How have you tested this change?
>>
>> Only compile-tested and loaded the module. As this feature isn't being used
>> in linux netfront AFAIK it is not easily testable. OTOH the code modification is
>> rather limited and I've used the threaded irq in the Xen scsiback driver
>> myself, so I'm rather confident it will work.
>>
> 
> OK. How about doing the rx interrupt/task too so that it can be easily tested with a linux netfront?

I'd like to, but this would require some more work. The rx kthread isn't
activated by an event only, but by a timer, too. This isn't easy to map
to the threaded irq framework. If, however, you are confident the timer
isn't really necessary I'd be happy to provide a patch switching the
rx task to the threaded irq, too.

And to be honest: this wouldn't verify that the control ring related
patch is really working. The mechanism itself _is_ working as it is
already in use in xen-scsiback in a very similar environment.


Juergen

^ permalink raw reply

* RE: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq for control ring
From: Paul Durrant @ 2016-09-22 10:41 UTC (permalink / raw)
  To: Juergen Gross, xen-devel@lists.xenproject.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
  Cc: Wei Liu
In-Reply-To: <a6111109-2f9f-d804-8040-bde575d2a769@suse.com>

> -----Original Message-----
> From: Juergen Gross [mailto:jgross@suse.com]
> Sent: 22 September 2016 11:39
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Wei Liu <wei.liu2@citrix.com>
> Subject: Re: [Xen-devel] [PATCH resend] xen-netback: switch to threaded irq
> for control ring
> 
> On 22/09/16 12:31, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Juergen Gross [mailto:jgross@suse.com]
> >> Sent: 22 September 2016 11:17
> >> To: Paul Durrant <Paul.Durrant@citrix.com>;
> >> xen-devel@lists.xenproject.org; netdev@vger.kernel.orga
> >> <netdev@vger.kernel.org>; linux- kernel@vger.kernel.org
> >> Cc: Wei Liu <wei.liu2@citrix.com>
> >> Subject: Re: [Xen-devel] [PATCH resend] xen-netback: switch to
> >> threaded irq for control ring
> >>
> >> On 22/09/16 11:09, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf
> >>>> Of Juergen Gross
> >>>> Sent: 22 September 2016 10:03
> >>>> To: xen-devel@lists.xenproject.org; netdev@vger.kernel.orga; linux-
> >>>> kernel@vger.kernel.org
> >>>> Cc: Juergen Gross <jgross@suse.com>; Wei Liu <wei.liu2@citrix.com>
> >>>> Subject: [Xen-devel] [PATCH resend] xen-netback: switch to threaded
> >>>> irq for control ring
> >>>>
> >>>> Instead of open coding it use the threaded irq mechanism in xen-
> netback.
> >>>>
> >>>> Signed-off-by: Juergen Gross <jgross@suse.com>
> >>>
> >>> How have you tested this change?
> >>
> >> Only compile-tested and loaded the module. As this feature isn't
> >> being used in linux netfront AFAIK it is not easily testable. OTOH
> >> the code modification is rather limited and I've used the threaded
> >> irq in the Xen scsiback driver myself, so I'm rather confident it will work.
> >>
> >
> > OK. How about doing the rx interrupt/task too so that it can be easily
> tested with a linux netfront?
> 
> I'd like to, but this would require some more work. The rx kthread isn't
> activated by an event only, but by a timer, too. This isn't easy to map to the
> threaded irq framework. If, however, you are confident the timer isn't really
> necessary I'd be happy to provide a patch switching the rx task to the
> threaded irq, too.
> 
> And to be honest: this wouldn't verify that the control ring related patch is
> really working. The mechanism itself _is_ working as it is already in use in
> xen-scsiback in a very similar environment.
> 

Ok. If you have confidence then...

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
> Juergen

^ permalink raw reply

* Re: [PATCH resend 2] xen-netback: switch to threaded irq for control ring
From: Wei Liu @ 2016-09-22 10:55 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, netdev, linux-kernel, wei.liu2
In-Reply-To: <1474535185-15734-1-git-send-email-jgross@suse.com>

On Thu, Sep 22, 2016 at 11:06:25AM +0200, Juergen Gross wrote:
> Instead of open coding it use the threaded irq mechanism in
> xen-netback.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply

* [PATCH net] net: rtnl_register in net_ns_init need rtnl_lock
From: Hannes Frederic Sowa @ 2016-09-22 11:03 UTC (permalink / raw)
  To: netdev

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/core/net_namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b629b11d..a2ace299f28355 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -758,9 +758,11 @@ static int __init net_ns_init(void)
 
 	register_pernet_subsys(&net_ns_ops);
 
+	rtnl_lock();
 	rtnl_register(PF_UNSPEC, RTM_NEWNSID, rtnl_net_newid, NULL, NULL);
 	rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, rtnl_net_dumpid,
 		      NULL);
+	rtnl_unlock();
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v2] tcp: fix wrong checksum calculation on MTU probing
From: Sergei Shtylyov @ 2016-09-22 11:12 UTC (permalink / raw)
  To: Douglas Caetano dos Santos, David Miller
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <2923a9bc-c433-7546-cd4a-9f834bd43181@taghos.com.br>

Hello.

On 9/21/2016 9:26 PM, Douglas Caetano dos Santos wrote:

> With TCP MTU probing enabled and offload TX checksumming disabled,
> tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
> into the probe's SKB had an odd length. This was caused by the direct use
> of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
> fragment being copied, if needed. When this fragment was not the last, a
> subsequent call used the previous checksum without considering this
> padding.
>
> The effect was a stale connection in one way, as even retransmissions
> wouldn't solve the problem, because the checksum was never recalculated for
> the full SKB length.
>
> Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
> ---
>  net/ipv4/tcp_output.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f53d0cc..767135e 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1968,10 +1968,12 @@ static int tcp_mtu_probe(struct sock *sk)
>  		copy = min_t(int, skb->len, probe_size - len);
>  		if (nskb->ip_summed)
>  			skb_copy_bits(skb, 0, skb_put(nskb, copy), copy);
> -		else
> -			nskb->csum = skb_copy_and_csum_bits(skb, 0,
> -							    skb_put(nskb, copy),
> -							    copy, nskb->csum);
> +		else {

    CodingStyle: now the first branch needs {} too.

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: ethernet: mediatek: add the dts property to set if TRGMII supported on GMAC0
From: Sergei Shtylyov @ 2016-09-22 11:28 UTC (permalink / raw)
  To: sean.wang, john, davem
  Cc: nbd, netdev, linux-kernel, linux-mediatek, andrew, f.fainelli,
	keyhaede, objelf
In-Reply-To: <1474511636-11644-4-git-send-email-sean.wang@mediatek.com>

Hello.

On 9/22/2016 5:33 AM, sean.wang@mediatek.com wrote:

> From: Sean Wang <sean.wang@mediatek.com>
>
> Add the dts property for the capability if TRGMII supported on GAMC0
>
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>
> ---
>  Documentation/devicetree/bindings/net/mediatek-net.txt | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
> index 6103e55..7111278 100644
> --- a/Documentation/devicetree/bindings/net/mediatek-net.txt
> +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
> @@ -31,7 +31,10 @@ Optional properties:
>  Required properties:
>  - compatible: Should be "mediatek,eth-mac"
>  - reg: The number of the MAC
> -- phy-handle: see ethernet.txt file in the same directory.
> +- phy-handle: see ethernet.txt file in the same directory and
> +	the phy-mode "trgmii" required being provided when reg

   Since you've modified the generic parser of the "phy-mode" to add your 
"trgmii", you also need to update ethernet.txt...

> +	is equal to 0 and the MAC uses fixed-link to connect
> +	with inernal switch such as MT7530.

   Internal.

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next v2 1/3] net: ethernet: mediatek: add extension of phy-mode for TRGMII
From: Sergei Shtylyov @ 2016-09-22 11:30 UTC (permalink / raw)
  To: sean.wang, john, davem
  Cc: nbd, netdev, linux-kernel, linux-mediatek, andrew, f.fainelli,
	keyhaede, objelf
In-Reply-To: <1474511636-11644-2-git-send-email-sean.wang@mediatek.com>

Hello.

On 9/22/2016 5:33 AM, sean.wang@mediatek.com wrote:

> From: Sean Wang <sean.wang@mediatek.com>
>
> adds PHY-mode "trgmii" as an extension for the operation
> mode of the PHY interface for PHY_INTERFACE_MODE_TRGMII.
> and adds a variable trgmii inside mtk_mac as the indication
> to make the difference between the MAC connected to internal
> switch or connected to external PHY by the given configuration
> on the board and then to perform the corresponding setup on
> TRGMII hardware module.
>
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 ++
>  drivers/net/ethernet/mediatek/mtk_eth_soc.h | 3 +++
>  include/linux/phy.h                         | 3 +++
>  3 files changed, 8 insertions(+)
>
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index ca6b501..827f4bd 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -244,6 +244,8 @@ static int mtk_phy_connect(struct mtk_mac *mac)
>  		return -ENODEV;
>
>  	switch (of_get_phy_mode(np)) {
> +	case PHY_INTERFACE_MODE_TRGMII:
> +		mac->trgmii = true;
>  	case PHY_INTERFACE_MODE_RGMII_TXID:
>  	case PHY_INTERFACE_MODE_RGMII_RXID:
>  	case PHY_INTERFACE_MODE_RGMII_ID:
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> index 7c5e534..e3b9525 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
> @@ -529,6 +529,8 @@ struct mtk_eth {
>   * @hw:			Backpointer to our main datastruture
>   * @hw_stats:		Packet statistics counter
>   * @phy_dev:		The attached PHY if available
> + * @trgmii		Indicate if the MAC uses TRGMII connected to internal
> +			switch
>   */
>  struct mtk_mac {
>  	int				id;
> @@ -539,6 +541,7 @@ struct mtk_mac {
>  	struct phy_device		*phy_dev;
>  	__be32				hwlro_ip[MTK_MAX_LRO_IP_CNT];
>  	int				hwlro_ip_cnt;
> +	bool				trgmii;

     I don't see where this is used.

[...]
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 2d24b28..e25f183 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -80,6 +80,7 @@ typedef enum {
>  	PHY_INTERFACE_MODE_XGMII,
>  	PHY_INTERFACE_MODE_MOCA,
>  	PHY_INTERFACE_MODE_QSGMII,
> +	PHY_INTERFACE_MODE_TRGMII,
>  	PHY_INTERFACE_MODE_MAX,
>  } phy_interface_t;
>
> @@ -123,6 +124,8 @@ static inline const char *phy_modes(phy_interface_t interface)
>  		return "moca";
>  	case PHY_INTERFACE_MODE_QSGMII:
>  		return "qsgmii";
> +	case PHY_INTERFACE_MODE_TRGMII:
> +		return "trgmii";
>  	default:
>  		return "unknown";
>  	}

    I think this should be done in a separate phylib patch.

MBR, Sergei

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Pablo Neira Ayuso @ 2016-09-22 12:05 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160922095411.GA5654-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>

On Thu, Sep 22, 2016 at 11:54:11AM +0200, Thomas Graf wrote:
> On 09/22/16 at 11:21am, Pablo Neira Ayuso wrote:
> > I have a hard time to buy this new specific hook, I think we should
> > shift focus of this debate, this is my proposal to untangle this:
> >
> > You add a net/netfilter/nft_bpf.c expression that allows you to run
> > bpf programs from nf_tables. This expression can either run bpf
> > programs in a similar fashion to tc+bpf or run the bpf program that
> > you have attached to the cgroup.
>
> So for every packet processed, you want to require the user to load
> and run a (unJITed) nft program acting as a wrapper to run a JITed
> BPF program? What it the benefit of this model compared to what Daniel
> is proposing? The hooking point is the same. This only introduces
> additional per packet overhead in the fast path. Am I missing
> something?

Have a look at net/ipv4/netfilter/nft_chain_route_ipv4.c for instance.
In your case, you have to add a new chain type:

static const struct nf_chain_type nft_chain_bpf = {
        .name           = "bpf",
        .type           = NFT_CHAIN_T_BPF,
        ...
        .hooks          = {
                [NF_INET_LOCAL_IN]      = nft_do_bpf,
                [NF_INET_LOCAL_OUT]     = nft_do_bpf,
                [NF_INET_FORWARD]       = nft_do_bpf,
                [NF_INET_PRE_ROUTING]   = nft_do_bpf,
                [NF_INET_POST_ROUTING]  = nft_do_bpf,
        },
};

nft_do_bpf() is the raw netfilter hook that you register, this hook
will just execute to iterate over the list of bpf filters and run
them.

This new chain is created on demand, so no overhead if not needed, eg.

nft add table bpf
nft add chain bpf input { type bpf hook output priority 0\; }

Then, you add a rule for each bpf program you want to run, just like
tc+bpf.

Benefits are, rewording previous email:

* You get access to all of the existing netfilter hooks in one go
  to run bpf programs. No need for specific redundant hooks. This
  provides raw access to the netfilter hook, you define the little
  code that your hook runs before you bpf run invocation. So there
  is *no need to bloat the stack with more hooks, we use what we
  have.*

* This is consistent to what we offer via tc+bpf, similar design idea.
  Users are already familiar with this approach.

* It becomes easily visible to the user that a bpf program is running
  from whenever in the packet path, so from a sysadmin perspective is
  is easy to dump the configuration via netlink interface using the
  existing tooling in case that troubleshooting is required.

^ permalink raw reply

* Re: [PATCH net-next 0/9] rxrpc: Preparation for slow-start algorithm [ver #2]
From: David Miller @ 2016-09-22 12:15 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel
In-Reply-To: <147453251987.14579.17230566511204847506.stgit@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Thu, 22 Sep 2016 09:22:00 +0100

> Tagged thusly:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
> 	rxrpc-rewrite-20160922-v2

Pulled, thanks David.

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] add support for RGMII on GMAC0 through TRGMII hardware module
From: David Miller @ 2016-09-22 12:22 UTC (permalink / raw)
  To: sean.wang
  Cc: john, nbd, netdev, linux-kernel, linux-mediatek, andrew,
	f.fainelli, keyhaede, objelf
In-Reply-To: <1474511636-11644-1-git-send-email-sean.wang@mediatek.com>

From: <sean.wang@mediatek.com>
Date: Thu, 22 Sep 2016 10:33:53 +0800

> By default, GMAC0 is connected to built-in switch called
> MT7530 through the proprietary interface called Turbo RGMII
> (TRGMII). TRGMII also supports well for RGMII as generic external
> PHY uses but requires some slight changes to the setup of TRGMII 
> and doesn't have well support on current driver.
> 
> So this patchset
> 1) provides the slight changes of the setup for RGMII can work
>    through TRGMII
> 2) adds additional setting "trgmii" as PHY_INTERFACE_MODE_TRGMII 
>    about phy-mode on device tree to make GMAC0 distinguish which
>    mode it runs
> 3) changes dynamically source clock, TX/RX delay and interface
>    mode on TRGMII for adapting various link
> 
> Changes since v1:
> - fixed the style of comment which doesn't have a space at 
>    the beginning and end of comment lines
> - add support for phy-mode "trgmii" as PHY_INTERFACE_MODE_TRGMII 
>    into linux/phy.h
> - enhance the Documentation about device tree binding for trgmii
>   which is applicable only for GMAC0 which uses fixed-link

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ethernet: mediatek: use phydev from struct net_device
From: David Miller @ 2016-09-22 12:23 UTC (permalink / raw)
  To: sean.wang
  Cc: john, nbd, netdev, linux-kernel, linux-mediatek, keyhaede, objelf
In-Reply-To: <1474533215-22273-1-git-send-email-sean.wang@mediatek.com>

From: <sean.wang@mediatek.com>
Date: Thu, 22 Sep 2016 16:33:35 +0800

> From: Sean Wang <sean.wang@mediatek.com>
> 
> reuse phydev already in struct net_device instead of creating
> another new one in private structure.
> 
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ethernet: mediatek: remove superfluous local variable for phy address
From: David Miller @ 2016-09-22 12:23 UTC (permalink / raw)
  To: sean.wang
  Cc: john, nbd, netdev, linux-kernel, linux-mediatek, keyhaede, objelf
In-Reply-To: <1474533375-22423-1-git-send-email-sean.wang@mediatek.com>

From: <sean.wang@mediatek.com>
Date: Thu, 22 Sep 2016 16:36:15 +0800

> From: Sean Wang <sean.wang@mediatek.com>
> 
> remove the unused variable for parsing PHY address
> and the related logic for sanity test which would
> be all already handled done when of_mdiobus_register
> was called
> 
> Reported-by: Nelson Chang <nelson.chang@mediatek.com>
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ethernet: mediatek: use [get|set]_link_ksettings
From: David Miller @ 2016-09-22 12:23 UTC (permalink / raw)
  To: sean.wang
  Cc: john, nbd, netdev, linux-kernel, linux-mediatek, keyhaede, objelf
In-Reply-To: <1474533723-22629-1-git-send-email-sean.wang@mediatek.com>

From: <sean.wang@mediatek.com>
Date: Thu, 22 Sep 2016 16:42:03 +0800

> From: Sean Wang <sean.wang@mediatek.com>
> 
> 1) use new api [get|set]_link_ksettings instead
> of [get|set]_settings old ones.
> 
> 2) dev->phydev is sure being ready before calling
> these callbacks, so removing all the sanity check
> if it is existing.
> 
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ethernet: mediatek: get out of potential invalid pointer access
From: David Miller @ 2016-09-22 12:23 UTC (permalink / raw)
  To: sean.wang
  Cc: john, nbd, netdev, linux-kernel, linux-mediatek, keyhaede, objelf
In-Reply-To: <1474533856-22753-1-git-send-email-sean.wang@mediatek.com>

From: <sean.wang@mediatek.com>
Date: Thu, 22 Sep 2016 16:44:16 +0800

> From: Sean Wang <sean.wang@mediatek.com>
> 
> Potential dangerous invalid pointer might be accessed if
> the error happens when couple phy_device to net_device so
> cleanup the error path.
> 
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox