Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 net-next 16/16] tcp_bbr: add BBR congestion control
From: Rick Jones @ 2016-09-19 21:17 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger
  Cc: Neal Cardwell, David Miller, netdev, Van Jacobson, Yuchung Cheng,
	Nandita Dukkipati, Soheil Hassas Yeganeh
In-Reply-To: <CANn89i+MpNPYn=ewi_LioNctNePCuity_UXw5ieU7HFfoZTsGA@mail.gmail.com>

On 09/19/2016 02:10 PM, Eric Dumazet wrote:
> On Mon, Sep 19, 2016 at 1:57 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>
>> Looks good, but could I suggest a simple optimization.
>> All these parameters are immutable in the version of BBR you are submitting.
>> Why not make the values const? And eliminate the always true long-term bw estimate
>> variable?
>>
>
> We could do that.
>
> We used to have variables (aka module params) while BBR was cooking in
> our kernels ;)

Are there better than epsilon odds of someone perhaps wanting to poke 
those values as it gets exposure beyond Google?

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [v3 PATCH 1/2] rhashtable: Add rhlist interface
From: Thomas Graf @ 2016-09-19 21:16 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Johannes Berg, David S. Miller, netdev, linux-wireless, tom,
	Ben Greear
In-Reply-To: <E1blwIz-0003Mz-HD@gondolin.me.apana.org.au>

On 09/19/16 at 07:00pm, Herbert Xu wrote:
> The insecure_elasticity setting is an ugly wart brought out by
> users who need to insert duplicate objects (that is, distinct
> objects with identical keys) into the same table.
> 
> In fact, those users have a much bigger problem.  Once those
> duplicate objects are inserted, they don't have an interface to
> find them (unless you count the walker interface which walks
> over the entire table).
> 
> Some users have resorted to doing a manual walk over the hash
> table which is of course broken because they don't handle the
> potential existence of multiple hash tables.  The result is that
> they will break sporadically when they encounter a hash table
> resize/rehash.
> 
> This patch provides a way out for those users, at the expense
> of an extra pointer per object.  Essentially each object is now
> a list of objects carrying the same key.  The hash table will
> only see the lists so nothing changes as far as rhashtable is
> concerned.
> 
> To use this new interface, you need to insert a struct rhlist_head
> into your objects instead of struct rhash_head.  While the hash
> table is unchanged, for type-safety you'll need to use struct
> rhltable instead of struct rhashtable.  All the existing interfaces
> have been duplicated for rhlist, including the hash table walker.
> 
> One missing feature is nulls marking because AFAIK the only potential
> user of it does not need duplicate objects.  Should anyone need
> this it shouldn't be too hard to add.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Nice, I like how this simplifies users! Is this suitable for
ILA as well?

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: [PATCH v3 net-next 16/16] tcp_bbr: add BBR congestion control
From: Eric Dumazet @ 2016-09-19 21:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Neal Cardwell, David Miller, netdev, Van Jacobson, Yuchung Cheng,
	Nandita Dukkipati, Soheil Hassas Yeganeh
In-Reply-To: <20160919135734.04ab5172@xeon-e3>

On Mon, Sep 19, 2016 at 1:57 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:

> Looks good, but could I suggest a simple optimization.
> All these parameters are immutable in the version of BBR you are submitting.
> Why not make the values const? And eliminate the always true long-term bw estimate
> variable?
>

We could do that.

We used to have variables (aka module params) while BBR was cooking in
our kernels ;)

Are you sure generated code is indeed 'optimized' ?

^ permalink raw reply

* Re: [PATCHv6 net-next 04/15] bpf: don't (ab)use instructions to store state
From: Daniel Borkmann @ 2016-09-19 21:03 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, kubakici
In-Reply-To: <1474211365-20088-5-git-send-email-jakub.kicinski@netronome.com>

Hi Jakub,

On 09/18/2016 05:09 PM, Jakub Kicinski wrote:
> Storing state in reserved fields of instructions makes
> it impossible to run verifier on programs already
> marked as read-only. Allocate and use an array of
> per-instruction state instead.
>
> While touching the error path rename and move existing
> jump target.
>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>

I believe there's still an issue here. Could you please double check
and confirm?

I rebased my locally pending stuff on top of your set and suddenly my
test case breaks. So I did a bisect and it pointed me to this commit
eventually.

[...]
> @@ -2697,11 +2706,8 @@ static int convert_ctx_accesses(struct verifier_env *env)
>   		else
>   			continue;
>
> -		if (insn->imm != PTR_TO_CTX) {
> -			/* clear internal mark */
> -			insn->imm = 0;
> +		if (env->insn_aux_data[i].ptr_type != PTR_TO_CTX)
>   			continue;
> -		}
>
>   		cnt = env->prog->aux->ops->
>   			convert_ctx_access(type, insn->dst_reg, insn->src_reg,

Looking at the code, I believe the issue is in above snippet. In the
convert_ctx_accesses() rewrite loop, each time we bpf_patch_insn_single()
a program, the program can grow in size (due to __sk_buff access rewrite,
for example). After rewrite, we do 'i += insn_delta' for adjustment to
process next insn.

However, env->insn_aux_data is alloced under the assumption that the
very initial, pre-verification prog->len doesn't change, right? So in
the above conversion access to env->insn_aux_data[i].ptr_type is off,
since after rewrites, corresponding mappings to ptr_type might not be
related anymore.

I noticed this with direct packet access where suddenly the data vs
data_end test failed and contained some "semi-random" value always
bailing out for me.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v3 net-next 16/16] tcp_bbr: add BBR congestion control
From: Stephen Hemminger @ 2016-09-19 20:57 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, netdev, Van Jacobson, Yuchung Cheng,
	Nandita Dukkipati, Eric Dumazet, Soheil Hassas Yeganeh
In-Reply-To: <1474236233-28511-17-git-send-email-ncardwell@google.com>

On Sun, 18 Sep 2016 18:03:53 -0400
Neal Cardwell <ncardwell@google.com> wrote:

> +static int bbr_bw_rtts	= CYCLE_LEN + 2; /* win len of bw filter (in rounds) */
> +static u32 bbr_min_rtt_win_sec = 10;	 /* min RTT filter window (in sec) */
> +static u32 bbr_probe_rtt_mode_ms = 200;	 /* min ms at cwnd=4 in BBR_PROBE_RTT */
> +static int bbr_min_tso_rate	= 1200000;  /* skip TSO below here (bits/sec) */
> +
> +/* We use a high_gain value chosen to allow a smoothly increasing pacing rate
> + * that will double each RTT and send the same number of packets per RTT that
> + * an un-paced, slow-starting Reno or CUBIC flow would.
> + */
> +static int bbr_high_gain  = BBR_UNIT * 2885 / 1000 + 1;	/* 2/ln(2) */
> +static int bbr_drain_gain = BBR_UNIT * 1000 / 2885;	/* 1/high_gain */
> +static int bbr_cwnd_gain  = BBR_UNIT * 2;	/* gain for steady-state cwnd */
> +/* The pacing_gain values for the PROBE_BW gain cycle: */
> +static int bbr_pacing_gain[] = { BBR_UNIT * 5 / 4, BBR_UNIT * 3 / 4,
> +				 BBR_UNIT, BBR_UNIT, BBR_UNIT,
> +				 BBR_UNIT, BBR_UNIT, BBR_UNIT };
> +static u32 bbr_cycle_rand = 7;  /* randomize gain cycling phase over N phases */
> +
> +/* Try to keep at least this many packets in flight, if things go smoothly. For
> + * smooth functioning, a sliding window protocol ACKing every other packet
> + * needs at least 4 packets in flight.
> + */
> +static u32 bbr_cwnd_min_target	= 4;
> +
> +/* To estimate if BBR_STARTUP mode (i.e. high_gain) has filled pipe. */
> +static u32 bbr_full_bw_thresh = BBR_UNIT * 5 / 4;  /* bw up 1.25x per round? */
> +static u32 bbr_full_bw_cnt    = 3;    /* N rounds w/o bw growth -> pipe full */
> +
> +/* "long-term" ("LT") bandwidth estimator parameters: */
> +static bool bbr_lt_bw_estimator = true;	/* use the long-term bw estimate? */
> +static u32 bbr_lt_intvl_min_rtts = 4;	/* min rounds in sampling interval */
> +static u32 bbr_lt_loss_thresh = 50;	/*  lost/delivered > 20% -> "lossy" */
> +static u32 bbr_lt_conv_thresh = BBR_UNIT / 8;  /* bw diff <= 12.5% -> "close" */
> +static u32 bbr_lt_bw_max_rtts	= 48;	/* max # of round trips using lt_bw */
> +

Looks good, but could I suggest a simple optimization.
All these parameters are immutable in the version of BBR you are submitting.
Why not make the values const? And eliminate the always true long-term bw estimate
variable?

diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 76baa8a..9c270a2 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -90,40 +90,41 @@ struct bbr {
 
 #define CYCLE_LEN	8	/* number of phases in a pacing gain cycle */
 
-static int bbr_bw_rtts	= CYCLE_LEN + 2; /* win len of bw filter (in rounds) */
-static u32 bbr_min_rtt_win_sec = 10;	 /* min RTT filter window (in sec) */
-static u32 bbr_probe_rtt_mode_ms = 200;	 /* min ms at cwnd=4 in BBR_PROBE_RTT */
-static int bbr_min_tso_rate	= 1200000;  /* skip TSO below here (bits/sec) */
+static const int bbr_bw_rtts = CYCLE_LEN + 2;  /* win len of bw filter (in rounds) */
+static const u32 bbr_min_rtt_win_sec = 10;     /* min RTT filter window (in sec) */
+static const u32 bbr_probe_rtt_mode_ms = 200;  /* min ms at cwnd=4 in BBR_PROBE_RTT */
+static const int bbr_min_tso_rate  = 1200000;  /* skip TSO below here (bits/sec) */
 
 /* We use a high_gain value chosen to allow a smoothly increasing pacing rate
  * that will double each RTT and send the same number of packets per RTT that
  * an un-paced, slow-starting Reno or CUBIC flow would.
  */
-static int bbr_high_gain  = BBR_UNIT * 2885 / 1000 + 1;	/* 2/ln(2) */
-static int bbr_drain_gain = BBR_UNIT * 1000 / 2885;	/* 1/high_gain */
-static int bbr_cwnd_gain  = BBR_UNIT * 2;	/* gain for steady-state cwnd */
+static const int bbr_high_gain  = BBR_UNIT * 2885 / 1000 + 1;	/* 2/ln(2) */
+static const int bbr_drain_gain = BBR_UNIT * 1000 / 2885;	/* 1/high_gain */
+static const int bbr_cwnd_gain  = BBR_UNIT * 2;	/* gain for steady-state cwnd */
 /* The pacing_gain values for the PROBE_BW gain cycle: */
-static int bbr_pacing_gain[] = { BBR_UNIT * 5 / 4, BBR_UNIT * 3 / 4,
-				 BBR_UNIT, BBR_UNIT, BBR_UNIT,
-				 BBR_UNIT, BBR_UNIT, BBR_UNIT };
-static u32 bbr_cycle_rand = 7;  /* randomize gain cycling phase over N phases */
+static const int bbr_pacing_gain[] = {
+  BBR_UNIT * 5 / 4, BBR_UNIT * 3 / 4,
+  BBR_UNIT, BBR_UNIT, BBR_UNIT,
+  BBR_UNIT, BBR_UNIT, BBR_UNIT
+};
+static const u32 bbr_cycle_rand = 7;  /* randomize gain cycling phase over N phases */
 
 /* Try to keep at least this many packets in flight, if things go smoothly. For
  * smooth functioning, a sliding window protocol ACKing every other packet
  * needs at least 4 packets in flight.
  */
-static u32 bbr_cwnd_min_target	= 4;
+static const u32 bbr_cwnd_min_target	= 4;
 
 /* To estimate if BBR_STARTUP mode (i.e. high_gain) has filled pipe. */
-static u32 bbr_full_bw_thresh = BBR_UNIT * 5 / 4;  /* bw up 1.25x per round? */
-static u32 bbr_full_bw_cnt    = 3;    /* N rounds w/o bw growth -> pipe full */
+static const u32 bbr_full_bw_thresh = BBR_UNIT * 5 / 4;  /* bw up 1.25x per round? */
+static const u32 bbr_full_bw_cnt    = 3;    /* N rounds w/o bw growth -> pipe full */
 
 /* "long-term" ("LT") bandwidth estimator parameters: */
-static bool bbr_lt_bw_estimator = true;	/* use the long-term bw estimate? */
-static u32 bbr_lt_intvl_min_rtts = 4;	/* min rounds in sampling interval */
-static u32 bbr_lt_loss_thresh = 50;	/*  lost/delivered > 20% -> "lossy" */
-static u32 bbr_lt_conv_thresh = BBR_UNIT / 8;  /* bw diff <= 12.5% -> "close" */
-static u32 bbr_lt_bw_max_rtts	= 48;	/* max # of round trips using lt_bw */
+static const u32 bbr_lt_intvl_min_rtts = 4;	/* min rounds in sampling interval */
+static const u32 bbr_lt_loss_thresh = 50;	/*  lost/delivered > 20% -> "lossy" */
+static const u32 bbr_lt_conv_thresh = BBR_UNIT / 8;  /* bw diff <= 12.5% -> "close" */
+static const u32 bbr_lt_bw_max_rtts	= 48;	/* max # of round trips using lt_bw */
 
 /* Do we estimate that STARTUP filled the pipe? */
 static bool bbr_full_bw_reached(const struct sock *sk)
@@ -470,8 +471,7 @@ static void bbr_lt_bw_interval_done(struct sock *sk, u32 bw)
 	struct bbr *bbr = inet_csk_ca(sk);
 	u32 diff;
 
-	if (bbr->lt_bw &&  /* do we have bw from a previous interval? */
-	    bbr_lt_bw_estimator) {  /* using long-term bw estimator enabled? */
+	if (bbr->lt_bw) { /* do we have bw from a previous interval? */
 		/* Is new bw close to the lt_bw from the previous interval? */
 		diff = abs(bw - bbr->lt_bw);
 		if ((diff * BBR_UNIT <= bbr_lt_conv_thresh * bbr->lt_bw) ||

^ permalink raw reply related

* [PATCH next] ipvlan: Fix dependency issue
From: Mahesh Bandewar @ 2016-09-19 20:56 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, David Miller, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

kbuild-build-bot reported that if NETFILTER is not selected, the
build fails pointing to netfilter symbols.

Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8768a625350d..95c32f2d7601 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,6 +149,7 @@ config IPVLAN
     tristate "IP-VLAN support"
     depends on INET
     depends on IPV6
+    depends on NETFILTER
     depends on NET_L3_MASTER_DEV
     ---help---
       This allows one to create virtual devices off of a main interface
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Daniel Mack @ 2016-09-19 20:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160919203533.GA888@salvia>

On 09/19/2016 10:35 PM, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 09:30:02PM +0200, Daniel Mack wrote:
>> On 09/19/2016 09:19 PM, Pablo Neira Ayuso wrote:

>>> Actually, did you look at Google's approach to this problem?  They
>>> want to control this at socket level, so you restrict what the process
>>> can actually bind. That is enforcing the policy way before you even
>>> send packets. On top of that, what they submitted is infrastructured
>>> so any process with CAP_NET_ADMIN can access that policy that is being
>>> applied and fetch a readable policy through kernel interface.
>>
>> Yes, I've seen what they propose, but I want this approach to support
>> accounting, and so the code has to look at each and every packet in
>> order to count bytes and packets. Do you know of any better place to put
>> the hook then?
> 
> Accounting is part of the usecase that fits into the "network
> introspection" idea that has been mentioned here, so you can achieve
> this by adding a hook that returns no verdict, so this becomes similar
> to the tracing infrastructure.

Why would we artificially limit the use-cases of this implementation if
the way it stands, both filtering and introspection are possible?

> Filtering packets with cgroups is braindead.

Filtering is done via eBPF, and cgroups are just the containers. I don't
see what's brain-dead in that approach. After all, accessing the cgroup
once we have a local socket is really fast, so the idea is kinda obvious.

> You have the means to ensure that processes send no packets via
> restricting port binding, there is no reason to do this any later for
> locally generated traffic.

Yes, restricting port binding can be done on top, if people are worried
about the performance overhead of a per-packet program.



Thanks,
Daniel

^ permalink raw reply

* [PATCH net-next 2/2] openvswitch: avoid resetting flow key while installing new flow.
From: Pravin B Shelar @ 2016-09-19 20:51 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar
In-Reply-To: <1474318260-16988-1-git-send-email-pshelar@ovn.org>

since commit commit db74a3335e0f6 ("openvswitch: use percpu
flow stats") flow alloc resets flow-key. So there is no need
to reset the flow-key again if OVS is using newly allocated
flow-key.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
---
 net/openvswitch/datapath.c     | 8 ++++----
 net/openvswitch/flow.c         | 2 --
 net/openvswitch/flow_netlink.c | 6 ++++--
 net/openvswitch/flow_netlink.h | 3 ++-
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 474e7a6..4d67ea8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -955,7 +955,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	/* Extract key. */
-	ovs_match_init(&match, &new_flow->key, &mask);
+	ovs_match_init(&match, &new_flow->key, false, &mask);
 	error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
 				  a[OVS_FLOW_ATTR_MASK], log);
 	if (error)
@@ -1124,7 +1124,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 
 	ufid_present = ovs_nla_get_ufid(&sfid, a[OVS_FLOW_ATTR_UFID], log);
 	if (a[OVS_FLOW_ATTR_KEY]) {
-		ovs_match_init(&match, &key, &mask);
+		ovs_match_init(&match, &key, true, &mask);
 		error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
 					  a[OVS_FLOW_ATTR_MASK], log);
 	} else if (!ufid_present) {
@@ -1241,7 +1241,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 
 	ufid_present = ovs_nla_get_ufid(&ufid, a[OVS_FLOW_ATTR_UFID], log);
 	if (a[OVS_FLOW_ATTR_KEY]) {
-		ovs_match_init(&match, &key, NULL);
+		ovs_match_init(&match, &key, true, NULL);
 		err = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY], NULL,
 					log);
 	} else if (!ufid_present) {
@@ -1300,7 +1300,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 
 	ufid_present = ovs_nla_get_ufid(&ufid, a[OVS_FLOW_ATTR_UFID], log);
 	if (a[OVS_FLOW_ATTR_KEY]) {
-		ovs_match_init(&match, &key, NULL);
+		ovs_match_init(&match, &key, true, NULL);
 		err = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
 					NULL, log);
 		if (unlikely(err))
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 0fa45439..634cc10 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -767,8 +767,6 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 {
 	int err;
 
-	memset(key, 0, OVS_SW_FLOW_KEY_METADATA_SIZE);
-
 	/* Extract metadata from netlink attributes. */
 	err = ovs_nla_get_flow_metadata(net, attr, key, log);
 	if (err)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 8efa718..ae25ded 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1996,13 +1996,15 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 
 void ovs_match_init(struct sw_flow_match *match,
 		    struct sw_flow_key *key,
+		    bool reset_key,
 		    struct sw_flow_mask *mask)
 {
 	memset(match, 0, sizeof(*match));
 	match->key = key;
 	match->mask = mask;
 
-	memset(key, 0, sizeof(*key));
+	if (reset_key)
+		memset(key, 0, sizeof(*key));
 
 	if (mask) {
 		memset(&mask->key, 0, sizeof(mask->key));
@@ -2049,7 +2051,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	struct nlattr *a;
 	int err = 0, start, opts_type;
 
-	ovs_match_init(&match, &key, NULL);
+	ovs_match_init(&match, &key, true, NULL);
 	opts_type = ip_tun_from_nlattr(nla_data(attr), &match, false, log);
 	if (opts_type < 0)
 		return opts_type;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 47dd142..45f9769 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -41,7 +41,8 @@ size_t ovs_tun_key_attr_size(void);
 size_t ovs_key_attr_size(void);
 
 void ovs_match_init(struct sw_flow_match *match,
-		    struct sw_flow_key *key, struct sw_flow_mask *mask);
+		    struct sw_flow_key *key, bool reset_key,
+		    struct sw_flow_mask *mask);
 
 int ovs_nla_put_key(const struct sw_flow_key *, const struct sw_flow_key *,
 		    int attr, bool is_mask, struct sk_buff *);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 1/2] openvswitch: Fix Frame-size larger than 1024 bytes warning.
From: Pravin B Shelar @ 2016-09-19 20:50 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

There is no need to declare separate key on stack,
we can just use sw_flow->key to store the key directly.

This commit fixes following warning:

net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’:
net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes
is larger than 1024 bytes [-Wframe-larger-than=]

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
---
 net/openvswitch/datapath.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0536ab3..474e7a6 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -928,7 +928,6 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow_mask mask;
 	struct sk_buff *reply;
 	struct datapath *dp;
-	struct sw_flow_key key;
 	struct sw_flow_actions *acts;
 	struct sw_flow_match match;
 	u32 ufid_flags = ovs_nla_get_ufid_flags(a[OVS_FLOW_ATTR_UFID_FLAGS]);
@@ -956,20 +955,24 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	/* Extract key. */
-	ovs_match_init(&match, &key, &mask);
+	ovs_match_init(&match, &new_flow->key, &mask);
 	error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
 				  a[OVS_FLOW_ATTR_MASK], log);
 	if (error)
 		goto err_kfree_flow;
 
-	ovs_flow_mask_key(&new_flow->key, &key, true, &mask);
-
 	/* Extract flow identifier. */
 	error = ovs_nla_get_identifier(&new_flow->id, a[OVS_FLOW_ATTR_UFID],
-				       &key, log);
+				       &new_flow->key, log);
 	if (error)
 		goto err_kfree_flow;
 
+	/* unmasked key is needed to match when ufid is not used. */
+	if (ovs_identifier_is_key(&new_flow->id))
+		match.key = new_flow->id.unmasked_key;
+
+	ovs_flow_mask_key(&new_flow->key, &new_flow->key, true, &mask);
+
 	/* Validate actions. */
 	error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS],
 				     &new_flow->key, &acts, log);
@@ -996,7 +999,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	if (ovs_identifier_is_ufid(&new_flow->id))
 		flow = ovs_flow_tbl_lookup_ufid(&dp->table, &new_flow->id);
 	if (!flow)
-		flow = ovs_flow_tbl_lookup(&dp->table, &key);
+		flow = ovs_flow_tbl_lookup(&dp->table, &new_flow->key);
 	if (likely(!flow)) {
 		rcu_assign_pointer(new_flow->sf_acts, acts);
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] net: skbuff: Fix length validation in skb_vlan_pop()
From: pravin shelar @ 2016-09-19 20:46 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Jiri Pirko, David S . Miller, Linux Kernel Network Developers,
	Daniel Borkmann, Jamal Hadi Salim
In-Reply-To: <20160919230454.08e58116@halley>

On Mon, Sep 19, 2016 at 1:04 PM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> Hi Pravin,
>
> On Sun, 18 Sep 2016 13:26:30 -0700 pravin shelar <pshelar@ovn.org> wrote:
>> > +++ b/net/core/skbuff.c
>> > @@ -4537,7 +4537,7 @@ int skb_vlan_pop(struct sk_buff *skb)
>> >         } else {
>> >                 if (unlikely((skb->protocol != htons(ETH_P_8021Q) &&
>> >                               skb->protocol != htons(ETH_P_8021AD)) ||
>> > -                            skb->len < VLAN_ETH_HLEN))
>> > +                            skb->mac_len < VLAN_ETH_HLEN))
>>
>> There is already check in __skb_vlan_pop() to validate skb for a vlan
>> header. So it is safe to drop this check entirely.
>
> Yep, I submitted a v2 with your suggestion, however I withdrew it, as
> there is a slight behavior difference noticable by 'skb_vlan_pop' callers.
>
> Suppose the rare case where skb->len is too small.
>
> pre:
>   skb_vlan_pop returns 0 (at least for the correct tx path).
>   Meaning, callers do not see it as a failure.
> post:
>   skb_ensure_writable fails (!pskb_may_pull), therefore -ENOMEM returned
>   to the callers of 'skb_vlan_pop'.
>
> For ovs, it means do_execute_actions's loop is terminated, no further
> actions are executed, and skb gets freed.
>
> For tc act vlan, it means skb gets dropped.
>
> This actually makes sense, but do we want to present this change?
>
I think this is correct behavior over existing code. And under memory
pressure chances of packet drop are higher even without the change
anyways.

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Pablo Neira Ayuso @ 2016-09-19 20:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Mack, htejun, daniel, ast, davem, kafai, fw, harald,
	netdev, sargun, cgroups
In-Reply-To: <20160919201322.GA84770@ast-mbp.thefacebook.com>

On Mon, Sep 19, 2016 at 01:13:27PM -0700, Alexei Starovoitov wrote:
> On Mon, Sep 19, 2016 at 09:19:10PM +0200, Pablo Neira Ayuso wrote:
[...]
> > 2) This will turn the stack into a nightmare to debug I predict. If
> >    any process with CAP_NET_ADMIN can potentially attach bpf blobs
> >    via these hooks, we will have to include in the network stack
> 
> a process without CAP_NET_ADMIN can attach bpf blobs to
> system calls via seccomp. bpf is already used for security and policing.

That is a local mechanism, it applies to parent process and child
processes, just like SO_ATTACH_FILTER.

The usecase that we're discussing here enforces a global policy.

^ permalink raw reply

* Re: [RFC V3 PATCH 00/26] Kernel NET policy
From: Stephen Hemminger @ 2016-09-19 20:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kan.liang, davem, linux-kernel, netdev, jeffrey.t.kirsher, mingo,
	peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook, viro,
	gorcunov, john.stultz, aduyck, ben, decot, fw, alexander.duyck,
	daniel, tom, rdunlap, xiyou.wangcong, hannes, alexei.starovoitov,
	jesse.brandeburg, andi
In-Reply-To: <1473695534.18970.75.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 12 Sep 2016 08:52:14 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Mon, 2016-09-12 at 07:55 -0700, kan.liang@intel.com wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >   
> 
> > 
> >  Documentation/networking/netpolicy.txt |  157 ++++  
> 
> 
> I find this patch series very suspect, as
> Documentation/networking/scaling.txt is untouched.
> 
> I highly recommend you present your ideas at next netdev conference.
> 
> I really doubt the mailing lists are the best place to present your
> work, given the huge amount of code/layers you want to add in linux
> kernel.

Agreed.

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Pablo Neira Ayuso @ 2016-09-19 20:35 UTC (permalink / raw)
  To: Daniel Mack
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <ac88bb4c-ab7c-1f74-c7fd-79e523b50ae4-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

On Mon, Sep 19, 2016 at 09:30:02PM +0200, Daniel Mack wrote:
> On 09/19/2016 09:19 PM, Pablo Neira Ayuso wrote:
> > On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:
> >> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> >> index 6001e78..5dc90aa 100644
> >> --- a/net/ipv6/ip6_output.c
> >> +++ b/net/ipv6/ip6_output.c
> >> @@ -39,6 +39,7 @@
> >>  #include <linux/module.h>
> >>  #include <linux/slab.h>
> >>  
> >> +#include <linux/bpf-cgroup.h>
> >>  #include <linux/netfilter.h>
> >>  #include <linux/netfilter_ipv6.h>
> >>  
> >> @@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >>  {
> >>  	struct net_device *dev = skb_dst(skb)->dev;
> >>  	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
> >> +	int ret;
> >>  
> >>  	if (unlikely(idev->cnf.disable_ipv6)) {
> >>  		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
> >> @@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >>  		return 0;
> >>  	}
> >>  
> >> +	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
> >> +	if (ret) {
> >> +		kfree_skb(skb);
> >> +		return ret;
> >> +	}
> > 
> > 1) If your goal is to filter packets, why so late? The sooner you
> >    enforce your policy, the less cycles you waste.
> > 
> > Actually, did you look at Google's approach to this problem?  They
> > want to control this at socket level, so you restrict what the process
> > can actually bind. That is enforcing the policy way before you even
> > send packets. On top of that, what they submitted is infrastructured
> > so any process with CAP_NET_ADMIN can access that policy that is being
> > applied and fetch a readable policy through kernel interface.
> 
> Yes, I've seen what they propose, but I want this approach to support
> accounting, and so the code has to look at each and every packet in
> order to count bytes and packets. Do you know of any better place to put
> the hook then?

Accounting is part of the usecase that fits into the "network
introspection" idea that has been mentioned here, so you can achieve
this by adding a hook that returns no verdict, so this becomes similar
to the tracing infrastructure.

> That said, I can well imagine more hooks types that also operate at port
> bind time. That would be easy to add on top.

Filtering packets with cgroups is braindead.

You have the means to ensure that processes send no packets via
restricting port binding, there is no reason to do this any later for
locally generated traffic.

^ permalink raw reply

* Re: [PATCH v2 net-next] MAINTAINERS: Add an entry for the core network DSA code
From: Vivien Didelot @ 2016-09-19 20:31 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: Florian Fainelli, netdev, Andrew Lunn
In-Reply-To: <1474226240-8083-1-git-send-email-andrew@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

> The core distributed switch architecture code currently does not have
> a MAINTAINERS entry, which results in some contributions not landing
> in the right peoples inbox.
>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH] net: ipv6: fallback to full lookup if table lookup is unsuitable
From: Vincent Bernat @ 2016-09-19 20:27 UTC (permalink / raw)
  To: David Miller; +Cc: dsa, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <20160919.005857.553516799273302087.davem@davemloft.net>

 ❦ 19 septembre 2016 06:58 CEST, David Miller <davem@davemloft.net> :

>> @@ -1808,6 +1808,30 @@ static struct rt6_info *ip6_nh_lookup_table(struct net *net,
>>  	return rt;
>>  }
>>  
>> +static int ip6_nh_valid(struct rt6_info *grt,
>> +			struct net_device **dev, struct inet6_dev **idev) {
>> +	int ret = 0;
>
> First, this is not formatted properly.  The openning brace should start
> on a new line.
>
> Second, please use "bool", "true", and "false" for the return value.

Noted for the next time. However, the v3 version of the patch doesn't
have the function anymore.
-- 
Avoid temporary variables.
            - The Elements of Programming Style (Kernighan & Plauger)

^ permalink raw reply

* Re: drr scheduler [mis]configuration question
From: Michal Soltys @ 2016-09-19 20:19 UTC (permalink / raw)
  To: netdev
In-Reply-To: <ea42bcea-2982-e34c-268e-41a2eb7a1e37@ziu.info>

> 
> At this point I'm a bit lost what I'm doing wrong.
> 

Just in case (erhm... for sake of completness and archiving) to answer
my own borderline silly question - handles and classids are in hex.

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Alexei Starovoitov @ 2016-09-19 20:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160919191910.GA984@salvia>

On Mon, Sep 19, 2016 at 09:19:10PM +0200, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index 6001e78..5dc90aa 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -39,6 +39,7 @@
> >  #include <linux/module.h>
> >  #include <linux/slab.h>
> >  
> > +#include <linux/bpf-cgroup.h>
> >  #include <linux/netfilter.h>
> >  #include <linux/netfilter_ipv6.h>
> >  
> > @@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >  {
> >  	struct net_device *dev = skb_dst(skb)->dev;
> >  	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
> > +	int ret;
> >  
> >  	if (unlikely(idev->cnf.disable_ipv6)) {
> >  		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
> > @@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >  		return 0;
> >  	}
> >  
> > +	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
> > +	if (ret) {
> > +		kfree_skb(skb);
> > +		return ret;
> > +	}
> 
> 1) If your goal is to filter packets, why so late? The sooner you
>    enforce your policy, the less cycles you waste.
> 
> Actually, did you look at Google's approach to this problem?  They
> want to control this at socket level, so you restrict what the process
> can actually bind. That is enforcing the policy way before you even
> send packets. On top of that, what they submitted is infrastructured
> so any process with CAP_NET_ADMIN can access that policy that is being
> applied and fetch a readable policy through kernel interface.
> 
> 2) This will turn the stack into a nightmare to debug I predict. If
>    any process with CAP_NET_ADMIN can potentially attach bpf blobs
>    via these hooks, we will have to include in the network stack

a process without CAP_NET_ADMIN can attach bpf blobs to
system calls via seccomp. bpf is already used for security and policing.

>    traveling documentation something like: "Probably you have to check
>    that your orchestrator is not dropping your packets for some
>    reason". So I wonder how users will debug this and how the policy that
>    your orchestrator applies will be exposed to userspace.

as far as bpf debuggability/visibility there are various efforts on the way:
for kernel side:
- ksym for jit-ed programs
- hash sum for prog code
- compact type information for maps and various pretty printers
- data flow analysis of the programs
for user space:
- from bpf asm reconstruct the program in the high level language
  (there is p4 to bpf, this effort is about bpf to p4)

^ permalink raw reply

* Re: [PATCH] net: skbuff: Fix length validation in skb_vlan_pop()
From: Shmulik Ladkani @ 2016-09-19 20:04 UTC (permalink / raw)
  To: pravin shelar
  Cc: Jiri Pirko, David S . Miller, Linux Kernel Network Developers,
	Daniel Borkmann, Jamal Hadi Salim
In-Reply-To: <CAOrHB_D2ytTK9WjfCPL87E1QomeXg+2GfAvo8N8xCcJULhAC=w@mail.gmail.com>

Hi Pravin,

On Sun, 18 Sep 2016 13:26:30 -0700 pravin shelar <pshelar@ovn.org> wrote:
> > +++ b/net/core/skbuff.c
> > @@ -4537,7 +4537,7 @@ int skb_vlan_pop(struct sk_buff *skb)
> >         } else {
> >                 if (unlikely((skb->protocol != htons(ETH_P_8021Q) &&
> >                               skb->protocol != htons(ETH_P_8021AD)) ||
> > -                            skb->len < VLAN_ETH_HLEN))
> > +                            skb->mac_len < VLAN_ETH_HLEN))  
> 
> There is already check in __skb_vlan_pop() to validate skb for a vlan
> header. So it is safe to drop this check entirely.

Yep, I submitted a v2 with your suggestion, however I withdrew it, as
there is a slight behavior difference noticable by 'skb_vlan_pop' callers.

Suppose the rare case where skb->len is too small.

pre:
  skb_vlan_pop returns 0 (at least for the correct tx path).
  Meaning, callers do not see it as a failure.
post:
  skb_ensure_writable fails (!pskb_may_pull), therefore -ENOMEM returned
  to the callers of 'skb_vlan_pop'.

For ovs, it means do_execute_actions's loop is terminated, no further
actions are executed, and skb gets freed.

For tc act vlan, it means skb gets dropped.

This actually makes sense, but do we want to present this change?

Thanks,
Shmulik

^ permalink raw reply

* Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing
From: Michael Chan @ 2016-09-19 20:03 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Joe Perches, netdev, David Miller, bcm-kernel-feedback-list,
	Rafał Miłecki, Hauke Mehrtens
In-Reply-To: <ceeadf96-6ec8-1931-e43c-1d992ce03b86@gmail.com>

On Mon, Sep 19, 2016 at 12:54 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 09/17/2016 04:39 PM, Michael Chan wrote:
>> On Sat, Sep 17, 2016 at 4:17 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> 2016-09-17 15:51 GMT-07:00 Joe Perches <joe@perches.com>:
>>>> On Sat, 2016-09-17 at 15:27 -0700, Florian Fainelli wrote:
>>>>> Gary has not been with Broadcom for some time now, replace his address
>>>>> with the internal mailing-list used for other entries.
>>>>>
>>>>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>>>> ---
>>>>> Michael,
>>>>>
>>>>> Since this is an old driver, not sure who could step up as a maintainer
>>>>> for b44?
>>>> []
>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> []
>>>>> @@ -2500,8 +2500,8 @@ S:      Supported
>>>>
>>>>> F:    kernel/bpf/
>>>>>
>>>>>  BROADCOM B44 10/100 ETHERNET DRIVER
>>>>> -M:   Gary Zambrano <zambrano@broadcom.com>
>>>>>  L:   netdev@vger.kernel.org
>>>>> +M:   bcm-kernel-feedback-list@broadcom.com
>>>>>  S:   Supported
>>>>>  F:   drivers/net/ethernet/broadcom/b44.*
>>>>
>>>> Without an actual maintainer, this should really be
>>>> orphan and not supported.
>>>
>>> I would like to hear from Michael before concluding that
>>>
>>
>> I have worked on this NIC more than 10 years ago.  Last time I
>> checked, I don't have this NIC anymore after moving offices several
>> times.
>>
>> I don't mind being the maintainer, if no one else more suitable and
>> have access to hardware wants to do it.
>
> Should I resubmit with your name as a maintainer or do you want to do it?

I will submit the patch in a day or 2.  Unless of course someone wants
to be the maintainer instead.

^ permalink raw reply

* Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing
From: Florian Fainelli @ 2016-09-19 19:54 UTC (permalink / raw)
  To: Michael Chan
  Cc: Joe Perches, netdev, David Miller, bcm-kernel-feedback-list,
	Rafał Miłecki, Hauke Mehrtens
In-Reply-To: <CACKFLi=io5VjP+L+RVVef3FwmFt77aVFrJ+v8-4FyupNL_752w@mail.gmail.com>

On 09/17/2016 04:39 PM, Michael Chan wrote:
> On Sat, Sep 17, 2016 at 4:17 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 2016-09-17 15:51 GMT-07:00 Joe Perches <joe@perches.com>:
>>> On Sat, 2016-09-17 at 15:27 -0700, Florian Fainelli wrote:
>>>> Gary has not been with Broadcom for some time now, replace his address
>>>> with the internal mailing-list used for other entries.
>>>>
>>>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>>> ---
>>>> Michael,
>>>>
>>>> Since this is an old driver, not sure who could step up as a maintainer
>>>> for b44?
>>> []
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> []
>>>> @@ -2500,8 +2500,8 @@ S:      Supported
>>>
>>>> F:    kernel/bpf/
>>>>
>>>>  BROADCOM B44 10/100 ETHERNET DRIVER
>>>> -M:   Gary Zambrano <zambrano@broadcom.com>
>>>>  L:   netdev@vger.kernel.org
>>>> +M:   bcm-kernel-feedback-list@broadcom.com
>>>>  S:   Supported
>>>>  F:   drivers/net/ethernet/broadcom/b44.*
>>>
>>> Without an actual maintainer, this should really be
>>> orphan and not supported.
>>
>> I would like to hear from Michael before concluding that
>>
> 
> I have worked on this NIC more than 10 years ago.  Last time I
> checked, I don't have this NIC anymore after moving offices several
> times.
> 
> I don't mind being the maintainer, if no one else more suitable and
> have access to hardware wants to do it.

Should I resubmit with your name as a maintainer or do you want to do it?
-- 
Florian

^ permalink raw reply

* Re: [PATCH v2 net-next] MAINTAINERS: Add an entry for the core network DSA code
From: Florian Fainelli @ 2016-09-19 19:54 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: Vivien Didelot, netdev
In-Reply-To: <1474226240-8083-1-git-send-email-andrew@lunn.ch>

On 09/18/2016 12:17 PM, Andrew Lunn wrote:
> The core distributed switch architecture code currently does not have
> a MAINTAINERS entry, which results in some contributions not landing
> in the right peoples inbox.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 04/10] bnxt_en: Added support for Secure Firmware Update
From: kbuild test robot @ 2016-09-19 19:48 UTC (permalink / raw)
  To: Michael Chan; +Cc: kbuild-all, davem, netdev
In-Reply-To: <1474271889-8229-5-git-send-email-michael.chan@broadcom.com>

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

Hi Rob,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Michael-Chan/bnxt-update-for-net-next/20160919-161506
config: s390-allmodconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

All errors (new ones prefixed by >>):

>> ERROR: "rtc_time64_to_tm" [drivers/net/ethernet/broadcom/bnxt/bnxt_en.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42381 bytes --]

^ permalink raw reply

* Re: [PATCH 0/4] net-next: dsa: set_addr should be optional
From: Florian Fainelli @ 2016-09-19 19:41 UTC (permalink / raw)
  To: John Crispin, David S. Miller, Andrew Lunn; +Cc: netdev, linux-kernel
In-Reply-To: <1474291683-44167-1-git-send-email-john@phrozen.org>

On 09/19/2016 06:27 AM, John Crispin wrote:
> The Marvell driver is the only one that actually sets the switches HW
> address. All other drivers have an empty stub. fix this by making the
> callback optional.
> 
> John Crispin (4):
>   net-next: dsa: fix duplicate invocation of set_addr()
>   net-next: dsa: make the set_addr() operation optional
>   net-next: dsa: b53: remove empty set_addr() stub
>   net-next: dsa: qca8k: remove empty set_addr() stub

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* pull request: bluetooth-next 2016-09-19
From: Johan Hedberg @ 2016-09-19 19:37 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 6995 bytes --]

Hi Dave,

Here's the main bluetooth-next pull request for the 4.9 kernel.

 - Added new messages for monitor sockets for better mgmt tracing
 - Added local name and appearance support in scan response
 - Added new Qualcomm WCNSS SMD based HCI driver
 - Minor fixes & cleanup to 802.15.4 code
 - New USB ID to btusb driver
 - Added Marvell support to HCI UART driver
 - Add combined LED trigger for controller power
 - Other minor fixes here and there

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit e867e87ae88c54f741d1cabd1de536b4497a0504:

  Merge tag 'rxrpc-rewrite-20160917-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs (2016-09-19 01:52:21 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream

for you to fetch changes up to af4168c5a925dc3b11b0246c2b91124327919f47:

  Bluetooth: Set appearance only for LE capable controllers (2016-09-19 21:48:22 +0300)

----------------------------------------------------------------
Alexander Aring (3):
      mac802154: set phy net namespace for new ifaces
      6lowpan: ndisc: no overreact if no short address is available
      fakelb: fix schedule while atomic

Aristeu Rozanski (2):
      mac802154: don't warn on unsupported frames
      mac802154: use rate limited warnings for malformed frames

Arnd Bergmann (1):
      Bluetooth: add WCNSS dependency for HCI driver

Bart Van Assche (1):
      Bluetooth: btusb, hci_intel: Fix wait_on_bit_timeout() return value checks

Bhaktipriya Shridhar (1):
      Bluetooth: Remove deprecated create_singlethread_workqueue

Bjorn Andersson (2):
      Bluetooth: Add HCI device identifier for Qualcomm SMD
      Bluetooth: Introduce Qualcomm WCNSS SMD based HCI driver

Colin Ian King (1):
      Bluetooth: btqca: remove null checks on edl->data as it is an array

Frédéric Dalleau (1):
      Bluetooth: Fix reason code used for rejecting SCO connections

Johan Hedberg (1):
      Bluetooth: mgmt: Fix sending redundant event for Advertising Instance

Kai-Heng Feng (1):
      Bluetooth: btusb: Add support for 0cf3:e009

Larry Finger (1):
      Bluetooth: btrtl: Add RTL8822BE Bluetooth device

Loic Poulain (3):
      Bluetooth: hci_bcm: Change protocol name
      Bluetooth: hci_uart: Add Nokia Protocol identifier
      Bluetooth: hci_uart: Add Marvell support

Marcel Holtmann (21):
      Bluetooth: Put led_trigger field behind CONFIG_BT_LEDS
      Bluetooth: Add combined LED trigger for controller power
      Bluetooth: Check SOL_HCI for raw socket options
      Bluetooth: Store control socket cookie and comm information
      Bluetooth: Introduce helper to pack mgmt version information
      Bluetooth: Add support for sending MGMT open and close to monitor
      Bluetooth: Add support for sending MGMT commands and events to monitor
      Bluetooth: Use individual flags for certain management events
      Bluetooth: Fix wrong Get Clock Information return parameters
      Bluetooth: Use command status event for Set IO Capability errors
      Bluetooth: Introduce helper functions for socket cookie handling
      Bluetooth: Use numbers for subsystem version string
      Bluetooth: Send control open and close only when cookie is present
      Bluetooth: Assign the channel early when binding HCI sockets
      Bluetooth: Add extra channel checks for control open/close messages
      Bluetooth: Send control open and close messages for HCI raw sockets
      Bluetooth: Handle HCI raw socket transition from unbound to bound
      Bluetooth: Add framework for Extended Controller Information
      Bluetooth: Send control open and close messages for HCI user channels
      Bluetooth: Fix wrong New Settings event when closing HCI User Channel
      Bluetooth: Increase the subsystem minor version number

Michał Narajowski (7):
      Bluetooth: Append local name and CoD to Extended Controller Info
      Bluetooth: Add support for local name in scan rsp
      Bluetooth: Add support for appearance in scan rsp
      Bluetooth: Factor appending EIR to separate helper
      Bluetooth: Add supported data types to ext info changed event
      Bluetooth: Fix missing ext info event when setting appearance
      Bluetooth: Set appearance only for LE capable controllers

Nicolas Iooss (1):
      Bluetooth: add printf format attribute to hci_set_[fh]w_info()

Szymon Janc (8):
      Bluetooth: btusb: Mark CW6622 devices to have broken link key commands
      Bluetooth: Fix not registering BR/EDR SMP channel with force_bredr flag
      Bluetooth: Remove unused parameter from tlv_data_is_valid function
      Bluetooth: Unify advertising instance flags check
      Bluetooth: Fix advertising instance validity check for flags
      Bluetooth: Increment management interface revision
      Bluetooth: Refactor read_ext_controller_info handler
      Bluetooth: Add appearance to Read Ext Controller Info command

Wei Yongjun (1):
      Bluetooth: Use kzalloc instead of kmalloc/memset

Wolfram Sang (1):
      Bluetooth: bcm203x: don't print error when allocating urb fails

 drivers/bluetooth/Kconfig         |  23 ++
 drivers/bluetooth/Makefile        |   2 +
 drivers/bluetooth/bcm203x.c       |   4 +-
 drivers/bluetooth/btqca.c         |   8 +-
 drivers/bluetooth/btqcomsmd.c     | 182 ++++++++++++++++
 drivers/bluetooth/btrtl.c         | 107 ++++++++--
 drivers/bluetooth/btusb.c         |  13 +-
 drivers/bluetooth/hci_bcm.c       |   2 +-
 drivers/bluetooth/hci_intel.c     |   6 +-
 drivers/bluetooth/hci_ldisc.c     |   6 +
 drivers/bluetooth/hci_mrvl.c      | 387 ++++++++++++++++++++++++++++++++++
 drivers/bluetooth/hci_qca.c       |   2 +-
 drivers/bluetooth/hci_uart.h      |   9 +-
 drivers/net/ieee802154/fakelb.c   |  14 +-
 include/net/bluetooth/bluetooth.h |   4 +-
 include/net/bluetooth/hci.h       |   7 +-
 include/net/bluetooth/hci_core.h  |  11 +-
 include/net/bluetooth/hci_mon.h   |   4 +
 include/net/bluetooth/mgmt.h      |  24 +++
 net/6lowpan/ndisc.c               |   2 -
 net/bluetooth/af_bluetooth.c      |  15 +-
 net/bluetooth/hci_core.c          |   1 +
 net/bluetooth/hci_request.c       |  49 +++--
 net/bluetooth/hci_request.h       |   5 +-
 net/bluetooth/hci_sock.c          | 396 ++++++++++++++++++++++++++++++++++-
 net/bluetooth/leds.c              |  27 +++
 net/bluetooth/leds.h              |  10 +
 net/bluetooth/mgmt.c              | 349 +++++++++++++++++++++++-------
 net/bluetooth/mgmt_util.c         |  66 +++++-
 net/bluetooth/smp.c               |   5 +-
 net/mac802154/iface.c             |   1 +
 net/mac802154/rx.c                |   9 +-
 32 files changed, 1600 insertions(+), 150 deletions(-)
 create mode 100644 drivers/bluetooth/btqcomsmd.c
 create mode 100644 drivers/bluetooth/hci_mrvl.c

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Daniel Mack @ 2016-09-19 19:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: htejun, daniel, ast, davem, kafai, fw, harald, netdev, sargun,
	cgroups
In-Reply-To: <20160919191910.GA984@salvia>

On 09/19/2016 09:19 PM, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 6001e78..5dc90aa 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -39,6 +39,7 @@
>>  #include <linux/module.h>
>>  #include <linux/slab.h>
>>  
>> +#include <linux/bpf-cgroup.h>
>>  #include <linux/netfilter.h>
>>  #include <linux/netfilter_ipv6.h>
>>  
>> @@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>>  {
>>  	struct net_device *dev = skb_dst(skb)->dev;
>>  	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
>> +	int ret;
>>  
>>  	if (unlikely(idev->cnf.disable_ipv6)) {
>>  		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
>> @@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>>  		return 0;
>>  	}
>>  
>> +	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
>> +	if (ret) {
>> +		kfree_skb(skb);
>> +		return ret;
>> +	}
> 
> 1) If your goal is to filter packets, why so late? The sooner you
>    enforce your policy, the less cycles you waste.
> 
> Actually, did you look at Google's approach to this problem?  They
> want to control this at socket level, so you restrict what the process
> can actually bind. That is enforcing the policy way before you even
> send packets. On top of that, what they submitted is infrastructured
> so any process with CAP_NET_ADMIN can access that policy that is being
> applied and fetch a readable policy through kernel interface.

Yes, I've seen what they propose, but I want this approach to support
accounting, and so the code has to look at each and every packet in
order to count bytes and packets. Do you know of any better place to put
the hook then?

That said, I can well imagine more hooks types that also operate at port
bind time. That would be easy to add on top.

> 2) This will turn the stack into a nightmare to debug I predict. If
>    any process with CAP_NET_ADMIN can potentially attach bpf blobs
>    via these hooks, we will have to include in the network stack
>    traveling documentation something like: "Probably you have to check
>    that your orchestrator is not dropping your packets for some
>    reason". So I wonder how users will debug this and how the policy that
>    your orchestrator applies will be exposed to userspace.

Sure, every new limitation mechanism adds another knob to look at if
things don't work. But apart from taking care at userspace level to make
the behavior as obvious as possible, I'm open to suggestions of how to
improve the transparency of attached eBPF programs on the kernel side.


Thanks,
Daniel

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox