Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] net: xfrm: don't double-hold dst when sk_policy in use.
From: Steffen Klassert @ 2017-08-25  5:16 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: netdev, weiwan, davem, misterikkit
In-Reply-To: <20170823081439.52796-1-lorenzo@google.com>

On Wed, Aug 23, 2017 at 05:14:39PM +0900, Lorenzo Colitti wrote:
> While removing dst_entry garbage collection, commit 52df157f17e5
> ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
> changed xfrm_resolve_and_create_bundle so it returns an xdst with
> a refcount of 1 instead of 0.
> 
> However, it did not delete the dst_hold performed by xfrm_lookup
> when a per-socket policy is in use. This means that when a
> socket policy is in use, dst entries returned by xfrm_lookup have
> a refcount of 2, and are not freed when no longer in use.
> 
> Cc: Wei Wang <weiwan@google.com>
> Fixes: 52df157f17 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
> Tested: https://android-review.googlesource.com/417481
> Tested: https://android-review.googlesource.com/418659
> Tested: https://android-review.googlesource.com/424463
> Tested: https://android-review.googlesource.com/452776 passes on net-next
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

Patch applied, thanks a lot!

^ permalink raw reply

* Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
From: Florian Westphal @ 2017-08-25  5:21 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Florian Westphal, Linux Kernel Network Developers
In-Reply-To: <f31c9ddfc1d4e02ff7fc16d3a0955841@nuclearcat.com>

Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
> >>> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> >>> approx 2gbps of pppoe users traffic) and noticed that after while server
> >>> rebooting(i have set reboot on panic and etc).
> >>> I can't run serial console, and in pstore / netconsole there is nothing.
> >>> Best i got is some very short message about softlockup in ipmi, but as
> >>> storage very limited there - it is near useless.
> >>>
> >>> By preliminary testing (can't do it much, as it's production) - it seems
> >>> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
> >>
> >>Wild guess here, does this help?
> >>
> >>diff --git a/net/netfilter/nf_conntrack_helper.c
> >>b/net/netfilter/nf_conntrack_helper.c
> >>--- a/net/netfilter/nf_conntrack_helper.c
> >>+++ b/net/netfilter/nf_conntrack_helper.c
> >>@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct,
> >>struct nf_conn *tmpl,
> >>                help = nf_ct_helper_ext_add(ct, helper, flags);
> >>                if (help == NULL)
> >>                        return -ENOMEM;
> >>+              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> >
> >sigh, stupid typo, should be no ';' at the end above.
> Sorry, is there any plans to push this to 4.12 stable queue?

No, sorry, this patch adds the extension for all connections
that use a helper, but the nat extension is only used/required by pptp
helper (and masquerade).

Thing is that this patch should not be needed, I will have
to review pptp again, maybe i missed a case where the extension is not
added.

Do you happen to have an oops backtrace?

That might speed this up a bit.

^ permalink raw reply

* [PATCH] net: sxgbe: check memory allocation failure
From: Christophe JAILLET @ 2017-08-25  5:35 UTC (permalink / raw)
  To: bh74.an, ks.giri, vipul.pandya
  Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET

Check memory allocation failure and return -ENOMEM in such a case, as
already done few lines below for another memory allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c b/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
index 73427e29df2a..fbd00cb0cb7d 100644
--- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
+++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
@@ -47,6 +47,8 @@ static int sxgbe_probe_config_dt(struct platform_device *pdev,
 	plat->mdio_bus_data = devm_kzalloc(&pdev->dev,
 					   sizeof(*plat->mdio_bus_data),
 					   GFP_KERNEL);
+	if (!plat->mdio_bus_data)
+		return -ENOMEM;
 
 	dma_cfg = devm_kzalloc(&pdev->dev, sizeof(*dma_cfg), GFP_KERNEL);
 	if (!dma_cfg)
-- 
2.11.0

^ permalink raw reply related

* RE: [PATCH v3 0/7] Add RSS to DPAA 1.x Ethernet driver
From: Madalin-cristian Bucur @ 2017-08-25  5:50 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20170824.114741.1529064786944246878.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Subject: Re: [PATCH v3 0/7] Add RSS to DPAA 1.x Ethernet driver
> 
> From: David Miller <davem@davemloft.net>
> Date: Thu, 24 Aug 2017 09:42:20 -0700 (PDT)
> 
> > From: Madalin Bucur <madalin.bucur@nxp.com>
> > Date: Thu, 24 Aug 2017 10:28:21 +0300
> >
> >> This patch set introduces Receive Side Scaling for the DPAA Ethernet
> >> driver. Documentation is updated with details related to the new
> >> feature and limitations that apply.
> >> Added also a small fix.
> >>
> >> v2: removed a C++ style comment
> >> v3: move struct fman to header file to avoid exporting a function
> >
> > Series applied, thanks.
> 
> Actually I'm reverting, this doesn't even compile.

Hi,

Sorry for this blunder, I've only tested on PPC, where it works.
Will come back with a proper patch set.

Madalin

^ permalink raw reply

* [PATCH net-next v5] net: stmmac: Delete dead code for MDIO registration
From: Romain Perier @ 2017-08-25  6:49 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue, Andrew Lunn,
	Florian Fainelli
  Cc: David S. Miller, netdev, linux-kernel, Romain Perier

This code is no longer used, the logging function was changed by commit
fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register").
It was previously showing information about the type of the IRQ, if it's
polled, ignored or a normal interrupt. As we don't want information loss,
I have moved this code to phy_attached_print().

Fixes: fbca164776e4 ("net: stmmac: Use the right logging function in stmmac_mdio_register")
Signed-off-by: Romain Perier <romain.perier@collabora.com>
---

Hello,

This is the continuity of "[PATCH v2 0/2] net: stmmac: phy logging fixes",
see https://lkml.org/lkml/2017/8/21/288

Changes in v5:
- Don't truncater commit fbca164776e4 in the message of *this* commit
- Fixed typo

Changes in v4:
- Don't truncate the commit subject for "Fixes"
- Fixed invalid sizeof() used with snprintf
- Added "net-next" prefix to the title of the commit

Changes in v3:
- Removed patch 2/2: "net: phy: Don't use drv when it is NULL in phy_attached_print",
  fixed upstream by fcd03e362b1c
  ("net: phy: Deal with unbound PHY driver in phy_attached_print()")
- Moved this code to phy_attached_print(), so we have more informations
  about the IRQ (poll, ignore or normal irq) and no information are loss.
- Re-worded commit message

 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 16 ----------------
 drivers/net/phy/phy_device.c                      | 21 ++++++++++++++++++---
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 72ec711fcba2..f5f37bfa1d58 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -248,9 +248,6 @@ int stmmac_mdio_register(struct net_device *ndev)
 	found = 0;
 	for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
 		struct phy_device *phydev = mdiobus_get_phy(new_bus, addr);
-		int act = 0;
-		char irq_num[4];
-		char *irq_str;
 
 		if (!phydev)
 			continue;
@@ -273,19 +270,6 @@ int stmmac_mdio_register(struct net_device *ndev)
 		if (priv->plat->phy_addr == -1)
 			priv->plat->phy_addr = addr;
 
-		act = (priv->plat->phy_addr == addr);
-		switch (phydev->irq) {
-		case PHY_POLL:
-			irq_str = "POLL";
-			break;
-		case PHY_IGNORE_INTERRUPT:
-			irq_str = "IGNORE";
-			break;
-		default:
-			sprintf(irq_num, "%d", phydev->irq);
-			irq_str = irq_num;
-			break;
-		}
 		phy_attached_info(phydev);
 		found = 1;
 	}
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 2f742ae5b92e..4f67cafda2ef 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -861,21 +861,36 @@ void phy_attached_info(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_attached_info);
 
-#define ATTACHED_FMT "attached PHY driver [%s] (mii_bus:phy_addr=%s, irq=%d)"
+#define ATTACHED_FMT "attached PHY driver [%s] (mii_bus:phy_addr=%s, irq=%s)"
 void phy_attached_print(struct phy_device *phydev, const char *fmt, ...)
 {
 	const char *drv_name = phydev->drv ? phydev->drv->name : "unbound";
+	char *irq_str;
+	char irq_num[4];
+
+	switch(phydev->irq) {
+	case PHY_POLL:
+		irq_str = "POLL";
+		break;
+	case PHY_IGNORE_INTERRUPT:
+		irq_str = "IGNORE";
+		break;
+	default:
+		snprintf(irq_num, sizeof(irq_num), "%d", phydev->irq);
+		irq_str = irq_num;
+		break;
+	}
 
 	if (!fmt) {
 		dev_info(&phydev->mdio.dev, ATTACHED_FMT "\n",
 			 drv_name, phydev_name(phydev),
-			 phydev->irq);
+			 irq_str);
 	} else {
 		va_list ap;
 
 		dev_info(&phydev->mdio.dev, ATTACHED_FMT,
 			 drv_name, phydev_name(phydev),
-			 phydev->irq);
+			 irq_str);
 
 		va_start(ap, fmt);
 		vprintk(fmt, ap);
-- 
2.11.0

^ permalink raw reply related

* [PATCH net] ipv6: Fix may be used uninitialized warning in rt6_check
From: Steffen Klassert @ 2017-08-25  7:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: Wei Wang, Eric Dumazet, Martin KaFai Lau, netdev

rt_cookie might be used uninitialized, fix this by
initializing it.

Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a9d3564..48c8c92 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1289,7 +1289,7 @@ static void rt6_dst_from_metrics_check(struct rt6_info *rt)
 
 static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
 {
-	u32 rt_cookie;
+	u32 rt_cookie = 0;
 
 	if (!rt6_get_cookie_safe(rt, &rt_cookie) || rt_cookie != cookie)
 		return NULL;
-- 
2.7.4

^ permalink raw reply related

* Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
From: Denys Fedoryshchenko @ 2017-08-25  7:15 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Linux Kernel Network Developers
In-Reply-To: <20170825052137.GF15739@breakpoint.cc>

On 2017-08-25 08:21, Florian Westphal wrote:
> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>> >>> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>> >>> approx 2gbps of pppoe users traffic) and noticed that after while server
>> >>> rebooting(i have set reboot on panic and etc).
>> >>> I can't run serial console, and in pstore / netconsole there is nothing.
>> >>> Best i got is some very short message about softlockup in ipmi, but as
>> >>> storage very limited there - it is near useless.
>> >>>
>> >>> By preliminary testing (can't do it much, as it's production) - it seems
>> >>> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>> >>
>> >>Wild guess here, does this help?
>> >>
>> >>diff --git a/net/netfilter/nf_conntrack_helper.c
>> >>b/net/netfilter/nf_conntrack_helper.c
>> >>--- a/net/netfilter/nf_conntrack_helper.c
>> >>+++ b/net/netfilter/nf_conntrack_helper.c
>> >>@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct,
>> >>struct nf_conn *tmpl,
>> >>                help = nf_ct_helper_ext_add(ct, helper, flags);
>> >>                if (help == NULL)
>> >>                        return -ENOMEM;
>> >>+              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
>> >
>> >sigh, stupid typo, should be no ';' at the end above.
>> Sorry, is there any plans to push this to 4.12 stable queue?
> 
> No, sorry, this patch adds the extension for all connections
> that use a helper, but the nat extension is only used/required by pptp
> helper (and masquerade).
> 
> Thing is that this patch should not be needed, I will have
> to review pptp again, maybe i missed a case where the extension is not
> added.
> 
> Do you happen to have an oops backtrace?
> 
> That might speed this up a bit.
There is nothing in netconsole, and also nothing ERST pstore, i found 
reason just by guessing.
Its totally headless also (no screen, no serial console).
I can try to attach USB serial for serial console, but not sure it will 
help.
If there is any other way to catch - i can try it, but as it's 
production server, so i can't "crash it" more than once per day.

^ permalink raw reply

* Re: [net-next:master 1328/1341] drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 'devlink_dpipe_header_ethernet' undeclared here (not in a function)
From: Jiri Pirko @ 2017-08-25  7:28 UTC (permalink / raw)
  To: David Miller; +Cc: fengguang.wu, arkadis, kbuild-all, netdev, jiri
In-Reply-To: <20170824.181150.1024909235376525254.davem@davemloft.net>

Fri, Aug 25, 2017 at 03:11:50AM CEST, davem@davemloft.net wrote:
>From: kbuild test robot <fengguang.wu@intel.com>
>Date: Fri, 25 Aug 2017 08:16:01 +0800
>
>> All error/warnings (new ones prefixed by >>):
>> 
>>>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 'devlink_dpipe_header_ethernet' undeclared here (not in a function)
>>      &devlink_dpipe_header_ethernet,
>>       ^
>>>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:78:3: error: 'devlink_dpipe_header_ipv4' undeclared here (not in a function)
>>      &devlink_dpipe_header_ipv4,
>>       ^
>>    drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 'mlxsw_sp_dpipe_erif_table_init':
>>    drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:305:9: error: too few arguments to function 'devlink_dpipe_table_register'
>>      return devlink_dpipe_table_register(devlink,
>
>Jiri and co., I think you'll need some ifdef'ery to deal with this
>properly.

Will fix this. Thanks.

^ permalink raw reply

* [PATCH net-next RESEND] sched: sfq: drop packets after root qdisc lock is released
From: gfree.wind @ 2017-08-25  7:43 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, edumazet, netdev; +Cc: Gao Feng

From: Gao Feng <gfree.wind@vip.163.com>

The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock
is released) made a big change of tc for performance. But there are
some points which are not changed in SFQ enqueue operation.
1. Fail to find the SFQ hash slot;
2. When the queue is full;

Now use qdisc_drop instead free skb directly.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
 net/sched/sch_sfq.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 82469ef..8841f4d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -292,7 +292,7 @@ static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb)
 	slot->skblist_prev = skb;
 }
 
-static unsigned int sfq_drop(struct Qdisc *sch)
+static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	sfq_index x, d = q->cur_depth;
@@ -310,9 +310,13 @@ static unsigned int sfq_drop(struct Qdisc *sch)
 		slot->backlog -= len;
 		sfq_dec(q, x);
 		sch->q.qlen--;
-		qdisc_qstats_drop(sch);
 		qdisc_qstats_backlog_dec(sch, skb);
-		kfree_skb(skb);
+		if (likely(to_free)) {
+			qdisc_drop(skb, sch, to_free);
+		} else {
+			qdisc_qstats_drop(sch);
+			kfree_skb(skb);
+		}
 		return len;
 	}
 
@@ -360,7 +364,7 @@ static int sfq_headdrop(const struct sfq_sched_data *q)
 	if (hash == 0) {
 		if (ret & __NET_XMIT_BYPASS)
 			qdisc_qstats_drop(sch);
-		kfree_skb(skb);
+		__qdisc_drop(skb, to_free);
 		return ret;
 	}
 	hash--;
@@ -465,7 +469,7 @@ static int sfq_headdrop(const struct sfq_sched_data *q)
 		return NET_XMIT_SUCCESS;
 
 	qlen = slot->qlen;
-	dropped = sfq_drop(sch);
+	dropped = sfq_drop(sch, to_free);
 	/* Return Congestion Notification only if we dropped a packet
 	 * from this flow.
 	 */
@@ -675,7 +679,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 	qlen = sch->q.qlen;
 	while (sch->q.qlen > q->limit)
-		dropped += sfq_drop(sch);
+		dropped += sfq_drop(sch, NULL);
 	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
 
 	del_timer(&q->perturb_timer);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net] ipv6: Fix may be used uninitialized warning in rt6_check
From: Steffen Klassert @ 2017-08-25  7:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: Wei Wang, Eric Dumazet, Martin KaFai Lau, netdev
In-Reply-To: <20170825070542.GV31224@secunet.com>

On Fri, Aug 25, 2017 at 09:05:42AM +0200, Steffen Klassert wrote:
> rt_cookie might be used uninitialized, fix this by
> initializing it.
> 
> Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/ipv6/route.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index a9d3564..48c8c92 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1289,7 +1289,7 @@ static void rt6_dst_from_metrics_check(struct rt6_info *rt)
>  
>  static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
>  {
> -	u32 rt_cookie;
> +	u32 rt_cookie = 0;
>  
>  	if (!rt6_get_cookie_safe(rt, &rt_cookie) || rt_cookie != cookie)
>  		return NULL;

The compiler warning seems to be a false positive, as
rt_cookie != cookie is only checked if rt6_get_cookie_safe
returns true in which case rt_cookie is initialized.

Please disregard this patch.

^ permalink raw reply

* [PATCH net-next v2 2/5] ipv6: sr: add support for encapsulation of L2 frames
From: David Lebrun @ 2017-08-25  7:56 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170825075648.5061-1-david.lebrun@uclouvain.be>

This patch implements the L2 frame encapsulation mechanism, referred to
as T.Encaps.L2 in the SRv6 specifications [1].

A new type of SRv6 tunnel mode is added (SEG6_IPTUN_MODE_L2ENCAP). It only
accepts packets with an existing MAC header (i.e., it will not work for
locally generated packets). The resulting packet looks like IPv6 -> SRH ->
Ethernet -> original L3 payload. The next header field of the SRH is set to
NEXTHDR_NONE.

[1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 include/uapi/linux/seg6_iptunnel.h | 18 ++++++++++++++----
 net/ipv6/seg6_iptunnel.c           | 25 +++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/seg6_iptunnel.h b/include/uapi/linux/seg6_iptunnel.h
index b6e5a0a..b23df9f 100644
--- a/include/uapi/linux/seg6_iptunnel.h
+++ b/include/uapi/linux/seg6_iptunnel.h
@@ -33,16 +33,26 @@ struct seg6_iptunnel_encap {
 enum {
 	SEG6_IPTUN_MODE_INLINE,
 	SEG6_IPTUN_MODE_ENCAP,
+	SEG6_IPTUN_MODE_L2ENCAP,
 };
 
 #ifdef __KERNEL__
 
 static inline size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
 {
-	int encap = (tuninfo->mode == SEG6_IPTUN_MODE_ENCAP);
-
-	return ((tuninfo->srh->hdrlen + 1) << 3) +
-	       (encap * sizeof(struct ipv6hdr));
+	int head = 0;
+
+	switch (tuninfo->mode) {
+	case SEG6_IPTUN_MODE_INLINE:
+		break;
+	case SEG6_IPTUN_MODE_ENCAP:
+		head = sizeof(struct ipv6hdr);
+		break;
+	case SEG6_IPTUN_MODE_L2ENCAP:
+		return 0;
+	}
+
+	return ((tuninfo->srh->hdrlen + 1) << 3) + head;
 }
 
 #endif
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 5bec781..bd6cc68 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -240,6 +240,22 @@ static int seg6_do_srh(struct sk_buff *skb)
 
 		skb->protocol = htons(ETH_P_IPV6);
 		break;
+	case SEG6_IPTUN_MODE_L2ENCAP:
+		if (!skb_mac_header_was_set(skb))
+			return -EINVAL;
+
+		if (pskb_expand_head(skb, skb->mac_len, 0, GFP_ATOMIC) < 0)
+			return -ENOMEM;
+
+		skb_mac_header_rebuild(skb);
+		skb_push(skb, skb->mac_len);
+
+		err = seg6_do_srh_encap(skb, tinfo->srh, NEXTHDR_NONE);
+		if (err)
+			return err;
+
+		skb->protocol = htons(ETH_P_IPV6);
+		break;
 	}
 
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
@@ -386,6 +402,8 @@ static int seg6_build_state(struct nlattr *nla,
 		break;
 	case SEG6_IPTUN_MODE_ENCAP:
 		break;
+	case SEG6_IPTUN_MODE_L2ENCAP:
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -409,8 +427,11 @@ static int seg6_build_state(struct nlattr *nla,
 	memcpy(&slwt->tuninfo, tuninfo, tuninfo_len);
 
 	newts->type = LWTUNNEL_ENCAP_SEG6;
-	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
-			LWTUNNEL_STATE_INPUT_REDIRECT;
+	newts->flags |= LWTUNNEL_STATE_INPUT_REDIRECT;
+
+	if (tuninfo->mode != SEG6_IPTUN_MODE_L2ENCAP)
+		newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT;
+
 	newts->headroom = seg6_lwt_headroom(tuninfo);
 
 	*ts = newts;
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next v2 0/5] net: updates for IPv6 Segment Routing
From: David Lebrun @ 2017-08-25  7:56 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun

v2: seg6_lwt_headroom() is not relevant for lwtunnel_input_redirect()
    use cases, and L2ENCAP only uses this redirection. Fix incoherence
    between arbitrary MAC header size support and fixed headroom
    computation by setting only LWTUNNEL_STATE_INPUT_REDIRECT for L2ENCAP
    mode.

This patch series provides several updates for the SRv6 implementation. The
first patch leverages the existing infrastructure to support encapsulation
of IPv4 packets. The second patch implements the T.Encaps.L2 SR function,
enabling to encapsulate an L2 Ethernet frame within an IPv6+SRH packet.
The last three patches update the seg6local lightweight tunnel, and mainly
implement four new actions: End.T, End.DX2, End.DX4 and End.DT6.

David Lebrun (5):
  ipv6: sr: add support for ip4ip6 encapsulation
  ipv6: sr: add support for encapsulation of L2 frames
  ipv6: sr: enforce IPv6 packets for seg6local lwt
  ipv6: sr: add helper functions for seg6local
  ipv6: sr: implement additional seg6local actions

 include/net/seg6.h                 |   3 +-
 include/uapi/linux/seg6_iptunnel.h |  18 ++-
 net/ipv6/Kconfig                   |   1 +
 net/ipv6/seg6_iptunnel.c           |  72 +++++++--
 net/ipv6/seg6_local.c              | 314 ++++++++++++++++++++++++++++---------
 5 files changed, 317 insertions(+), 91 deletions(-)

-- 
2.10.2

^ permalink raw reply

* [PATCH net-next v2 4/5] ipv6: sr: add helper functions for seg6local
From: David Lebrun @ 2017-08-25  7:56 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170825075648.5061-1-david.lebrun@uclouvain.be>

This patch adds three helper functions to be used with the seg6local packet
processing actions.

The decap_and_validate() function will be used by the End.D* actions, that
decapsulate an SR-enabled packet.

The advance_nextseg() function applies the fundamental operations to update
an SRH for the next segment.

The lookup_nexthop() function helps select the next-hop for the processed
SR packets. It supports an optional next-hop address to route the packet
specifically through it, and an optional routing table to use.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 net/ipv6/Kconfig      |   1 +
 net/ipv6/seg6_local.c | 189 ++++++++++++++++++++++++++------------------------
 2 files changed, 101 insertions(+), 89 deletions(-)

diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 0d72239..ea71e4b 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -308,6 +308,7 @@ config IPV6_SEG6_LWTUNNEL
 	depends on IPV6
 	select LWTUNNEL
 	select DST_CACHE
+	select IPV6_MULTIPLE_TABLES
 	---help---
 	  Support for encapsulation of packets within an outer IPv6
 	  header and a Segment Routing Header using the lightweight
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index c626325..26db4d3e 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -99,23 +99,105 @@ static struct ipv6_sr_hdr *get_and_validate_srh(struct sk_buff *skb)
 	return srh;
 }
 
+static bool decap_and_validate(struct sk_buff *skb, int proto)
+{
+	struct ipv6_sr_hdr *srh;
+	unsigned int off = 0;
+
+	srh = get_srh(skb);
+	if (srh && srh->segments_left > 0)
+		return false;
+
+#ifdef CONFIG_IPV6_SEG6_HMAC
+	if (srh && !seg6_hmac_validate_skb(skb))
+		return false;
+#endif
+
+	if (ipv6_find_hdr(skb, &off, proto, NULL, NULL) < 0)
+		return false;
+
+	if (!pskb_pull(skb, off))
+		return false;
+
+	skb_postpull_rcsum(skb, skb_network_header(skb), off);
+
+	skb_reset_network_header(skb);
+	skb_reset_transport_header(skb);
+	skb->encapsulation = 0;
+
+	return true;
+}
+
+static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr)
+{
+	struct in6_addr *addr;
+
+	srh->segments_left--;
+	addr = srh->segments + srh->segments_left;
+	*daddr = *addr;
+}
+
+static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+			   u32 tbl_id)
+{
+	struct net *net = dev_net(skb->dev);
+	struct ipv6hdr *hdr = ipv6_hdr(skb);
+	int flags = RT6_LOOKUP_F_HAS_SADDR;
+	struct dst_entry *dst = NULL;
+	struct rt6_info *rt;
+	struct flowi6 fl6;
+
+	fl6.flowi6_iif = skb->dev->ifindex;
+	fl6.daddr = nhaddr ? *nhaddr : hdr->daddr;
+	fl6.saddr = hdr->saddr;
+	fl6.flowlabel = ip6_flowinfo(hdr);
+	fl6.flowi6_mark = skb->mark;
+	fl6.flowi6_proto = hdr->nexthdr;
+
+	if (nhaddr)
+		fl6.flowi6_flags = FLOWI_FLAG_KNOWN_NH;
+
+	if (!tbl_id) {
+		dst = ip6_route_input_lookup(net, skb->dev, &fl6, flags);
+	} else {
+		struct fib6_table *table;
+
+		table = fib6_get_table(net, tbl_id);
+		if (!table)
+			goto out;
+
+		rt = ip6_pol_route(net, table, 0, &fl6, flags);
+		dst = &rt->dst;
+	}
+
+	if (dst && dst->dev->flags & IFF_LOOPBACK && !dst->error) {
+		dst_release(dst);
+		dst = NULL;
+	}
+
+out:
+	if (!dst) {
+		rt = net->ipv6.ip6_blk_hole_entry;
+		dst = &rt->dst;
+		dst_hold(dst);
+	}
+
+	skb_dst_drop(skb);
+	skb_dst_set(skb, dst);
+}
+
 /* regular endpoint function */
 static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 {
 	struct ipv6_sr_hdr *srh;
-	struct in6_addr *addr;
 
 	srh = get_and_validate_srh(skb);
 	if (!srh)
 		goto drop;
 
-	srh->segments_left--;
-	addr = srh->segments + srh->segments_left;
-
-	ipv6_hdr(skb)->daddr = *addr;
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-	skb_dst_drop(skb);
-	ip6_route_input(skb);
+	lookup_nexthop(skb, NULL, 0);
 
 	return dst_input(skb);
 
@@ -127,41 +209,15 @@ static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 /* regular endpoint, and forward to specified nexthop */
 static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 {
-	struct net *net = dev_net(skb->dev);
 	struct ipv6_sr_hdr *srh;
-	struct dst_entry *dst;
-	struct in6_addr *addr;
-	struct ipv6hdr *hdr;
-	struct flowi6 fl6;
-	int flags;
 
 	srh = get_and_validate_srh(skb);
 	if (!srh)
 		goto drop;
 
-	srh->segments_left--;
-	addr = srh->segments + srh->segments_left;
-
-	hdr = ipv6_hdr(skb);
-	hdr->daddr = *addr;
-
-	skb_dst_drop(skb);
-
-	fl6.flowi6_iif = skb->dev->ifindex;
-	fl6.daddr = slwt->nh6;
-	fl6.saddr = hdr->saddr;
-	fl6.flowlabel = ip6_flowinfo(hdr);
-	fl6.flowi6_mark = skb->mark;
-	fl6.flowi6_proto = hdr->nexthdr;
-
-	flags = RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_IFACE |
-		RT6_LOOKUP_F_REACHABLE;
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-	dst = ip6_route_input_lookup(net, skb->dev, &fl6, flags);
-	if (dst->dev->flags & IFF_LOOPBACK)
-		goto drop;
-
-	skb_dst_set(skb, dst);
+	lookup_nexthop(skb, &slwt->nh6, 0);
 
 	return dst_input(skb);
 
@@ -174,42 +230,18 @@ static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 static int input_action_end_dx6(struct sk_buff *skb,
 				struct seg6_local_lwt *slwt)
 {
-	struct net *net = dev_net(skb->dev);
-	struct ipv6hdr *inner_hdr;
-	struct ipv6_sr_hdr *srh;
-	struct dst_entry *dst;
-	unsigned int off = 0;
-	struct flowi6 fl6;
-	bool use_nh;
-	int flags;
+	struct in6_addr *nhaddr = NULL;
 
 	/* this function accepts IPv6 encapsulated packets, with either
 	 * an SRH with SL=0, or no SRH.
 	 */
 
-	srh = get_srh(skb);
-	if (srh && srh->segments_left > 0)
-		goto drop;
-
-#ifdef CONFIG_IPV6_SEG6_HMAC
-	if (srh && !seg6_hmac_validate_skb(skb))
+	if (!decap_and_validate(skb, IPPROTO_IPV6))
 		goto drop;
-#endif
 
-	if (ipv6_find_hdr(skb, &off, IPPROTO_IPV6, NULL, NULL) < 0)
-		goto drop;
-
-	if (!pskb_pull(skb, off))
+	if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
 		goto drop;
 
-	skb_postpull_rcsum(skb, skb_network_header(skb), off);
-
-	skb_reset_network_header(skb);
-	skb_reset_transport_header(skb);
-	skb->encapsulation = 0;
-
-	inner_hdr = ipv6_hdr(skb);
-
 	/* The inner packet is not associated to any local interface,
 	 * so we do not call netif_rx().
 	 *
@@ -217,26 +249,10 @@ static int input_action_end_dx6(struct sk_buff *skb,
 	 * inner packet's DA. Otherwise, use the specified nexthop.
 	 */
 
-	use_nh = !ipv6_addr_any(&slwt->nh6);
+	if (!ipv6_addr_any(&slwt->nh6))
+		nhaddr = &slwt->nh6;
 
-	skb_dst_drop(skb);
-
-	fl6.flowi6_iif = skb->dev->ifindex;
-	fl6.daddr = use_nh ? slwt->nh6 : inner_hdr->daddr;
-	fl6.saddr = inner_hdr->saddr;
-	fl6.flowlabel = ip6_flowinfo(inner_hdr);
-	fl6.flowi6_mark = skb->mark;
-	fl6.flowi6_proto = inner_hdr->nexthdr;
-
-	flags = RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_REACHABLE;
-	if (use_nh)
-		flags |= RT6_LOOKUP_F_IFACE;
-
-	dst = ip6_route_input_lookup(net, skb->dev, &fl6, flags);
-	if (dst->dev->flags & IFF_LOOPBACK)
-		goto drop;
-
-	skb_dst_set(skb, dst);
+	lookup_nexthop(skb, nhaddr, 0);
 
 	return dst_input(skb);
 drop:
@@ -261,8 +277,7 @@ static int input_action_end_b6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
 	skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
-	skb_dst_drop(skb);
-	ip6_route_input(skb);
+	lookup_nexthop(skb, NULL, 0);
 
 	return dst_input(skb);
 
@@ -276,16 +291,13 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
 				     struct seg6_local_lwt *slwt)
 {
 	struct ipv6_sr_hdr *srh;
-	struct in6_addr *addr;
 	int err = -EINVAL;
 
 	srh = get_and_validate_srh(skb);
 	if (!srh)
 		goto drop;
 
-	srh->segments_left--;
-	addr = srh->segments + srh->segments_left;
-	ipv6_hdr(skb)->daddr = *addr;
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
 	skb_reset_inner_headers(skb);
 	skb->encapsulation = 1;
@@ -297,8 +309,7 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
 	skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
-	skb_dst_drop(skb);
-	ip6_route_input(skb);
+	lookup_nexthop(skb, NULL, 0);
 
 	return dst_input(skb);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next v2 3/5] ipv6: sr: enforce IPv6 packets for seg6local lwt
From: David Lebrun @ 2017-08-25  7:56 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170825075648.5061-1-david.lebrun@uclouvain.be>

This patch ensures that the seg6local lightweight tunnel is used solely
with IPv6 routes and processes only IPv6 packets.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 net/ipv6/seg6_local.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 609b94e..c626325 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -357,6 +357,11 @@ static int seg6_local_input(struct sk_buff *skb)
 	struct seg6_action_desc *desc;
 	struct seg6_local_lwt *slwt;
 
+	if (skb->protocol != htons(ETH_P_IPV6)) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
 	slwt = seg6_local_lwtunnel(orig_dst->lwtstate);
 	desc = slwt->desc;
 
@@ -623,6 +628,9 @@ static int seg6_local_build_state(struct nlattr *nla, unsigned int family,
 	struct seg6_local_lwt *slwt;
 	int err;
 
+	if (family != AF_INET6)
+		return -EINVAL;
+
 	err = nla_parse_nested(tb, SEG6_LOCAL_MAX, nla, seg6_local_policy,
 			       extack);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next v2 1/5] ipv6: sr: add support for ip4ip6 encapsulation
From: David Lebrun @ 2017-08-25  7:56 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170825075648.5061-1-david.lebrun@uclouvain.be>

This patch enables the SRv6 encapsulation mode to carry an IPv4 payload.
All the infrastructure was already present, I just had to add a parameter
to seg6_do_srh_encap() to specify the inner packet protocol, and perform
some additional checks.

Usage example:
ip route add 1.2.3.4 encap seg6 mode encap segs fc00::1,fc00::2 dev eth0

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 include/net/seg6.h       |  3 ++-
 net/ipv6/seg6_iptunnel.c | 47 +++++++++++++++++++++++++++++++++++++----------
 net/ipv6/seg6_local.c    |  2 +-
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 5379f55..099bad5 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -60,7 +60,8 @@ extern int seg6_local_init(void);
 extern void seg6_local_exit(void);
 
 extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len);
-extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
+extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
+			     int proto);
 extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
 
 #endif
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 5012330..5bec781 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -91,7 +91,7 @@ static void set_tun_src(struct net *net, struct net_device *dev,
 }
 
 /* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
-int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
 {
 	struct net *net = dev_net(skb_dst(skb)->dev);
 	struct ipv6hdr *hdr, *inner_hdr;
@@ -116,15 +116,22 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
 	 * hlim will be decremented in ip6_forward() afterwards and
 	 * decapsulation will overwrite inner hlim with outer hlim
 	 */
-	ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)),
-		     ip6_flowlabel(inner_hdr));
-	hdr->hop_limit = inner_hdr->hop_limit;
+
+	if (skb->protocol == htons(ETH_P_IPV6)) {
+		ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)),
+			     ip6_flowlabel(inner_hdr));
+		hdr->hop_limit = inner_hdr->hop_limit;
+	} else {
+		ip6_flow_hdr(hdr, 0, 0);
+		hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb));
+	}
+
 	hdr->nexthdr = NEXTHDR_ROUTING;
 
 	isrh = (void *)hdr + sizeof(*hdr);
 	memcpy(isrh, osrh, hdrlen);
 
-	isrh->nexthdr = NEXTHDR_IPV6;
+	isrh->nexthdr = proto;
 
 	hdr->daddr = isrh->segments[isrh->first_segment];
 	set_tun_src(net, skb->dev, &hdr->daddr, &hdr->saddr);
@@ -199,7 +206,7 @@ static int seg6_do_srh(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
 	struct seg6_iptunnel_encap *tinfo;
-	int err = 0;
+	int proto, err = 0;
 
 	tinfo = seg6_encap_lwtunnel(dst->lwtstate);
 
@@ -210,17 +217,31 @@ static int seg6_do_srh(struct sk_buff *skb)
 
 	switch (tinfo->mode) {
 	case SEG6_IPTUN_MODE_INLINE:
+		if (skb->protocol != htons(ETH_P_IPV6))
+			return -EINVAL;
+
 		err = seg6_do_srh_inline(skb, tinfo->srh);
+		if (err)
+			return err;
+
 		skb_reset_inner_headers(skb);
 		break;
 	case SEG6_IPTUN_MODE_ENCAP:
-		err = seg6_do_srh_encap(skb, tinfo->srh);
+		if (skb->protocol == htons(ETH_P_IPV6))
+			proto = IPPROTO_IPV6;
+		else if (skb->protocol == htons(ETH_P_IP))
+			proto = IPPROTO_IPIP;
+		else
+			return -EINVAL;
+
+		err = seg6_do_srh_encap(skb, tinfo->srh, proto);
+		if (err)
+			return err;
+
+		skb->protocol = htons(ETH_P_IPV6);
 		break;
 	}
 
-	if (err)
-		return err;
-
 	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
 	skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
@@ -334,6 +355,9 @@ static int seg6_build_state(struct nlattr *nla,
 	struct seg6_lwt *slwt;
 	int err;
 
+	if (family != AF_INET && family != AF_INET6)
+		return -EINVAL;
+
 	err = nla_parse_nested(tb, SEG6_IPTUNNEL_MAX, nla,
 			       seg6_iptunnel_policy, extack);
 
@@ -356,6 +380,9 @@ static int seg6_build_state(struct nlattr *nla,
 
 	switch (tuninfo->mode) {
 	case SEG6_IPTUN_MODE_INLINE:
+		if (family != AF_INET6)
+			return -EINVAL;
+
 		break;
 	case SEG6_IPTUN_MODE_ENCAP:
 		break;
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 147680e..609b94e 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -290,7 +290,7 @@ static int input_action_end_b6_encap(struct sk_buff *skb,
 	skb_reset_inner_headers(skb);
 	skb->encapsulation = 1;
 
-	err = seg6_do_srh_encap(skb, slwt->srh);
+	err = seg6_do_srh_encap(skb, slwt->srh, IPPROTO_IPV6);
 	if (err)
 		goto drop;
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next v2 5/5] ipv6: sr: implement additional seg6local actions
From: David Lebrun @ 2017-08-25  7:58 UTC (permalink / raw)
  To: netdev; +Cc: David Lebrun
In-Reply-To: <20170825075648.5061-1-david.lebrun@uclouvain.be>

This patch implements the following seg6local actions.

- SEG6_LOCAL_ACTION_END_T: regular SRH processing and forward to the
  next-hop looked up in the specified routing table.

- SEG6_LOCAL_ACTION_END_DX2: decapsulate an L2 frame and forward it to
  the specified network interface.

- SEG6_LOCAL_ACTION_END_DX4: decapsulate an IPv4 packet and forward it,
  possibly to the specified next-hop.

- SEG6_LOCAL_ACTION_END_DT6: decapsulate an IPv6 packet and forward it
  to the next-hop looked up in the specified routing table.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
---
 net/ipv6/seg6_local.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 26db4d3e..9c1a885 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -30,6 +30,7 @@
 #ifdef CONFIG_IPV6_SEG6_HMAC
 #include <net/seg6_hmac.h>
 #endif
+#include <linux/etherdevice.h>
 
 struct seg6_local_lwt;
 
@@ -226,6 +227,82 @@ static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 	return -EINVAL;
 }
 
+static int input_action_end_t(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+	struct ipv6_sr_hdr *srh;
+
+	srh = get_and_validate_srh(skb);
+	if (!srh)
+		goto drop;
+
+	advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
+
+	lookup_nexthop(skb, NULL, slwt->table);
+
+	return dst_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
+/* decapsulate and forward inner L2 frame on specified interface */
+static int input_action_end_dx2(struct sk_buff *skb,
+				struct seg6_local_lwt *slwt)
+{
+	struct net *net = dev_net(skb->dev);
+	struct net_device *odev;
+	struct ethhdr *eth;
+
+	if (!decap_and_validate(skb, NEXTHDR_NONE))
+		goto drop;
+
+	if (!pskb_may_pull(skb, ETH_HLEN))
+		goto drop;
+
+	skb_reset_mac_header(skb);
+	eth = (struct ethhdr *)skb->data;
+
+	/* To determine the frame's protocol, we assume it is 802.3. This avoids
+	 * a call to eth_type_trans(), which is not really relevant for our
+	 * use case.
+	 */
+	if (!eth_proto_is_802_3(eth->h_proto))
+		goto drop;
+
+	odev = dev_get_by_index_rcu(net, slwt->oif);
+	if (!odev)
+		goto drop;
+
+	/* As we accept Ethernet frames, make sure the egress device is of
+	 * the correct type.
+	 */
+	if (odev->type != ARPHRD_ETHER)
+		goto drop;
+
+	if (!(odev->flags & IFF_UP) || !netif_carrier_ok(odev))
+		goto drop;
+
+	skb_orphan(skb);
+
+	if (skb_warn_if_lro(skb))
+		goto drop;
+
+	skb_forward_csum(skb);
+
+	if (skb->len - ETH_HLEN > odev->mtu)
+		goto drop;
+
+	skb->dev = odev;
+	skb->protocol = eth->h_proto;
+
+	return dev_queue_xmit(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
 /* decapsulate and forward to specified nexthop */
 static int input_action_end_dx6(struct sk_buff *skb,
 				struct seg6_local_lwt *slwt)
@@ -260,6 +337,56 @@ static int input_action_end_dx6(struct sk_buff *skb,
 	return -EINVAL;
 }
 
+static int input_action_end_dx4(struct sk_buff *skb,
+				struct seg6_local_lwt *slwt)
+{
+	struct iphdr *iph;
+	__be32 nhaddr;
+	int err;
+
+	if (!decap_and_validate(skb, IPPROTO_IPIP))
+		goto drop;
+
+	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+		goto drop;
+
+	skb->protocol = htons(ETH_P_IP);
+
+	iph = ip_hdr(skb);
+
+	nhaddr = slwt->nh4.s_addr ?: iph->daddr;
+
+	skb_dst_drop(skb);
+
+	err = ip_route_input(skb, nhaddr, iph->saddr, 0, skb->dev);
+	if (err)
+		goto drop;
+
+	return dst_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
+static int input_action_end_dt6(struct sk_buff *skb,
+				struct seg6_local_lwt *slwt)
+{
+	if (!decap_and_validate(skb, IPPROTO_IPV6))
+		goto drop;
+
+	if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+		goto drop;
+
+	lookup_nexthop(skb, NULL, slwt->table);
+
+	return dst_input(skb);
+
+drop:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+
 /* push an SRH on top of the current one */
 static int input_action_end_b6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
 {
@@ -330,11 +457,31 @@ static struct seg6_action_desc seg6_action_table[] = {
 		.input		= input_action_end_x,
 	},
 	{
+		.action		= SEG6_LOCAL_ACTION_END_T,
+		.attrs		= (1 << SEG6_LOCAL_TABLE),
+		.input		= input_action_end_t,
+	},
+	{
+		.action		= SEG6_LOCAL_ACTION_END_DX2,
+		.attrs		= (1 << SEG6_LOCAL_OIF),
+		.input		= input_action_end_dx2,
+	},
+	{
 		.action		= SEG6_LOCAL_ACTION_END_DX6,
 		.attrs		= (1 << SEG6_LOCAL_NH6),
 		.input		= input_action_end_dx6,
 	},
 	{
+		.action		= SEG6_LOCAL_ACTION_END_DX4,
+		.attrs		= (1 << SEG6_LOCAL_NH4),
+		.input		= input_action_end_dx4,
+	},
+	{
+		.action		= SEG6_LOCAL_ACTION_END_DT6,
+		.attrs		= (1 << SEG6_LOCAL_TABLE),
+		.input		= input_action_end_dt6,
+	},
+	{
 		.action		= SEG6_LOCAL_ACTION_END_B6,
 		.attrs		= (1 << SEG6_LOCAL_SRH),
 		.input		= input_action_end_b6,
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH net-next v7 01/10] selftest: Enhance kselftest_harness.h with a step mechanism
From: Mickaël Salaün @ 2017-08-25  7:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Shuah Khan
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Tejun Heo, Thomas Graf, Will Drewry
In-Reply-To: <20170824023105.gt6lyn2snhczryik@ast-mbp>


[-- Attachment #1.1: Type: text/plain, Size: 1563 bytes --]



On 24/08/2017 04:31, Alexei Starovoitov wrote:
> On Mon, Aug 21, 2017 at 02:09:24AM +0200, Mickaël Salaün wrote:
>> This step mechanism may be useful to return an information about the
>> error without being able to write to TH_LOG_STREAM.
>>
>> Set _metadata->no_print to true to print this counter.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Shuah Khan <shuah@kernel.org>
>> Cc: Will Drewry <wad@chromium.org>
>> Link: https://lkml.kernel.org/r/CAGXu5j+D-FP8Kt9unNOqKrQJP4DYTpmgkJxWykZyrYiVPz3Y3Q@mail.gmail.com
>> ---
>>
>> This patch is intended to the kselftest tree:
>> https://lkml.kernel.org/r/20170806232337.4191-1-mic@digikod.net
>>
>> Changes since v6:
>> * add the step counter in assert/expect macros and use _metadata to
>>   enable the counter (suggested by Kees Cook)
>> ---
>>  tools/testing/selftests/kselftest_harness.h   | 31 ++++++++++++++++++++++-----
>>  tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
>>  2 files changed, 27 insertions(+), 6 deletions(-)
> 
> is there a dependency on this in patches 2+ ?
> if not, I would send this patch to selftests right away.
> 
> 

The Landlock tests [patch 9/10] rely on it for now.

I sent it three weeks ago:
https://lkml.kernel.org/r/20170806232337.4191-1-mic@digikod.net

Anyway, until this patch is merged in the kselftest tree and then
available to net-next, I'll have to keep it here.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v7 03/10] bpf,landlock: Define an eBPF program type for a Landlock rule
From: Mickaël Salaün @ 2017-08-25  8:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf
In-Reply-To: <20170824022853.qbgiubmslaaeclxf@ast-mbp>


[-- Attachment #1.1: Type: text/plain, Size: 9633 bytes --]



On 24/08/2017 04:28, Alexei Starovoitov wrote:
> On Mon, Aug 21, 2017 at 02:09:26AM +0200, Mickaël Salaün wrote:
>> Add a new type of eBPF program used by Landlock rules.
>>
>> This new BPF program type will be registered with the Landlock LSM
>> initialization.
>>
>> Add an initial Landlock Kconfig.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: James Morris <james.l.morris@oracle.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Serge E. Hallyn <serge@hallyn.com>
>> ---
>>
>> Changes since v6:
>> * add 3 more sub-events: IOCTL, LOCK, FCNTL
>>   https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35571@digikod.net
>> * rename LANDLOCK_VERSION to LANDLOCK_ABI to better reflect its purpose,
>>   and move it from landlock.h to common.h
>> * rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE: an eBPF
>>   program could be used for something else than a rule
>> * simplify struct landlock_context by removing the arch and syscall_nr fields
>> * remove all eBPF map functions call, remove ABILITY_WRITE
>> * refactor bpf_landlock_func_proto() (suggested by Kees Cook)
>> * constify pointers
>> * fix doc inclusion
>>
>> Changes since v5:
>> * rename file hooks.c to init.c
>> * fix spelling
>>
>> Changes since v4:
>> * merge a minimal (not enabled) LSM code and Kconfig in this commit
>>
>> Changes since v3:
>> * split commit
>> * revamp the landlock_context:
>>   * add arch, syscall_nr and syscall_cmd (ioctl, fcntl…) to be able to
>>     cross-check action with the event type
>>   * replace args array with dedicated fields to ease the addition of new
>>     fields
>> ---
>>  include/linux/bpf_types.h      |  3 ++
>>  include/uapi/linux/bpf.h       | 97 +++++++++++++++++++++++++++++++++++++++++
>>  security/Kconfig               |  1 +
>>  security/Makefile              |  2 +
>>  security/landlock/Kconfig      | 18 ++++++++
>>  security/landlock/Makefile     |  3 ++
>>  security/landlock/common.h     | 21 +++++++++
>>  security/landlock/init.c       | 98 ++++++++++++++++++++++++++++++++++++++++++
>>  tools/include/uapi/linux/bpf.h | 97 +++++++++++++++++++++++++++++++++++++++++
>>  9 files changed, 340 insertions(+)
>>  create mode 100644 security/landlock/Kconfig
>>  create mode 100644 security/landlock/Makefile
>>  create mode 100644 security/landlock/common.h
>>  create mode 100644 security/landlock/init.c
>>
>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>> index 6f1a567667b8..8bac93970a47 100644
>> --- a/include/linux/bpf_types.h
>> +++ b/include/linux/bpf_types.h
>> @@ -18,6 +18,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe_prog_ops)
>>  BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint_prog_ops)
>>  BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event_prog_ops)
>>  #endif
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +BPF_PROG_TYPE(BPF_PROG_TYPE_LANDLOCK_RULE, bpf_landlock_ops)
>> +#endif
>>  
>>  BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
>>  BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 8541ab85e432..20da634da941 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -129,6 +129,7 @@ enum bpf_prog_type {
>>  	BPF_PROG_TYPE_LWT_XMIT,
>>  	BPF_PROG_TYPE_SOCK_OPS,
>>  	BPF_PROG_TYPE_SK_SKB,
>> +	BPF_PROG_TYPE_LANDLOCK_RULE,
>>  };
>>  
>>  enum bpf_attach_type {
>> @@ -879,4 +880,100 @@ enum {
>>  #define TCP_BPF_IW		1001	/* Set TCP initial congestion window */
>>  #define TCP_BPF_SNDCWND_CLAMP	1002	/* Set sndcwnd_clamp */
>>  
>> +/**
>> + * enum landlock_subtype_event - event occurring when an action is performed on
>> + * a particular kernel object
>> + *
>> + * An event is a policy decision point which exposes the same context type
>> + * (especially the same arg[0-9] field types) for each rule execution.
>> + *
>> + * @LANDLOCK_SUBTYPE_EVENT_UNSPEC: invalid value
>> + * @LANDLOCK_SUBTYPE_EVENT_FS: generic filesystem event
>> + * @LANDLOCK_SUBTYPE_EVENT_FS_IOCTL: custom IOCTL sub-event
>> + * @LANDLOCK_SUBTYPE_EVENT_FS_LOCK: custom LOCK sub-event
>> + * @LANDLOCK_SUBTYPE_EVENT_FS_FCNTL: custom FCNTL sub-event
>> + */
>> +enum landlock_subtype_event {
>> +	LANDLOCK_SUBTYPE_EVENT_UNSPEC,
>> +	LANDLOCK_SUBTYPE_EVENT_FS,
>> +	LANDLOCK_SUBTYPE_EVENT_FS_IOCTL,
>> +	LANDLOCK_SUBTYPE_EVENT_FS_LOCK,
>> +	LANDLOCK_SUBTYPE_EVENT_FS_FCNTL,
>> +};
>> +#define _LANDLOCK_SUBTYPE_EVENT_LAST LANDLOCK_SUBTYPE_EVENT_FS_FCNTL
>> +
>> +/**
>> + * DOC: landlock_subtype_ability
>> + *
>> + * eBPF context and functions allowed for a rule
>> + *
>> + * - LANDLOCK_SUBTYPE_ABILITY_DEBUG: allows to do debug actions (e.g. writing
>> + *   logs), which may be dangerous and should only be used for rule testing
>> + */
>> +#define LANDLOCK_SUBTYPE_ABILITY_DEBUG		(1ULL << 0)
>> +#define _LANDLOCK_SUBTYPE_ABILITY_NB		1
>> +#define _LANDLOCK_SUBTYPE_ABILITY_MASK		((1ULL << _LANDLOCK_SUBTYPE_ABILITY_NB) - 1)
> 
> can you move the last two macros out of uapi?
> There is really no need for the implementation details to
> leak into uapi.

Right, I'll move them.

> 
>> +
>> +/*
>> + * Future options for a Landlock rule (e.g. run even if a previous rule denied
>> + * an action).
>> + */
>> +#define _LANDLOCK_SUBTYPE_OPTION_NB		0
>> +#define _LANDLOCK_SUBTYPE_OPTION_MASK		((1ULL << _LANDLOCK_SUBTYPE_OPTION_NB) - 1)
> 
> same here
> 
>> +
>> +/*
>> + * Status visible in the @status field of a context (e.g. already called in
>> + * this syscall session, with same args...).
>> + *
>> + * The @status field exposed to a rule shall depend on the rule version.
>> + */
>> +#define _LANDLOCK_SUBTYPE_STATUS_NB		0
>> +#define _LANDLOCK_SUBTYPE_STATUS_MASK		((1ULL << _LANDLOCK_SUBTYPE_STATUS_NB) - 1)
> 
> and here
> 
>> +
>> +/**
>> + * DOC: landlock_action_fs
>> + *
>> + * - %LANDLOCK_ACTION_FS_EXEC: execute a file or walk through a directory
>> + * - %LANDLOCK_ACTION_FS_WRITE: modify a file or a directory view (which
>> + *   include mount actions)
>> + * - %LANDLOCK_ACTION_FS_READ: read a file or a directory
>> + * - %LANDLOCK_ACTION_FS_NEW: create a file or a directory
>> + * - %LANDLOCK_ACTION_FS_GET: open or receive a file
>> + * - %LANDLOCK_ACTION_FS_REMOVE: unlink a file or remove a directory
>> + *
>> + * Each of the following actions are specific to syscall multiplexers. Each of
>> + * them trigger a dedicated Landlock event where their command can be read.
>> + *
>> + * - %LANDLOCK_ACTION_FS_IOCTL: ioctl command
>> + * - %LANDLOCK_ACTION_FS_LOCK: flock or fcntl lock command
>> + * - %LANDLOCK_ACTION_FS_FCNTL: fcntl command
>> + */
>> +#define LANDLOCK_ACTION_FS_EXEC			(1ULL << 0)
>> +#define LANDLOCK_ACTION_FS_WRITE		(1ULL << 1)
>> +#define LANDLOCK_ACTION_FS_READ			(1ULL << 2)
>> +#define LANDLOCK_ACTION_FS_NEW			(1ULL << 3)
>> +#define LANDLOCK_ACTION_FS_GET			(1ULL << 4)
>> +#define LANDLOCK_ACTION_FS_REMOVE		(1ULL << 5)
>> +#define LANDLOCK_ACTION_FS_IOCTL		(1ULL << 6)
>> +#define LANDLOCK_ACTION_FS_LOCK			(1ULL << 7)
>> +#define LANDLOCK_ACTION_FS_FCNTL		(1ULL << 8)
>> +#define _LANDLOCK_ACTION_FS_NB			9
>> +#define _LANDLOCK_ACTION_FS_MASK		((1ULL << _LANDLOCK_ACTION_FS_NB) - 1)
> 
> and here
> 
>> +
>> +
>> +/**
>> + * struct landlock_context - context accessible to a Landlock rule
>> + *
>> + * @status: bitfield for future use (LANDLOCK_SUBTYPE_STATUS_*)
>> + * @event: event type (&enum landlock_subtype_event)
>> + * @arg1: event's first optional argument
>> + * @arg2: event's second optional argument
>> + */
>> +struct landlock_context {
>> +	__u64 status;
>> +	__u64 event;
>> +	__u64 arg1;
>> +	__u64 arg2;
>> +};
> 
> looking at all the uapi additions.. probably worth moving them
> into separate .h like we did with bpf_perf_event.h
> How about include/uapi/linux/landlock.h ?
> It can include bpf.h first thing and then the rest of
> landlock related bits.
> so all, but BPF_PROG_TYPE_LANDLOCK_RULE will go there.

Sounds good.


> 
>> +++ b/security/landlock/Kconfig
>> @@ -0,0 +1,18 @@
>> +config SECURITY_LANDLOCK
>> +	bool "Landlock sandbox support"
>> +	depends on SECURITY
>> +	depends on BPF_SYSCALL
>> +	depends on SECCOMP_FILTER
>> +	default y
> 
> all new features need to start with default n

OK

> 
>> +static inline const struct bpf_func_proto *bpf_landlock_func_proto(
>> +		enum bpf_func_id func_id,
>> +		const union bpf_prog_subtype *prog_subtype)
>> +{
>> +	/* generic functions */
>> +	if (prog_subtype->landlock_rule.ability &
>> +			LANDLOCK_SUBTYPE_ABILITY_DEBUG) {
>> +		switch (func_id) {
>> +		case BPF_FUNC_get_current_comm:
>> +			return &bpf_get_current_comm_proto;
>> +		case BPF_FUNC_get_current_pid_tgid:
>> +			return &bpf_get_current_pid_tgid_proto;
>> +		case BPF_FUNC_get_current_uid_gid:
>> +			return &bpf_get_current_uid_gid_proto;
> 
> why current_*() helpers are 'debug' only?

This helpers should not be needed for access control. It sounds like a
really bad idea to rely on a command name, a PID or even the UID,
especially for a kind of hierarchy-based access control like Landlock. I
want to avoid people from using these features except for debug
purposes. This could be changed in the future if there is a legitimate
use case of some of these features, though.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v7 05/10] landlock: Add LSM hooks related to filesystem
From: Mickaël Salaün @ 2017-08-25  8:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf
In-Reply-To: <20170824025030.sxl2hkpcbzipb47y@ast-mbp>


[-- Attachment #1.1: Type: text/plain, Size: 2700 bytes --]


On 24/08/2017 04:50, Alexei Starovoitov wrote:
> On Mon, Aug 21, 2017 at 02:09:28AM +0200, Mickaël Salaün wrote:
>> Handle 33 filesystem-related LSM hooks for the Landlock filesystem
>> event: LANDLOCK_SUBTYPE_EVENT_FS.
>>
>> A Landlock event wrap LSM hooks for similar kernel object types (e.g.
>> struct file, struct path...). Multiple LSM hooks can trigger the same
>> Landlock event.
>>
>> Landlock handle nine coarse-grained actions: read, write, execute, new,
>> get, remove, ioctl, lock and fcntl. Each of them abstract LSM hook
>> access control in a way that can be extended in the future.
>>
>> The Landlock LSM hook registration is done after other LSM to only run
>> actions from user-space, via eBPF programs, if the access was granted by
>> major (privileged) LSMs.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> 
> ...
> 
>> +/* WRAP_ARG_SB */
>> +#define WRAP_ARG_SB_TYPE	WRAP_TYPE_FS
>> +#define WRAP_ARG_SB_DEC(arg)					\
>> +	EXPAND_C(WRAP_TYPE_FS) wrap_##arg =			\
>> +	{ .type = BPF_HANDLE_FS_TYPE_DENTRY, .dentry = arg->s_root };
>> +#define WRAP_ARG_SB_VAL(arg)	((uintptr_t)&wrap_##arg)
>> +#define WRAP_ARG_SB_OK(arg)	(arg && arg->s_root)
> ...
> 
>> +HOOK_NEW_FS(sb_remount, 2,
>> +	struct super_block *, sb,
>> +	void *, data,
>> +	WRAP_ARG_SB, sb,
>> +	WRAP_ARG_RAW, LANDLOCK_ACTION_FS_WRITE
>> +);
> 
> this looks wrong. casting super_block to dentry?

This is called when remounting a block device. The WRAP_ARG_SB take the
sb->s_root as a dentry, it is not a cast. What do you expect from this hook?

> 
>> +/* a directory inode contains only one dentry */
>> +HOOK_NEW_FS(inode_create, 3,
>> +	struct inode *, dir,
>> +	struct dentry *, dentry,
>> +	umode_t, mode,
>> +	WRAP_ARG_INODE, dir,
>> +	WRAP_ARG_RAW, LANDLOCK_ACTION_FS_WRITE
>> +);
> 
> more general question: why you're not wrapping all useful
> arguments? Like in the above dentry can be acted upon
> by the landlock rule and it's readily available...

The context used for the FS event must have the exact same types for all
calls. This event is meant to be generic but we can add more specific
ones if needed, like I do with FS_IOCTL.

The idea is to enable people to write simple rules, while being able to
write fine grain rules for special cases (e.g. IOCTL) if needed.

> 
> The limitation of only 2 args looks odd.
> Is it a hard limitation ? how hard to extend?

It's not a hard limit at all. Actually, the FS_FNCTL event should have
three arguments (I'll add them in the next series): FS handle, FCNTL
command and FCNTL argument. I made sure that it's really easy to add
more arguments to the context of an event.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v7 08/10] bpf: Add a Landlock sandbox example
From: Mickaël Salaün @ 2017-08-25  8:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf
In-Reply-To: <20170824025901.cpppy4nn5xv2ao24@ast-mbp>


[-- Attachment #1.1: Type: text/plain, Size: 3333 bytes --]



On 24/08/2017 04:59, Alexei Starovoitov wrote:
> On Mon, Aug 21, 2017 at 02:09:31AM +0200, Mickaël Salaün wrote:
>> Add a basic sandbox tool to create a process isolated from some part of
>> the system. This sandbox create a read-only environment. It is only
>> allowed to write to a character device such as a TTY:
>>
>>   # :> X
>>   # echo $?
>>   0
>>   # ./samples/bpf/landlock1 /bin/sh -i
>>   Launching a new sandboxed process.
>>   # :> Y
>>   cannot create Y: Operation not permitted
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> 
> ...
> 
>> +SEC("landlock1")
>> +static int landlock_fs_prog1(struct landlock_context *ctx)
>> +{
>> +	char fmt_error_mode[] = "landlock1: error: get_mode:%lld\n";
>> +	char fmt_error_access[] = "landlock1: error: access denied\n";
>> +	long long ret;
>> +
>> +	/*
>> +	 * The argument ctx->arg2 contains bitflags of actions for which the
>> +	 * rule is run.  The flag LANDLOCK_ACTION_FS_WRITE means that a write
>> +	 * is requested by one of the userspace processes restricted by this
>> +	 * rule. The following test allows any actions which does not include a
>> +	 * write.
>> +	 */
>> +	if (!(ctx->arg2 & LANDLOCK_ACTION_FS_WRITE))
>> +		return 0;
>> +
>> +	/*
>> +	 * The argument ctx->arg1 is a file handle for which the process want
>> +	 * to access. The function bpf_handle_fs_get_mode() return the mode of
>> +	 * a file (e.g. S_IFBLK, S_IFDIR, S_IFREG...). If there is an error,
>> +	 * for example if the argument is not a file handle, then an
>> +	 * -errno value is returned. Otherwise the caller get the file mode as
>> +	 *  with stat(2).
>> +	 */
>> +	ret = bpf_handle_fs_get_mode((void *)ctx->arg1);
>> +	if (ret < 0) {
>> +
>> +		/*
>> +		 * The bpf_trace_printk() function enable to write in the
>> +		 * kernel eBPF debug log, accessible through
>> +		 * /sys/kernel/debug/tracing/trace_pipe . To be allowed to call
>> +		 * this function, a Landlock rule must have the
>> +		 * LANDLOCK_SUBTYPE_ABILITY_DEBUG ability, which is only
>> +		 * allowed for CAP_SYS_ADMIN.
>> +		 */
>> +		bpf_trace_printk(fmt_error_mode, sizeof(fmt_error_mode), ret);
>> +		return 1;
>> +	}
>> +
>> +	/*
>> +	 * This check allows the action on the file if it is a directory or a
>> +	 * pipe. Otherwise, a message is printed to the eBPF log.
>> +	 */
>> +	if (S_ISCHR(ret) || S_ISFIFO(ret))
>> +		return 0;
>> +	bpf_trace_printk(fmt_error_access, sizeof(fmt_error_access));
>> +	return 1;
>> +}
>> +
>> +/*
>> + * This subtype enable to set the ABI, which ensure that the eBPF context and
>> + * program behavior will be compatible with this Landlock rule.
>> + */
>> +SEC("subtype")
>> +static const union bpf_prog_subtype _subtype = {
>> +	.landlock_rule = {
>> +		.abi = 1,
>> +		.event = LANDLOCK_SUBTYPE_EVENT_FS,
>> +		.ability = LANDLOCK_SUBTYPE_ABILITY_DEBUG,
>> +	}
>> +};
> 
> from rule writer perspective can you somehow merge subtype definition
> with the program? It seems they go hand in hand.
> Like section name of the program can be:
> SEC("landlock_rule1/event=fs/ability=debug")
> static int landlock_fs_prog1(struct landlock_context *ctx)...
> and the loader can parse this string and prepare appropriate
> data structures for the kernel.

Right, I'll try that.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [EXT] Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Russell King - ARM Linux @ 2017-08-25  8:19 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: Stefan Chulski, Andrew Lunn, davem@davemloft.net, kishon@ti.com,
	jason@lakedaemon.net, sebastian.hesselbarth@gmail.com,
	gregory.clement@free-electrons.com,
	thomas.petazzoni@free-electrons.com, Nadav Haklai,
	linux-kernel@vger.kernel.org, mw@semihalf.com,
	miquel.raynal@free-electrons.com, netdev@vger.kernel.org
In-Reply-To: <20170824171418.GI28443@kwain>

On Thu, Aug 24, 2017 at 07:14:18PM +0200, Antoine Tenart wrote:
> On Thu, Aug 24, 2017 at 05:08:29PM +0000, Stefan Chulski wrote:
> > > > Imagine phylib is using the copper Ethernet PHY, but the MAC is using
> > > > the SFP port. Somebody pulls out the copper cable, phylib says the
> > > > link is down, turns the carrier off and calls the callback. Not good,
> > > > since your SFP cable is still plugged in...  Ethtool is
> > > > returning/setting stuff in the Copper Ethernet PHY, when in fact you
> > > > intend to be setting SFP settings.
> > > 
> > > I see what could be the issue but I do not understand one aspect though:
> > > how could we switch from one PHY to another, as there's only one output
> > > between the SoC (and so a given GoP#) and the board. So if a given PHY can
> > > handle multiple modes I see, but in the other case a muxing somewhere would
> > > be needed? Or did I miss something?
> > 
> > I think PHY name and PHY mode struct that describe here both MAC to
> > PHY and PHY to PHY connection create confusion...  Serdes IP lane
> > doesn't care if connector is SFP, RJ45 or direct attached cable.
> > mvpp22_comphy_init only configures MAC to PHY
> > connection. SFI for 10G(KR in mainline), SGMII for 1G and HS_SGMII for
> > 2.5G.
> 
> So maybe one confusion was to name them PHY_MODE_10GKR and
> PHY_MODE_SGMII. It could be PHY_MODE_10G and PHY_MODE_1G instead.

SGMII mode supports 100M and 10M as well using data repetition, so 1G
makes it look like those speeds are not supported.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* [PATCH] hinic: skb_pad() frees on error
From: Dan Carpenter @ 2017-08-25  8:24 UTC (permalink / raw)
  To: Aviad Krawczyk; +Cc: netdev, kernel-janitors

The skb_pad() function frees the skb on error, so this code has a double
free.

Fixes: 00e57a6d4ad3 ("net-next/hinic: Add Tx operation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
index 5bf6a32faa46..abe3e38cd342 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
@@ -192,7 +192,7 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	if (skb->len < MIN_SKB_LEN) {
 		if (skb_pad(skb, MIN_SKB_LEN - skb->len)) {
 			netdev_err(netdev, "Failed to pad skb\n");
-			goto skb_error;
+			goto update_error_stats;
 		}
 
 		skb->len = MIN_SKB_LEN;
@@ -237,6 +237,7 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 skb_error:
 	dev_kfree_skb_any(skb);
 
+update_error_stats:
 	u64_stats_update_begin(&txq->txq_stats.syncp);
 	txq->txq_stats.tx_dropped++;
 	u64_stats_update_end(&txq->txq_stats.syncp);

^ permalink raw reply related

* Re: [Patch net-next v2 1/4] net_sched: get rid of more forward declarations
From: Jiri Pirko @ 2017-08-25  8:45 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jhs
In-Reply-To: <20170824235130.28503-2-xiyou.wangcong@gmail.com>

Fri, Aug 25, 2017 at 01:51:27AM CEST, xiyou.wangcong@gmail.com wrote:
>This is not needed if we move them up properly.
>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [Patch net-next v2 2/4] net_sched: introduce tclass_del_notify()
From: Jiri Pirko @ 2017-08-25  8:54 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jhs
In-Reply-To: <20170824235130.28503-3-xiyou.wangcong@gmail.com>

Fri, Aug 25, 2017 at 01:51:28AM CEST, xiyou.wangcong@gmail.com wrote:
>Like for TC actions, ->delete() is a special case,
>we have to prepare and fill the notification before delete
>otherwise would get use-after-free after we remove the
>reference count.
>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>---

[...]


>+static int tclass_del_notify(struct net *net,
>+			     const struct Qdisc_class_ops *cops,
>+			     struct sk_buff *oskb, struct nlmsghdr *n,
>+			     struct Qdisc *q, unsigned long cl)
>+{
>+	u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
>+	struct sk_buff *skb;
>+	int err = 0;
>+
>+	if (!cops->delete)
>+		return -EOPNOTSUPP;
>+
>+	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
>+	if (!skb)
>+		return -ENOBUFS;
>+
>+	if (tc_fill_tclass(skb, q, cl, portid, n->nlmsg_seq, 0,
>+			   RTM_DELTCLASS) < 0) {
>+		kfree_skb(skb);
>+		return -EINVAL;
>+	}
>+
>+	err = cops->delete(q, cl);
>+	if (err) {
>+		kfree_skb(skb);
>+		return err;
>+	}
>+
>+	return rtnetlink_send(skb, net, portid, RTNLGRP_TC,
>+			      n->nlmsg_flags & NLM_F_ECHO);

There is a lot of code duplication with tclass_notify function. Don't
you rather want to split tclass_notify into tclass_notify_prepare and
tclass_notify_send and use these 2 from both tclass_notify and
tclass_del_notify?

^ permalink raw reply

* Re: [Patch net-next v2 3/4] net_sched: remove tc class reference counting
From: Jiri Pirko @ 2017-08-25  9:00 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jhs, fw
In-Reply-To: <20170824235130.28503-4-xiyou.wangcong@gmail.com>

Fri, Aug 25, 2017 at 01:51:29AM CEST, xiyou.wangcong@gmail.com wrote:
>For TC classes, their ->get() and ->put() are always paired, and the
>reference counting is completely useless, because:
>
>1) For class modification and dumping paths, we already hold RTNL lock,
>   so all of these ->get(),->change(),->put() are atomic.

There is ongoing initiative by Florian to avoid taking RTNL for some
rtnetlink calls. I think that for dumping it could be done in tc as well.
Don't we need the refcnt then?


>
>2) For filter bindiing/unbinding, we use other reference counter than
>   this one, and they should have RTNL lock too.
>
>3) For ->qlen_notify(), it is special because it is called on ->enqueue()
>   path, but we already hold qdisc tree lock there, and we hold this
>   tree lock when graft or delete the class too, so it should not be gone
>   or changed until we release the tree lock.
>
>Therefore, this patch removes ->get() and ->put(), but:
>
>1) Adds a new ->find() to find the pointer to a class by classid, no
>   refcnt.
>
>2) Move the original class destroy upon the last refcnt into ->delete(),
>   right after releasing tree lock. This is fine because the class is
>   already removed from hash when holding the lock.
>
>For those who also use ->put() as ->unbind(), just rename them to reflect
>this change.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox