* [PATCH net-next v1 2/5] if_addr: add IFA_IF_NETNSID
From: Christian Brauner @ 2018-09-03 4:37 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
Christian Brauner
In-Reply-To: <20180903043717.20136-1-christian@brauner.io>
This adds a new IFA_IF_NETNSID property to be used by address families such
as PF_INET and PF_INET6.
The IFA_IF_NETNSID property can be used to send a network namespace
identifier as part of a request. If a IFA_IF_NETNSID property is identified
it will be used to retrieve the target network namespace in which the
request is to be made.
Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
v0->v1:
- unchanged
Note, I did not change the property name to IFA_TARGET_NSID as there
was no clear agreement what would be preferred. My personal preference
is to keep the IFA_IF_NETNSID name because it aligns naturally with
the IFLA_IF_NETNSID property for RTM_*LINK requests. Jiri seems to
prefer this name too.
However, if there is agreement that another property name makes more
sense I'm happy to send a v2 that changes this.
---
include/uapi/linux/if_addr.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/if_addr.h b/include/uapi/linux/if_addr.h
index ebaf5701c9db..0e0cd588cac0 100644
--- a/include/uapi/linux/if_addr.h
+++ b/include/uapi/linux/if_addr.h
@@ -34,6 +34,7 @@ enum {
IFA_MULTICAST,
IFA_FLAGS,
IFA_RT_PRIORITY, /* u32, priority/metric for prefix route */
+ IFA_IF_NETNSID,
__IFA_MAX,
};
--
2.17.1
^ permalink raw reply related
* [PATCH net-next v1 1/5] rtnetlink: add rtnl_get_net_ns_capable()
From: Christian Brauner @ 2018-09-03 4:37 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
Christian Brauner
In-Reply-To: <20180903043717.20136-1-christian@brauner.io>
get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to
retrieve network namespaces based on network namespace identifiers. So
remove the static declaration and export in the rtnetlink header. Also,
rename it to rtnl_get_net_ns_capable() to make it obvious what this
function is doing.
Signed-off-by: Christian Brauner <christian@brauner.io>
---
v0->v1:
- export rtnl_get_net_ns_capable().
Kbuild reported a build failure when ipv6 is built as a module. This was
caused by rtnl_get_net_ns_capable() not being exported. Fix this by
exporting it.
---
include/net/rtnetlink.h | 1 +
net/core/rtnetlink.c | 17 +++++++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 0bbaa5488423..cf26e5aacac4 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -165,6 +165,7 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm);
int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len,
struct netlink_ext_ack *exterr);
+struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid);
#define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 24431e578310..30645d9a9801 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1841,7 +1841,15 @@ static bool link_dump_filtered(struct net_device *dev,
return false;
}
-static struct net *get_target_net(struct sock *sk, int netnsid)
+/**
+ * rtnl_get_net_ns_capable - Get netns if sufficiently privileged.
+ * @sk: netlink socket
+ * @netnsid: network namespace identifier
+ *
+ * Returns the network namespace identified by netnsid on success or an error
+ * pointer on failure.
+ */
+struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid)
{
struct net *net;
@@ -1858,6 +1866,7 @@ static struct net *get_target_net(struct sock *sk, int netnsid)
}
return net;
}
+EXPORT_SYMBOL_GPL(rtnl_get_net_ns_capable);
static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
{
@@ -1893,7 +1902,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
ifla_policy, NULL) >= 0) {
if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(skb->sk, netnsid);
+ tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
if (IS_ERR(tgt_net)) {
tgt_net = net;
netnsid = -1;
@@ -2761,7 +2770,7 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
+ tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
@@ -3171,7 +3180,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
+ tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
--
2.17.1
^ permalink raw reply related
* [PATCH net-next v1 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-09-03 4:37 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
fw, ktkhai, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel,
Christian Brauner
Hey,
# v1 introduction:
The only functional change is the export of rtnl_get_net_ns_capable()
which is needed in case ipv6 is built as a module.
Note, I did not change the property name to IFA_TARGET_NSID as there was
no clear agreement what would be preferred. My personal preference is to
keep the IFA_IF_NETNSID name because it aligns naturally with the
IFLA_IF_NETNSID property for RTM_*LINK requests. Jiri seems to prefer
this name too.
However, if there is agreement that another property name makes more
sense I'm happy to send a v2 that changes this.
## Performance:
To test this patchset I performed 1 million getifaddrs() requests
against a network namespace containing 5 interfaces (lo, eth{0-4}). The
first test used a network namespace aware getifaddrs() implementation I
wrote and the second test used the traditional setns() + getifaddrs()
method. The results show that this patchsets allows userspace to cut
retrieval time in half:
1. netns_getifaddrs(): 82 microseconds
2. setns() + getifaddrs(): 162 microseconds
# v0 introduction:
A while back we introduced and enabled IFLA_IF_NETNSID in
RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
to signficant performance increases since it allows userspace to avoid
taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
interfaces from the netns associated with the netns_fd. Especially when a
lot of network namespaces are in use, using setns() becomes increasingly
problematic when performance matters.
Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
getifaddrs() style functions and friends). But currently, RTM_GETADDR
requests do not support a similar property like IFLA_IF_NETNSID for
RTM_*LINK requests.
This is problematic since userspace can retrieve interfaces from another
network namespace by sending a IFLA_IF_NETNSID property along but
RTM_GETLINK request but is still forced to use the legacy setns() style of
retrieving interfaces in RTM_GETADDR requests.
The goal of this series is to make it possible to perform RTM_GETADDR
requests on different network namespaces. To this end a new IFA_IF_NETNSID
property for RTM_*ADDR requests is introduced. It can be used to send a
network namespace identifier along in RTM_*ADDR requests. The network
namespace identifier will be used to retrieve the target network namespace
in which the request is supposed to be fulfilled. This aligns the behavior
of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
## Security:
- The caller must have assigned a valid network namespace identifier for
the target network namespace.
- The caller must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.
Thanks!
Christian
[1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
[2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
[3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
[4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
[5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
Christian Brauner (5):
rtnetlink: add rtnl_get_net_ns_capable()
if_addr: add IFA_IF_NETNSID
ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
rtnetlink: move type calculation out of loop
include/net/rtnetlink.h | 1 +
include/uapi/linux/if_addr.h | 1 +
net/core/rtnetlink.c | 19 +++++++---
net/ipv4/devinet.c | 38 +++++++++++++++-----
net/ipv6/addrconf.c | 70 ++++++++++++++++++++++++++++--------
5 files changed, 101 insertions(+), 28 deletions(-)
--
2.17.1
^ permalink raw reply
* Re: [PATCH v4] 9p: Add refcount to p9_req_t
From: Dominique Martinet @ 2018-09-03 4:36 UTC (permalink / raw)
To: Tomas Bortoli
Cc: Eric Van Hensbergen, Latchesar Ionkov, v9fs-developer, netdev,
linux-kernel, syzkaller, Dominique Martinet
In-Reply-To: <96b44210-3c4d-b5c9-0806-ad4b53fe911f@gmail.com>
Tomas Bortoli wrote on Fri, Aug 31, 2018:
> On 08/30/2018 12:52 PM, Dominique Martinet wrote:
> > From: Tomas Bortoli <tomasbortoli@gmail.com>
> >
> > To avoid use-after-free(s), use a refcount to keep track of the
> > usable references to any instantiated struct p9_req_t.
> >
> > This commit adds p9_req_put(), p9_req_get() and p9_req_try_get() as
> > wrappers to kref_put(), kref_get() and kref_get_unless_zero().
> > These are used by the client and the transports to keep track of
> > valid requests' references.
> >
> > p9_free_req() is added back and used as callback by kref_put().
> >
> > Add SLAB_TYPESAFE_BY_RCU as it ensures that the memory freed by
> > kmem_cache_free() will not be reused for another type until the rcu
> > synchronisation period is over, so an address gotten under rcu read
> > lock is safe to inc_ref() without corrupting random memory while
> > the lock is held.
> >
> > Co-developed-by: Dominique Martinet <dominique.martinet@cea.fr>
> > Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
> > Reported-by: syzbot+467050c1ce275af2a5b8@syzkaller.appspotmail.com
> > Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
> > ---
> > v3:
> > - add req put if virtio zc request fails
> > - add req put if cancelled callback is not defined for virtio
> > - (incorrectly) add req put in rdma cancelled callback
> >
> > v4:
> > - removed rdma's cancelled callback put again
> > - changed the else if no cancelled callback into actually giving virtio
> > a callback, xen does not need to call put in that case either because
> > both function rely on tag_lookup to find the request. trans_fd only
> > needs to put in cancelled because it also keeps the req in a list around
> > for cancel.
> > - add req put for trans xen's request(), I'm not sure why that one was
> > missing either..
> >
> > And with that I believe I am done testing all four transports.
> > I'll do a second round of tests next week just to make sure, but it
> > should be good enough™
> > Sorry for the multiple iterations.
>
> LGTM, thanks Dominique!
Thanks.
I've pushed this with the other patches to my '9p-next' branch, which
will get merged to linux-next today/tomorrow, so they can soak up some
syzbot testing as well.
That doesn't mean they cannot get reviews anymore, so don't be shy!
Tomas, I didn't see you reply about the 'rename req to rreq' requested
patch for trans_fd, but it's trivial so if you're not going to do it I
will submit something around next week.
--
Dominique
^ permalink raw reply
* Re: [PATCH v7 1/4] gpiolib: Pass bitmaps, not integer arrays, to get/set array
From: Matthew Wilcox @ 2018-09-03 4:31 UTC (permalink / raw)
To: Janusz Krzysztofik
Cc: Linus Walleij, Jonathan Corbet, Miguel Ojeda Sandonis,
Peter Korsgaard, Peter Rosin, Ulf Hansson, Andrew Lunn,
Florian Fainelli, David S. Miller, Dominik Brodowski,
Greg Kroah-Hartman, Kishon Vijay Abraham I, Lars-Peter Clausen,
Michael Hennerich, Jonathan Cameron, Hartmut Knaack,
Peter Meerwald-Stadler, Jiri Slaby, Willy Tarreau
In-Reply-To: <20180902120144.6855-2-jmkrzyszt@gmail.com>
> +++ b/drivers/auxdisplay/hd44780.c
> @@ -62,17 +62,12 @@ static void hd44780_strobe_gpio(struct hd44780 *hd)
> /* write to an LCD panel register in 8 bit GPIO mode */
> static void hd44780_write_gpio8(struct hd44780 *hd, u8 val, unsigned int rs)
> {
> - int values[10]; /* for DATA[0-7], RS, RW */
> - unsigned int i, n;
> -
> - for (i = 0; i < 8; i++)
> - values[PIN_DATA0 + i] = !!(val & BIT(i));
> - values[PIN_CTRL_RS] = rs;
> - n = 9;
> - if (hd->pins[PIN_CTRL_RW]) {
> - values[PIN_CTRL_RW] = 0;
> - n++;
> - }
> + DECLARE_BITMAP(values, 10); /* for DATA[0-7], RS, RW */
> + unsigned int n;
> +
> + *values = val;
> + __assign_bit(8, values, rs);
> + n = hd->pins[PIN_CTRL_RW] ? 10 : 9;
Doesn't this assume little endian bitmaps? Has anyone tested this on
big-endian machines?
^ permalink raw reply
* Re: [PATCH net-next 0/4] mlx5e IPoIB stats
From: David Miller @ 2018-09-02 23:23 UTC (permalink / raw)
To: tariqt; +Cc: netdev, eranbe, saeedm, ferasda
In-Reply-To: <1535915530-2874-1-git-send-email-tariqt@mellanox.com>
From: Tariq Toukan <tariqt@mellanox.com>
Date: Sun, 2 Sep 2018 22:12:06 +0300
> I am temporarily covering Saeed with the mlx5 submissions.
Ok, thanks for letting me know.
> This patchset by Feras contains statistics enhancements and NDO
> implementation for the mlx5e IPoIB driver.
>
> Series generated against net-next commit:
> 2d5c28859839 net: bgmac: remove set but not used variable 'err'
Series applied, thanks Tariq.
^ permalink raw reply
* [PATCH ipsec-next 1/2] xfrm: reset transport header back to network header after all input transforms ahave been applied
From: Sowmini Varadhan @ 2018-09-02 23:18 UTC (permalink / raw)
To: netdev, steffen.klassert; +Cc: davem, sowmini.varadhan
In-Reply-To: <cover.1535712205.git.sowmini.varadhan@oracle.com>
A policy may have been set up with multiple transforms (e.g., ESP
and ipcomp). In this situation, the ingress IPsec processing
iterates in xfrm_input() and applies each transform in turn,
processing the nexthdr to find any additional xfrm that may apply.
This patch resets the transport header back to network header
only after the last transformation so that subsequent xfrms
can find the correct transport header.
Suggested-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/ipv4/xfrm4_input.c | 1 +
net/ipv4/xfrm4_mode_transport.c | 4 +---
net/ipv6/xfrm6_input.c | 1 +
net/ipv6/xfrm6_mode_transport.c | 4 +---
4 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index bcfc00e..f8de248 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -67,6 +67,7 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
if (xo && (xo->flags & XFRM_GRO)) {
skb_mac_header_rebuild(skb);
+ skb_reset_transport_header(skb);
return 0;
}
diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c
index 3d36644..1ad2c2c 100644
--- a/net/ipv4/xfrm4_mode_transport.c
+++ b/net/ipv4/xfrm4_mode_transport.c
@@ -46,7 +46,6 @@ static int xfrm4_transport_output(struct xfrm_state *x, struct sk_buff *skb)
static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
{
int ihl = skb->data - skb_transport_header(skb);
- struct xfrm_offload *xo = xfrm_offload(skb);
if (skb->transport_header != skb->network_header) {
memmove(skb_transport_header(skb),
@@ -54,8 +53,7 @@ static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
skb->network_header = skb->transport_header;
}
ip_hdr(skb)->tot_len = htons(skb->len + ihl);
- if (!xo || !(xo->flags & XFRM_GRO))
- skb_reset_transport_header(skb);
+ skb_reset_transport_header(skb);
return 0;
}
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 841f4a0..9ef490d 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -59,6 +59,7 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
if (xo && (xo->flags & XFRM_GRO)) {
skb_mac_header_rebuild(skb);
+ skb_reset_transport_header(skb);
return -1;
}
diff --git a/net/ipv6/xfrm6_mode_transport.c b/net/ipv6/xfrm6_mode_transport.c
index 9ad07a9..3c29da5 100644
--- a/net/ipv6/xfrm6_mode_transport.c
+++ b/net/ipv6/xfrm6_mode_transport.c
@@ -51,7 +51,6 @@ static int xfrm6_transport_output(struct xfrm_state *x, struct sk_buff *skb)
static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb)
{
int ihl = skb->data - skb_transport_header(skb);
- struct xfrm_offload *xo = xfrm_offload(skb);
if (skb->transport_header != skb->network_header) {
memmove(skb_transport_header(skb),
@@ -60,8 +59,7 @@ static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb)
}
ipv6_hdr(skb)->payload_len = htons(skb->len + ihl -
sizeof(struct ipv6hdr));
- if (!xo || !(xo->flags & XFRM_GRO))
- skb_reset_transport_header(skb);
+ skb_reset_transport_header(skb);
return 0;
}
--
1.7.1
^ permalink raw reply related
* [PATCH ipsec-next 2/2] xfrm: reset crypto_done when iterating over multiple input xfrms
From: Sowmini Varadhan @ 2018-09-02 23:18 UTC (permalink / raw)
To: netdev, steffen.klassert; +Cc: davem, sowmini.varadhan
In-Reply-To: <cover.1535712205.git.sowmini.varadhan@oracle.com>
We only support one offloaded xfrm (we do not have devices that
can handle more than one offload), so reset crypto_done in
xfrm_input() when iterating over multiple transforms in xfrm_input,
so that we can invoke the appropriate x->type->input for the
non-offloaded transforms
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/xfrm/xfrm_input.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index b89c9c7..be3520e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -458,6 +458,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
goto drop;
}
+ crypto_done = false;
} while (!err);
err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
--
1.7.1
^ permalink raw reply related
* [PATCH ipsec-next 0/2] xfrm: bug fixes when processing multiple transforms
From: Sowmini Varadhan @ 2018-09-02 23:18 UTC (permalink / raw)
To: netdev, steffen.klassert; +Cc: davem, sowmini.varadhan
This series contains bug fixes that were encountered when I set
up a libreswan tunnel using the config below, which will set up
an IPsec policy involving 2 tmpls.
type=transport
compress=yes
esp=aes_gcm_c-128-null # offloaded to Niantic
auto=start
The non-offload test case uses esp=aes_gcm_c-256-null.
Each patch has a technical description of the contents of the fix.
Sowmini Varadhan (2):
xfrm: reset transport header back to network header after all input
transforms ahave been applied
xfrm: reset crypto_done when iterating over multiple input xfrms
net/ipv4/xfrm4_input.c | 1 +
net/ipv4/xfrm4_mode_transport.c | 4 +---
net/ipv6/xfrm6_input.c | 1 +
net/ipv6/xfrm6_mode_transport.c | 4 +---
net/xfrm/xfrm_input.c | 1 +
5 files changed, 5 insertions(+), 6 deletions(-)
^ permalink raw reply
* Re: [PATCH net-next 0/2] Full phylink support for mv88e6352
From: David Miller @ 2018-09-02 23:17 UTC (permalink / raw)
To: andrew; +Cc: f.fainelli, vivien.didelot, cphealy, netdev
In-Reply-To: <1535904795-17405-1-git-send-email-andrew@lunn.ch>
From: Andrew Lunn <andrew@lunn.ch>
Date: Sun, 2 Sep 2018 18:13:13 +0200
> These two patches implement full phylink support for the mv88e6352
> family, when using an SFP connected to its SERDES interface. This adds
> interrupt support to the SERDES, so that we get interrupts on link
> up/down, and then make calls phydev_link_change().
>
> The first patch is a minor bug fix, which does not seem to affect any
> current features, so i'm not submitting it for stable. It is however
> required for configuring SERDES interrupts.
Series applied, thanks Andrew.
^ permalink raw reply
* Re: [PATCH] uapi: Fix linux/rds.h userspace compilation errors.
From: David Miller @ 2018-09-02 23:15 UTC (permalink / raw)
To: vlee; +Cc: netdev
In-Reply-To: <20180901212027.25031-1-vlee@freedesktop.org>
From: Vinson Lee <vlee@freedesktop.org>
Date: Sat, 1 Sep 2018 21:20:27 +0000
> Include linux/in6.h for struct in6_addr.
...
> Fixes: b7ff8b1036f0 ("rds: Extend RDS API for IPv6 support")
> Signed-off-by: Vinson Lee <vlee@freedesktop.org>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Applied, thanks.
^ permalink raw reply
* Re: pull-request: bpf 2018-09-02
From: David Miller @ 2018-09-02 22:53 UTC (permalink / raw)
To: daniel; +Cc: ast, netdev
In-Reply-To: <20180902212031.11246-1-daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sun, 2 Sep 2018 23:20:31 +0200
> The following pull-request contains BPF updates for your *net* tree.
>
> The main changes are:
>
> 1) Fix one remaining buggy offset override in sockmap's bpf_msg_pull_data()
> when linearizing multiple scatterlist elements, from Tushar.
>
> 2) Fix BPF sockmap's misuse of ULP when a collision with another ULP is
> found on map update where it would release existing ULP. syzbot found and
> triggered this couple of times now, fix from John.
>
> 3) Add missing xskmap type to bpftool so it will properly show the type
> on map dump, from Prashant.
>
> Please consider pulling these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Pulled, thanks Daniel.
^ permalink raw reply
* Re: [PATCH] net: scm: Fix a possible sleep-in-atomic-context bug in scm_fp_copy()
From: Jia-Ju Bai @ 2018-09-03 1:43 UTC (permalink / raw)
To: David Miller
Cc: ktkhai, viro, adobriyan, dvlasenk, xiyou.wangcong, netdev,
linux-kernel
In-Reply-To: <20180902.160144.542360312136980090.davem@davemloft.net>
Thanks for your reply.
On 2018/9/3 7:01, David Miller wrote:
> From: Jia-Ju Bai <baijiaju1990@gmail.com>
> Date: Sat, 1 Sep 2018 18:00:26 +0800
>
>> The kernel module may sleep with holding a spinlock.
>>
>> The function call paths (from bottom to top) in Linux-4.16 are:
>>
>> [FUNC] kmalloc(GFP_KERNEL)
>> net/core/scm.c, 85: kmalloc in scm_fp_copy
>> net/core/scm.c, 161: scm_fp_copy in __scm_send
>> ./include/net/scm.h, 88: __scm_send in scm_send
>> net/unix/af_unix.c, 1600: scm_send in maybe_init_creds
>> net/unix/af_unix.c, 1983: maybe_init_creds in unix_stream_sendpage
>> net/unix/af_unix.c, 1973: spin_lock in unix_stream_sendpage
> Please, do a full analysis of the code for these changes you are
> submitting.
>
> Read maybe_init_creds(), it sets msg.msg_controllen to zero.
>
> struct msghdr msg = { .msg_controllen = 0 };
>
> When that is zero, __scm__send() is never called.
Oh, I did not notice this, sorry...
> static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
> struct scm_cookie *scm, bool forcecreds)
> {
> ...
> if (msg->msg_controllen <= 0)
> return 0;
> return __scm_send(sock, msg, scm);
>
> If this bug existed, sleeping in atomic warnings would be triggering
> all the time and people would report that.
Sorry for this false positive.
I will check the code more carefully before submitting my patches.
Best wishes,
Jia-Ju Bai
^ permalink raw reply
* Re: [PATCH] isdn: mISDN: tei: Fix a sleep-in-atomic-context bug in create_teimgr()
From: Jia-Ju Bai @ 2018-09-03 1:40 UTC (permalink / raw)
To: isdn; +Cc: netdev, linux-kernel
In-Reply-To: <3ecd32b2-81e5-038e-edc9-fd06d6e21851@linux-pingi.de>
On 2018/9/3 0:31, isdn@linux-pingi.de wrote:
> Hi,
>
> I do not understand the analysis and do not see that the spinlock is a
> problem here.
> I think your DSAC analyzer assumes that the FUNC_PTR mgr_ctrl call calls
> the mgr_ctrl in tei.c, but in real it calls l2->ch.ctrl() which is the
> function in layer2.c, not tei.c. And the function in layer2.c should not
> do any GFP_KERNEL allocation.
>
> Same for your 2. reported issue.
Okay, thanks for your reply.
My analysis handles the function pointer using the function type and
structure field, but it cannot distinguish the different variables of
the same type and field now.
I will try to improve my tool, thanks for your explanation.
Best wishes,
Jia-Ju Bai
^ permalink raw reply
* pull-request: bpf 2018-09-02
From: Daniel Borkmann @ 2018-09-02 21:20 UTC (permalink / raw)
To: davem; +Cc: daniel, ast, netdev
Hi David,
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix one remaining buggy offset override in sockmap's bpf_msg_pull_data()
when linearizing multiple scatterlist elements, from Tushar.
2) Fix BPF sockmap's misuse of ULP when a collision with another ULP is
found on map update where it would release existing ULP. syzbot found and
triggered this couple of times now, fix from John.
3) Add missing xskmap type to bpftool so it will properly show the type
on map dump, from Prashant.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Thanks a lot!
----------------------------------------------------------------
The following changes since commit 93bbadd6e0a2a58e49d265b9b1aa58e621b60a26:
ipv6: don't get lwtstate twice in ip6_rt_copy_init() (2018-09-01 17:42:12 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
for you to fetch changes up to 597222f72a94118f593e4f32bf58ae7e049a0df1:
bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP (2018-09-02 22:31:10 +0200)
----------------------------------------------------------------
John Fastabend (1):
bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP
Prashant Bhole (1):
tools/bpf: bpftool, add xskmap in map types
Tushar Dave (1):
bpf: Fix bpf_msg_pull_data()
kernel/bpf/sockmap.c | 12 +++++++++++-
net/core/filter.c | 7 +++----
tools/bpf/bpftool/map.c | 1 +
3 files changed, 15 insertions(+), 5 deletions(-)
^ permalink raw reply
* Re: [PATCH net] net/ipv6: Only update MTU metric if it set
From: David Miller @ 2018-09-02 21:04 UTC (permalink / raw)
To: dsahern; +Cc: netdev, medhefgo, dsahern
In-Reply-To: <20180830211543.27111-1-dsahern@kernel.org>
From: dsahern@kernel.org
Date: Thu, 30 Aug 2018 14:15:43 -0700
> From: David Ahern <dsahern@gmail.com>
>
> Jan reported a regression after an update to 4.18.5. In this case ipv6
> default route is setup by systemd-networkd based on data from an RA. The
> RA contains an MTU of 1492 which is used when the route is first inserted
> but then systemd-networkd pushes down updates to the default route
> without the mtu set.
>
> Prior to the change to fib6_info, metrics such as MTU were held in the
> dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if
> any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not,
> the value from the metrics is used if it is set and finally falling
> back to the idev value.
>
> After the fib6_info change metrics are contained in the fib6_info struct
> and there is no equivalent to rt6i_pmtu. To maintain consistency with
> the old behavior the new code should only reset the MTU in the metrics
> if the route update has it set.
>
> Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
> Reported-by: Jan Janssen <medhefgo@web.de>
> Signed-off-by: David Ahern <dsahern@gmail.com>
Applied and queued up for -stable, thanks David.
^ permalink raw reply
* Re: [bpf-next 1/3] flow_dissector: implements flow dissector BPF hook
From: Daniel Borkmann @ 2018-09-02 21:03 UTC (permalink / raw)
To: Petar Penkov, netdev
Cc: davem, ast, simon.horman, ecree, songliubraving, tom,
Petar Penkov, Willem de Bruijn
In-Reply-To: <20180830182301.89435-2-peterpenkov96@gmail.com>
On 08/30/2018 08:22 PM, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
>
> Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
> attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
> path. The BPF program is per-network namespace.
>
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
[...]
> + err = check_flow_keys_access(env, off, size);
> + if (!err && t == BPF_READ && value_regno >= 0)
> + mark_reg_unknown(env, regs, value_regno);
> } else {
> verbose(env, "R%d invalid mem access '%s'\n", regno,
> reg_type_str[reg->type]);
> @@ -1925,6 +1954,8 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
> case PTR_TO_PACKET_META:
> return check_packet_access(env, regno, reg->off, access_size,
> zero_size_allowed);
> + case PTR_TO_FLOW_KEYS:
> + return check_flow_keys_access(env, reg->off, access_size);
> case PTR_TO_MAP_VALUE:
> return check_map_access(env, regno, reg->off, access_size,
> zero_size_allowed);
> @@ -3976,6 +4007,7 @@ static bool may_access_skb(enum bpf_prog_type type)
> case BPF_PROG_TYPE_SOCKET_FILTER:
> case BPF_PROG_TYPE_SCHED_CLS:
> case BPF_PROG_TYPE_SCHED_ACT:
> + case BPF_PROG_TYPE_FLOW_DISSECTOR:
> return true;
This one should not be added here. It would allow for LD_ABS to be used, but
you already have direct packet access as well as bpf_skb_load_bytes() helper
enabled. Downside on LD_ABS is that error path will exit the BPF prog with
return 0 for historical reasons w/o user realizing (here: to BPF_OK mapping).
So we should not encourage use of LD_ABS/IND anymore in eBPF context and
avoid surprises.
> default:
> return false;
> @@ -4451,6 +4483,7 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
> case PTR_TO_CTX:
> case CONST_PTR_TO_MAP:
> case PTR_TO_PACKET_END:
> + case PTR_TO_FLOW_KEYS:
> /* Only valid matches are exact, which memcmp() above
> * would have accepted
> */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c25eb36f1320..0143b9c0c67e 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5092,6 +5092,17 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> }
> }
>
> +static const struct bpf_func_proto *
> +flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> + switch (func_id) {
> + case BPF_FUNC_skb_load_bytes:
> + return &bpf_skb_load_bytes_proto;
Probably makes sense to also enable bpf_skb_pull_data helper for direct packet
access use to fetch non-linear data from here once.
> + default:
> + return bpf_base_func_proto(func_id);
> + }
> +}
> +
> static const struct bpf_func_proto *
> lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> {
> @@ -5207,6 +5218,7 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
> case bpf_ctx_range(struct __sk_buff, data):
> case bpf_ctx_range(struct __sk_buff, data_meta):
> case bpf_ctx_range(struct __sk_buff, data_end):
> + case bpf_ctx_range(struct __sk_buff, flow_keys):
> if (size != size_default)
> return false;
> break;
> @@ -5235,6 +5247,7 @@ static bool sk_filter_is_valid_access(int off, int size,
> case bpf_ctx_range(struct __sk_buff, data):
> case bpf_ctx_range(struct __sk_buff, data_meta):
> case bpf_ctx_range(struct __sk_buff, data_end):
> + case bpf_ctx_range(struct __sk_buff, flow_keys):
> case bpf_ctx_range_till(struct __sk_buff, family, local_port):
[...]
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH net-next 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-09-03 1:18 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, netdev, linux-kernel, davem, kuznet, yoshfuji,
pombredanne, kstewart, gregkh, dsahern, fw, ktkhai, lucien.xin,
jakub.kicinski, jbenc, nicolas.dichtel
In-Reply-To: <201808310127.xJh9cWoD%fengguang.wu@intel.com>
On Fri, Aug 31, 2018 at 02:41:45AM +0800, kbuild test robot wrote:
> Hi Christian,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on net-next/master]
>
> url: https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-add-IFA_IF_NETNSID-for-RTM_GETADDR/20180830-194411
> config: x86_64-randconfig-s1-08302022 (attached as .config)
> compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All errors (new ones prefixed by >>):
>
> >> ERROR: "rtnl_get_net_ns_capable" [net/ipv6/ipv6.ko] undefined!
Fwiw, this is triggered when ipv6 is built as a module. The first
version of my patch did not take that into account and left
rtnl_get_net_ns_capable() unexported. The second version will export
that function and fix this issue.
Christian
^ permalink raw reply
* Re: [PATCH net-next] net/sched: fix type of htb statistics
From: David Miller @ 2018-09-02 20:57 UTC (permalink / raw)
To: florent.fourcot; +Cc: netdev
In-Reply-To: <20180830143923.25122-1-florent.fourcot@wifirst.fr>
From: Florent Fourcot <florent.fourcot@wifirst.fr>
Date: Thu, 30 Aug 2018 16:39:23 +0200
> tokens and ctokens are defined as s64 in htb_class structure,
> and clamped to 32bits value during netlink dumps:
>
> cl->xstats.tokens = clamp_t(s64, PSCHED_NS2TICKS(cl->tokens),
> INT_MIN, INT_MAX);
>
> Defining it as u32 is working since userspace (tc) is printing it as
> signed int, but a correct definition from the beginning is probably
> better.
>
> In the same time, 'giants' structure member is unused since years, so
> update the comment to mark it unused.
>
> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Looks good, applied.
^ permalink raw reply
* Re: [PATCH 2/2] net: ethernet: cpsw-phy-sel: prefer phandle for phy sel
From: David Miller @ 2018-09-02 20:52 UTC (permalink / raw)
To: tony
Cc: netdev, linux-omap, devicetree, andrew, grygorii.strashko,
ivan.khoronzhuk, mark.rutland, m-karicheri2, robh+dt
In-Reply-To: <20180829150024.43210-2-tony@atomide.com>
From: Tony Lindgren <tony@atomide.com>
Date: Wed, 29 Aug 2018 08:00:24 -0700
> The cpsw-phy-sel device is not a child of the cpsw interconnect target
> module. It lives in the system control module.
>
> Let's fix this issue by trying to use cpsw-phy-sel phandle first if it
> exists and if not fall back to current usage of trying to find the
> cpsw-phy-sel child. That way the phy sel driver can be a child of the
> system control module where it belongs in the device tree.
>
> Without this fix, we cannot have a proper interconnect target module
> hierarchy in device tree for things like genpd.
>
> Note that deferred probe is mostly not supported by cpsw and this patch
> does not attempt to fix that. In case deferred probe support is needed,
> this could be added to cpsw_slave_open() and phy_connect() so they start
> handling and returning errors.
>
> For documenting it, looks like the cpsw-phy-sel is used for all cpsw device
> tree nodes. It's missing the related binding documentation, so let's also
> update the binding documentation accordingly.
>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/2] dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle
From: David Miller @ 2018-09-02 20:52 UTC (permalink / raw)
To: tony
Cc: netdev, linux-omap, devicetree, andrew, grygorii.strashko,
ivan.khoronzhuk, mark.rutland, m-karicheri2, robh+dt
In-Reply-To: <20180829150024.43210-1-tony@atomide.com>
From: Tony Lindgren <tony@atomide.com>
Date: Wed, 29 Aug 2018 08:00:23 -0700
> The current cpsw usage for cpsw-phy-sel is undocumented but is used for
> all the boards using cpsw. And cpsw-phy-sel is not really a child of
> the cpsw device, it lives in the system control module instead.
>
> Let's document the existing usage, and improve it a bit where we prefer
> to use a phandle instead of a child device for it. That way we can
> properly describe the hardware in dts files for things like genpd.
>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
Applied.
^ permalink raw reply
* Re: [bpf PATCH v2] bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP
From: Daniel Borkmann @ 2018-09-02 20:40 UTC (permalink / raw)
To: John Fastabend, ast; +Cc: netdev
In-Reply-To: <20180831042502.15257.35453.stgit@john-Precision-Tower-5810>
On 08/31/2018 06:25 AM, John Fastabend wrote:
> Currently we check sk_user_data is non NULL to determine if the sk
> exists in a map. However, this is not sufficient to ensure the psock
> or the ULP ops are not in use by another user, such as kcm or TLS. To
> avoid this when adding a sock to a map also verify it is of the
> correct ULP type. Additionally, when releasing a psock verify that
> it is the TCP_ULP_BPF type before releasing the ULP. The error case
> where we abort an update due to ULP collision can cause this error
> path.
>
> For example,
>
> __sock_map_ctx_update_elem()
> [...]
> err = tcp_set_ulp_id(sock, TCP_ULP_BPF) <- collides with TLS
> if (err) <- so err out here
> goto out_free
> [...]
> out_free:
> smap_release_sock() <- calling tcp_cleanup_ulp releases the
> TLS ULP incorrectly.
>
> Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Applied to bpf, thanks John!
^ permalink raw reply
* Re: [PATCH net 0/2] igmp: fix two incorrect unsolicit report count issues
From: David Miller @ 2018-09-02 20:39 UTC (permalink / raw)
To: liuhangbin; +Cc: netdev
In-Reply-To: <1535537171-24533-1-git-send-email-liuhangbin@gmail.com>
From: Hangbin Liu <liuhangbin@gmail.com>
Date: Wed, 29 Aug 2018 18:06:07 +0800
> Just like the subject, fix two minor igmp unsolicit report count issues.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH bpf-next] tools/bpf: bpftool, add xskmap in map types
From: Daniel Borkmann @ 2018-09-02 20:39 UTC (permalink / raw)
To: Jakub Kicinski, Prashant Bhole; +Cc: Alexei Starovoitov, Quentin Monnet, netdev
In-Reply-To: <20180831104532.22093b6d@cakuba.netronome.com>
On 08/31/2018 10:45 AM, Jakub Kicinski wrote:
> On Fri, 31 Aug 2018 15:32:42 +0900, Prashant Bhole wrote:
>> When listed all maps, bpftool currently shows (null) for xskmap.
>> Added xskmap type in map_type_name[] to show correct type.
>>
>> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
>
> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>
> Thank you! I feel tempted to suggest considering the bpf tree, but
> perhaps that's a stretch..
Applied it to bpf, thanks guys!
^ permalink raw reply
* Re: [PATCH net] bpf: Fix bpf_msg_pull_data()
From: Daniel Borkmann @ 2018-09-02 20:39 UTC (permalink / raw)
To: Tushar Dave, john.fastabend, ast, davem, netdev; +Cc: sowmini.varadhan
In-Reply-To: <1535751916-11880-1-git-send-email-tushar.n.dave@oracle.com>
On 08/31/2018 11:45 PM, Tushar Dave wrote:
> Helper bpf_msg_pull_data() mistakenly reuses variable 'offset' while
> linearizing multiple scatterlist elements. Variable 'offset' is used
> to find first starting scatterlist element
> i.e. msg->data = sg_virt(&sg[first_sg]) + start - offset"
>
> Use different variable name while linearizing multiple scatterlist
> elements so that value contained in variable 'offset' won't get
> overwritten.
>
> Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
> Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Applied to bpf, thanks Tushar!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox