* [net-next 1/3] tipc: set default MTU for UDP media
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19 9:06 UTC (permalink / raw)
To: tipc-discussion, jon.maloy, maloy, ying.xue,
mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Currently, all bearers are configured with MTU value same as the
underlying L2 device. However, in case of bearers with media type
UDP, higher throughput is possible with a fixed and higher emulated
MTU value than adapting to the underlying L2 MTU.
In this commit, we introduce a parameter mtu in struct tipc_media
and a default value is set for UDP. A default value of 14k
was determined by experimentation and found to have a higher throughput
than 16k. MTU for UDP bearers are assigned the above set value of
media MTU.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
include/uapi/linux/tipc_config.h | 5 +++++
net/tipc/udp_media.c | 4 ++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/tipc_config.h b/include/uapi/linux/tipc_config.h
index 3f29e3c8ed06..4b2c93b1934c 100644
--- a/include/uapi/linux/tipc_config.h
+++ b/include/uapi/linux/tipc_config.h
@@ -185,6 +185,11 @@
#define TIPC_DEF_LINK_WIN 50
#define TIPC_MAX_LINK_WIN 8191
+/*
+ * Default MTU for UDP media
+ */
+
+#define TIPC_DEF_LINK_UDP_MTU 14000
struct tipc_node_info {
__be32 addr; /* network address of node */
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index e7d91f5d5cae..9783101bc4a9 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -713,8 +713,7 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
err = -EINVAL;
goto err;
}
- b->mtu = dev->mtu - sizeof(struct iphdr)
- - sizeof(struct udphdr);
+ b->mtu = b->media->mtu;
#if IS_ENABLED(CONFIG_IPV6)
} else if (local.proto == htons(ETH_P_IPV6)) {
udp_conf.family = AF_INET6;
@@ -803,6 +802,7 @@ struct tipc_media udp_media_info = {
.priority = TIPC_DEF_LINK_PRI,
.tolerance = TIPC_DEF_LINK_TOL,
.window = TIPC_DEF_LINK_WIN,
+ .mtu = TIPC_DEF_LINK_UDP_MTU,
.type_id = TIPC_MEDIA_TYPE_UDP,
.hwaddr_len = 0,
.name = "udp"
--
2.1.4
^ permalink raw reply related
* [net-next 2/3] tipc: implement configuration of UDP media MTU
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19 9:06 UTC (permalink / raw)
To: tipc-discussion, jon.maloy, maloy, ying.xue,
mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>
In previous commit, we changed the default emulated MTU for UDP bearers
to 14k.
This commit adds the functionality to set/change the default value
by configuring new MTU for UDP media. UDP bearer(s) have to be disabled
and enabled back for the new MTU to take effect.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
include/uapi/linux/tipc_netlink.h | 1 +
net/tipc/bearer.c | 13 +++++++++++++
net/tipc/bearer.h | 3 +++
net/tipc/udp_media.h | 14 ++++++++++++++
4 files changed, 31 insertions(+)
diff --git a/include/uapi/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h
index 0affb682e5e3..85c11982c89b 100644
--- a/include/uapi/linux/tipc_netlink.h
+++ b/include/uapi/linux/tipc_netlink.h
@@ -266,6 +266,7 @@ enum {
TIPC_NLA_PROP_PRIO, /* u32 */
TIPC_NLA_PROP_TOL, /* u32 */
TIPC_NLA_PROP_WIN, /* u32 */
+ TIPC_NLA_PROP_MTU, /* u32 */
__TIPC_NLA_PROP_MAX,
TIPC_NLA_PROP_MAX = __TIPC_NLA_PROP_MAX - 1
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index f7d47c89d658..a22caf9e5a18 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -1029,6 +1029,9 @@ static int __tipc_nl_add_media(struct tipc_nl_msg *msg,
goto prop_msg_full;
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->window))
goto prop_msg_full;
+ if (media->type_id == TIPC_MEDIA_TYPE_UDP)
+ if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, media->mtu))
+ goto prop_msg_full;
nla_nest_end(msg->skb, prop);
nla_nest_end(msg->skb, attrs);
@@ -1158,6 +1161,16 @@ int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info)
m->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
if (props[TIPC_NLA_PROP_WIN])
m->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+ if (props[TIPC_NLA_PROP_MTU]) {
+ if (m->type_id != TIPC_MEDIA_TYPE_UDP)
+ return -EINVAL;
+#ifdef CONFIG_TIPC_MEDIA_UDP
+ if (tipc_udp_mtu_bad(nla_get_u32
+ (props[TIPC_NLA_PROP_MTU])))
+ return -EINVAL;
+ m->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
+#endif
+ }
}
return 0;
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 6efcee63a381..394290cbbb1d 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -94,6 +94,8 @@ struct tipc_bearer;
* @priority: default link (and bearer) priority
* @tolerance: default time (in ms) before declaring link failure
* @window: default window (in packets) before declaring link congestion
+ * @mtu: max packet size bearer can support for media type not dependent on
+ * underlying device MTU
* @type_id: TIPC media identifier
* @hwaddr_len: TIPC media address len
* @name: media name
@@ -118,6 +120,7 @@ struct tipc_media {
u32 priority;
u32 tolerance;
u32 window;
+ u32 mtu;
u32 type_id;
u32 hwaddr_len;
char name[TIPC_MAX_MEDIA_NAME];
diff --git a/net/tipc/udp_media.h b/net/tipc/udp_media.h
index 281bbae87726..e7455cc73e16 100644
--- a/net/tipc/udp_media.h
+++ b/net/tipc/udp_media.h
@@ -38,9 +38,23 @@
#ifndef _TIPC_UDP_MEDIA_H
#define _TIPC_UDP_MEDIA_H
+#include <linux/ip.h>
+#include <linux/udp.h>
+
int tipc_udp_nl_bearer_add(struct tipc_bearer *b, struct nlattr *attr);
int tipc_udp_nl_add_bearer_data(struct tipc_nl_msg *msg, struct tipc_bearer *b);
int tipc_udp_nl_dump_remoteip(struct sk_buff *skb, struct netlink_callback *cb);
+/* check if configured MTU is too low for tipc headers */
+static inline bool tipc_udp_mtu_bad(u32 mtu)
+{
+ if (mtu >= (TIPC_MIN_BEARER_MTU + sizeof(struct iphdr) +
+ sizeof(struct udphdr)))
+ return false;
+
+ pr_warn("MTU too low for tipc bearer\n");
+ return true;
+}
+
#endif
#endif
--
2.1.4
^ permalink raw reply related
* [net-next 3/3] tipc: confgiure and apply UDP bearer MTU on running links
From: GhantaKrishnamurthy MohanKrishna @ 2018-04-19 9:06 UTC (permalink / raw)
To: tipc-discussion, jon.maloy, maloy, ying.xue,
mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Currently, we have option to configure MTU of UDP media. The configured
MTU takes effect on the links going up after that moment. I.e, a user
has to reset bearer to have new value applied across its links. This is
confusing and disturbing on a running cluster.
We now introduce the functionality to change the default UDP bearer MTU
in struct tipc_bearer. Additionally, the links are updated dynamically,
without any need for a reset, when bearer value is changed. We leverage
the existing per-link functionality and the design being symetrical to
the confguration of link tolerance.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
net/tipc/bearer.c | 16 +++++++++++++++-
net/tipc/node.c | 12 +++++++++---
net/tipc/node.h | 2 +-
3 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index a22caf9e5a18..2dfb492a7c94 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -697,6 +697,9 @@ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
goto prop_msg_full;
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->window))
goto prop_msg_full;
+ if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP)
+ if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu))
+ goto prop_msg_full;
nla_nest_end(msg->skb, prop);
@@ -979,12 +982,23 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info)
if (props[TIPC_NLA_PROP_TOL]) {
b->tolerance = nla_get_u32(props[TIPC_NLA_PROP_TOL]);
- tipc_node_apply_tolerance(net, b);
+ tipc_node_apply_property(net, b, TIPC_NLA_PROP_TOL);
}
if (props[TIPC_NLA_PROP_PRIO])
b->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
if (props[TIPC_NLA_PROP_WIN])
b->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+ if (props[TIPC_NLA_PROP_MTU]) {
+ if (b->media->type_id != TIPC_MEDIA_TYPE_UDP)
+ return -EINVAL;
+#ifdef CONFIG_TIPC_MEDIA_UDP
+ if (tipc_udp_mtu_bad(nla_get_u32
+ (props[TIPC_NLA_PROP_MTU])))
+ return -EINVAL;
+ b->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
+ tipc_node_apply_property(net, b, TIPC_NLA_PROP_MTU);
+#endif
+ }
}
return 0;
diff --git a/net/tipc/node.c b/net/tipc/node.c
index c77dd2f3c589..b71e4e376bb9 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1681,7 +1681,8 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)
kfree_skb(skb);
}
-void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b)
+void tipc_node_apply_property(struct net *net, struct tipc_bearer *b,
+ int prop)
{
struct tipc_net *tn = tipc_net(net);
int bearer_id = b->identity;
@@ -1696,8 +1697,13 @@ void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b)
list_for_each_entry_rcu(n, &tn->node_list, list) {
tipc_node_write_lock(n);
e = &n->links[bearer_id];
- if (e->link)
- tipc_link_set_tolerance(e->link, b->tolerance, &xmitq);
+ if (e->link) {
+ if (prop == TIPC_NLA_PROP_TOL)
+ tipc_link_set_tolerance(e->link, b->tolerance,
+ &xmitq);
+ else if (prop == TIPC_NLA_PROP_MTU)
+ tipc_link_set_mtu(e->link, b->mtu);
+ }
tipc_node_write_unlock(n);
tipc_bearer_xmit(net, bearer_id, &xmitq, &e->maddr);
}
diff --git a/net/tipc/node.h b/net/tipc/node.h
index f24b83500df1..bb271a37c93f 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -67,7 +67,7 @@ void tipc_node_check_dest(struct net *net, u32 onode, u8 *peer_id128,
struct tipc_media_addr *maddr,
bool *respond, bool *dupl_addr);
void tipc_node_delete_links(struct net *net, int bearer_id);
-void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b);
+void tipc_node_apply_property(struct net *net, struct tipc_bearer *b, int prop);
int tipc_node_get_linkname(struct net *net, u32 bearer_id, u32 node,
char *linkname, size_t len);
int tipc_node_xmit(struct net *net, struct sk_buff_head *list, u32 dnode,
--
2.1.4
^ permalink raw reply related
* Re: [bpf-next PATCH 1/3] bpf: add id to map tracepoint
From: Jesper Dangaard Brouer @ 2018-04-19 9:08 UTC (permalink / raw)
To: Sebastiano Miano
Cc: netdev, ast, daniel, mingo, rostedt, fulvio.risso, brouer
In-Reply-To: <152406544797.3465.16927588919039069579.stgit@localhost.localdomain>
On Wed, 18 Apr 2018 17:30:48 +0200
Sebastiano Miano <sebastiano.miano@polito.it> wrote:
> This patch adds the map id to the bpf tracepoints
> that can be used when monitoring or inspecting map
> related functions.
>
> Signed-off-by: Sebastiano Miano <sebastiano.miano@polito.it>
> Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Thanks you for doing this. I've needed this before when
troubleshooting my XDP programs (specifically xdp_ddos01_blacklist[1]).
E.g. when I want to verify that my tools are doing the right thing, I
can now find the XDP prog id via 'ip link' or bpftool, and list the map
IDs used by the prog tool (via bpftool), and now use perf to record map
changes, which now have the needed IDs I can filter on. Before, I
could not tell the difference if the program was updating the correct
map (which were a mistake I ran into).
Perf record even support supplying filters on the cmdline, like:
perf record -e bpf:bpf_map_* -a --filter 'id == 2 || id == 1' sleep 100
And yes, doing filtering this way is slow, compared to doing it via a
bpf_prog inside the kernel, which Sebastiano already provide a sample
on howto do. But I just needed a way to find the bug in my program,
not any high speed usage.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_ddos01_blacklist_cmdline.c
^ permalink raw reply
* Re: [PATCH] net: phy: marvell: clear wol event before setting it
From: Jisheng Zhang @ 2018-04-19 9:09 UTC (permalink / raw)
To: Bhadram Varka
Cc: Andrew Lunn, Florian Fainelli, David S. Miller,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <e1409c9493f64188a71b44f69ccb24cf@bgmail102.nvidia.com>
On Thu, 19 Apr 2018 09:00:40 +0000 Bhadram Varka wrote:
> Hi,
>
> > -----Original Message-----
> > From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > Sent: Thursday, April 19, 2018 2:24 PM
> > To: Bhadram Varka <vbhadram@nvidia.com>
> > Cc: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
> > David S. Miller <davem@davemloft.net>; netdev@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Jingju Hou <Jingju.Hou@synaptics.com>
> > Subject: Re: [PATCH] net: phy: marvell: clear wol event before setting it
> >
> > Hi,
> >
> > On Thu, 19 Apr 2018 08:38:45 +0000 Bhadram Varka wrote:
> >
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > > > Behalf Of Jisheng Zhang
> > > > Sent: Thursday, April 19, 2018 1:33 PM
> > > > To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli
> > > > <f.fainelli@gmail.com>; David S. Miller <davem@davemloft.net>
> > > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
> > > > <Jingju.Hou@synaptics.com>
> > > > Subject: [PATCH] net: phy: marvell: clear wol event before setting
> > > > it
> > > >
> > > > From: Jingju Hou <Jingju.Hou@synaptics.com>
> > > >
> > > > If WOL event happened once, the LED[2] interrupt pin will not be
> > > > cleared unless reading the CSISR register. So clear the WOL event before
> > enabling it.
> > > >
> > > > Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
> > > > Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
> > > > ---
> > > > drivers/net/phy/marvell.c | 9 +++++++++
> > > > 1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > > > index c22e8e383247..b6abe1cbc84b 100644
> > > > --- a/drivers/net/phy/marvell.c
> > > > +++ b/drivers/net/phy/marvell.c
> > > > @@ -115,6 +115,9 @@
> > > > /* WOL Event Interrupt Enable */
> > > > #define MII_88E1318S_PHY_CSIER_WOL_EIE BIT(7)
> > > >
> > > > +/* Copper Specific Interrupt Status Register */
> > > > +#define MII_88E1318S_PHY_CSISR 0x13
> > > > +
> > >
> > > There is already macro to represent this register - MII_M1011_IEVENT. Do we
> > need this macro ?
> >
> > Good point. Will use MII_M1011_IEVENT instead in v2.
> >
> > >
> > > > /* LED Timer Control Register */
> > > > #define MII_88E1318S_PHY_LED_TCR 0x12
> > > > #define MII_88E1318S_PHY_LED_TCR_FORCE_INT BIT(15)
> > > > @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
> > > > *phydev,
> > > > if (err < 0)
> > > > goto error;
> > > >
> > > > + /* If WOL event happened once, the LED[2] interrupt pin
> > > > + * will not be cleared unless reading the CSISR register.
> > > > + * So clear the WOL event first before enabling it.
> > > > + */
> > > > + phy_read(phydev, MII_88E1318S_PHY_CSISR);
> > >
> > > This part of the operation already taken care by ack_interrupt and
> > > did_interrupt [....] .ack_interrupt = &marvell_ack_interrupt,
> > > .did_interrupt = &m88e1121_did_interrupt, [...]
> > >
> > > If at all WOL event occurred marvell_ack_interrupt will take care of clearing the
> > interrupt status register.
> > > Am I missing anything here ?
> >
> > If there's no valid irq for phy, the ack_interrupt/did_interrupt won't be called.
>
> Which means that the PHY is not having Interrupt pin ?
No valid irq doesn't mean "not having interrupt pin". they are different
>
> Generally through PHY interrupt will wake up the system right. If there is no interrupt pin then how the system will wake up the from suspend for the magic packet.?
>
IIRC, the phy irq isn't necessary for WOL. The phy interrupt pin isn't
necessarily taken as "interrupt"
PS: Did you use outlook as your email client? it's not suitable
for kernel mail list.
Thanks
^ permalink raw reply
* Re: [bpf-next PATCH 2/3] bpf: add id to prog tracepoint
From: Jesper Dangaard Brouer @ 2018-04-19 9:10 UTC (permalink / raw)
To: Sebastiano Miano
Cc: netdev, ast, daniel, mingo, rostedt, fulvio.risso, brouer
In-Reply-To: <152406545357.3465.11912320259045335712.stgit@localhost.localdomain>
On Wed, 18 Apr 2018 17:30:53 +0200
Sebastiano Miano <sebastiano.miano@polito.it> wrote:
> This patch adds the prog id to the bpf tracepoints
> that can be used when monitoring or inspecting prog
> related functions.
>
> Signed-off-by: Sebastiano Miano <sebastiano.miano@polito.it>
> Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [bpf-next PATCH 3/3] bpf: add sample program to trace map events
From: Jesper Dangaard Brouer @ 2018-04-19 9:20 UTC (permalink / raw)
To: Sebastiano Miano
Cc: netdev, ast, daniel, mingo, rostedt, fulvio.risso, brouer
In-Reply-To: <152406545918.3465.14253635905960610284.stgit@localhost.localdomain>
On Wed, 18 Apr 2018 17:30:59 +0200
Sebastiano Miano <sebastiano.miano@polito.it> wrote:
> This patch adds a sample program, called trace_map_events,
> that shows how to capture map events and filter them based on
> the map id.
>
> The program accepts a list of map IDs, via the -i command line
> option, and filters all the map events related to those IDs (i.e.,
> map_create/update/lookup/next_key).
> If no IDs are specified, all map events are listed and no filtering
> is performed.
>
> Sample usage:
>
> # trace_map_events -i <map_id1> -i <map_id2> -i <map_id3> ...
>
> Signed-off-by: Sebastiano Miano <sebastiano.miano@polito.it>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
I have tested it works:
$ sudo ./trace_map_events -i 2
Init bpf_perf_event for cpu:0
Init bpf_perf_event for cpu:1
Init bpf_perf_event for cpu:2
Init bpf_perf_event for cpu:3
Init bpf_perf_event for cpu:4
Init bpf_perf_event for cpu:5
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
LOOKUP event for map id: 2 and type: 6
Waiting for map events...
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH net-next] team: account for oper state
From: George Wilkie @ 2018-04-19 9:33 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev
In-Reply-To: <20180418191732.GB1922@nanopsycho>
On Wed, Apr 18, 2018 at 09:17:32PM +0200, Jiri Pirko wrote:
> Wed, Apr 18, 2018 at 05:33:12PM CEST, gwilkie@vyatta.att-mail.com wrote:
> >On Wed, Apr 18, 2018 at 04:58:22PM +0200, Jiri Pirko wrote:
> >> Wed, Apr 18, 2018 at 03:35:49PM CEST, gwilkie@vyatta.att-mail.com wrote:
> >> >On Wed, Apr 18, 2018 at 02:56:44PM +0200, Jiri Pirko wrote:
> >> >> Wed, Apr 18, 2018 at 12:29:50PM CEST, gwilkie@vyatta.att-mail.com wrote:
> >> >> >Account for operational state when determining port linkup state,
> >> >> >as per Documentation/networking/operstates.txt.
> >> >>
> >> >> Could you please point me to the exact place in the document where this
> >> >> is suggested?
> >> >>
> >> >
> >> >Various places cover it I think.
> >> >
> >> >In 1. Introduction:
> >> >"interface is not usable just because the admin enabled it"
> >> >"userspace must be granted the possibility to
> >> >influence operational state"
> >> >
> >> >In 4. Setting from userspace:
> >> >"the userspace application can set IFLA_OPERSTATE
> >> >to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
> >> >netif_carrier_off() or netif_dormant_on()"
> >> >
> >> >We have a use case where we want to set the oper state of the team ports based
> >> >on whether they are actually usable or not (as opposed to just admin up).
> >>
> >> Are you running a supplicant there or what is the use-case?
> >>
> >
> >We are using tun/tap interfaces for the team ports with the physical interfaces
> >under the control of a user process.
> >
> >> How is this handle in other drivers like bond, openvswitch, bridge, etc?
> >
> >It looks like bridge is using it, looking at br_port_carrier_check() and
> >br_add_if().
>
> Okay, so why do you still need to check netif_carrier_ok?
> Looks like netif_oper_up is enough, right?
Yes, I was being overly cautious. Replacing netif_carrier_ok with netif_oper_up
works OK. I'll send updated patch.
Cheers.
>
>
> >
> >Cheers.
> >
> >>
> >> >
> >> >Cheers.
> >> >
> >> >>
> >> >> >
> >> >> >Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
> >> >> >---
> >> >> > drivers/net/team/team.c | 3 ++-
> >> >> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >> >> >
> >> >> >diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> >> >> >index a6c6ce19eeee..231264a05e55 100644
> >> >> >--- a/drivers/net/team/team.c
> >> >> >+++ b/drivers/net/team/team.c
> >> >> >@@ -2918,7 +2918,8 @@ static int team_device_event(struct notifier_block *unused,
> >> >> > case NETDEV_CHANGE:
> >> >> > if (netif_running(port->dev))
> >> >> > team_port_change_check(port,
> >> >> >- !!netif_carrier_ok(port->dev));
> >> >> >+ !!(netif_carrier_ok(port->dev) &&
> >> >> >+ netif_oper_up(port->dev)));
> >> >> > break;
> >> >> > case NETDEV_UNREGISTER:
> >> >> > team_del_slave(port->team->dev, dev);
> >> >> >--
> >> >> >2.11.0
> >> >> >
>
>
^ permalink raw reply
* Re: [PATCH bpf-next v3 2/8] bpf: add documentation for eBPF helpers (01-11)
From: Daniel Borkmann @ 2018-04-19 10:02 UTC (permalink / raw)
To: Quentin Monnet, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180417143438.7018-3-quentin.monnet@netronome.com>
On 04/17/2018 04:34 PM, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions, all
> written by Alexei:
>
> - bpf_map_lookup_elem()
> - bpf_map_update_elem()
> - bpf_map_delete_elem()
> - bpf_probe_read()
> - bpf_ktime_get_ns()
> - bpf_trace_printk()
> - bpf_skb_store_bytes()
> - bpf_l3_csum_replace()
> - bpf_l4_csum_replace()
> - bpf_tail_call()
> - bpf_clone_redirect()
>
> v3:
> - bpf_map_lookup_elem(): Fix description of restrictions for flags
> related to the existence of the entry.
> - bpf_trace_printk(): State that trace_pipe can be configured. Fix
> return value in case an unknown format specifier is met. Add a note on
> kernel log notice when the helper is used. Edit example.
> - bpf_tail_call(): Improve comment on stack inheritance.
> - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Thanks for doing all this work, Quentin!
Just some small improvements while reading over it:
> ---
> include/uapi/linux/bpf.h | 210 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 210 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 45f77f01e672..02b7d522b3c0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -381,6 +381,216 @@ union bpf_attr {
> * intentional, removing them would break paragraphs for rst2man.
> *
> * Start of BPF helper function descriptions:
> + *
> + * void *bpf_map_lookup_elem(struct bpf_map *map, void *key)
const void *key
> + * Description
> + * Perform a lookup in *map* for an entry associated to *key*.
> + * Return
> + * Map value associated to *key*, or **NULL** if no entry was
> + * found.
> + *
> + * int bpf_map_update_elem(struct bpf_map *map, void *key, void *value, u64 flags)
const void *key, const void *value
> + * Description
> + * Add or update the value of the entry associated to *key* in
> + * *map* with *value*. *flags* is one of:
> + *
> + * **BPF_NOEXIST**
> + * The entry for *key* must not exist in the map.
> + * **BPF_EXIST**
> + * The entry for *key* must already exist in the map.
> + * **BPF_ANY**
> + * No condition on the existence of the entry for *key*.
> + *
> + * Flag value **BPF_NOEXIST** cannot be used for maps of types
> + * **BPF_MAP_TYPE_ARRAY** or **BPF_MAP_TYPE_PERCPU_ARRAY** (all
> + * elements always exist), the helper would return an error.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_map_delete_elem(struct bpf_map *map, void *key)
const void *key
> + * Description
> + * Delete entry with *key* from *map*.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_probe_read(void *dst, u32 size, const void *src)
> + * Description
> + * For tracing programs, safely attempt to read *size* bytes from
> + * address *src* and store the data in *dst*.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * u64 bpf_ktime_get_ns(void)
> + * Description
> + * Return the time elapsed since system boot, in nanoseconds.
> + * Return
> + * Current *ktime*.
> + *
> + * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
> + * Description
> + * This helper is a "printk()-like" facility for debugging. It
> + * prints a message defined by format *fmt* (of size *fmt_size*)
> + * to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
> + * available. It can take up to three additional **u64**
> + * arguments (as an eBPF helpers, the total number of arguments is
> + * limited to five).
> + *
> + * Each time the helper is called, it appends a line to the trace.
> + * The format of the trace is customizable, and the exact output
> + * one will get depends on the options set in
> + * *\/sys/kernel/debug/tracing/trace_options* (see also the
> + * *README* file under the same directory). However, it usually
> + * defaults to something like:
> + *
> + * ::
> + *
> + * telnet-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
> + *
> + * In the above:
> + *
> + * * ``telnet`` is the name of the current task.
> + * * ``470`` is the PID of the current task.
> + * * ``001`` is the CPU number on which the task is
> + * running.
> + * * In ``.N..``, each character refers to a set of
> + * options (whether irqs are enabled, scheduling
> + * options, whether hard/softirqs are running, level of
> + * preempt_disabled respectively). **N** means that
> + * **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
> + * are set.
> + * * ``419421.045894`` is a timestamp.
> + * * ``0x00000001`` is a fake value used by BPF for the
> + * instruction pointer register.
> + * * ``<formatted msg>`` is the message formatted with
> + * *fmt*.
> + *
> + * The conversion specifiers supported by *fmt* are similar, but
> + * more limited than for printk(). They are **%d**, **%i**,
> + * **%u**, **%x**, **%ld**, **%li**, **%lu**, **%lx**, **%lld**,
> + * **%lli**, **%llu**, **%llx**, **%p**, **%s**. No modifier (size
> + * of field, padding with zeroes, etc.) is available, and the
> + * helper will return **-EINVAL** (but print nothing) if it
> + * encounters an unknown specifier.
> + *
> + * Also, note that **bpf_trace_printk**\ () is slow, and should
> + * only be used for debugging purposes. For this reason, a notice
> + * bloc (spanning several lines) is printed to kernel logs and
> + * states that the helper should not be used "for production use"
> + * the first time this helper is used (or more precisely, when
> + * **trace_printk**\ () buffers are allocated). For passing values
> + * to user space, perf events should be preferred.
> + * Return
> + * The number of bytes written to the buffer, or a negative error
> + * in case of failure.
> + *
> + * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
> + * Description
> + * Store *len* bytes from address *from* into the packet
> + * associated to *skb*, at *offset*. *flags* are a combination of
> + * **BPF_F_RECOMPUTE_CSUM** (automatically recompute the
> + * checksum for the packet after storing the bytes) and
> + * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\
> + * **->swhash** and *skb*\ **->l4hash** to 0).
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
Perhaps: to change the underlying packet buffer (it's not the data itself but
a potential reallocation to e.g. unclone the skb which otherwise would lead to
a use-after-free if not forced to be invalidated from verifier).
> + * previously done by the verifier are invalidated and must be
> + * performed again.
I would add something like "if used in combination with direct packet access"
where it's the only place this comment is relevant.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 size)
> + * Description
> + * Recompute the IP checksum for the packet associated to *skb*.
Perhaps we could say something like 'L3 (e.g. IP)'. The helper itself has zero
knowledge of the underlying protocol used. It's solely replacing the csum and
updating skb->csum when necessary.
> + * Computation is incremental, so the helper must know the former
> + * value of the header field that was modified (*from*), the new
> + * value of this field (*to*), and the number of bytes (2 or 4)
> + * for this field, stored in *size*. Alternatively, it is possible
> + * to store the difference between the previous and the new values
> + * of the header field in *to*, by setting *from* and *size* to 0.
I would add that this works in combination with csum_diff() helper, allows for
more flexibility and to handle sizes larger than 2 or 4.
> + * For both methods, *offset* indicates the location of the IP
> + * checksum within the packet.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 flags)
> + * Description
> + * Recompute the TCP or UDP checksum for the packet associated to
See prior comment, L4 (e.g. TCP, UDP or ICMP).
> + * *skb*. Computation is incremental, so the helper must know the
> + * former value of the header field that was modified (*from*),
> + * the new value of this field (*to*), and the number of bytes (2
> + * or 4) for this field, stored on the lowest four bits of
> + * *flags*. Alternatively, it is possible to store the difference
> + * between the previous and the new values of the header field in
> + * *to*, by setting *from* and the four lowest bits of *flags* to
Same reference for csum_diff().
> + * 0. For both methods, *offset* indicates the location of the IP
> + * checksum within the packet. In addition to the size of the
> + * field, *flags* can be added (bitwise OR) actual flags. With
> + * **BPF_F_MARK_MANGLED_0**, a null checksum is left untouched
> + * (unless **BPF_F_MARK_ENFORCE** is added as well), and for
> + * updates resulting in a null checksum the value is set to
> + * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR**
> + * indicates the checksum is to be computed against a
> + * pseudo-header.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
> + * Description
> + * This special helper is used to trigger a "tail call", or in
> + * other words, to jump into another eBPF program. The same stack
> + * frame is used (but values on stack and in registers for the
> + * caller are not accessible to the callee). This mechanism allows
> + * for program chaining, either for raising the maximum number of
> + * available eBPF instructions, or to execute given programs in
> + * conditional blocks. For security reasons, there is an upper
> + * limit to the number of successive tail calls that can be
> + * performed.
> + *
> + * Upon call of this helper, the program attempts to jump into a
> + * program referenced at index *index* in *prog_array_map*, a
> + * special map of type **BPF_MAP_TYPE_PROG_ARRAY**, and passes
> + * *ctx*, a pointer to the context.
> + *
> + * If the call succeeds, the kernel immediately runs the first
> + * instruction of the new program. This is not a function call,
> + * and it never goes back to the previous program. If the call
Nit: s/goes back/returns/
> + * fails, then the helper has no effect, and the caller continues
> + * to run its own instructions. A call can fail if the destination
Maybe: s/to run its own/to run its subsequent/
> + * program for the jump does not exist (i.e. *index* is superior
> + * to the number of entries in *prog_array_map*), or if the
> + * maximum number of tail calls has been reached for this chain of
> + * programs. This limit is defined in the kernel by the macro
> + * **MAX_TAIL_CALL_CNT** (not accessible to user space), which
> + * is currently set to 32.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
> + * Description
> + * Clone and redirect the packet associated to *skb* to another
> + * net device of index *ifindex*. Both ingress and egress
> + * interfaces can be used for redirection. The **BPF_F_INGRESS**
> + * value in *flags* is used to make the distinction (ingress path
> + * is selected if the flag is present, egress path otherwise).
> + * This is the only flag supported for now.
Would probably make sense to describe the relation to bpf_redirect() helper
in one sentence, meaning, that bpf_clone_redirect() has the associated cost
duplicating the skb but the helper can be used out of the BPF prog whereas
this is not the case with bpf_redirect() which is therefore more efficient
but handled through an action code where the redirect happens after the BPF
prog returned.
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
>
^ permalink raw reply
* Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
From: Thomas Gleixner @ 2018-04-19 10:03 UTC (permalink / raw)
To: Jesus Sanchez-Palencia
Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes, richardcochran,
anna-maria, henrik, John Stultz, levi.pearson, edumazet, willemb,
mlichvar
In-Reply-To: <3d1c27a2-7a21-ab80-e92b-47b415b70548@intel.com>
On Wed, 11 Apr 2018, Jesus Sanchez-Palencia wrote:
> On 04/11/2018 01:16 PM, Thomas Gleixner wrote:
> >> So there is a "clockid" that can be used for the full hw offload modes. On this
> >> case, the txtimes are in reference to the NIC's PTP clock, and, as discussed, we
> >> can't just use a clockid that was computed from the fd pointing to /dev/ptpX .
> >
> > And the NICs PTP clock is CLOCK_TAI, so there should be no reason to have
> > yet another clock, right?
>
> Just breaking this down a bit, yes, TAI is the network time base, and the NICs
> PTP clock use that because PTP is (commonly) based on TAI. After the PHCs have
> been synchronized over the network (e.g. with ptp4l), my understanding is that
> if applications want to use the clockid_t CLOCK_TAI as a network clock reference
> it's required that something (i.e. phc2sys) is synchronizing the PHCs and the
> system clock, and also that something calls adjtime to apply the TAI vs UTC
> offset to CLOCK_TAI.
>
> If we are fine with those 'dependencies', then I agree there is no need for
> another clock.
>
> I was thinking about the full offload use-cases, thus when no scheduling is
> happening inside the qdiscs. Applications could just read the time from the PHC
> clocks directly without having to rely on any of the above. On this case,
> userspace would use DYNAMIC_CLOCK just to flag that this is the case, but I must
> admit it's not clear to me how common of a use-case that is, or even if it makes
> sense.
I don't think it makes a lot of sense because the only use case for that is
a full user space scheduler which routes _ALL_ traffic. I don't think
that's something which we want to proliferate.
So I'd rather start off with the CLOCK_TAI assumption and if the need
really arises we can discuss that separately. So you can take a clockid
into account when designing the ABI, but have it CLOCK_TAI only for the
start.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH] net: phy: marvell: clear wol event before setting it
From: Bhadram Varka @ 2018-04-19 10:05 UTC (permalink / raw)
To: Jisheng Zhang
Cc: Andrew Lunn, Florian Fainelli, David S. Miller,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jingju Hou
In-Reply-To: <20180419170932.7a0b88fb@xhacker.debian>
HiJisheng,
On 4/19/2018 2:39 PM, Jisheng Zhang wrote:
> On Thu, 19 Apr 2018 09:00:40 +0000 Bhadram Varka wrote:
>
>> Hi,
>>
>>> -----Original Message-----
>>> From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
>>> Sent: Thursday, April 19, 2018 2:24 PM
>>> To: Bhadram Varka <vbhadram@nvidia.com>
>>> Cc: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>;
>>> David S. Miller <davem@davemloft.net>; netdev@vger.kernel.org; linux-
>>> kernel@vger.kernel.org; Jingju Hou <Jingju.Hou@synaptics.com>
>>> Subject: Re: [PATCH] net: phy: marvell: clear wol event before setting it
>>>
>>> Hi,
>>>
>>> On Thu, 19 Apr 2018 08:38:45 +0000 Bhadram Varka wrote:
>>>
>>>> Hi,
>>>>
>>>>> -----Original Message-----
>>>>> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
>>>>> Behalf Of Jisheng Zhang
>>>>> Sent: Thursday, April 19, 2018 1:33 PM
>>>>> To: Andrew Lunn <andrew@lunn.ch>; Florian Fainelli
>>>>> <f.fainelli@gmail.com>; David S. Miller <davem@davemloft.net>
>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jingju Hou
>>>>> <Jingju.Hou@synaptics.com>
>>>>> Subject: [PATCH] net: phy: marvell: clear wol event before setting
>>>>> it
>>>>>
>>>>> From: Jingju Hou <Jingju.Hou@synaptics.com>
>>>>>
>>>>> If WOL event happened once, the LED[2] interrupt pin will not be
>>>>> cleared unless reading the CSISR register. So clear the WOL event before
>>> enabling it.
>>>>> Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com>
>>>>> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
>>>>> ---
>>>>> drivers/net/phy/marvell.c | 9 +++++++++
>>>>> 1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
>>>>> index c22e8e383247..b6abe1cbc84b 100644
>>>>> --- a/drivers/net/phy/marvell.c
>>>>> +++ b/drivers/net/phy/marvell.c
>>>>> @@ -115,6 +115,9 @@
>>>>> /* WOL Event Interrupt Enable */
>>>>> #define MII_88E1318S_PHY_CSIER_WOL_EIE BIT(7)
>>>>>
>>>>> +/* Copper Specific Interrupt Status Register */
>>>>> +#define MII_88E1318S_PHY_CSISR 0x13
>>>>> +
>>>> There is already macro to represent this register - MII_M1011_IEVENT. Do we
>>> need this macro ?
>>>
>>> Good point. Will use MII_M1011_IEVENT instead in v2.
>>>
>>>>
>>>>> /* LED Timer Control Register */
>>>>> #define MII_88E1318S_PHY_LED_TCR 0x12
>>>>> #define MII_88E1318S_PHY_LED_TCR_FORCE_INT BIT(15)
>>>>> @@ -1393,6 +1396,12 @@ static int m88e1318_set_wol(struct phy_device
>>>>> *phydev,
>>>>> if (err < 0)
>>>>> goto error;
>>>>>
>>>>> + /* If WOL event happened once, the LED[2] interrupt pin
>>>>> + * will not be cleared unless reading the CSISR register.
>>>>> + * So clear the WOL event first before enabling it.
>>>>> + */
>>>>> + phy_read(phydev, MII_88E1318S_PHY_CSISR);
>>>> This part of the operation already taken care by ack_interrupt and
>>>> did_interrupt [....] .ack_interrupt = &marvell_ack_interrupt,
>>>> .did_interrupt = &m88e1121_did_interrupt, [...]
>>>>
>>>> If at all WOL event occurred marvell_ack_interrupt will take care of clearing the
>>> interrupt status register.
>>>> Am I missing anything here ?
>>> If there's no valid irq for phy, the ack_interrupt/did_interrupt won't be called.
>> Which means that the PHY is not having Interrupt pin ?
> No valid irq doesn't mean "not having interrupt pin". they are different
Okay. If there is WoL event through magic packet then its valid irq for
the PHY right.
Then phy_interrupt will be called from there ack/did_interrupts will be
called. So it clears WoL interrupt.
>
>> Generally through PHY interrupt will wake up the system right. If there is no interrupt pin then how the system will wake up the from suspend for the magic packet.?
>>
> IIRC, the phy irq isn't necessary for WOL. The phy interrupt pin isn't
> necessarily taken as "interrupt"
Please correct me if I am wrong. In this case how the system will wake
up from the SC7.There has to be wake capable irq/gpio pin to do this
operation.
Thanks,
Bhadram.
^ permalink raw reply
* [PATCH iproute2-next 1/2] man: ip link: document GRE tunnels
From: Sabrina Dubroca @ 2018-04-19 10:22 UTC (permalink / raw)
To: netdev; +Cc: dsahern, Sabrina Dubroca
GRE tunnels are currently only documented together with IPIP and SIT
tunnels, but they actually have very different configuration
options. Let's separate them.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
man/man8/ip-link.8.in | 152 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 148 insertions(+), 4 deletions(-)
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 5dee9fcd627a..77ab8a3b9723 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -693,13 +693,13 @@ tunnel.
.in -8
.TP
-GRE, IPIP, SIT Type Support
-For a link of types
-.I GRE/IPIP/SIT
+IPIP, SIT Type Support
+For a link of type
+.IR IPIP or SIT
the following additional arguments are supported:
.BI "ip link add " DEVICE
-.BR type " { " gre " | " ipip " | " sit " }"
+.BR type " { " ipip " | " sit " }"
.BI " remote " ADDR " local " ADDR
[
.BR encap " { " fou " | " gue " | " none " }"
@@ -764,6 +764,150 @@ IPv6-Over-IPv4 is not supported for IPIP.
- make this tunnel externally controlled
.RB "(e.g. " "ip route encap" ).
+.in -8
+.TP
+GRE Type Support
+For a link of type
+.IR GRE " or " GRETAP
+the following additional arguments are supported:
+
+.BI "ip link add " DEVICE
+.BR type " { " gre " | " gretap " }"
+.BI " remote " ADDR " local " ADDR
+[
+.RB [ i | o ] seq
+] [
+.RB [ i | o ] key
+.I KEY
+] [
+.RB [ i | o ] csum
+] [
+.BI ttl " TTL "
+] [
+.BI tos " TOS "
+] [
+.RB [ no ] pmtudisc
+] [
+.RB [ no ] ignore-df
+] [
+.BI dev " PHYS_DEV "
+] [
+.BR encap " { " fou " | " gue " | " none " }"
+] [
+.BR encap-sport " { " \fIPORT " | " auto " }"
+] [
+.BI "encap-dport " PORT
+] [
+.RB [ no ] encap-csum
+] [
+.RB [ no ] encap-remcsum
+] [
+.BR external
+]
+
+.in +8
+.sp
+.BI remote " ADDR "
+- specifies the remote address of the tunnel.
+
+.sp
+.BI local " ADDR "
+- specifies the fixed local address for tunneled packets.
+It must be an address on another interface on this host.
+
+.sp
+.RB [ i | o ] seq
+- serialize packets.
+The
+.B oseq
+flag enables sequencing of outgoing packets.
+The
+.B iseq
+flag requires that all input packets are serialized.
+
+.sp
+.RB [ i | o ] key
+.I KEY
+- use keyed GRE with key
+.IR KEY ". "KEY
+is either a number or an IPv4 address-like dotted quad.
+The
+.B key
+parameter specifies the same key to use in both directions.
+The
+.BR ikey " and " okey
+parameters specify different keys for input and output.
+
+.sp
+.RB [ i | o ] csum
+- generate/require checksums for tunneled packets.
+The
+.B ocsum
+flag calculates checksums for outgoing packets.
+The
+.B icsum
+flag requires that all input packets have the correct
+checksum. The
+.B csum
+flag is equivalent to the combination
+.B "icsum ocsum" .
+
+.sp
+.BI ttl " TTL"
+- specifies the TTL value to use in outgoing packets.
+
+.sp
+.BI tos " TOS"
+- specifies the TOS value to use in outgoing packets.
+
+.sp
+.RB [ no ] pmtudisc
+- enables/disables Path MTU Discovery on this tunnel.
+It is enabled by default. Note that a fixed ttl is incompatible
+with this option: tunneling with a fixed ttl always makes pmtu
+discovery.
+
+.sp
+.RB [ no ] ignore-df
+- enables/disables IPv4 DF suppression on this tunnel.
+Normally datagrams that exceed the MTU will be fragmented; the presence
+of the DF flag inhibits this, resulting instead in an ICMP Unreachable
+(Fragmentation Required) message. Enabling this attribute casues the
+DF flag to be ignored.
+
+.sp
+.BI dev " PHYS_DEV"
+- specifies the physical device to use for tunnel endpoint communication.
+
+.sp
+.BR encap " { " fou " | " gue " | " none " }"
+- specifies type of secondary UDP encapsulation. "fou" indicates
+Foo-Over-UDP, "gue" indicates Generic UDP Encapsulation.
+
+.sp
+.BR encap-sport " { " \fIPORT " | " auto " }"
+- specifies the source port in UDP encapsulation.
+.IR PORT
+indicates the port by number, "auto"
+indicates that the port number should be chosen automatically
+(the kernel picks a flow based on the flow hash of the
+encapsulated packet).
+
+.sp
+.RB [ no ] encap-csum
+- specifies if UDP checksums are enabled in the secondary
+encapsulation.
+
+.sp
+.RB [ no ] encap-remcsum
+- specifies if Remote Checksum Offload is enabled. This is only
+applicable for Generic UDP Encapsulation.
+
+.sp
+.BR external
+- make this tunnel externally controlled
+.RB "(e.g. " "ip route encap" ).
+
.in -8
.TP
--
2.17.0
^ permalink raw reply related
* [PATCH iproute2-next 2/2] gre/gre6: allow clearing {,i,o}{key,seq,csum} flags
From: Sabrina Dubroca @ 2018-04-19 10:22 UTC (permalink / raw)
To: netdev; +Cc: dsahern, Sabrina Dubroca
In-Reply-To: <a1bfb22fa7ce8940e7b4430d1aa0a9ed81b33d6f.1524133203.git.sd@queasysnail.net>
Currently, iproute allows setting those flags, but it's impossible to
clear them, since their current value is fetched from the kernel and
then we OR in the additional flags passed on the command line.
Add no* variants to allow clearing them.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
ip/link_gre.c | 27 ++++++++++++++++++++++++---
ip/link_gre6.c | 27 ++++++++++++++++++++++++---
man/man8/ip-link.8.in | 27 ++++++++++++++++++---------
3 files changed, 66 insertions(+), 15 deletions(-)
diff --git a/ip/link_gre.c b/ip/link_gre.c
index bc1cee8fbca2..4c8849f7051d 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -31,9 +31,9 @@ static void gre_print_help(struct link_util *lu, int argc, char **argv, FILE *f)
);
fprintf(f,
" [ local ADDR ]\n"
- " [ [i|o]seq ]\n"
- " [ [i|o]key KEY ]\n"
- " [ [i|o]csum ]\n"
+ " [ [no][i|o]seq ]\n"
+ " [ [i|o]key KEY | no[i|o]key ]\n"
+ " [ [no][i|o]csum ]\n"
" [ ttl TTL ]\n"
" [ tos TOS ]\n"
" [ [no]pmtudisc ]\n"
@@ -210,28 +210,49 @@ get_failed:
iflags |= GRE_KEY;
oflags |= GRE_KEY;
ikey = okey = tnl_parse_key("key", *argv);
+ } else if (!matches(*argv, "nokey")) {
+ iflags &= ~GRE_KEY;
+ oflags &= ~GRE_KEY;
} else if (!matches(*argv, "ikey")) {
NEXT_ARG();
iflags |= GRE_KEY;
ikey = tnl_parse_key("ikey", *argv);
+ } else if (!matches(*argv, "noikey")) {
+ iflags &= ~GRE_KEY;
} else if (!matches(*argv, "okey")) {
NEXT_ARG();
oflags |= GRE_KEY;
okey = tnl_parse_key("okey", *argv);
+ } else if (!matches(*argv, "noikey")) {
+ iflags &= ~GRE_KEY;
} else if (!matches(*argv, "seq")) {
iflags |= GRE_SEQ;
oflags |= GRE_SEQ;
+ } else if (!matches(*argv, "noseq")) {
+ iflags &= ~GRE_SEQ;
+ oflags &= ~GRE_SEQ;
} else if (!matches(*argv, "iseq")) {
iflags |= GRE_SEQ;
+ } else if (!matches(*argv, "noiseq")) {
+ iflags &= ~GRE_SEQ;
} else if (!matches(*argv, "oseq")) {
oflags |= GRE_SEQ;
+ } else if (!matches(*argv, "nooseq")) {
+ oflags &= ~GRE_SEQ;
} else if (!matches(*argv, "csum")) {
iflags |= GRE_CSUM;
oflags |= GRE_CSUM;
+ } else if (!matches(*argv, "nocsum")) {
+ iflags &= ~GRE_CSUM;
+ oflags &= ~GRE_CSUM;
} else if (!matches(*argv, "icsum")) {
iflags |= GRE_CSUM;
+ } else if (!matches(*argv, "noicsum")) {
+ iflags &= ~GRE_CSUM;
} else if (!matches(*argv, "ocsum")) {
oflags |= GRE_CSUM;
+ } else if (!matches(*argv, "noocsum")) {
+ oflags &= ~GRE_CSUM;
} else if (!matches(*argv, "nopmtudisc")) {
pmtudisc = 0;
} else if (!matches(*argv, "pmtudisc")) {
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index a6fe0b73d235..542da0c3ccc9 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -38,9 +38,9 @@ static void gre_print_help(struct link_util *lu, int argc, char **argv, FILE *f)
);
fprintf(f,
" [ local ADDR ]\n"
- " [ [i|o]seq ]\n"
- " [ [i|o]key KEY ]\n"
- " [ [i|o]csum ]\n"
+ " [ [no][i|o]seq ]\n"
+ " [ [i|o]key KEY | no[i|o]key ]\n"
+ " [ [no][i|o]csum ]\n"
" [ hoplimit TTL ]\n"
" [ encaplimit ELIM ]\n"
" [ tclass TCLASS ]\n"
@@ -220,28 +220,49 @@ get_failed:
iflags |= GRE_KEY;
oflags |= GRE_KEY;
ikey = okey = tnl_parse_key("key", *argv);
+ } else if (!matches(*argv, "nokey")) {
+ iflags &= ~GRE_KEY;
+ oflags &= ~GRE_KEY;
} else if (!matches(*argv, "ikey")) {
NEXT_ARG();
iflags |= GRE_KEY;
ikey = tnl_parse_key("ikey", *argv);
+ } else if (!matches(*argv, "noikey")) {
+ iflags &= ~GRE_KEY;
} else if (!matches(*argv, "okey")) {
NEXT_ARG();
oflags |= GRE_KEY;
okey = tnl_parse_key("okey", *argv);
+ } else if (!matches(*argv, "noikey")) {
+ iflags &= ~GRE_KEY;
} else if (!matches(*argv, "seq")) {
iflags |= GRE_SEQ;
oflags |= GRE_SEQ;
+ } else if (!matches(*argv, "noseq")) {
+ iflags &= ~GRE_SEQ;
+ oflags &= ~GRE_SEQ;
} else if (!matches(*argv, "iseq")) {
iflags |= GRE_SEQ;
+ } else if (!matches(*argv, "noiseq")) {
+ iflags &= ~GRE_SEQ;
} else if (!matches(*argv, "oseq")) {
oflags |= GRE_SEQ;
+ } else if (!matches(*argv, "nooseq")) {
+ oflags &= ~GRE_SEQ;
} else if (!matches(*argv, "csum")) {
iflags |= GRE_CSUM;
oflags |= GRE_CSUM;
+ } else if (!matches(*argv, "nocsum")) {
+ iflags &= ~GRE_CSUM;
+ oflags &= ~GRE_CSUM;
} else if (!matches(*argv, "icsum")) {
iflags |= GRE_CSUM;
+ } else if (!matches(*argv, "noicsum")) {
+ iflags &= ~GRE_CSUM;
} else if (!matches(*argv, "ocsum")) {
oflags |= GRE_CSUM;
+ } else if (!matches(*argv, "noocsum")) {
+ oflags &= ~GRE_CSUM;
} else if (!matches(*argv, "remote")) {
NEXT_ARG();
get_addr(&daddr, *argv, AF_INET6);
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 77ab8a3b9723..83ef3cae54b9 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -775,12 +775,14 @@ the following additional arguments are supported:
.BR type " { " gre " | " gretap " }"
.BI " remote " ADDR " local " ADDR
[
-.RB [ i | o ] seq
+.RB [ no ] "" [ i | o ] seq
] [
.RB [ i | o ] key
.I KEY
+|
+.BR no [ i | o ] key
] [
-.RB [ i | o ] csum
+.RB [ no ] "" [ i | o ] csum
] [
.BI ttl " TTL "
] [
@@ -816,7 +818,7 @@ the following additional arguments are supported:
It must be an address on another interface on this host.
.sp
-.RB [ i | o ] seq
+.RB [ no ] "" [ i | o ] seq
- serialize packets.
The
.B oseq
@@ -828,6 +830,8 @@ flag requires that all input packets are serialized.
.sp
.RB [ i | o ] key
.I KEY
+|
+.BR no [ i | o ] key
- use keyed GRE with key
.IR KEY ". "KEY
is either a number or an IPv4 address-like dotted quad.
@@ -839,7 +843,7 @@ The
parameters specify different keys for input and output.
.sp
-.RB [ i | o ] csum
+.RB [ no ] "" [ i | o ] csum
- generate/require checksums for tunneled packets.
The
.B ocsum
@@ -920,12 +924,14 @@ the following additional arguments are supported:
.BR type " { " ip6gre " | " ip6gretap " }"
.BI remote " ADDR " local " ADDR"
[
-.RB [ i | o ] seq
+.RB [ no ] "" [ i | o ] seq
] [
.RB [ i | o ] key
.I KEY
+|
+.BR no [ i | o ] key
] [
-.RB [ i | o ] csum
+.RB [ no ] "" [ i | o ] csum
] [
.BI hoplimit " TTL "
] [
@@ -955,7 +961,7 @@ the following additional arguments are supported:
It must be an address on another interface on this host.
.sp
-.RB [ i | o ] seq
+.RB [ no ] "" [ i | o ] seq
- serialize packets.
The
.B oseq
@@ -965,7 +971,10 @@ The
flag requires that all input packets are serialized.
.sp
-.RB [ i | o ] key " \fIKEY"
+.RB [ i | o ] key
+.I KEY
+|
+.BR no [ i | o ] key
- use keyed GRE with key
.IR KEY ". "KEY
is either a number or an IPv4 address-like dotted quad.
@@ -977,7 +986,7 @@ The
parameters specify different keys for input and output.
.sp
-.RB [ i | o ] csum
+.RB [ no ] "" [ i | o ] csum
- generate/require checksums for tunneled packets.
The
.B ocsum
--
2.17.0
^ permalink raw reply related
* [PATCH net-next v2] team: account for oper state
From: George Wilkie @ 2018-04-19 10:24 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev
Account for operational state when determining port linkup state,
as per Documentation/networking/operstates.txt.
Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
---
drivers/net/team/team.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index a6c6ce19eeee..ed4d109f40f3 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2918,7 +2918,7 @@ static int team_device_event(struct notifier_block *unused,
case NETDEV_CHANGE:
if (netif_running(port->dev))
team_port_change_check(port,
- !!netif_carrier_ok(port->dev));
+ !!(netif_oper_up(port->dev)));
break;
case NETDEV_UNREGISTER:
team_del_slave(port->team->dev, dev);
--
2.11.0
^ permalink raw reply related
* Re: [PATCH net-next v2] team: account for oper state
From: George Wilkie @ 2018-04-19 10:28 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev
In-Reply-To: <20180419102415.549-1-gwilkie@vyatta.att-mail.com>
On Thu, Apr 19, 2018 at 11:24:15AM +0100, George Wilkie wrote:
> Account for operational state when determining port linkup state,
> as per Documentation/networking/operstates.txt.
>
> Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
> ---
> drivers/net/team/team.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> index a6c6ce19eeee..ed4d109f40f3 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -2918,7 +2918,7 @@ static int team_device_event(struct notifier_block *unused,
> case NETDEV_CHANGE:
> if (netif_running(port->dev))
> team_port_change_check(port,
> - !!netif_carrier_ok(port->dev));
> + !!(netif_oper_up(port->dev)));
Bah, forgot to remove the extra ().
> break;
> case NETDEV_UNREGISTER:
> team_del_slave(port->team->dev, dev);
> --
> 2.11.0
>
>
^ permalink raw reply
* Re: [PATCH bpf-next v3 3/8] bpf: add documentation for eBPF helpers (12-22)
From: Daniel Borkmann @ 2018-04-19 10:29 UTC (permalink / raw)
To: Quentin Monnet, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180417143438.7018-4-quentin.monnet@netronome.com>
On 04/17/2018 04:34 PM, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
>
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
>
> This patch contains descriptions for the following helper functions, all
> written by Alexei:
>
> - bpf_get_current_pid_tgid()
> - bpf_get_current_uid_gid()
> - bpf_get_current_comm()
> - bpf_skb_vlan_push()
> - bpf_skb_vlan_pop()
> - bpf_skb_get_tunnel_key()
> - bpf_skb_set_tunnel_key()
> - bpf_redirect()
> - bpf_perf_event_output()
> - bpf_get_stackid()
> - bpf_get_current_task()
>
> v3:
> - bpf_skb_get_tunnel_key(): Change and improve description and example.
> - bpf_redirect(): Improve description of BPF_F_INGRESS flag.
> - bpf_perf_event_output(): Fix first sentence of description. Delete
> wrong statement on context being evaluated as a struct pt_reg. Remove
> the long yet incomplete example.
> - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
> configurable.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
> include/uapi/linux/bpf.h | 225 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 225 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 02b7d522b3c0..c59bf5b28164 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -591,6 +591,231 @@ union bpf_attr {
> * performed again.
> * Return
> * 0 on success, or a negative error in case of failure.
> + *
> + * u64 bpf_get_current_pid_tgid(void)
> + * Return
> + * A 64-bit integer containing the current tgid and pid, and
> + * created as such:
> + * *current_task*\ **->tgid << 32 \|**
> + * *current_task*\ **->pid**.
> + *
> + * u64 bpf_get_current_uid_gid(void)
> + * Return
> + * A 64-bit integer containing the current GID and UID, and
> + * created as such: *current_gid* **<< 32 \|** *current_uid*.
> + *
> + * int bpf_get_current_comm(char *buf, u32 size_of_buf)
> + * Description
> + * Copy the **comm** attribute of the current task into *buf* of
> + * *size_of_buf*. The **comm** attribute contains the name of
> + * the executable (excluding the path) for the current task. The
> + * *size_of_buf* must be strictly positive. On success, the
> + * helper makes sure that the *buf* is NUL-terminated. On failure,
> + * it is filled with zeroes.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
> + * Description
> + * Push a *vlan_tci* (VLAN tag control information) of protocol
> + * *vlan_proto* to the packet associated to *skb*, then update
> + * the checksum. Note that if *vlan_proto* is different from
> + * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
> + * be **ETH_P_8021Q**.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_vlan_pop(struct sk_buff *skb)
> + * Description
> + * Pop a VLAN header from the packet associated to *skb*.
> + *
> + * A call to this helper is susceptible to change data from the
> + * packet. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
> + * Description
> + * Get tunnel metadata. This helper takes a pointer *key* to an
> + * empty **struct bpf_tunnel_key** of **size**, that will be
> + * filled with tunnel metadata for the packet associated to *skb*.
> + * The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which
> + * indicates that the tunnel is based on IPv6 protocol instead of
> + * IPv4.
> + *
> + * The **struct bpf_tunnel_key** is an object that generalizes the
> + * principal parameters used by various tunneling protocols into a
> + * single struct. This way, it can be used to easily make a
> + * decision based on the contents of the encapsulation header,
> + * "summarized" in this struct. In particular, it holds the IP
> + * address of the remote end (IPv4 or IPv6, depending on the case)
> + * in *key*\ **->remote_ipv4** or *key*\ **->remote_ipv6**.
I would also mention the tunnel_id which is typically mapped to a vni, allowing
to make this id programmable together with bpf_skb_set_tunnel_key() helper.
> + * Let's imagine that the following code is part of a program
> + * attached to the TC ingress interface, on one end of a GRE
> + * tunnel, and is supposed to filter out all messages coming from
> + * remote ends with IPv4 address other than 10.0.0.1:
> + *
> + * ::
> + *
> + * int ret;
> + * struct bpf_tunnel_key key = {};
> + *
> + * ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
> + * if (ret < 0)
> + * return TC_ACT_SHOT; // drop packet
> + *
> + * if (key.remote_ipv4 != 0x0a000001)
> + * return TC_ACT_SHOT; // drop packet
> + *
> + * return TC_ACT_OK; // accept packet
Lets also add a small sentence that this interface can be used with all
encap devs that can operate in 'collect metadata' mode, where instead of
having one netdevice per specific configuration, the 'collect metadata'
mode only requires a single device where the configuration can be extracted
from those BPF helpers. Could also mentioned this can be used together with
vxlan, geneve, gre and ipip tunnels.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
> + * Description
> + * Populate tunnel metadata for packet associated to *skb.* The
> + * tunnel metadata is set to the contents of *key*, of *size*. The
> + * *flags* can be set to a combination of the following values:
> + *
> + * **BPF_F_TUNINFO_IPV6**
> + * Indicate that the tunnel is based on IPv6 protocol
> + * instead of IPv4.
> + * **BPF_F_ZERO_CSUM_TX**
> + * For IPv4 packets, add a flag to tunnel metadata
> + * indicating that checksum computation should be skipped
> + * and checksum set to zeroes.
> + * **BPF_F_DONT_FRAGMENT**
> + * Add a flag to tunnel metadata indicating that the
> + * packet should not be fragmented.
> + * **BPF_F_SEQ_NUMBER**
> + * Add a flag to tunnel metadata indicating that a
> + * sequence number should be added to tunnel header before
> + * sending the packet. This flag was added for GRE
> + * encapsulation, but might be used with other protocols
> + * as well in the future.
> + *
> + * Here is a typical usage on the transmit path:
> + *
> + * ::
> + *
> + * struct bpf_tunnel_key key;
> + * populate key ...
> + * bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
> + * bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
See above, maybe this can just reference bpf_skb_get_tunnel_key() from here.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_redirect(u32 ifindex, u64 flags)
> + * Description
> + * Redirect the packet to another net device of index *ifindex*.
> + * This helper is somewhat similar to **bpf_clone_redirect**\
> + * (), except that the packet is not cloned, which provides
> + * increased performance.
> + *
> + * Save for XDP, both ingress and egress interfaces can be used
s/Save/Same/ ?
> + * for redirection. The **BPF_F_INGRESS** value in *flags* is used
(In XDP case, BPF_F_INGRESS cannot be used.)
> + * to make the distinction (ingress path is selected if the flag
> + * is present, egress path otherwise). Currently, XDP only
> + * supports redirection to the egress interface, and accepts no
> + * flag at all.
> + * Return
> + * For XDP, the helper returns **XDP_REDIRECT** on success or
> + * **XDP_ABORT** on error. For other program types, the values
> + * are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
> + * error.
> + *
> + * int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64 flags, void *data, u64 size)
> + * Description
> + * Write raw *data* blob into a special BPF perf event held by
> + * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This perf
> + * event must have the following attributes: **PERF_SAMPLE_RAW**
> + * as **sample_type**, **PERF_TYPE_SOFTWARE** as **type**, and
> + * **PERF_COUNT_SW_BPF_OUTPUT** as **config**.
> + *
> + * The *flags* are used to indicate the index in *map* for which
> + * the value must be put, masked with **BPF_F_INDEX_MASK**.
> + * Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**
> + * to indicate that the index of the current CPU core should be
> + * used.
> + *
> + * The value to write, of *size*, is passed through eBPF stack and
> + * pointed by *data*.
> + *
> + * The context of the program *ctx* needs also be passed to the
> + * helper.
> + *
> + * On user space, a program willing to read the values needs to
> + * call **perf_event_open**\ () on the perf event (either for
> + * one or for all CPUs) and to store the file descriptor into the
> + * *map*. This must be done before the eBPF program can send data
> + * into it. An example is available in file
> + * *samples/bpf/trace_output_user.c* in the Linux kernel source
> + * tree (the eBPF program counterpart is in
> + * *samples/bpf/trace_output_kern.c*).
> + *
> + * **bpf_perf_event_output**\ () achieves better performance
> + * than **bpf_trace_printk**\ () for sharing data with user
> + * space, and is much better suitable for streaming data from eBPF
> + * programs.
Would also mentioned that this helper can be used out of tc and XDP BPF
programs as well and allows for passing i) only custom structs, ii) only
packet payload, or iii) a combination of both to user space listeners.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
> + * Description
> + * Walk a user or a kernel stack and return its id. To achieve
> + * this, the helper needs *ctx*, which is a pointer to the context
> + * on which the tracing program is executed, and a pointer to a
> + * *map* of type **BPF_MAP_TYPE_STACK_TRACE**.
> + *
> + * The last argument, *flags*, holds the number of stack frames to
> + * skip (from 0 to 255), masked with
> + * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
> + * a combination of the following flags:
> + *
> + * **BPF_F_USER_STACK**
> + * Collect a user space stack instead of a kernel stack.
> + * **BPF_F_FAST_STACK_CMP**
> + * Compare stacks by hash only.
> + * **BPF_F_REUSE_STACKID**
> + * If two different stacks hash into the same *stackid*,
> + * discard the old one.
> + *
> + * The stack id retrieved is a 32 bit long integer handle which
> + * can be further combined with other data (including other stack
> + * ids) and used as a key into maps. This can be useful for
> + * generating a variety of graphs (such as flame graphs or off-cpu
> + * graphs).
> + *
> + * For walking a stack, this helper is an improvement over
> + * **bpf_probe_read**\ (), which can be used with unrolled loops
> + * but is not efficient and consumes a lot of eBPF instructions.
> + * Instead, **bpf_get_stackid**\ () can collect up to
> + * **PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
> + * this limit can be controlled with the **sysctl** program, and
> + * that it should be manually increased in order to profile long
> + * user stacks (such as stacks for Java programs). To do so, use:
> + *
> + * ::
> + *
> + * # sysctl kernel.perf_event_max_stack=<new value>
> + *
> + * Return
> + * The positive or null stack id on success, or a negative error
> + * in case of failure.
> + *
> + * u64 bpf_get_current_task(void)
> + * Return
> + * A pointer to the current task struct.
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
>
^ permalink raw reply
* [PATCH net-next v3] team: account for oper state
From: George Wilkie @ 2018-04-19 10:34 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev
Account for operational state when determining port linkup state,
as per Documentation/networking/operstates.txt.
Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
---
drivers/net/team/team.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index a6c6ce19eeee..8a8611095ca0 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2918,7 +2918,7 @@ static int team_device_event(struct notifier_block *unused,
case NETDEV_CHANGE:
if (netif_running(port->dev))
team_port_change_check(port,
- !!netif_carrier_ok(port->dev));
+ !!netif_oper_up(port->dev));
break;
case NETDEV_UNREGISTER:
team_del_slave(port->team->dev, dev);
--
2.11.0
^ permalink raw reply related
* Re: [PATCH iproute2-next 2/2] gre/gre6: allow clearing {,i,o}{key,seq,csum} flags
From: Sabrina Dubroca @ 2018-04-19 10:40 UTC (permalink / raw)
To: netdev; +Cc: dsahern
In-Reply-To: <bdb1a801f15694d884f823e6caa6b1d6022173dd.1524133203.git.sd@queasysnail.net>
2018-04-19, 12:22:42 +0200, Sabrina Dubroca wrote:
> @@ -210,28 +210,49 @@ get_failed:
> iflags |= GRE_KEY;
> oflags |= GRE_KEY;
> ikey = okey = tnl_parse_key("key", *argv);
> + } else if (!matches(*argv, "nokey")) {
> + iflags &= ~GRE_KEY;
> + oflags &= ~GRE_KEY;
> } else if (!matches(*argv, "ikey")) {
> NEXT_ARG();
> iflags |= GRE_KEY;
> ikey = tnl_parse_key("ikey", *argv);
> + } else if (!matches(*argv, "noikey")) {
> + iflags &= ~GRE_KEY;
> } else if (!matches(*argv, "okey")) {
> NEXT_ARG();
> oflags |= GRE_KEY;
> okey = tnl_parse_key("okey", *argv);
> + } else if (!matches(*argv, "noikey")) {
> + iflags &= ~GRE_KEY;
Sorry, posted the wrong version. I'll send v2 after I've had a bucket
of coffee.
--
Sabrina
^ permalink raw reply
* [PATCH net 0/6] s390/qeth: fixes 2018-04-19
From: Julian Wiedmann @ 2018-04-19 10:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
Hi Dave,
new mail address, same old me.
Please apply the following qeth fixes for 4.17. The common theme
seems to be error handling improvements in various areas of cmd IO.
Patches 1-3 should also go back to stable.
Thank you,
Julian
Julian Wiedmann (6):
s390/qeth: fix error handling in adapter command callbacks
s390/qeth: avoid control IO completion stalls
s390/qeth: handle failure on workqueue creation
s390/qeth: fix MAC address update sequence
s390/qeth: fix request-side race during cmd IO timeout
s390/qeth: use Read device to query hypervisor for MAC
drivers/s390/net/qeth_core.h | 2 -
drivers/s390/net/qeth_core_main.c | 158 +++++++++++++++++---------------------
drivers/s390/net/qeth_core_mpc.h | 12 +++
drivers/s390/net/qeth_l2_main.c | 59 +++++++-------
4 files changed, 116 insertions(+), 115 deletions(-)
--
2.13.5
^ permalink raw reply
* [PATCH net 2/6] s390/qeth: avoid control IO completion stalls
From: Julian Wiedmann @ 2018-04-19 10:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180419105211.83935-1-jwi@linux.ibm.com>
From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
For control IO, qeth currently tracks the index of the buffer that it
expects to complete the next IO on each qeth_channel. If the channel
presents an IRQ while this buffer has not yet completed, no completion
processing for _any_ completed buffer takes place.
So if the 'next buffer' is skipped for any sort of reason* (eg. when it
is released due to error conditions, before the IO is started), the
buffer obviously won't switch to PROCESSED until it is eventually
allocated for a _different_ IO and completes.
Until this happens, all completion processing on that channel stalls
and pending requests possibly time out.
As a fix, remove the whole 'next buffer' logic and simply process any
IO buffer right when it completes. A channel will never have more than
one IO pending, so there's no risk of processing out-of-sequence.
*Note: currently just one location in the code really handles this problem,
by advancing the 'next' index manually.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
drivers/s390/net/qeth_core.h | 2 --
drivers/s390/net/qeth_core_main.c | 22 +++++-----------------
2 files changed, 5 insertions(+), 19 deletions(-)
diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 4326715dc13e..78b98b3e7efa 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -557,7 +557,6 @@ enum qeth_prot_versions {
enum qeth_cmd_buffer_state {
BUF_STATE_FREE,
BUF_STATE_LOCKED,
- BUF_STATE_PROCESSED,
};
enum qeth_cq {
@@ -601,7 +600,6 @@ struct qeth_channel {
struct qeth_cmd_buffer iob[QETH_CMD_BUFFER_NO];
atomic_t irq_pending;
int io_buf_no;
- int buf_no;
};
/**
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 36bc94088de1..5ec47c6ebaa6 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -818,7 +818,6 @@ void qeth_clear_cmd_buffers(struct qeth_channel *channel)
for (cnt = 0; cnt < QETH_CMD_BUFFER_NO; cnt++)
qeth_release_buffer(channel, &channel->iob[cnt]);
- channel->buf_no = 0;
channel->io_buf_no = 0;
}
EXPORT_SYMBOL_GPL(qeth_clear_cmd_buffers);
@@ -924,7 +923,6 @@ static int qeth_setup_channel(struct qeth_channel *channel)
kfree(channel->iob[cnt].data);
return -ENOMEM;
}
- channel->buf_no = 0;
channel->io_buf_no = 0;
atomic_set(&channel->irq_pending, 0);
spin_lock_init(&channel->iob_lock);
@@ -1100,11 +1098,9 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm,
{
int rc;
int cstat, dstat;
- struct qeth_cmd_buffer *buffer;
struct qeth_channel *channel;
struct qeth_card *card;
struct qeth_cmd_buffer *iob;
- __u8 index;
if (__qeth_check_irb_error(cdev, intparm, irb))
return;
@@ -1182,25 +1178,18 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm,
channel->state = CH_STATE_RCD_DONE;
goto out;
}
- if (intparm) {
- buffer = (struct qeth_cmd_buffer *) __va((addr_t)intparm);
- buffer->state = BUF_STATE_PROCESSED;
- }
if (channel == &card->data)
return;
if (channel == &card->read &&
channel->state == CH_STATE_UP)
__qeth_issue_next_read(card);
- iob = channel->iob;
- index = channel->buf_no;
- while (iob[index].state == BUF_STATE_PROCESSED) {
- if (iob[index].callback != NULL)
- iob[index].callback(channel, iob + index);
-
- index = (index + 1) % QETH_CMD_BUFFER_NO;
+ if (intparm) {
+ iob = (struct qeth_cmd_buffer *) __va((addr_t)intparm);
+ if (iob->callback)
+ iob->callback(iob->channel, iob);
}
- channel->buf_no = index;
+
out:
wake_up(&card->wait_q);
return;
@@ -2214,7 +2203,6 @@ int qeth_send_control_data(struct qeth_card *card, int len,
error:
atomic_set(&card->write.irq_pending, 0);
qeth_release_buffer(iob->channel, iob);
- card->write.buf_no = (card->write.buf_no + 1) % QETH_CMD_BUFFER_NO;
rc = reply->rc;
qeth_put_reply(reply);
return rc;
--
2.13.5
^ permalink raw reply related
* [PATCH net 1/6] s390/qeth: fix error handling in adapter command callbacks
From: Julian Wiedmann @ 2018-04-19 10:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180419105211.83935-1-jwi@linux.ibm.com>
From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Make sure to check both return code fields before(!) processing the
command response. Otherwise we risk operating on invalid data.
This matches an earlier fix for SETASSPARMS commands, see
commit ad3cbf613329 ("s390/qeth: fix error handling in checksum cmd callback").
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
drivers/s390/net/qeth_core_main.c | 85 +++++++++++++++++----------------------
1 file changed, 37 insertions(+), 48 deletions(-)
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 04fefa5bb08d..36bc94088de1 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -3033,28 +3033,23 @@ static int qeth_send_startlan(struct qeth_card *card)
return rc;
}
-static int qeth_default_setadapterparms_cb(struct qeth_card *card,
- struct qeth_reply *reply, unsigned long data)
+static int qeth_setadpparms_inspect_rc(struct qeth_ipa_cmd *cmd)
{
- struct qeth_ipa_cmd *cmd;
-
- QETH_CARD_TEXT(card, 4, "defadpcb");
-
- cmd = (struct qeth_ipa_cmd *) data;
- if (cmd->hdr.return_code == 0)
+ if (!cmd->hdr.return_code)
cmd->hdr.return_code =
cmd->data.setadapterparms.hdr.return_code;
- return 0;
+ return cmd->hdr.return_code;
}
static int qeth_query_setadapterparms_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
- struct qeth_ipa_cmd *cmd;
+ struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data;
QETH_CARD_TEXT(card, 3, "quyadpcb");
+ if (qeth_setadpparms_inspect_rc(cmd))
+ return 0;
- cmd = (struct qeth_ipa_cmd *) data;
if (cmd->data.setadapterparms.data.query_cmds_supp.lan_type & 0x7f) {
card->info.link_type =
cmd->data.setadapterparms.data.query_cmds_supp.lan_type;
@@ -3062,7 +3057,7 @@ static int qeth_query_setadapterparms_cb(struct qeth_card *card,
}
card->options.adp.supported_funcs =
cmd->data.setadapterparms.data.query_cmds_supp.supported_cmds;
- return qeth_default_setadapterparms_cb(card, reply, (unsigned long)cmd);
+ return 0;
}
static struct qeth_cmd_buffer *qeth_get_adapter_cmd(struct qeth_card *card,
@@ -3154,22 +3149,20 @@ EXPORT_SYMBOL_GPL(qeth_query_ipassists);
static int qeth_query_switch_attributes_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
- struct qeth_ipa_cmd *cmd;
- struct qeth_switch_info *sw_info;
+ struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data;
struct qeth_query_switch_attributes *attrs;
+ struct qeth_switch_info *sw_info;
QETH_CARD_TEXT(card, 2, "qswiatcb");
- cmd = (struct qeth_ipa_cmd *) data;
- sw_info = (struct qeth_switch_info *)reply->param;
- if (cmd->data.setadapterparms.hdr.return_code == 0) {
- attrs = &cmd->data.setadapterparms.data.query_switch_attributes;
- sw_info->capabilities = attrs->capabilities;
- sw_info->settings = attrs->settings;
- QETH_CARD_TEXT_(card, 2, "%04x%04x", sw_info->capabilities,
- sw_info->settings);
- }
- qeth_default_setadapterparms_cb(card, reply, (unsigned long) cmd);
+ if (qeth_setadpparms_inspect_rc(cmd))
+ return 0;
+ sw_info = (struct qeth_switch_info *)reply->param;
+ attrs = &cmd->data.setadapterparms.data.query_switch_attributes;
+ sw_info->capabilities = attrs->capabilities;
+ sw_info->settings = attrs->settings;
+ QETH_CARD_TEXT_(card, 2, "%04x%04x", sw_info->capabilities,
+ sw_info->settings);
return 0;
}
@@ -4207,16 +4200,13 @@ EXPORT_SYMBOL_GPL(qeth_do_send_packet);
static int qeth_setadp_promisc_mode_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
- struct qeth_ipa_cmd *cmd;
+ struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data;
struct qeth_ipacmd_setadpparms *setparms;
QETH_CARD_TEXT(card, 4, "prmadpcb");
- cmd = (struct qeth_ipa_cmd *) data;
setparms = &(cmd->data.setadapterparms);
-
- qeth_default_setadapterparms_cb(card, reply, (unsigned long)cmd);
- if (cmd->hdr.return_code) {
+ if (qeth_setadpparms_inspect_rc(cmd)) {
QETH_CARD_TEXT_(card, 4, "prmrc%x", cmd->hdr.return_code);
setparms->data.mode = SET_PROMISC_MODE_OFF;
}
@@ -4286,18 +4276,18 @@ EXPORT_SYMBOL_GPL(qeth_get_stats);
static int qeth_setadpparms_change_macaddr_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
- struct qeth_ipa_cmd *cmd;
+ struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data;
QETH_CARD_TEXT(card, 4, "chgmaccb");
+ if (qeth_setadpparms_inspect_rc(cmd))
+ return 0;
- cmd = (struct qeth_ipa_cmd *) data;
if (!card->options.layer2 ||
!(card->info.mac_bits & QETH_LAYER2_MAC_READ)) {
ether_addr_copy(card->dev->dev_addr,
cmd->data.setadapterparms.data.change_addr.addr);
card->info.mac_bits |= QETH_LAYER2_MAC_READ;
}
- qeth_default_setadapterparms_cb(card, reply, (unsigned long) cmd);
return 0;
}
@@ -4328,13 +4318,15 @@ EXPORT_SYMBOL_GPL(qeth_setadpparms_change_macaddr);
static int qeth_setadpparms_set_access_ctrl_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
- struct qeth_ipa_cmd *cmd;
+ struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data;
struct qeth_set_access_ctrl *access_ctrl_req;
int fallback = *(int *)reply->param;
QETH_CARD_TEXT(card, 4, "setaccb");
+ if (cmd->hdr.return_code)
+ return 0;
+ qeth_setadpparms_inspect_rc(cmd);
- cmd = (struct qeth_ipa_cmd *) data;
access_ctrl_req = &cmd->data.setadapterparms.data.set_access_ctrl;
QETH_DBF_TEXT_(SETUP, 2, "setaccb");
QETH_DBF_TEXT_(SETUP, 2, "%s", card->gdev->dev.kobj.name);
@@ -4407,7 +4399,6 @@ static int qeth_setadpparms_set_access_ctrl_cb(struct qeth_card *card,
card->options.isolation = card->options.prev_isolation;
break;
}
- qeth_default_setadapterparms_cb(card, reply, (unsigned long) cmd);
return 0;
}
@@ -4695,14 +4686,15 @@ static int qeth_snmp_command(struct qeth_card *card, char __user *udata)
static int qeth_setadpparms_query_oat_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
- struct qeth_ipa_cmd *cmd;
+ struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *)data;
struct qeth_qoat_priv *priv;
char *resdata;
int resdatalen;
QETH_CARD_TEXT(card, 3, "qoatcb");
+ if (qeth_setadpparms_inspect_rc(cmd))
+ return 0;
- cmd = (struct qeth_ipa_cmd *)data;
priv = (struct qeth_qoat_priv *)reply->param;
resdatalen = cmd->data.setadapterparms.hdr.cmdlength;
resdata = (char *)data + 28;
@@ -4796,21 +4788,18 @@ static int qeth_query_oat_command(struct qeth_card *card, char __user *udata)
static int qeth_query_card_info_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
- struct qeth_ipa_cmd *cmd;
+ struct carrier_info *carrier_info = (struct carrier_info *)reply->param;
+ struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *)data;
struct qeth_query_card_info *card_info;
- struct carrier_info *carrier_info;
QETH_CARD_TEXT(card, 2, "qcrdincb");
- carrier_info = (struct carrier_info *)reply->param;
- cmd = (struct qeth_ipa_cmd *)data;
- card_info = &cmd->data.setadapterparms.data.card_info;
- if (cmd->data.setadapterparms.hdr.return_code == 0) {
- carrier_info->card_type = card_info->card_type;
- carrier_info->port_mode = card_info->port_mode;
- carrier_info->port_speed = card_info->port_speed;
- }
+ if (qeth_setadpparms_inspect_rc(cmd))
+ return 0;
- qeth_default_setadapterparms_cb(card, reply, (unsigned long) cmd);
+ card_info = &cmd->data.setadapterparms.data.card_info;
+ carrier_info->card_type = card_info->card_type;
+ carrier_info->port_mode = card_info->port_mode;
+ carrier_info->port_speed = card_info->port_speed;
return 0;
}
--
2.13.5
^ permalink raw reply related
* [PATCH net 3/6] s390/qeth: handle failure on workqueue creation
From: Julian Wiedmann @ 2018-04-19 10:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180419105211.83935-1-jwi@linux.ibm.com>
Creating the global workqueue during driver init may fail, deal with it.
Also, destroy the created workqueue on any subsequent error.
Fixes: 0f54761d167f ("qeth: Support VEPA mode")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
drivers/s390/net/qeth_core_main.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 5ec47c6ebaa6..9a08b545d018 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -6540,10 +6540,14 @@ static int __init qeth_core_init(void)
mutex_init(&qeth_mod_mutex);
qeth_wq = create_singlethread_workqueue("qeth_wq");
+ if (!qeth_wq) {
+ rc = -ENOMEM;
+ goto out_err;
+ }
rc = qeth_register_dbf_views();
if (rc)
- goto out_err;
+ goto dbf_err;
qeth_core_root_dev = root_device_register("qeth");
rc = PTR_ERR_OR_ZERO(qeth_core_root_dev);
if (rc)
@@ -6580,6 +6584,8 @@ static int __init qeth_core_init(void)
root_device_unregister(qeth_core_root_dev);
register_err:
qeth_unregister_dbf_views();
+dbf_err:
+ destroy_workqueue(qeth_wq);
out_err:
pr_err("Initializing the qeth device driver failed\n");
return rc;
--
2.13.5
^ permalink raw reply related
* [PATCH net 4/6] s390/qeth: fix MAC address update sequence
From: Julian Wiedmann @ 2018-04-19 10:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180419105211.83935-1-jwi@linux.ibm.com>
From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
When changing the MAC address on a L2 qeth device, current code first
unregisters the old address, then registers the new one.
If HW rejects the new address (or the IO fails), the device ends up with
no operable address at all.
Re-order the code flow so that the old address only gets dropped if the
new address was registered successfully. While at it, add logic to catch
some corner-cases.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
drivers/s390/net/qeth_l2_main.c | 55 +++++++++++++++++++++++------------------
1 file changed, 31 insertions(+), 24 deletions(-)
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 50a313806dde..830ca56a62e5 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -122,13 +122,10 @@ static int qeth_l2_send_setmac(struct qeth_card *card, __u8 *mac)
QETH_CARD_TEXT(card, 2, "L2Setmac");
rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_SETVMAC);
if (rc == 0) {
- card->info.mac_bits |= QETH_LAYER2_MAC_REGISTERED;
- ether_addr_copy(card->dev->dev_addr, mac);
dev_info(&card->gdev->dev,
- "MAC address %pM successfully registered on device %s\n",
- card->dev->dev_addr, card->dev->name);
+ "MAC address %pM successfully registered on device %s\n",
+ mac, card->dev->name);
} else {
- card->info.mac_bits &= ~QETH_LAYER2_MAC_REGISTERED;
switch (rc) {
case -EEXIST:
dev_warn(&card->gdev->dev,
@@ -143,19 +140,6 @@ static int qeth_l2_send_setmac(struct qeth_card *card, __u8 *mac)
return rc;
}
-static int qeth_l2_send_delmac(struct qeth_card *card, __u8 *mac)
-{
- int rc;
-
- QETH_CARD_TEXT(card, 2, "L2Delmac");
- if (!(card->info.mac_bits & QETH_LAYER2_MAC_REGISTERED))
- return 0;
- rc = qeth_l2_send_setdelmac(card, mac, IPA_CMD_DELVMAC);
- if (rc == 0)
- card->info.mac_bits &= ~QETH_LAYER2_MAC_REGISTERED;
- return rc;
-}
-
static int qeth_l2_write_mac(struct qeth_card *card, u8 *mac)
{
enum qeth_ipa_cmds cmd = is_multicast_ether_addr_64bits(mac) ?
@@ -520,6 +504,7 @@ static int qeth_l2_set_mac_address(struct net_device *dev, void *p)
{
struct sockaddr *addr = p;
struct qeth_card *card = dev->ml_priv;
+ u8 old_addr[ETH_ALEN];
int rc = 0;
QETH_CARD_TEXT(card, 3, "setmac");
@@ -531,14 +516,35 @@ static int qeth_l2_set_mac_address(struct net_device *dev, void *p)
return -EOPNOTSUPP;
}
QETH_CARD_HEX(card, 3, addr->sa_data, ETH_ALEN);
+ if (!is_valid_ether_addr(addr->sa_data))
+ return -EADDRNOTAVAIL;
+
if (qeth_wait_for_threads(card, QETH_RECOVER_THREAD)) {
QETH_CARD_TEXT(card, 3, "setmcREC");
return -ERESTARTSYS;
}
- rc = qeth_l2_send_delmac(card, &card->dev->dev_addr[0]);
- if (!rc || (rc == -ENOENT))
- rc = qeth_l2_send_setmac(card, addr->sa_data);
- return rc ? -EINVAL : 0;
+
+ if (!qeth_card_hw_is_reachable(card)) {
+ ether_addr_copy(dev->dev_addr, addr->sa_data);
+ return 0;
+ }
+
+ /* don't register the same address twice */
+ if (ether_addr_equal_64bits(dev->dev_addr, addr->sa_data) &&
+ (card->info.mac_bits & QETH_LAYER2_MAC_REGISTERED))
+ return 0;
+
+ /* add the new address, switch over, drop the old */
+ rc = qeth_l2_send_setmac(card, addr->sa_data);
+ if (rc)
+ return rc;
+ ether_addr_copy(old_addr, dev->dev_addr);
+ ether_addr_copy(dev->dev_addr, addr->sa_data);
+
+ if (card->info.mac_bits & QETH_LAYER2_MAC_REGISTERED)
+ qeth_l2_remove_mac(card, old_addr);
+ card->info.mac_bits |= QETH_LAYER2_MAC_REGISTERED;
+ return 0;
}
static void qeth_promisc_to_bridge(struct qeth_card *card)
@@ -1068,8 +1074,9 @@ static int __qeth_l2_set_online(struct ccwgroup_device *gdev, int recovery_mode)
goto out_remove;
}
- if (card->info.type != QETH_CARD_TYPE_OSN)
- qeth_l2_send_setmac(card, &card->dev->dev_addr[0]);
+ if (card->info.type != QETH_CARD_TYPE_OSN &&
+ !qeth_l2_send_setmac(card, card->dev->dev_addr))
+ card->info.mac_bits |= QETH_LAYER2_MAC_REGISTERED;
if (qeth_is_diagass_supported(card, QETH_DIAGS_CMD_TRAP)) {
if (card->info.hwtrap &&
--
2.13.5
^ permalink raw reply related
* [PATCH net 5/6] s390/qeth: fix request-side race during cmd IO timeout
From: Julian Wiedmann @ 2018-04-19 10:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180419105211.83935-1-jwi@linux.ibm.com>
From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Submitting a cmd IO request (usually on the WRITE device, but for IDX
also on the READ device) is currently done with ccw_device_start()
and a manual timeout in the caller.
On timeout, the caller cleans up the related resources (eg. IO buffer).
But 1) the IO might still be active and utilize those resources, and
2) when the IO completes, qeth_irq() will attempt to clean up the
same resources again.
Instead of introducing additional resource locking, switch to
ccw_device_start_timeout() to ensure IO termination after timeout, and
let the IRQ handler alone deal with cleaning up after a request.
This also removes a stray write->irq_pending reset from
clear_ipacmd_list(). The routine doesn't terminate any pending IO on
the WRITE device, so this should be handled properly via IO timeout
in the IRQ handler.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
drivers/s390/net/qeth_core_main.c | 51 ++++++++++++++++++++-------------------
drivers/s390/net/qeth_core_mpc.h | 12 +++++++++
drivers/s390/net/qeth_l2_main.c | 4 +--
3 files changed, 40 insertions(+), 27 deletions(-)
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 9a08b545d018..9b22d5d496ae 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -706,7 +706,6 @@ void qeth_clear_ipacmd_list(struct qeth_card *card)
qeth_put_reply(reply);
}
spin_unlock_irqrestore(&card->lock, flags);
- atomic_set(&card->write.irq_pending, 0);
}
EXPORT_SYMBOL_GPL(qeth_clear_ipacmd_list);
@@ -1098,14 +1097,9 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm,
{
int rc;
int cstat, dstat;
+ struct qeth_cmd_buffer *iob = NULL;
struct qeth_channel *channel;
struct qeth_card *card;
- struct qeth_cmd_buffer *iob;
-
- if (__qeth_check_irb_error(cdev, intparm, irb))
- return;
- cstat = irb->scsw.cmd.cstat;
- dstat = irb->scsw.cmd.dstat;
card = CARD_FROM_CDEV(cdev);
if (!card)
@@ -1123,6 +1117,19 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm,
channel = &card->data;
QETH_CARD_TEXT(card, 5, "data");
}
+
+ if (qeth_intparm_is_iob(intparm))
+ iob = (struct qeth_cmd_buffer *) __va((addr_t)intparm);
+
+ if (__qeth_check_irb_error(cdev, intparm, irb)) {
+ /* IO was terminated, free its resources. */
+ if (iob)
+ qeth_release_buffer(iob->channel, iob);
+ atomic_set(&channel->irq_pending, 0);
+ wake_up(&card->wait_q);
+ return;
+ }
+
atomic_set(&channel->irq_pending, 0);
if (irb->scsw.cmd.fctl & (SCSW_FCTL_CLEAR_FUNC))
@@ -1146,6 +1153,10 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm,
/* we don't have to handle this further */
intparm = 0;
}
+
+ cstat = irb->scsw.cmd.cstat;
+ dstat = irb->scsw.cmd.dstat;
+
if ((dstat & DEV_STAT_UNIT_EXCEP) ||
(dstat & DEV_STAT_UNIT_CHECK) ||
(cstat)) {
@@ -1184,11 +1195,8 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm,
channel->state == CH_STATE_UP)
__qeth_issue_next_read(card);
- if (intparm) {
- iob = (struct qeth_cmd_buffer *) __va((addr_t)intparm);
- if (iob->callback)
- iob->callback(iob->channel, iob);
- }
+ if (iob && iob->callback)
+ iob->callback(iob->channel, iob);
out:
wake_up(&card->wait_q);
@@ -1859,8 +1867,8 @@ static int qeth_idx_activate_get_answer(struct qeth_channel *channel,
atomic_cmpxchg(&channel->irq_pending, 0, 1) == 0);
QETH_DBF_TEXT(SETUP, 6, "noirqpnd");
spin_lock_irqsave(get_ccwdev_lock(channel->ccwdev), flags);
- rc = ccw_device_start(channel->ccwdev,
- &channel->ccw, (addr_t) iob, 0, 0);
+ rc = ccw_device_start_timeout(channel->ccwdev, &channel->ccw,
+ (addr_t) iob, 0, 0, QETH_TIMEOUT);
spin_unlock_irqrestore(get_ccwdev_lock(channel->ccwdev), flags);
if (rc) {
@@ -1877,7 +1885,6 @@ static int qeth_idx_activate_get_answer(struct qeth_channel *channel,
if (channel->state != CH_STATE_UP) {
rc = -ETIME;
QETH_DBF_TEXT_(SETUP, 2, "3err%d", rc);
- qeth_clear_cmd_buffers(channel);
} else
rc = 0;
return rc;
@@ -1931,8 +1938,8 @@ static int qeth_idx_activate_channel(struct qeth_channel *channel,
atomic_cmpxchg(&channel->irq_pending, 0, 1) == 0);
QETH_DBF_TEXT(SETUP, 6, "noirqpnd");
spin_lock_irqsave(get_ccwdev_lock(channel->ccwdev), flags);
- rc = ccw_device_start(channel->ccwdev,
- &channel->ccw, (addr_t) iob, 0, 0);
+ rc = ccw_device_start_timeout(channel->ccwdev, &channel->ccw,
+ (addr_t) iob, 0, 0, QETH_TIMEOUT);
spin_unlock_irqrestore(get_ccwdev_lock(channel->ccwdev), flags);
if (rc) {
@@ -1953,7 +1960,6 @@ static int qeth_idx_activate_channel(struct qeth_channel *channel,
QETH_DBF_MESSAGE(2, "%s IDX activate timed out\n",
dev_name(&channel->ccwdev->dev));
QETH_DBF_TEXT_(SETUP, 2, "2err%d", -ETIME);
- qeth_clear_cmd_buffers(channel);
return -ETIME;
}
return qeth_idx_activate_get_answer(channel, idx_reply_cb);
@@ -2155,8 +2161,8 @@ int qeth_send_control_data(struct qeth_card *card, int len,
QETH_CARD_TEXT(card, 6, "noirqpnd");
spin_lock_irqsave(get_ccwdev_lock(card->write.ccwdev), flags);
- rc = ccw_device_start(card->write.ccwdev, &card->write.ccw,
- (addr_t) iob, 0, 0);
+ rc = ccw_device_start_timeout(CARD_WDEV(card), &card->write.ccw,
+ (addr_t) iob, 0, 0, event_timeout);
spin_unlock_irqrestore(get_ccwdev_lock(card->write.ccwdev), flags);
if (rc) {
QETH_DBF_MESSAGE(2, "%s qeth_send_control_data: "
@@ -2188,8 +2194,6 @@ int qeth_send_control_data(struct qeth_card *card, int len,
}
}
- if (reply->rc == -EIO)
- goto error;
rc = reply->rc;
qeth_put_reply(reply);
return rc;
@@ -2200,9 +2204,6 @@ int qeth_send_control_data(struct qeth_card *card, int len,
list_del_init(&reply->list);
spin_unlock_irqrestore(&reply->card->lock, flags);
atomic_inc(&reply->received);
-error:
- atomic_set(&card->write.irq_pending, 0);
- qeth_release_buffer(iob->channel, iob);
rc = reply->rc;
qeth_put_reply(reply);
return rc;
diff --git a/drivers/s390/net/qeth_core_mpc.h b/drivers/s390/net/qeth_core_mpc.h
index 619f897b4bb0..f4d1ec0b8f5a 100644
--- a/drivers/s390/net/qeth_core_mpc.h
+++ b/drivers/s390/net/qeth_core_mpc.h
@@ -35,6 +35,18 @@ extern unsigned char IPA_PDU_HEADER[];
#define QETH_HALT_CHANNEL_PARM -11
#define QETH_RCD_PARM -12
+static inline bool qeth_intparm_is_iob(unsigned long intparm)
+{
+ switch (intparm) {
+ case QETH_CLEAR_CHANNEL_PARM:
+ case QETH_HALT_CHANNEL_PARM:
+ case QETH_RCD_PARM:
+ case 0:
+ return false;
+ }
+ return true;
+}
+
/*****************************************************************************/
/* IP Assist related definitions */
/*****************************************************************************/
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 830ca56a62e5..2f1c13226a7a 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1346,8 +1346,8 @@ static int qeth_osn_send_control_data(struct qeth_card *card, int len,
qeth_prepare_control_data(card, len, iob);
QETH_CARD_TEXT(card, 6, "osnoirqp");
spin_lock_irqsave(get_ccwdev_lock(card->write.ccwdev), flags);
- rc = ccw_device_start(card->write.ccwdev, &card->write.ccw,
- (addr_t) iob, 0, 0);
+ rc = ccw_device_start_timeout(CARD_WDEV(card), &card->write.ccw,
+ (addr_t) iob, 0, 0, QETH_IPA_TIMEOUT);
spin_unlock_irqrestore(get_ccwdev_lock(card->write.ccwdev), flags);
if (rc) {
QETH_DBF_MESSAGE(2, "qeth_osn_send_control_data: "
--
2.13.5
^ permalink raw reply related
* [PATCH net 6/6] s390/qeth: use Read device to query hypervisor for MAC
From: Julian Wiedmann @ 2018-04-19 10:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180419105211.83935-1-jwi@linux.ibm.com>
From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
For z/VM NICs, qeth needs to consider which of the three CCW devices in
an MPC group it uses for requesting a managed MAC address.
On the Base device, the hypervisor returns a default MAC which is
pre-assigned when creating the NIC (this MAC is also returned by the
READ MAC primitive). Querying any other device results in the allocation
of an additional MAC address.
For consistency with READ MAC and to avoid using up more addresses than
necessary, it is preferable to use the NIC's default MAC. So switch the
the diag26c over to using a NIC's Read device, which should always be
identical to the Base device.
Fixes: ec61bd2fd2a2 ("s390/qeth: use diag26c to get MAC address on L2")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
drivers/s390/net/qeth_core_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 9b22d5d496ae..dffd820731f2 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -4835,7 +4835,7 @@ int qeth_vm_request_mac(struct qeth_card *card)
goto out;
}
- ccw_device_get_id(CARD_DDEV(card), &id);
+ ccw_device_get_id(CARD_RDEV(card), &id);
request->resp_buf_len = sizeof(*response);
request->resp_version = DIAG26C_VERSION2;
request->op_code = DIAG26C_GET_MAC;
--
2.13.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox