Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Jakub Kicinski @ 2019-08-07 18:25 UTC (permalink / raw)
  To: Tao Ren
  Cc: Samuel Mendoza-Jonas, David S . Miller, netdev, linux-kernel,
	openbmc, William Kennington, Joel Stanley, Andrew Lunn
In-Reply-To: <20190807002118.164360-1-taoren@fb.com>

On Tue, 6 Aug 2019 17:21:18 -0700, Tao Ren wrote:
> Currently BMC's MAC address is calculated by adding 1 to NCSI NIC's base
> MAC address when CONFIG_NCSI_OEM_CMD_GET_MAC option is enabled. The logic
> doesn't work for platforms with different BMC MAC offset: for example,
> Facebook Yamp BMC's MAC address is calculated by adding 2 to NIC's base
> MAC address ("BaseMAC + 1" is reserved for Host use).
> 
> This patch adds NET_NCSI_MC_MAC_OFFSET config option to customize offset
> between NIC's Base MAC address and BMC's MAC address. Its default value is
> set to 1 to avoid breaking existing users.
> 
> Signed-off-by: Tao Ren <taoren@fb.com>

Maybe someone more knowledgeable like Andrew has an opinion here, 
but to me it seems a bit strange to encode what seems to be platfrom
information in the kernel config :(

> diff --git a/net/ncsi/Kconfig b/net/ncsi/Kconfig
> index 2f1e5756c03a..be8efe1ed99e 100644
> --- a/net/ncsi/Kconfig
> +++ b/net/ncsi/Kconfig
> @@ -17,3 +17,11 @@ config NCSI_OEM_CMD_GET_MAC
>  	---help---
>  	  This allows to get MAC address from NCSI firmware and set them back to
>  		controller.
> +config NET_NCSI_MC_MAC_OFFSET
> +	int
> +	prompt "Offset of Management Controller's MAC Address"
> +	depends on NCSI_OEM_CMD_GET_MAC
> +	default 1
> +	help
> +	  This defines the offset between Network Controller's (base) MAC
> +	  address and Management Controller's MAC address.
> diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
> index 7581bf919885..24a791f9ebf5 100644
> --- a/net/ncsi/ncsi-rsp.c
> +++ b/net/ncsi/ncsi-rsp.c
> @@ -656,6 +656,11 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct ncsi_request *nr)
>  	struct ncsi_rsp_oem_pkt *rsp;
>  	struct sockaddr saddr;
>  	int ret = 0;
> +#ifdef CONFIG_NET_NCSI_MC_MAC_OFFSET
> +	int mac_offset = CONFIG_NET_NCSI_MC_MAC_OFFSET;
> +#else
> +	int mac_offset = 1;
> +#endif
>  
>  	/* Get the response header */
>  	rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
> @@ -663,8 +668,14 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct ncsi_request *nr)
>  	saddr.sa_family = ndev->type;
>  	ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
>  	memcpy(saddr.sa_data, &rsp->data[BCM_MAC_ADDR_OFFSET], ETH_ALEN);
> -	/* Increase mac address by 1 for BMC's address */
> -	eth_addr_inc((u8 *)saddr.sa_data);
> +
> +	/* Management Controller's MAC address is calculated by adding
> +	 * the offset to Network Controller's (base) MAC address.
> +	 * Note: negative offset is "ignored", and BMC will use the Base
> +	 * MAC address in this case.
> +	 */
> +	while (mac_offset-- > 0)
> +		eth_addr_inc((u8 *)saddr.sa_data);
>  	if (!is_valid_ether_addr((const u8 *)saddr.sa_data))
>  		return -ENXIO;
>  


^ permalink raw reply

* Re: [v3,2/4] tools: bpftool: add net detach command to detach XDP on interface
From: Maciej Fijalkowski @ 2019-08-07 18:30 UTC (permalink / raw)
  To: Y Song; +Cc: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAH3MdRW4LgdLoqSpLsWUOwjnNhJA1sodHqSD2Z14JY6aHMaKxg@mail.gmail.com>

On Wed, 7 Aug 2019 10:02:04 -0700
Y Song <ys114321@gmail.com> wrote:

> On Tue, Aug 6, 2019 at 7:25 PM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> >
> > By this commit, using `bpftool net detach`, the attached XDP prog can
> > be detached. Detaching the BPF prog will be done through libbpf
> > 'bpf_set_link_xdp_fd' with the progfd set to -1.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> >  tools/bpf/bpftool/net.c | 42 ++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 41 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > index c05a3fac5cac..7be96acb08e0 100644
> > --- a/tools/bpf/bpftool/net.c
> > +++ b/tools/bpf/bpftool/net.c
> > @@ -343,6 +343,43 @@ static int do_attach(int argc, char **argv)
> >         return 0;
> >  }
> >
> > +static int do_detach(int argc, char **argv)
> > +{
> > +       enum net_attach_type attach_type;
> > +       int progfd, ifindex, err = 0;
> > +
> > +       /* parse detach args */
> > +       if (!REQ_ARGS(3))
> > +               return -EINVAL;
> > +
> > +       attach_type = parse_attach_type(*argv);
> > +       if (attach_type == max_net_attach_type) {
> > +               p_err("invalid net attach/detach type");
> > +               return -EINVAL;
> > +       }
> > +
> > +       NEXT_ARG();
> > +       ifindex = net_parse_dev(&argc, &argv);
> > +       if (ifindex < 1)
> > +               return -EINVAL;
> > +
> > +       /* detach xdp prog */
> > +       progfd = -1;
> > +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> > +               err = do_attach_detach_xdp(progfd, attach_type, ifindex, NULL);  
> 
> I found an issue here. This is probably related to do_attach_detach_xdp.
> 
> -bash-4.4$ sudo ./bpftool net attach x pinned /sys/fs/bpf/xdp_example dev v1
> -bash-4.4$ sudo ./bpftool net
> xdp:
> v1(4) driver id 1172
> 
> tc:
> eth0(2) clsact/ingress fbflow_icmp id 29 act []
> eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
> eth0(2) clsact/egress fbflow_egress id 28
> eth0(2) clsact/egress fbflow_sslwall_egress id 35
> 
> flow_dissector:
> 
> -bash-4.4$ sudo ./bpftool net detach x dev v2

Shouldn't this be v1 as dev?

> -bash-4.4$ sudo ./bpftool net
> xdp:
> v1(4) driver id 1172
> 
> tc:
> eth0(2) clsact/ingress fbflow_icmp id 29 act []
> eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
> eth0(2) clsact/egress fbflow_egress id 28
> eth0(2) clsact/egress fbflow_sslwall_egress id 35
> 
> flow_dissector:
> 
> -bash-4.4$
> 
> Basically detaching may fail due to wrong dev name or wrong type, etc.
> But the tool did not return an error. Is this expected?
> This may be related to this funciton "bpf_set_link_xdp_fd()".
> So this patch itself should be okay.
> 
> > +
> > +       if (err < 0) {
> > +               p_err("interface %s detach failed",
> > +                     attach_type_strings[attach_type]);
> > +               return err;
> > +       }
> > +
> > +       if (json_output)
> > +               jsonw_null(json_wtr);
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_show(int argc, char **argv)
> >  {
> >         struct bpf_attach_info attach_info = {};
> > @@ -419,6 +456,7 @@ static int do_help(int argc, char **argv)
> >         fprintf(stderr,
> >                 "Usage: %s %s { show | list } [dev <devname>]\n"
> >                 "       %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
> > +               "       %s %s detach ATTACH_TYPE dev <devname>\n"
> >                 "       %s %s help\n"
> >                 "\n"
> >                 "       " HELP_SPEC_PROGRAM "\n"
> > @@ -429,7 +467,8 @@ static int do_help(int argc, char **argv)
> >                 "      to dump program attachments. For program types\n"
> >                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> >                 "      consult iproute2.\n",
> > -               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> > +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> > +               bin_name, argv[-2]);
> >
> >         return 0;
> >  }
> > @@ -438,6 +477,7 @@ static const struct cmd cmds[] = {
> >         { "show",       do_show },
> >         { "list",       do_show },
> >         { "attach",     do_attach },
> > +       { "detach",     do_detach },
> >         { "help",       do_help },
> >         { 0 }
> >  };
> > --
> > 2.20.1
> >  


^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Andrew Lunn @ 2019-08-07 18:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tao Ren, Samuel Mendoza-Jonas, David S . Miller, netdev,
	linux-kernel, openbmc, William Kennington, Joel Stanley
In-Reply-To: <20190807112518.644a21a2@cakuba.netronome.com>

On Wed, Aug 07, 2019 at 11:25:18AM -0700, Jakub Kicinski wrote:
> On Tue, 6 Aug 2019 17:21:18 -0700, Tao Ren wrote:
> > Currently BMC's MAC address is calculated by adding 1 to NCSI NIC's base
> > MAC address when CONFIG_NCSI_OEM_CMD_GET_MAC option is enabled. The logic
> > doesn't work for platforms with different BMC MAC offset: for example,
> > Facebook Yamp BMC's MAC address is calculated by adding 2 to NIC's base
> > MAC address ("BaseMAC + 1" is reserved for Host use).
> > 
> > This patch adds NET_NCSI_MC_MAC_OFFSET config option to customize offset
> > between NIC's Base MAC address and BMC's MAC address. Its default value is
> > set to 1 to avoid breaking existing users.
> > 
> > Signed-off-by: Tao Ren <taoren@fb.com>
> 
> Maybe someone more knowledgeable like Andrew has an opinion here, 
> but to me it seems a bit strange to encode what seems to be platfrom
> information in the kernel config :(

Yes, this is not a good idea. It makes it impossible to have a 'BMC
distro' kernel which you install on a number of different BMCs.

A device tree property would be better. Ideally it would be the
standard local-mac-address, or mac-address.

  Andrew

^ permalink raw reply

* Re: [PATCH net v2] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Willem de Bruijn @ 2019-08-07 18:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Willem de Bruijn, David Miller, Network Development, davejwatson,
	borisp, aviadye, John Fastabend, Daniel Borkmann, Eric Dumazet,
	Alexei Starovoitov, oss-drivers
In-Reply-To: <20190807110042.690cf50a@cakuba.netronome.com>

On Wed, Aug 7, 2019 at 2:01 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Wed, 7 Aug 2019 12:59:00 -0400, Willem de Bruijn wrote:
> > On Wed, Aug 7, 2019 at 2:06 AM Jakub Kicinski wrote:
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index d57b0cc995a0..0f9619b0892f 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1992,6 +1992,20 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
> > >  }
> > >  EXPORT_SYMBOL(skb_set_owner_w);
> > >
> > > +static bool can_skb_orphan_partial(const struct sk_buff *skb)
> > > +{
> > > +#ifdef CONFIG_TLS_DEVICE
> > > +       /* Drivers depend on in-order delivery for crypto offload,
> > > +        * partial orphan breaks out-of-order-OK logic.
> > > +        */
> > > +       if (skb->decrypted)
> > > +               return false;
> > > +#endif
> > > +       return (IS_ENABLED(CONFIG_INET) &&
> > > +               skb->destructor == tcp_wfree) ||
> >
> > Please add parentheses around IS_ENABLED(CONFIG_INET) &&
> > skb->destructor == tcp_wfree
>
> Mm.. there are parenthesis around them, maybe I'm being slow,
> could you show me how?

I mean

    return (skb->destructor == sock_wfree ||
               (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree))

In other words, (a || (b && c)) instead of (a || b && c). Though the
existing code also eschews the extra parentheses.

> > I was also surprised that this works when tcp_wfree is not defined if
> > !CONFIG_INET. But apparently it does (at -O2?) :)
>
> I was surprised to but in essence it should work the same as
>
>         if (IS_ENABLED(CONFIG_xyz))
>                 call_some_xyz_code();
>
> from compiler's perspective, and we do that a lot. Perhaps kbuild
> bot will prove us wrong :)
>
> > > @@ -984,6 +984,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> > >                         if (!skb)
> > >                                 goto wait_for_memory;
> > >
> > > +#ifdef CONFIG_TLS_DEVICE
> > > +                       skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED);
> > > +#endif
> >
> > Nothing is stopping userspace from passing this new flag. In send
> > (tcp_sendmsg_locked) it is ignored. But can it reach do_tcp_sendpages
> > through tcp_bpf_sendmsg?
>
> Ah, I think you're right, thanks for checking that :( I don't entirely
> follow how 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through
> ULP") is safe then.
>
> One option would be to clear the flags kernel would previously ignore
> in tcp_bpf_sendmsg(). But I feel like we should just go back to marking
> the socket, since we don't need the per-message flexibility of a flag.
>
> WDYT?

I don't feel strongly either way. Passing flags from send through
tcp_bpf_sendmsg is probably unintentional, so should probably be
addressed anyway? Then this is a bit simpler.

> > >                         skb_entail(sk, skb);
> > >                         copy = size_goal;
> > >                 }
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 6e4afc48d7bb..979520e46e33 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -1320,6 +1320,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
> > >         buff = sk_stream_alloc_skb(sk, nsize, gfp, true);
> > >         if (!buff)
> > >                 return -ENOMEM; /* We'll just try again later. */
> > > +       skb_copy_decrypted(buff, skb);
> >
> > This code has to copy timestamps, tx_flags, zerocopy state and now
> > this in three locations. Eventually we'll want a single helper for all
> > of them..
>
> Ack, should I take an action on that for net-next or was it a
> note-to-self? :)

Note-to-self :)

As a matter of fact, your patch showed me that we actually miss the
tstamp case in tcp_mtu_probe..

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Jakub Kicinski @ 2019-08-07 18:49 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrew Lunn, Jiri Pirko, netdev, davem, mlxsw, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <e0047c07-11a0-423c-9560-3806328a0d76@gmail.com>

On Tue, 6 Aug 2019 20:33:47 -0600, David Ahern wrote:
> Some time back supported was added for devlink 'resources'. The idea is
> that hardware (mlxsw) has limited resources (e.g., memory) that can be
> allocated in certain ways (e.g., kvd for mlxsw) thus implementing
> restrictions on the number of programmable entries (e.g., routes,
> neighbors) by userspace.
> 
> I contend:
> 
> 1. The kernel is an analogy to the hardware: it is programmed by
> userspace, has limited resources (e.g., memory), and that users want to
> control (e.g., limit) the number of networking entities that can be
> programmed - routes, rules, nexthop objects etc and by address family
> (ipv4, ipv6).

Memory hierarchy for ASIC is more complex and changes more often than
we want to change the model and kernel ABIs. The API in devlink is
intended for TCAM partitioning.

> 2. A consistent operational model across use cases - s/w forwarding, XDP
> forwarding and hardware forwarding - is good for users deploying systems
> based on the Linux networking stack. This aligns with my basic point at
> LPC last November about better integration of XDP and kernel tables.
> 
> The existing devlink API is the right one for all use cases. Most
> notably that the kernel can mimic the hardware from a resource
> management. Trying to say 'use cgroups for s/w forwarding and devlink
> for h/w forwarding' is complicating the lives of users. It is just a
> model and models can apply to more than some rigid definition.

This argument holds no water. Only a tiny fraction of Linux networking
users will have an high performance forwarding ASIC attached to their
CPUs. So we'll make 99.9% of users who never seen devlink learn the
tool for device control to control kernel resource?

Perhaps I'm misinterpreting your point there.

> As for the namespace piece of this, the kernel's tables for networking
> are *per namespace*, and so the resource controller must be per
> namespace. This aligns with another consistent theme I have promoted
> over the years - the ability to divide up a single ASIC into multiple,
> virtual switches which are managed per namespace. This is a very popular
> feature from a certain legacy vendor and one that would be good for open
> networking to achieve. This is the basis of my response last week about
> the devlink instance per namespace, and I thought Jiri was moving in
> that direction until our chat today. Jiri's intention is something
> different; we can discuss that on the next version of his patches.

Resource limits per namespace make perfect sense. Just not configured
via devlink..

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Jakub Kicinski @ 2019-08-07 18:57 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrew Lunn, Jiri Pirko, netdev, davem, mlxsw, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <153eb34b-05dd-4a85-88d8-e5723f41bbe3@gmail.com>

On Tue, 6 Aug 2019 21:10:40 -0600, David Ahern wrote:
> On 8/6/19 8:59 PM, Andrew Lunn wrote:
> > However, zoom out a bit, from networking to the whole kernel. In
> > general, across the kernel as a whole, resource management is done
> > with cgroups. cgroups is the consistent operational model across the
> > kernel as a whole.
> > 
> > So i think you need a second leg to your argument. You have said why
> > devlink is the right way to do this. But you should also be able to
> > say to Tejun Heo why cgroups is the wrong way to do this, going
> > against the kernel as a whole model. Why is networking special?
> >   
> 
> So you are saying mlxsw should be using a cgroups based API for its
> resources? netdevsim is for testing kernel APIs sans hardware. Is that
> not what the fib controller netdevsim is doing? It is from my perspective.

Why would all the drivers have to pay attention to resource limits?
Shouldn't we try to implement that at a higher layer?

> I am not the one arguing to change code and functionality that has
> existed for 16 months. I am arguing that the existing resource
> controller satisfies all existing goals (testing in kernel APIs) and
> even satisfies additional ones - like a consistent user experience
> managing networking resources. ie.., I see no reason to change what exists.

Please don't use the netdevsim code as an argument that something
already exists. The only legitimate use of that code is to validate
the devlink resource API and that the notifier can fail the insertion.

We try to encourage adding tests and are generally more willing to
merge test code. Possible abuse of that for establishing precedents 
is worrying.

^ permalink raw reply

* Re: [PATCH iproute2-next] rdma: Add driver QP type string
From: David Ahern @ 2019-08-07 19:07 UTC (permalink / raw)
  To: Gal Pressman, Stephen Hemminger; +Cc: netdev, linux-rdma, Leon Romanovsky
In-Reply-To: <20190804080756.58364-1-galpress@amazon.com>

On 8/4/19 2:07 AM, Gal Pressman wrote:
> RDMA resource tracker now tracks driver QPs as well, add driver QP type
> string to qp_types_to_str function.
> 
> Signed-off-by: Gal Pressman <galpress@amazon.com>
> ---
>  rdma/res.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 

applied to iproute2-next. Thanks



^ permalink raw reply

* Re: [PATCH iproute2-next v2] ip tunnel: add json output
From: David Ahern @ 2019-08-07 19:08 UTC (permalink / raw)
  To: Andrea Claudi, netdev; +Cc: stephen, dsahern
In-Reply-To: <a7a161117f3a68e5a0cea008a8ca7e80b42bf2fa.1564766777.git.aclaudi@redhat.com>

On 8/2/19 11:38 AM, Andrea Claudi wrote:
> Add json support on iptunnel and ip6tunnel.
> The plain text output format should remain the same.
> 
> Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
> ---
> Changes since v1:
>  * Use print_color_* for ifname and ip addresses;
>  * Use print_null() instead of print_bool() where appropriate;
>  * Reduce indentation level on tnl_print_gre_flags()
> ---
>  ip/ip6tunnel.c | 66 ++++++++++++++++++++++++---------------
>  ip/iptunnel.c  | 84 ++++++++++++++++++++++++++++++++------------------
>  ip/tunnel.c    | 37 +++++++++++++++++-----
>  3 files changed, 125 insertions(+), 62 deletions(-)
> 

applied to iproute2-next. Thanks



^ permalink raw reply

* [PATCH RFC] net: phy: add support for clause 37 auto-negotiation
From: Heiner Kallweit @ 2019-08-07 19:11 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

This patch adds support for clause 37 1000Base-X auto-negotiation.
It's compile-tested only as I don't have fiber equipment.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 139 +++++++++++++++++++++++++++++++++++
 include/linux/phy.h          |   5 ++
 2 files changed, 144 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 7ddd91df9..f24e3e7e5 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1617,6 +1617,40 @@ static int genphy_config_advert(struct phy_device *phydev)
 	return changed;
 }
 
+/**
+ * genphy_c37_config_advert - sanitize and advertise auto-negotiation parameters
+ * @phydev: target phy_device struct
+ *
+ * Description: Writes MII_ADVERTISE with the appropriate values,
+ *   after sanitizing the values to make sure we only advertise
+ *   what is supported.  Returns < 0 on error, 0 if the PHY's advertisement
+ *   hasn't changed, and > 0 if it has changed. This function is intended
+ *   for Clause 37 1000Base-X mode.
+ */
+static int genphy_c37_config_advert(struct phy_device *phydev)
+{
+	u16 adv = 0;
+
+	/* Only allow advertising what this PHY supports */
+	linkmode_and(phydev->advertising, phydev->advertising,
+		     phydev->supported);
+
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
+			      phydev->advertising))
+		adv |= ADVERTISE_1000XFULL;
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+			      phydev->advertising))
+		adv |= ADVERTISE_1000XPAUSE;
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+			      phydev->advertising))
+		adv |= ADVERTISE_1000XPSE_ASYM;
+
+	return phy_modify_changed(phydev, MII_ADVERTISE,
+				  ADVERTISE_1000XFULL | ADVERTISE_1000XPAUSE |
+				  ADVERTISE_1000XHALF | ADVERTISE_1000XPSE_ASYM,
+				  adv);
+}
+
 /**
  * genphy_config_eee_advert - disable unwanted eee mode advertisement
  * @phydev: target phy_device struct
@@ -1726,6 +1760,54 @@ int genphy_config_aneg(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(genphy_config_aneg);
 
+/**
+ * genphy_c37_config_aneg - restart auto-negotiation or write BMCR
+ * @phydev: target phy_device struct
+ *
+ * Description: If auto-negotiation is enabled, we configure the
+ *   advertising, and then restart auto-negotiation.  If it is not
+ *   enabled, then we write the BMCR. This function is intended
+ *   for use with Clause 37 1000Base-X mode.
+ */
+int genphy_c37_config_aneg(struct phy_device *phydev)
+{
+	int err, changed;
+
+	if (AUTONEG_ENABLE != phydev->autoneg)
+		return genphy_setup_forced(phydev);
+
+	err = phy_modify(phydev, MII_BMCR, BMCR_SPEED1000 | BMCR_SPEED100,
+			 BMCR_SPEED1000);
+	if (err)
+		return err;
+
+	changed = genphy_c37_config_advert(phydev);
+	if (changed < 0) /* error */
+		return changed;
+
+	if (!changed) {
+		/* Advertisement hasn't changed, but maybe aneg was never on to
+		 * begin with?  Or maybe phy was isolated?
+		 */
+		int ctl = phy_read(phydev, MII_BMCR);
+
+		if (ctl < 0)
+			return ctl;
+
+		if (!(ctl & BMCR_ANENABLE) || (ctl & BMCR_ISOLATE))
+			changed = 1; /* do restart aneg */
+	}
+
+	/* Only restart aneg if we are advertising something different
+	 * than we were before.
+	 */
+	if (changed > 0)
+		return genphy_restart_aneg(phydev);
+
+	return 0;
+}
+EXPORT_SYMBOL(genphy_c37_config_aneg);
+
 /**
  * genphy_aneg_done - return auto-negotiation status
  * @phydev: target phy_device struct
@@ -1864,6 +1946,63 @@ int genphy_read_status(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(genphy_read_status);
 
+/**
+ * genphy_c37_read_status - check the link status and update current link state
+ * @phydev: target phy_device struct
+ *
+ * Description: Check the link, then figure out the current state
+ *   by comparing what we advertise with what the link partner
+ *   advertises. This function is for Clause 37 1000Base-X mode.
+ */
+int genphy_c37_read_status(struct phy_device *phydev)
+{
+	int lpa, err, old_link = phydev->link;
+
+	/* Update the link, but return if there was an error */
+	err = genphy_update_link(phydev);
+	if (err)
+		return err;
+
+	/* why bother the PHY if nothing can have changed */
+	if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link)
+		return 0;
+
+	phydev->duplex = DUPLEX_UNKNOWN;
+	phydev->pause = 0;
+	phydev->asym_pause = 0;
+
+	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
+		lpa = phy_read(phydev, MII_LPA);
+		if (lpa < 0)
+			return lpa;
+
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+				 phydev->lp_advertising, lpa & LPA_LPACK);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
+				 phydev->lp_advertising, lpa & LPA_1000XFULL);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+				 phydev->lp_advertising, lpa & LPA_1000XPAUSE);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+				 phydev->lp_advertising,
+				 lpa & LPA_1000XPAUSE_ASYM);
+
+		phy_resolve_aneg_linkmode(phydev);
+	} else if (phydev->autoneg == AUTONEG_DISABLE) {
+		int bmcr = phy_read(phydev, MII_BMCR);
+
+		if (bmcr < 0)
+			return bmcr;
+
+		if (bmcr & BMCR_FULLDPLX)
+			phydev->duplex = DUPLEX_FULL;
+		else
+			phydev->duplex = DUPLEX_HALF;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(genphy_c37_read_status);
+
 /**
  * genphy_soft_reset - software reset the PHY via BMCR_RESET bit
  * @phydev: target phy_device struct
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73..36cbdfe84 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1077,6 +1077,11 @@ int genphy_suspend(struct phy_device *phydev);
 int genphy_resume(struct phy_device *phydev);
 int genphy_loopback(struct phy_device *phydev, bool enable);
 int genphy_soft_reset(struct phy_device *phydev);
+
+/* Clause 37 */
+int genphy_c37_aneg_done(struct phy_device *phydev);
+int genphy_c37_read_status(struct phy_device *phydev);
+
 static inline int genphy_no_soft_reset(struct phy_device *phydev)
 {
 	return 0;
-- 
2.22.0


^ permalink raw reply related

* Re: [v3,2/4] tools: bpftool: add net detach command to detach XDP on interface
From: Jakub Kicinski @ 2019-08-07 19:14 UTC (permalink / raw)
  To: Y Song; +Cc: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAH3MdRW4LgdLoqSpLsWUOwjnNhJA1sodHqSD2Z14JY6aHMaKxg@mail.gmail.com>

On Wed, 7 Aug 2019 10:02:04 -0700, Y Song wrote:
> -bash-4.4$ sudo ./bpftool net detach x dev v2
> -bash-4.4$ sudo ./bpftool net
> xdp:
> v1(4) driver id 1172
> 
> tc:
> eth0(2) clsact/ingress fbflow_icmp id 29 act []
> eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
> eth0(2) clsact/egress fbflow_egress id 28
> eth0(2) clsact/egress fbflow_sslwall_egress id 35
> 
> flow_dissector:
> 
> -bash-4.4$
> 
> Basically detaching may fail due to wrong dev name or wrong type, etc.
> But the tool did not return an error. Is this expected?
> This may be related to this funciton "bpf_set_link_xdp_fd()".
> So this patch itself should be okay.

Yup, I'm pretty sure kernel doesn't return errors on unset if there 
is no program on the interface.

^ permalink raw reply

* Re: [PATCH net-next v4 2/2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Heiner Kallweit @ 2019-08-07 19:18 UTC (permalink / raw)
  To: Tao Ren, Andrew Lunn, Florian Fainelli, David S . Miller,
	Arun Parameswaran, Justin Chen, Vladimir Oltean,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org
In-Reply-To: <fe0d39ea-91f3-0cac-f13b-3d46ea1748a3@fb.com>

On 06.08.2019 23:42, Tao Ren wrote:
> Hi Andrew / Heiner / Vladimir,
> 
> On 8/6/19 2:09 PM, Tao Ren wrote:
>> The BCM54616S PHY cannot work properly in RGMII->1000Base-KX mode (for
>> example, on Facebook CMM BMC platform), mainly because genphy functions
>> are designed for copper links, and 1000Base-X (clause 37) auto negotiation
>> needs to be handled differently.
>>
>> This patch enables 1000Base-X support for BCM54616S by customizing 3
>> driver callbacks:
>>
>>   - probe: probe callback detects PHY's operation mode based on
>>     INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
>>     Control register.
>>
>>   - config_aneg: bcm54616s_config_aneg_1000bx function is added for auto
>>     negotiation in 1000Base-X mode.
>>
>>   - read_status: BCM54616S and BCM5482 PHY share the same read_status
>>     callback which manually set link speed and duplex mode in 1000Base-X
>>     mode.
>>
>> Signed-off-by: Tao Ren <taoren@fb.com>
> 
> I customized config_aneg function for BCM54616S 1000Base-X mode and link-down issue is also fixed: the patch is tested on Facebook CMM and Minipack BMC and everything looks normal. Please kindly review when you have bandwidth and let me know if you have further suggestions.
> 
> BTW, I would be happy to help if we decide to add a set of genphy functions for clause 37, although that may mean I need more help/guidance from you :-)

You want to have standard clause 37 aneg and this should be generic in phylib.
I hacked together a first version that is compile-tested only:
https://patchwork.ozlabs.org/patch/1143631/
It supports fixed mode too.

It doesn't support half duplex mode because phylib doesn't know 1000BaseX HD yet.
Not sure whether half duplex mode is used at all in reality.

You could test the new core functions in your own config_aneg and read_status
callback implementations.

> 
> 
> Cheers,
> 
> Tao
> 
Heiner

^ permalink raw reply

* Re: [PATCH v4 bpf-next 02/14] libbpf: convert libbpf code to use new btf helpers
From: Alexei Starovoitov @ 2019-08-07 19:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, yhs, andrii.nakryiko, kernel-team
In-Reply-To: <20190807053806.1534571-3-andriin@fb.com>

On Tue, Aug 06, 2019 at 10:37:54PM -0700, Andrii Nakryiko wrote:
> Simplify code by relying on newly added BTF helper functions.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
..
>  
> -	for (i = 0, vsi = (struct btf_var_secinfo *)(t + 1);
> -	     i < vars; i++, vsi++) {
> +	for (i = 0, vsi = (void *)btf_var_secinfos(t); i < vars; i++, vsi++) {

> +			struct btf_member *m = (void *)btf_members(t);
...
>  		case BTF_KIND_ENUM: {
> -			struct btf_enum *m = (struct btf_enum *)(t + 1);
> -			__u16 vlen = BTF_INFO_VLEN(t->info);
> +			struct btf_enum *m = (void *)btf_enum(t);
> +			__u16 vlen = btf_vlen(t);
...
>  		case BTF_KIND_FUNC_PROTO: {
> -			struct btf_param *m = (struct btf_param *)(t + 1);
> -			__u16 vlen = BTF_INFO_VLEN(t->info);
> +			struct btf_param *m = (void *)btf_params(t);
> +			__u16 vlen = btf_vlen(t);

So all of these 'void *' type hacks are only to drop const-ness ?
May be the helpers shouldn't be taking const then?


^ permalink raw reply

* Re: [PATCH v4 bpf-next 01/14] libbpf: add helpers for working with BTF types
From: Alexei Starovoitov @ 2019-08-07 19:31 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, yhs, andrii.nakryiko, kernel-team
In-Reply-To: <20190807053806.1534571-2-andriin@fb.com>

On Tue, Aug 06, 2019 at 10:37:53PM -0700, Andrii Nakryiko wrote:
> Add lots of frequently used helpers that simplify working with BTF
> types.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
..
> +/* get bitfield size of a member, assuming t is BTF_KIND_STRUCT or
> + * BTF_KIND_UNION. If member is not a bitfield, zero is returned. */

Invalid comment style.


^ permalink raw reply

* Re: [PATCH net] netdevsim: Restore per-network namespace accounting for fib entries
From: David Ahern @ 2019-08-07 19:31 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, David Ahern, davem, netdev
In-Reply-To: <20190807130739.GA2201@nanopsycho>

On 8/7/19 7:07 AM, Jiri Pirko wrote:
> 
> Yeah. I believe it was a mistake to add it in the first place. Abuses
> netdevsim for something it is not. I'm fine to use devlink the way you
> want to after we conclude 2), but outside netdevsim.
> 
> Again, netdevsim is there for config api testing purposes. If things
> got broken, it is not that bit deal. I broke the way it is
> instantiated significantly for example (iplink->sysfs).
>

netdevsim was created as a way of testing hardware focused kernel APIs
and code paths without hardware. yes?

The devlink api was added to netdevsim to test fib notifiers failing by
handlers. The notifiers were added for mlxsw - a hardware driver - as a
way to get notifications of fib changes.

The easiest way to test the error paths was to code limits to fib
entries very similar to what mlxsw implements with its 'devlink
resource' implementation. ie., to implement 'devlink resource' for a
software emulated device. netdevsim was the most logical place for it.

See the commit logs starting at:

commit b349e0b5ec5d7be57ac243fb08ae8b994c928165
Merge: 6e2135ce54b7 37923ed6b8ce
Author: David S. Miller <davem@davemloft.net>
Date:   Thu Mar 29 14:10:31 2018 -0400

    Merge branch 'net-Allow-FIB-notifiers-to-fail-add-and-replace'

    David Ahern says:

    ====================
    net: Allow FIB notifiers to fail add and replace

Everything about that implementation is using the s/w device to test
code paths targeted at hardware offloads.

^ permalink raw reply

* [PATCH net-next] r8169: allocate rx buffers using alloc_pages_node
From: Heiner Kallweit @ 2019-08-07 19:38 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org

We allocate 16kb per rx buffer, so we can avoid some overhead by using
alloc_pages_node directly instead of bothering kmalloc_node. Due to
this change buffers are page-aligned now, therefore the alignment check
can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 45 ++++++++++-------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 3c7af6669..4f0b32280 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -642,7 +642,7 @@ struct rtl8169_private {
 	struct RxDesc *RxDescArray;	/* 256-aligned Rx descriptor ring */
 	dma_addr_t TxPhyAddr;
 	dma_addr_t RxPhyAddr;
-	void *Rx_databuff[NUM_RX_DESC];	/* Rx data buffers */
+	struct page *Rx_databuff[NUM_RX_DESC];	/* Rx data buffers */
 	struct ring_info tx_skb[NUM_TX_DESC];	/* Tx data buffers */
 	u16 cp_cmd;
 	u16 irq_mask;
@@ -5261,12 +5261,13 @@ static inline void rtl8169_make_unusable_by_asic(struct RxDesc *desc)
 }
 
 static void rtl8169_free_rx_databuff(struct rtl8169_private *tp,
-				     void **data_buff, struct RxDesc *desc)
+				     struct page **data_buff,
+				     struct RxDesc *desc)
 {
-	dma_unmap_single(tp_to_dev(tp), le64_to_cpu(desc->addr),
-			 R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
+	dma_unmap_page(tp_to_dev(tp), le64_to_cpu(desc->addr),
+		       R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
 
-	kfree(*data_buff);
+	__free_pages(*data_buff, get_order(R8169_RX_BUF_SIZE));
 	*data_buff = NULL;
 	rtl8169_make_unusable_by_asic(desc);
 }
@@ -5281,38 +5282,30 @@ static inline void rtl8169_mark_to_asic(struct RxDesc *desc)
 	desc->opts1 = cpu_to_le32(DescOwn | eor | R8169_RX_BUF_SIZE);
 }
 
-static struct sk_buff *rtl8169_alloc_rx_data(struct rtl8169_private *tp,
-					     struct RxDesc *desc)
+static struct page *rtl8169_alloc_rx_data(struct rtl8169_private *tp,
+					  struct RxDesc *desc)
 {
-	void *data;
-	dma_addr_t mapping;
 	struct device *d = tp_to_dev(tp);
 	int node = dev_to_node(d);
+	dma_addr_t mapping;
+	struct page *data;
 
-	data = kmalloc_node(R8169_RX_BUF_SIZE, GFP_KERNEL, node);
+	data = alloc_pages_node(node, GFP_KERNEL, get_order(R8169_RX_BUF_SIZE));
 	if (!data)
 		return NULL;
 
-	/* Memory should be properly aligned, but better check. */
-	if (!IS_ALIGNED((unsigned long)data, 8)) {
-		netdev_err_once(tp->dev, "RX buffer not 8-byte-aligned\n");
-		goto err_out;
-	}
-
-	mapping = dma_map_single(d, data, R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
+	mapping = dma_map_page(d, data, 0, R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
 	if (unlikely(dma_mapping_error(d, mapping))) {
 		if (net_ratelimit())
 			netif_err(tp, drv, tp->dev, "Failed to map RX DMA!\n");
-		goto err_out;
+		__free_pages(data, get_order(R8169_RX_BUF_SIZE));
+		return NULL;
 	}
 
 	desc->addr = cpu_to_le64(mapping);
 	rtl8169_mark_to_asic(desc);
-	return data;
 
-err_out:
-	kfree(data);
-	return NULL;
+	return data;
 }
 
 static void rtl8169_rx_clear(struct rtl8169_private *tp)
@@ -5337,7 +5330,7 @@ static int rtl8169_rx_fill(struct rtl8169_private *tp)
 	unsigned int i;
 
 	for (i = 0; i < NUM_RX_DESC; i++) {
-		void *data;
+		struct page *data;
 
 		data = rtl8169_alloc_rx_data(tp, tp->RxDescArray + i);
 		if (!data) {
@@ -5892,6 +5885,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 
 	for (rx_left = min(budget, NUM_RX_DESC); rx_left > 0; rx_left--, cur_rx++) {
 		unsigned int entry = cur_rx % NUM_RX_DESC;
+		const void *rx_buf = page_address(tp->Rx_databuff[entry]);
 		struct RxDesc *desc = tp->RxDescArray + entry;
 		u32 status;
 
@@ -5946,9 +5940,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 				goto release_descriptor;
 			}
 
-			prefetch(tp->Rx_databuff[entry]);
-			skb_copy_to_linear_data(skb, tp->Rx_databuff[entry],
-						pkt_size);
+			prefetch(rx_buf);
+			skb_copy_to_linear_data(skb, rx_buf, pkt_size);
 			skb->tail += pkt_size;
 			skb->len = pkt_size;
 
-- 
2.22.0


^ permalink raw reply related

* Why is this WiFi booster so popular across the World?
From:  Horace Harrington @ 2019-08-07 19:24 UTC (permalink / raw)
  To: netdev

Internet providers make big money by overcharging you for faster internet lines. Could this little device really be the one solution that we've all been waiting for?

Frequency: 2.4Ghz
Wireless Rate: 300Mbps
Interface: 1 10/100Mbps WAN/LAN RJ45 Ports

Amazing new technology find out here! http://bit.ly/asdf411gt

------------------------
If you'd prefer not to receive future emails, Unsubscribe Here http://bit.ly/itr4fgy
11041 Santa Monica Blvd. #301 Los Angeles, CA 90025

^ permalink raw reply

* Re: [PATCH net v2] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Willem de Bruijn @ 2019-08-07 19:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, Network Development, davejwatson, borisp, aviadye,
	John Fastabend, Daniel Borkmann, Eric Dumazet, Alexei Starovoitov,
	oss-drivers
In-Reply-To: <CA+FuTSeR1QqAZVTLQ-mJ8iHi+h+ghbrGyT6TWATTecQSbQP6sA@mail.gmail.com>

On Wed, Aug 7, 2019 at 2:47 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, Aug 7, 2019 at 2:01 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> >
> > On Wed, 7 Aug 2019 12:59:00 -0400, Willem de Bruijn wrote:
> > > On Wed, Aug 7, 2019 at 2:06 AM Jakub Kicinski wrote:
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index d57b0cc995a0..0f9619b0892f 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -1992,6 +1992,20 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
> > > >  }
> > > >  EXPORT_SYMBOL(skb_set_owner_w);
> > > >
> > > > +static bool can_skb_orphan_partial(const struct sk_buff *skb)
> > > > +{
> > > > +#ifdef CONFIG_TLS_DEVICE
> > > > +       /* Drivers depend on in-order delivery for crypto offload,
> > > > +        * partial orphan breaks out-of-order-OK logic.
> > > > +        */
> > > > +       if (skb->decrypted)
> > > > +               return false;
> > > > +#endif
> > > > +       return (IS_ENABLED(CONFIG_INET) &&
> > > > +               skb->destructor == tcp_wfree) ||
> > >
> > > Please add parentheses around IS_ENABLED(CONFIG_INET) &&
> > > skb->destructor == tcp_wfree
> >
> > Mm.. there are parenthesis around them, maybe I'm being slow,
> > could you show me how?
>
> I mean
>
>     return (skb->destructor == sock_wfree ||
>                (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree))
>
> In other words, (a || (b && c)) instead of (a || b && c). Though the
> existing code also eschews the extra parentheses.

No, ignore that last bit. It uses #ifdef, of course.

^ permalink raw reply

* Re: [PATCH v4 bpf-next 01/14] libbpf: add helpers for working with BTF types
From: Andrii Nakryiko @ 2019-08-07 19:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <20190807193110.p5flmxojmdjdg4dj@ast-mbp>

On Wed, Aug 7, 2019 at 12:31 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 06, 2019 at 10:37:53PM -0700, Andrii Nakryiko wrote:
> > Add lots of frequently used helpers that simplify working with BTF
> > types.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ..
> > +/* get bitfield size of a member, assuming t is BTF_KIND_STRUCT or
> > + * BTF_KIND_UNION. If member is not a bitfield, zero is returned. */
>
> Invalid comment style.
>

oops, fixing

^ permalink raw reply

* [PATCH net 0/2] Fix batched event generation for skbedit action
From: Roman Mashak @ 2019-08-07 19:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.

For example, consider the following script adding 32 entries with all
supported skbedit parameters and cookie (in order to maximize netlink
messages size):

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action skbedit
for i in `seq 1 $1`;
do
   cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \
               ptype host inheritdsfield \
               index $i cookie aabbccddeeff112233445566778800a1 "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of skbedit action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant skbedit test cases.

Roman Mashak (2):
  net sched: update skbedit action for batched events operations
  tc-testing: updated skbedit action tests with batch create/delete

 net/sched/act_skbedit.c                            | 12 ++++++
 .../tc-testing/tc-tests/actions/skbedit.json       | 47 ++++++++++++++++++++++
 2 files changed, 59 insertions(+)

-- 
2.7.4


^ permalink raw reply

* [PATCH net 1/2] net sched: update skbedit action for batched events operations
From: Roman Mashak @ 2019-08-07 19:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1565207849-11442-1-git-send-email-mrv@mojatatu.com>

Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_skbedit.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index b100870f02a6..37dced00b63d 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -307,6 +307,17 @@ static int tcf_skbedit_search(struct net *net, struct tc_action **a, u32 index)
 	return tcf_idr_search(tn, a, index);
 }
 
+static size_t tcf_skbedit_get_fill_size(const struct tc_action *act)
+{
+	return nla_total_size(sizeof(struct tc_skbedit))
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_PRIORITY */
+		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_QUEUE_MAPPING */
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MARK */
+		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_PTYPE */
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MASK */
+		+ nla_total_size_64bit(sizeof(u64)); /* TCA_SKBEDIT_FLAGS */
+}
+
 static struct tc_action_ops act_skbedit_ops = {
 	.kind		=	"skbedit",
 	.id		=	TCA_ID_SKBEDIT,
@@ -316,6 +327,7 @@ static struct tc_action_ops act_skbedit_ops = {
 	.init		=	tcf_skbedit_init,
 	.cleanup	=	tcf_skbedit_cleanup,
 	.walk		=	tcf_skbedit_walker,
+	.get_fill_size	=	tcf_skbedit_get_fill_size,
 	.lookup		=	tcf_skbedit_search,
 	.size		=	sizeof(struct tcf_skbedit),
 };
-- 
2.7.4


^ permalink raw reply related

* [PATCH net 2/2] tc-testing: updated skbedit action tests with batch create/delete
From: Roman Mashak @ 2019-08-07 19:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1565207849-11442-1-git-send-email-mrv@mojatatu.com>

Update TDC tests with cases varifying ability of TC to install or delete
batches of skbedit actions.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/skbedit.json       | 47 ++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
index bf5ebf59c2d4..9cdd2e31ac2c 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
@@ -670,5 +670,52 @@
         "teardown": [
             "$TC actions flush action skbedit"
         ]
+    },
+    {
+        "id": "630c",
+        "name": "Add batch of 32 skbedit actions with all parameters and cookie",
+        "category": [
+            "actions",
+            "skbedit"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action skbedit",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd ptype host inheritdsfield index \\$i cookie aabbccddeeff112233445566778800a1 \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action skbedit",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action skbedit"
+        ]
+    },
+    {
+        "id": "706d",
+        "name": "Delete batch of 32 skbedit actions with all parameters",
+        "category": [
+            "actions",
+            "skbedit"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action skbedit",
+                0,
+                1,
+                255
+            ],
+            "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd ptype host inheritdsfield index \\$i \\\"; args=\\\"\\$args\\$cmd\\\"; done && $TC actions add \\$args\""
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action skbedit",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
     }
 ]
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH v4 bpf-next 02/14] libbpf: convert libbpf code to use new btf helpers
From: Andrii Nakryiko @ 2019-08-07 19:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <20190807193011.g2zuaapc2uvvr4h6@ast-mbp>

On Wed, Aug 7, 2019 at 12:30 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 06, 2019 at 10:37:54PM -0700, Andrii Nakryiko wrote:
> > Simplify code by relying on newly added BTF helper functions.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ..
> >
> > -     for (i = 0, vsi = (struct btf_var_secinfo *)(t + 1);
> > -          i < vars; i++, vsi++) {
> > +     for (i = 0, vsi = (void *)btf_var_secinfos(t); i < vars; i++, vsi++) {
>
> > +                     struct btf_member *m = (void *)btf_members(t);
> ...
> >               case BTF_KIND_ENUM: {
> > -                     struct btf_enum *m = (struct btf_enum *)(t + 1);
> > -                     __u16 vlen = BTF_INFO_VLEN(t->info);
> > +                     struct btf_enum *m = (void *)btf_enum(t);
> > +                     __u16 vlen = btf_vlen(t);
> ...
> >               case BTF_KIND_FUNC_PROTO: {
> > -                     struct btf_param *m = (struct btf_param *)(t + 1);
> > -                     __u16 vlen = BTF_INFO_VLEN(t->info);
> > +                     struct btf_param *m = (void *)btf_params(t);
> > +                     __u16 vlen = btf_vlen(t);
>
> So all of these 'void *' type hacks are only to drop const-ness ?

Yes.

> May be the helpers shouldn't be taking const then?
>

Probably not, because then we'll have much wider-spread problem of
casting const pointers into non-const when passing btf_type into
helpers.
I think const as a default is the right choice, because normally BTF
is immutable and btf__type_by_id is returning const pointer, etc.
That's typical and expected use-case. btf_dedup and BTF sanitization +
datasec size setting pieces are an exception that have to modify BTF
types in place before passing it to user.

So realistically I think we can just leave it as (void *), or I can do
explicit non-const type casts, or we can just not use helpers for
mutable cases. Do you have a preference?

^ permalink raw reply

* Re: [PATCH v4 bpf-next 02/14] libbpf: convert libbpf code to use new btf helpers
From: Alexei Starovoitov @ 2019-08-07 20:01 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Daniel Borkmann, Yonghong Song,
	Kernel Team
In-Reply-To: <CAEf4BzahxLWRVNcNWpba7_7CbbQgN8k0RU8Ya1XCK8j4rPQ0NQ@mail.gmail.com>

On 8/7/19 12:59 PM, Andrii Nakryiko wrote:
> On Wed, Aug 7, 2019 at 12:30 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Tue, Aug 06, 2019 at 10:37:54PM -0700, Andrii Nakryiko wrote:
>>> Simplify code by relying on newly added BTF helper functions.
>>>
>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>> ..
>>>
>>> -     for (i = 0, vsi = (struct btf_var_secinfo *)(t + 1);
>>> -          i < vars; i++, vsi++) {
>>> +     for (i = 0, vsi = (void *)btf_var_secinfos(t); i < vars; i++, vsi++) {
>>
>>> +                     struct btf_member *m = (void *)btf_members(t);
>> ...
>>>                case BTF_KIND_ENUM: {
>>> -                     struct btf_enum *m = (struct btf_enum *)(t + 1);
>>> -                     __u16 vlen = BTF_INFO_VLEN(t->info);
>>> +                     struct btf_enum *m = (void *)btf_enum(t);
>>> +                     __u16 vlen = btf_vlen(t);
>> ...
>>>                case BTF_KIND_FUNC_PROTO: {
>>> -                     struct btf_param *m = (struct btf_param *)(t + 1);
>>> -                     __u16 vlen = BTF_INFO_VLEN(t->info);
>>> +                     struct btf_param *m = (void *)btf_params(t);
>>> +                     __u16 vlen = btf_vlen(t);
>>
>> So all of these 'void *' type hacks are only to drop const-ness ?
> 
> Yes.
> 
>> May be the helpers shouldn't be taking const then?
>>
> 
> Probably not, because then we'll have much wider-spread problem of
> casting const pointers into non-const when passing btf_type into
> helpers.
> I think const as a default is the right choice, because normally BTF
> is immutable and btf__type_by_id is returning const pointer, etc.
> That's typical and expected use-case. btf_dedup and BTF sanitization +
> datasec size setting pieces are an exception that have to modify BTF
> types in place before passing it to user.
> 
> So realistically I think we can just leave it as (void *), or I can do
> explicit non-const type casts, or we can just not use helpers for
> mutable cases. Do you have a preference?

Hmm. Take a const into the helper and drop it there?
I'd like to avoid all these 'void *'.

^ permalink raw reply

* Re: [PATCH v4 bpf-next 02/14] libbpf: convert libbpf code to use new btf helpers
From: Andrii Nakryiko @ 2019-08-07 20:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <151d13d1-e894-56cc-4690-4661c8afc65e@fb.com>

On Wed, Aug 7, 2019 at 1:01 PM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 8/7/19 12:59 PM, Andrii Nakryiko wrote:
> > On Wed, Aug 7, 2019 at 12:30 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >> On Tue, Aug 06, 2019 at 10:37:54PM -0700, Andrii Nakryiko wrote:
> >>> Simplify code by relying on newly added BTF helper functions.
> >>>
> >>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >> ..
> >>>
> >>> -     for (i = 0, vsi = (struct btf_var_secinfo *)(t + 1);
> >>> -          i < vars; i++, vsi++) {
> >>> +     for (i = 0, vsi = (void *)btf_var_secinfos(t); i < vars; i++, vsi++) {
> >>
> >>> +                     struct btf_member *m = (void *)btf_members(t);
> >> ...
> >>>                case BTF_KIND_ENUM: {
> >>> -                     struct btf_enum *m = (struct btf_enum *)(t + 1);
> >>> -                     __u16 vlen = BTF_INFO_VLEN(t->info);
> >>> +                     struct btf_enum *m = (void *)btf_enum(t);
> >>> +                     __u16 vlen = btf_vlen(t);
> >> ...
> >>>                case BTF_KIND_FUNC_PROTO: {
> >>> -                     struct btf_param *m = (struct btf_param *)(t + 1);
> >>> -                     __u16 vlen = BTF_INFO_VLEN(t->info);
> >>> +                     struct btf_param *m = (void *)btf_params(t);
> >>> +                     __u16 vlen = btf_vlen(t);
> >>
> >> So all of these 'void *' type hacks are only to drop const-ness ?
> >
> > Yes.
> >
> >> May be the helpers shouldn't be taking const then?
> >>
> >
> > Probably not, because then we'll have much wider-spread problem of
> > casting const pointers into non-const when passing btf_type into
> > helpers.
> > I think const as a default is the right choice, because normally BTF
> > is immutable and btf__type_by_id is returning const pointer, etc.
> > That's typical and expected use-case. btf_dedup and BTF sanitization +
> > datasec size setting pieces are an exception that have to modify BTF
> > types in place before passing it to user.
> >
> > So realistically I think we can just leave it as (void *), or I can do
> > explicit non-const type casts, or we can just not use helpers for
> > mutable cases. Do you have a preference?
>
> Hmm. Take a const into the helper and drop it there?
> I'd like to avoid all these 'void *'.

Well, I guess it's a way to do this as well. Given it's C, I guess
it's acceptable to be so free about constness. I'll update patches.

^ permalink raw reply

* Re: [v3,2/4] tools: bpftool: add net detach command to detach XDP on interface
From: Y Song @ 2019-08-07 20:12 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190807203041.000020a8@gmail.com>

On Wed, Aug 7, 2019 at 11:30 AM Maciej Fijalkowski
<maciejromanfijalkowski@gmail.com> wrote:
>
> On Wed, 7 Aug 2019 10:02:04 -0700
> Y Song <ys114321@gmail.com> wrote:
>
> > On Tue, Aug 6, 2019 at 7:25 PM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> > >
> > > By this commit, using `bpftool net detach`, the attached XDP prog can
> > > be detached. Detaching the BPF prog will be done through libbpf
> > > 'bpf_set_link_xdp_fd' with the progfd set to -1.
> > >
> > > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > > ---
> > >  tools/bpf/bpftool/net.c | 42 ++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 41 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > > index c05a3fac5cac..7be96acb08e0 100644
> > > --- a/tools/bpf/bpftool/net.c
> > > +++ b/tools/bpf/bpftool/net.c
> > > @@ -343,6 +343,43 @@ static int do_attach(int argc, char **argv)
> > >         return 0;
> > >  }
> > >
> > > +static int do_detach(int argc, char **argv)
> > > +{
> > > +       enum net_attach_type attach_type;
> > > +       int progfd, ifindex, err = 0;
> > > +
> > > +       /* parse detach args */
> > > +       if (!REQ_ARGS(3))
> > > +               return -EINVAL;
> > > +
> > > +       attach_type = parse_attach_type(*argv);
> > > +       if (attach_type == max_net_attach_type) {
> > > +               p_err("invalid net attach/detach type");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       NEXT_ARG();
> > > +       ifindex = net_parse_dev(&argc, &argv);
> > > +       if (ifindex < 1)
> > > +               return -EINVAL;
> > > +
> > > +       /* detach xdp prog */
> > > +       progfd = -1;
> > > +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> > > +               err = do_attach_detach_xdp(progfd, attach_type, ifindex, NULL);
> >
> > I found an issue here. This is probably related to do_attach_detach_xdp.
> >
> > -bash-4.4$ sudo ./bpftool net attach x pinned /sys/fs/bpf/xdp_example dev v1
> > -bash-4.4$ sudo ./bpftool net
> > xdp:
> > v1(4) driver id 1172
> >
> > tc:
> > eth0(2) clsact/ingress fbflow_icmp id 29 act []
> > eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
> > eth0(2) clsact/egress fbflow_egress id 28
> > eth0(2) clsact/egress fbflow_sslwall_egress id 35
> >
> > flow_dissector:
> >
> > -bash-4.4$ sudo ./bpftool net detach x dev v2
>
> Shouldn't this be v1 as dev?

I am testing a scenario where with wrong devname
we did not return an error.

Yes, if dev "v1", it works as expected.

>
> > -bash-4.4$ sudo ./bpftool net
> > xdp:
> > v1(4) driver id 1172
> >
> > tc:
> > eth0(2) clsact/ingress fbflow_icmp id 29 act []
> > eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
> > eth0(2) clsact/egress fbflow_egress id 28
> > eth0(2) clsact/egress fbflow_sslwall_egress id 35
> >
> > flow_dissector:
> >
> > -bash-4.4$
> >
> > Basically detaching may fail due to wrong dev name or wrong type, etc.
> > But the tool did not return an error. Is this expected?
> > This may be related to this funciton "bpf_set_link_xdp_fd()".
> > So this patch itself should be okay.
> >
> > > +
> > > +       if (err < 0) {
> > > +               p_err("interface %s detach failed",
> > > +                     attach_type_strings[attach_type]);
> > > +               return err;
> > > +       }
> > > +
> > > +       if (json_output)
> > > +               jsonw_null(json_wtr);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  static int do_show(int argc, char **argv)
> > >  {
> > >         struct bpf_attach_info attach_info = {};
> > > @@ -419,6 +456,7 @@ static int do_help(int argc, char **argv)
> > >         fprintf(stderr,
> > >                 "Usage: %s %s { show | list } [dev <devname>]\n"
> > >                 "       %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
> > > +               "       %s %s detach ATTACH_TYPE dev <devname>\n"
> > >                 "       %s %s help\n"
> > >                 "\n"
> > >                 "       " HELP_SPEC_PROGRAM "\n"
> > > @@ -429,7 +467,8 @@ static int do_help(int argc, char **argv)
> > >                 "      to dump program attachments. For program types\n"
> > >                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> > >                 "      consult iproute2.\n",
> > > -               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> > > +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> > > +               bin_name, argv[-2]);
> > >
> > >         return 0;
> > >  }
> > > @@ -438,6 +477,7 @@ static const struct cmd cmds[] = {
> > >         { "show",       do_show },
> > >         { "list",       do_show },
> > >         { "attach",     do_attach },
> > > +       { "detach",     do_detach },
> > >         { "help",       do_help },
> > >         { 0 }
> > >  };
> > > --
> > > 2.20.1
> > >
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox