Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] mvpp2: refactor MTU change code
From: David Miller @ 2019-07-27 20:24 UTC (permalink / raw)
  To: mcroce; +Cc: netdev, antoine.tenart, maxime.chevallier, mw, stefanc,
	linux-kernel
In-Reply-To: <20190725231931.24073-1-mcroce@redhat.com>

From: Matteo Croce <mcroce@redhat.com>
Date: Fri, 26 Jul 2019 01:19:31 +0200

> The MTU change code can call napi_disable() with the device already down,
> leading to a deadlock. Also, lot of code is duplicated unnecessarily.
> 
> Rework mvpp2_change_mtu() to avoid the deadlock and remove duplicated code.
> 
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Please resubmit with the Fixes: tag.

Please do not line break the Fixes: tag no matter how many characters
it ends up being in total.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/2] net: ipv4: Fix a possible null-pointer dereference in inet_csk_rebuild_route()
From: David Miller @ 2019-07-27 20:25 UTC (permalink / raw)
  To: baijiaju1990; +Cc: kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20190726022534.24994-1-baijiaju1990@gmail.com>

From: Jia-Ju Bai <baijiaju1990@gmail.com>
Date: Fri, 26 Jul 2019 10:25:34 +0800

> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index f5c163d4771b..27d9d80f3401 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1073,7 +1073,10 @@ static struct dst_entry *inet_csk_rebuild_route(struct sock *sk, struct flowi *f
>  		sk_setup_caps(sk, &rt->dst);
>  	rcu_read_unlock();
>  
> -	return &rt->dst;
> +	if (rt)
> +		return &rt->dst;
> +	else
> +		return NULL;

If rt is NULL, &rt->dst is also NULL as has been pointed out to you.

Please fix your automated tools to understand this case before submitting
more changes.

Thank you.

^ permalink raw reply

* Re: [PATCH 2/2] net: ipv6: Fix a possible null-pointer dereference in vti6_link_config()
From: David Miller @ 2019-07-27 20:27 UTC (permalink / raw)
  To: baijiaju1990; +Cc: kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20190726080321.4466-1-baijiaju1990@gmail.com>

From: Jia-Ju Bai <baijiaju1990@gmail.com>
Date: Fri, 26 Jul 2019 16:03:21 +0800

> @@ -646,9 +646,10 @@ static void vti6_link_config(struct ip6_tnl *t, bool keep_mtu)
>  						 &p->raddr, &p->laddr,
>  						 p->link, NULL, strict);
>  
> -		if (rt)
> +		if (rt) {
>  			tdev = rt->dst.dev;
> -		ip6_rt_put(rt);
> +			ip6_rt_put(rt);
> +		}

ip6_rt_put() can take a NULL argument without any problem.

I want to make it clear that because of mistakes of this nature, and
how frequently you make them, it is very tiring and exhausting to
review your changes...

^ permalink raw reply

* Re: [PATCH] isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain()
From: David Miller @ 2019-07-27 20:29 UTC (permalink / raw)
  To: baijiaju1990
  Cc: isdn, pakki001, tranmanphong, gregkh, rfontana, gustavo, tglx,
	netdev, linux-kernel
In-Reply-To: <20190726082736.8195-1-baijiaju1990@gmail.com>

From: Jia-Ju Bai <baijiaju1990@gmail.com>
Date: Fri, 26 Jul 2019 16:27:36 +0800

> In start_isoc_chain(), usb_alloc_urb() on line 1392 may fail 
> and return NULL. At this time, fifo->iso[i].urb is assigned to NULL.
> 
> Then, fifo->iso[i].urb is used at some places, such as:
> LINE 1405:    fill_isoc_urb(fifo->iso[i].urb, ...)
>                   urb->number_of_packets = num_packets;
>                   urb->transfer_flags = URB_ISO_ASAP;
>                   urb->actual_length = 0;
>                   urb->interval = interval;
> LINE 1416:    fifo->iso[i].urb->...
> LINE 1419:    fifo->iso[i].urb->...
> 
> Thus, possible null-pointer dereferences may occur.
> 
> To fix these bugs, "continue" is added to avoid using fifo->iso[i].urb
> when it is NULL.
> 
> These bugs are found by a static analysis tool STCheck written by us.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: neigh: remove redundant assignment to variable bucket
From: David Miller @ 2019-07-27 20:32 UTC (permalink / raw)
  To: colin.king; +Cc: dsahern, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20190726094611.3597-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Fri, 26 Jul 2019 10:46:11 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> The variable bucket is being initialized with a value that is never
> read and it is being updated later with a new value in a following
> for-loop. The initialization is redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: stmmac: Make MDIO bus reset optional
From: David Miller @ 2019-07-27 20:52 UTC (permalink / raw)
  To: thierry.reding
  Cc: joabreu, alexandre.torgue, peppe.cavallaro, jonathanh, netdev,
	linux-tegra, linux-arm-kernel
In-Reply-To: <20190726102741.27872-1-thierry.reding@gmail.com>

From: Thierry Reding <thierry.reding@gmail.com>
Date: Fri, 26 Jul 2019 12:27:40 +0200

> From: Thierry Reding <treding@nvidia.com>
> 
> The Tegra EQOS driver already resets the MDIO bus at probe time via the
> reset GPIO specified in the phy-reset-gpios device tree property. There
> is no need to reset the bus again later on.
> 
> This avoids the need to query the device tree for the snps,reset GPIO,
> which is not part of the Tegra EQOS device tree bindings. This quiesces
> an error message from the generic bus reset code if it doesn't find the
> snps,reset related delays.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: stmmac: Do not request stmmaceth clock
From: David Miller @ 2019-07-27 20:52 UTC (permalink / raw)
  To: thierry.reding
  Cc: joabreu, alexandre.torgue, peppe.cavallaro, jonathanh, netdev,
	linux-tegra, linux-arm-kernel
In-Reply-To: <20190726102741.27872-2-thierry.reding@gmail.com>

From: Thierry Reding <thierry.reding@gmail.com>
Date: Fri, 26 Jul 2019 12:27:41 +0200

> From: Thierry Reding <treding@nvidia.com>
> 
> The stmmaceth clock is specified by the slave_bus and apb_pclk clocks in
> the device tree bindings for snps,dwc-qos-ethernet-4.10 compatible nodes
> of this IP.
> 
> The subdrivers for these bindings will be requesting the stmmac clock
> correctly at a later point, so there is no need to request it here and
> cause an error message to be printed to the kernel log.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: rds: Fix possible null-pointer dereferences in rds_rdma_cm_event_handler_cmn()
From: David Miller @ 2019-07-27 20:58 UTC (permalink / raw)
  To: baijiaju1990
  Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel, linux-kernel
In-Reply-To: <20190726141705.9585-1-baijiaju1990@gmail.com>

From: Jia-Ju Bai <baijiaju1990@gmail.com>
Date: Fri, 26 Jul 2019 22:17:05 +0800

> In rds_rdma_cm_event_handler_cmn(), there are some if statements to
> check whether conn is NULL, such as on lines 65, 96 and 112.
> But conn is not checked before being used on line 108:
>     trans->cm_connect_complete(conn, event);
> and on lines 140-143:
>     rdsdebug("DISCONNECT event - dropping connection "
>             "%pI6c->%pI6c\n", &conn->c_laddr,
>             &conn->c_faddr);
>     rds_conn_drop(conn);
> 
> Thus, possible null-pointer dereferences may occur.
> 
> To fix these bugs, conn is checked before being used.
> 
> These bugs are found by a static analysis tool STCheck written by us.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] qed: Add API for configuring NVM attributes.
From: David Miller @ 2019-07-27 21:00 UTC (permalink / raw)
  To: skalluru; +Cc: netdev, mkalderon, aelior
In-Reply-To: <20190726155215.25151-2-skalluru@marvell.com>

From: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Date: Fri, 26 Jul 2019 08:52:14 -0700

> +int qed_mcp_nvm_set_cfg(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt,
> +			u16 option_id, u8 entity_id, u16 flags, u8 *p_buf,
> +			u32 len)
> +{
> +	u32 mb_param = 0, resp, param;
> +	int rc;
 ...
> +	rc = qed_mcp_nvm_wr_cmd(p_hwfn, p_ptt,
> +				DRV_MSG_CODE_SET_NVM_CFG_OPTION,
> +				mb_param, &resp, &param, len, (u32 *)p_buf);
> +
> +	return rc;

'rc' is completely unnecessary, please just return the function result
directly.

Thank you.

^ permalink raw reply

* Re: [PATCH] drivers: net: xgene: Move status variable declaration into CONFIG_ACPI block
From: David Miller @ 2019-07-27 21:18 UTC (permalink / raw)
  To: natechancellor
  Cc: iyappan, keyur, quan, netdev, linux-kernel, skunberg.kelsey
In-Reply-To: <20190726162037.37308-1-natechancellor@gmail.com>

From: Nathan Chancellor <natechancellor@gmail.com>
Date: Fri, 26 Jul 2019 09:20:37 -0700

> When CONFIG_ACPI is unset (arm allyesconfig), status is unused.
> 
> drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:383:14: warning:
> unused variable 'status' [-Wunused-variable]
>         acpi_status status;
>                     ^
> drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c:440:14: warning:
> unused variable 'status' [-Wunused-variable]
>         acpi_status status;
>                     ^
> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:697:14: warning: unused
> variable 'status' [-Wunused-variable]
>         acpi_status status;
>                     ^
> 
> Move the declaration into the CONFIG_ACPI block so that there are no
> compiler warnings.
> 
> Fixes: 570d785ba46b ("drivers: net: xgene: Remove acpi_has_method() calls")
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2] mlx4/en_netdev: allow offloading VXLAN over VLAN
From: David Miller @ 2019-07-27 21:22 UTC (permalink / raw)
  To: dcaratti; +Cc: pabeni, marcelo.leitner, saeedm, tariqt, netdev
In-Reply-To: <2beb05557960e04aa588ecc09e9ee5e5a19fc651.1564164688.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Fri, 26 Jul 2019 20:18:12 +0200

> ConnectX-3 Pro can offload transmission of VLAN packets with VXLAN inside:
> enable tunnel offloads in dev->vlan_features, like it's done with other
> NIC drivers (e.g. be2net and ixgbe).
> 
> It's no more necessary to change dev->hw_enc_features when VXLAN are added
> or removed, since .ndo_features_check() already checks for VXLAN packet
> where the UDP destination port matches the configured value. Just set
> dev->hw_enc_features when the NIC is initialized, so that overlying VLAN
> can correctly inherit the tunnel offload capabilities.
> 
> Changes since v1:
> - avoid flipping hw_enc_features, instead of calling netdev notifiers,
>   thanks to Saeed Mahameed
> - squash two patches into a single one
> 
> CC: Paolo Abeni <pabeni@redhat.com>
> CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] r8169: align setting PME with vendor driver
From: David Miller @ 2019-07-27 21:23 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <dfc84691-5643-63be-6338-55fe56df18b9@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Fri, 26 Jul 2019 20:56:20 +0200

> Align setting PME with the vendor driver. PMEnable is writable on
> RTL8169 only, on later chip versions it's read-only. PME_SIGNAL is
> used on chip versions from RTL8168evl with the exception of the
> RTL8168f family.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: remove printk
From: David Miller @ 2019-07-27 21:24 UTC (permalink / raw)
  To: jonathan.lemon; +Cc: netdev, kernel-team
In-Reply-To: <20190726191609.3601197-1-jonathan.lemon@gmail.com>

From: Jonathan Lemon <jonathan.lemon@gmail.com>
Date: Fri, 26 Jul 2019 12:16:09 -0700

> ipv6_find_hdr() prints a non-rate limited error message
> when it cannot find an ipv6 header at a specific offset.
> This could be used as a DoS, so just remove it.
> 
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/4] r8169: improve HW csum and TSO handling
From: David Miller @ 2019-07-27 21:25 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <d347af97-0b46-6c71-37ef-46ce2b46f4df@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Fri, 26 Jul 2019 21:47:30 +0200

> This series:
> - delegates more tasks from the driver to the core
> - enables HW csum and TSO per default
> - copies quirks for buggy chip versions from vendor driver

Series applied, thanks Heiner.

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: dsa: mt7530: Add support for port 5
From: René van Dorst @ 2019-07-27 21:26 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: netdev, frank-w, sean.wang, f.fainelli, davem, matthias.bgg,
	andrew, vivien.didelot, john, linux-mediatek, linux-mips, robh+dt,
	devicetree
In-Reply-To: <20190727185315.GU1330@shell.armlinux.org.uk>

Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:

> On Wed, Jul 24, 2019 at 09:25:49PM +0200, René van Dorst wrote:
>> Adding support for port 5.
>>
>> Port 5 can muxed/interface to:
>> - internal 5th GMAC of the switch; can be used as 2nd CPU port or as
>>   extra port with an external phy for a 6th ethernet port.
>> - internal PHY of port 0 or 4; Used in most applications so that port 0
>>   or 4 is the WAN port and interfaces with the 2nd GMAC of the SOC.
>
> ...
>
>> @@ -1381,15 +1506,19 @@ static void mt7530_phylink_validate(struct  
>> dsa_switch *ds, int port,
>>  	phylink_set_port_modes(mask);
>>  	phylink_set(mask, Autoneg);
>>
>> -	if (state->interface != PHY_INTERFACE_MODE_TRGMII) {
>> +	if (state->interface == PHY_INTERFACE_MODE_TRGMII) {
>> +		phylink_set(mask, 1000baseT_Full);
>> +	} else {
>>  		phylink_set(mask, 10baseT_Half);
>>  		phylink_set(mask, 10baseT_Full);
>>  		phylink_set(mask, 100baseT_Half);
>>  		phylink_set(mask, 100baseT_Full);
>> -		phylink_set(mask, 1000baseT_Half);
>> -	}
>>
>> -	phylink_set(mask, 1000baseT_Full);
>> +		if (state->interface != PHY_INTERFACE_MODE_MII) {
>> +			phylink_set(mask, 1000baseT_Half);
>> +			phylink_set(mask, 1000baseT_Full);
>> +		}
>> +	}
>

Hi Russell,

Thanks for your review and many useful comments and explanations.

> As port 5 could use an external PHY, and it supports gigabit speeds,
> consider that the PHY may provide not only copper but also fiber
> connectivity, so port 5 should probably also have 1000baseX modes
> too, which would allow such a PHY to bridge the switch to fiber.

I shall add the 1000baseX modes.

My device, Ubiquiti EdgeRouter X SFP, has this setup.
Port 5 is connected to a at8033 phy which acts as a RGMII-SerDes converter for
the SFP cage. According to the datasheet it only support 100BASE-FX and
1000BASE-X. With bootstrap resistors the PHY is put in RGMII-SerDes 1000BASE-X
mode.

The problem I had is that the current mainline driver doesn't support  
this mode
so I had to hack it in myself [0][1]. I probably doing the wrong thing with my
phy driver. Driver works for me, it detects a link and sets-up a 1gbit link.
So I can test port 5. But the driver may report all the wrong values to
PHYLIB/PHYLINK. But now that I learned more about it I can revise the driver.


By reading your previous emails, my setup could official not support the
FiberStore SFP-GB-GE-T module, because it requests a SGMII interface.
But my PHY only supports 1000BaseX and my code currently doesn't error out.

dmesg output of this module:
[    3.382637] sfp sfp_lan5: module FiberStore       SFP-GB-GE-T       
rev B    sn <snip>      dc 19-12-17
[    3.402048] sfp sfp_lan5:   unknown/unspecified connector, encoding  
8b10b, nominal bitrate 1.3Gbps +0% -0%
[    3.421268] sfp sfp_lan5:   1000BaseSX- 1000BaseLX- 1000BaseCX-  
1000BaseT+ 100BaseLX- 100BaseFX- BaseBX10- BasePX-
[    3.441867] sfp sfp_lan5:   10GBaseSR- 10GBaseLR- 10GBaseLRM- 10GBaseER-
[    3.455208] sfp sfp_lan5:   Copper length: 100m
[    3.464225] sfp sfp_lan5:   Options: txdisable
[    3.473066] sfp sfp_lan5:   Diagnostics:
[    3.481034] sfp sfp_lan5: Unknown/unsupported extended compliance  
code: 0x01
[    3.495069] Atheros 8031 ethernet mdio-bus:07: SFP interface sgmii

What is the best way to do it in case of SGMII interface request?
Return that we don't support SGMII or report that we only support 1  
mode and no
auto-negotiation?

Greats,

René

>

> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up


[0]:  
https://github.com/vDorst/linux-1/commit/dad5d6ec65cfa99c204e9756b3fc234071709292
[1]:  
https://github.com/vDorst/linux-1/commit/a3aa74e84796604ab8619cfaf1c299c115a8736f



^ permalink raw reply

* Re: [PATCH] gigaset: stop maintaining seperately
From: David Miller @ 2019-07-27 21:27 UTC (permalink / raw)
  To: pebolle; +Cc: tilman, hjlipp, arnd, kkeil, netdev, linux-kernel
In-Reply-To: <20190726220541.28783-1-pebolle@tiscali.nl>

From: Paul Bolle <pebolle@tiscali.nl>
Date: Sat, 27 Jul 2019 00:05:41 +0200

> The Dutch consumer grade ISDN network will be shut down on September 1,
> 2019. This means I'll be converted to some sort of VOIP shortly. At that
> point it would be unwise to try to maintain the gigaset driver, even for
> odd fixes as I do. So I'll stop maintaining it as a seperate driver and
> bump support to CAPI in staging. De facto this means the driver will be
> unmaintained, since no-one seems to be working on CAPI.
> 
> I've lighty tested the hardware specific modules of this driver (bas-gigaset,
> ser-gigaset, and usb-gigaset) for v5.3-rc1. The basic functionality appears to
> be working. It's unclear whether anyone still cares. I'm aware of only one
> person sort of using the driver a few years ago.
> 
> Thanks to Karsten Keil for the ISDN subsystems gigaset was using (I4L and
> CAPI). And many thanks to Hansjoerg Lipp and Tilman Schmidt for writing and
> upstreaming this driver.
> 
> Signed-off-by: Paul Bolle <pebolle@tiscali.nl>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: phylink: Fix flow control for fixed-link
From: David Miller @ 2019-07-27 21:28 UTC (permalink / raw)
  To: opensource; +Cc: linux, andrew, f.fainelli, hkallweit1, netdev
In-Reply-To: <20190727094011.14024-1-opensource@vdorst.com>

From: René van Dorst <opensource@vdorst.com>
Date: Sat, 27 Jul 2019 11:40:11 +0200

> In phylink_parse_fixedlink() the pl->link_config.advertising bits are AND
> with pl->supported, pl->supported is zeroed and only the speed/duplex
> modes and MII bits are set.
> So pl->link_config.advertising always loses the flow control/pause bits.
> 
> By setting Pause and Asym_Pause bits in pl->supported, the flow control
> work again when devicetree "pause" is set in fixes-link node and the MAC
> advertise that is supports pause.
> 
> Results with this patch.
> 
> Legend:
> - DT = 'Pause' is set in the fixed-link in devicetree.
> - validate() = ‘Yes’ means phylink_set(mask, Pause) is set in the
>   validate().
> - flow = results reported my link is Up line.
> 
> +-----+------------+-------+
> | DT  | validate() | flow  |
> +-----+------------+-------+
> | Yes | Yes        | rx/tx |
> | No  | Yes        | off   |
> | Yes | No         | off   |
> +-----+------------+-------+
> 
> Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> Signed-off-by: René van Dorst <opensource@vdorst.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Yonghong Song @ 2019-07-27 21:29 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
	Daniel Borkmann, Kernel Team
In-Reply-To: <CAEf4Bzbt4+mT8GfQG4xMj4tCnWd2ZqJiY3r8cwOankFFQo8wWA@mail.gmail.com>



On 7/27/19 11:24 AM, Andrii Nakryiko wrote:
> On Sat, Jul 27, 2019 at 10:00 AM Alexei Starovoitov <ast@fb.com> wrote:
>>
>> On 7/26/19 11:25 PM, Andrii Nakryiko wrote:
>>>>> +     } else if (class == BPF_ST && BPF_MODE(insn->code) == BPF_MEM) {
>>>>> +             if (insn->imm != orig_off)
>>>>> +                     return -EINVAL;
>>>>> +             insn->imm = new_off;
>>>>> +             pr_debug("prog '%s': patched insn #%d (ST | MEM) imm %d -> %d\n",
>>>>> +                      bpf_program__title(prog, false),
>>>>> +                      insn_idx, orig_off, new_off);
>>>> I'm pretty sure llvm was not capable of emitting BPF_ST insn.
>>>> When did that change?
>>> I just looked at possible instructions that could have 32-bit
>>> immediate value. This is `*(rX) = offsetof(struct s, field)`, which I
>>> though is conceivable. Do you think I should drop it?
>>
>> Just trying to point out that since it's not emitted by llvm
>> this code is likely untested ?
>> Or you've created a bpf asm test for this?
> 
> 
> Yeah, it's untested right now. Let me try to come up with LLVM
> assembly + relocation (not yet sure how/whether builtin works with
> inline assembly), if that works out, I'll leave this, if not, I'll
> drop BPF_ST|BPF_MEM part.

FYI. The llvm does not have assembly code format for BPF_ST instructions 
as it does not generate code for it. So inline asm through llvm won't 
work. llvm disasseembler won't be able to decode BPF_ST either.

>>
>>

^ permalink raw reply

* Re: [PATCH net] Revert ("r8169: remove 1000/Half from supported modes")
From: David Miller @ 2019-07-27 21:29 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev, berny156
In-Reply-To: <56f11453-59fd-3990-7f32-52820fee238e@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 27 Jul 2019 12:32:28 +0200

> This reverts commit a6851c613fd7fccc5d1f28d5d8a0cbe9b0f4e8cc.
> It was reported that RTL8111b successfully finishes 1000/Full autoneg
> but no data flows. Reverting the original patch fixes the issue.
> It seems to be a HW issue with the integrated RTL8211B PHY. This PHY
> version used also e.g. on RTL8168d, so better revert the original patch.
> 
> Reported-by: Bernhard Held <berny156@gmx.de>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] r8169: don't use MSI before RTL8168d
From: David Miller @ 2019-07-27 21:31 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev, dragic.dusan
In-Reply-To: <c9f89cfc-ec16-62dc-a975-1b614941e723@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 27 Jul 2019 12:43:31 +0200

> It was reported that after resuming from suspend network fails with
> error "do_IRQ: 3.38 No irq handler for vector", see [0]. Enabling WoL
> can work around the issue, but the only actual fix is to disable MSI.
> So let's mimic the behavior of the vendor driver and disable MSI on
> all chip versions before RTL8168d.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=204079
> 
> Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
> Reported-by: Dušan Dragić <dragic.dusan@gmail.com>
> Tested-by: Dušan Dragić <dragic.dusan@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
> This version of the fix applies from 5.3 only. I'll submit a separate
> version for previous kernel versions.

Applied and queued up for -stable, thanks for the backport.

^ permalink raw reply

* Re: [PATCH net-next 0/3] mlxsw: spectrum_acl: Forbid unsupported filters
From: David Miller @ 2019-07-27 21:32 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, mlxsw, idosch
In-Reply-To: <20190727173257.6848-1-idosch@idosch.org>

From: Ido Schimmel <idosch@idosch.org>
Date: Sat, 27 Jul 2019 20:32:54 +0300

> From: Ido Schimmel <idosch@mellanox.com>
> 
> Patches #1-#2 make mlxsw reject unsupported egress filters. These
> include filters that match on VLAN and filters associated with a
> redirect action. Patch #1 rejects such filters when they are configured
> on egress and patch #2 rejects such filters when they are configured in
> a shared block that user tries to bind to egress.
> 
> Patch #3 forbids matching on reserved TCP flags as this is not supported
> by the current keys that mlxsw uses.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Andrii Nakryiko @ 2019-07-27 21:36 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Alexei Starovoitov, Andrii Nakryiko, bpf,
	Networking, Daniel Borkmann, Kernel Team
In-Reply-To: <363f7363-7031-3160-9f5f-583a1662fe25@fb.com>

On Sat, Jul 27, 2019 at 2:29 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/27/19 11:24 AM, Andrii Nakryiko wrote:
> > On Sat, Jul 27, 2019 at 10:00 AM Alexei Starovoitov <ast@fb.com> wrote:
> >>
> >> On 7/26/19 11:25 PM, Andrii Nakryiko wrote:
> >>>>> +     } else if (class == BPF_ST && BPF_MODE(insn->code) == BPF_MEM) {
> >>>>> +             if (insn->imm != orig_off)
> >>>>> +                     return -EINVAL;
> >>>>> +             insn->imm = new_off;
> >>>>> +             pr_debug("prog '%s': patched insn #%d (ST | MEM) imm %d -> %d\n",
> >>>>> +                      bpf_program__title(prog, false),
> >>>>> +                      insn_idx, orig_off, new_off);
> >>>> I'm pretty sure llvm was not capable of emitting BPF_ST insn.
> >>>> When did that change?
> >>> I just looked at possible instructions that could have 32-bit
> >>> immediate value. This is `*(rX) = offsetof(struct s, field)`, which I
> >>> though is conceivable. Do you think I should drop it?
> >>
> >> Just trying to point out that since it's not emitted by llvm
> >> this code is likely untested ?
> >> Or you've created a bpf asm test for this?
> >
> >
> > Yeah, it's untested right now. Let me try to come up with LLVM
> > assembly + relocation (not yet sure how/whether builtin works with
> > inline assembly), if that works out, I'll leave this, if not, I'll
> > drop BPF_ST|BPF_MEM part.
>
> FYI. The llvm does not have assembly code format for BPF_ST instructions
> as it does not generate code for it. So inline asm through llvm won't
> work. llvm disasseembler won't be able to decode BPF_ST either.

Well then, I'll just drop it for now. Thanks!

>
> >>
> >>

^ permalink raw reply

* Re: [PATCH] tcp: add new tcp_mtu_probe_floor sysctl
From: Josh Hunt @ 2019-07-27 23:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller
In-Reply-To: <CANn89iJtw_XrU-F0-frE=P6egH99kF0W0kTzReK701LmigcJ4Q@mail.gmail.com>

On 7/27/19 12:05 AM, Eric Dumazet wrote:
> On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt <johunt@akamai.com> wrote:
>>
>> The current implementation of TCP MTU probing can considerably
>> underestimate the MTU on lossy connections allowing the MSS to get down to
>> 48. We have found that in almost all of these cases on our networks these
>> paths can handle much larger MTUs meaning the connections are being
>> artificially limited. Even though TCP MTU probing can raise the MSS back up
>> we have seen this not to be the case causing connections to be "stuck" with
>> an MSS of 48 when heavy loss is present.
>>
>> Prior to pushing out this change we could not keep TCP MTU probing enabled
>> b/c of the above reasons. Now with a reasonble floor set we've had it
>> enabled for the past 6 months.
> 
> And what reasonable value have you used ???

Reasonable for some may not be reasonable for others hence the new 
sysctl :) We're currently running with a fairly high value based off of 
the v6 min MTU minus headers and options, etc. We went conservative with 
our setting initially as it seemed a reasonable first step when 
re-enabling TCP MTU probing since with no configurable floor we saw a # 
of cases where connections were using severely reduced mss b/c of loss 
and not b/c of actual path restriction. I plan to reevaluate the setting 
at some point, but since the probing method is still the same it means 
the same clients who got stuck with mss of 48 before will land at 
whatever floor we set. Looking forward we are interested in trying to 
improve TCP MTU probing so it does not penalize clients like this.

A suggestion for a more reasonable floor default would be 512, which is 
the same as the min_pmtu. Given both mechanisms are trying to achieve 
the same goal it seems like they should have a similar min/floor.

> 
>>
>> The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
>> administrators the ability to control the floor of MSS probing.
>>
>> Signed-off-by: Josh Hunt <johunt@akamai.com>
>> ---
>>   Documentation/networking/ip-sysctl.txt | 6 ++++++
>>   include/net/netns/ipv4.h               | 1 +
>>   net/ipv4/sysctl_net_ipv4.c             | 9 +++++++++
>>   net/ipv4/tcp_ipv4.c                    | 1 +
>>   net/ipv4/tcp_timer.c                   | 2 +-
>>   5 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>> index df33674799b5..49e95f438ed7 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -256,6 +256,12 @@ tcp_base_mss - INTEGER
>>          Path MTU discovery (MTU probing).  If MTU probing is enabled,
>>          this is the initial MSS used by the connection.
>>
>> +tcp_mtu_probe_floor - INTEGER
>> +       If MTU probing is enabled this caps the minimum MSS used for search_low
>> +       for the connection.
>> +
>> +       Default : 48
>> +
>>   tcp_min_snd_mss - INTEGER
>>          TCP SYN and SYNACK messages usually advertise an ADVMSS option,
>>          as described in RFC 1122 and RFC 6691.
>> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
>> index bc24a8ec1ce5..c0c0791b1912 100644
>> --- a/include/net/netns/ipv4.h
>> +++ b/include/net/netns/ipv4.h
>> @@ -116,6 +116,7 @@ struct netns_ipv4 {
>>          int sysctl_tcp_l3mdev_accept;
>>   #endif
>>          int sysctl_tcp_mtu_probing;
>> +       int sysctl_tcp_mtu_probe_floor;
>>          int sysctl_tcp_base_mss;
>>          int sysctl_tcp_min_snd_mss;
>>          int sysctl_tcp_probe_threshold;
>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
>> index 0b980e841927..59ded25acd04 100644
>> --- a/net/ipv4/sysctl_net_ipv4.c
>> +++ b/net/ipv4/sysctl_net_ipv4.c
>> @@ -820,6 +820,15 @@ static struct ctl_table ipv4_net_table[] = {
>>                  .extra2         = &tcp_min_snd_mss_max,
>>          },
>>          {
>> +               .procname       = "tcp_mtu_probe_floor",
>> +               .data           = &init_net.ipv4.sysctl_tcp_mtu_probe_floor,
>> +               .maxlen         = sizeof(int),
>> +               .mode           = 0644,
>> +               .proc_handler   = proc_dointvec_minmax,
>> +               .extra1         = &tcp_min_snd_mss_min,
>> +               .extra2         = &tcp_min_snd_mss_max,
>> +       },
>> +       {
>>                  .procname       = "tcp_probe_threshold",
>>                  .data           = &init_net.ipv4.sysctl_tcp_probe_threshold,
>>                  .maxlen         = sizeof(int),
>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
>> index d57641cb3477..e0a372676329 100644
>> --- a/net/ipv4/tcp_ipv4.c
>> +++ b/net/ipv4/tcp_ipv4.c
>> @@ -2637,6 +2637,7 @@ static int __net_init tcp_sk_init(struct net *net)
>>          net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS;
>>          net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD;
>>          net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL;
>> +       net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS;
>>
>>          net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME;
>>          net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES;
>> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
>> index c801cd37cc2a..dbd9d2d0ee63 100644
>> --- a/net/ipv4/tcp_timer.c
>> +++ b/net/ipv4/tcp_timer.c
>> @@ -154,7 +154,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
>>          } else {
>>                  mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
>>                  mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
>> -               mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len);
>> +               mss = max(mss, net->ipv4.sysctl_tcp_mtu_probe_floor);
>>                  mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss);
>>                  icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
>>          }
> 
> 
> Existing sysctl should be enough ?

I don't think so. Changing tcp_min_snd_mss could impact clients that 
really want/need a small mss. When you added the new sysctl I tried to 
analyze the mss values we're seeing to understand what we could possibly 
raise it to. While not a huge amount, we see more clients than I 
expected announcing mss values in the 180-512 range. Given that I would 
not feel comfortable setting tcp_min_snd_mss to say 512 as I suggested 
above.

> 
> tcp_min_snd_mss  documentation could be slightly updated.
> 
> And maybe its default value could be raised a bit.
> 

Thanks
Josh

^ permalink raw reply

* [PATCH] rocker: fix memory leaks of fib_work on two error return paths
From: Colin King @ 2019-07-27 23:37 UTC (permalink / raw)
  To: David Ahern, Jiri Pirko, David S . Miller, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Currently there are two error return paths that leak memory allocated
to fib_work. Fix this by kfree'ing fib_work before returning.

Addresses-Coverity: ("Resource leak")
Fixes: 19a9d136f198 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
Fixes: dbcc4fa718ee ("rocker: Fail attempts to use routes with nexthop objects")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/rocker/rocker_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 079f459c73a5..2c5d3f5b84dd 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2208,10 +2208,12 @@ static int rocker_router_fib_event(struct notifier_block *nb,
 
 			if (fen_info->fi->fib_nh_is_v6) {
 				NL_SET_ERR_MSG_MOD(info->extack, "IPv6 gateway with IPv4 route is not supported");
+				kfree(fib_work);
 				return notifier_from_errno(-EINVAL);
 			}
 			if (fen_info->fi->nh) {
 				NL_SET_ERR_MSG_MOD(info->extack, "IPv4 route with nexthop objects is not supported");
+				kfree(fib_work);
 				return notifier_from_errno(-EINVAL);
 			}
 		}
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Song Liu @ 2019-07-28  0:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <CAEf4BzbwzPsPonkqvGwS-FOWtWYQHQP=PwdVuVEkuEevrUKHWQ@mail.gmail.com>



> On Jul 27, 2019, at 12:09 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Sat, Jul 27, 2019 at 11:59 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Jul 26, 2019, at 11:11 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Thu, Jul 25, 2019 at 12:32 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jul 24, 2019, at 12:27 PM, Andrii Nakryiko <andriin@fb.com> wrote:
>>>>> 
>>>>> This patch implements the core logic for BPF CO-RE offsets relocations.
>>>>> All the details are described in code comments.
>>>> 
>>>> Some description in the change log is still useful. Please at least
>>>> copy-paste key comments here.
>>> 
>>> OK, will add some more.
>>> 
>>>> 
>>>> And, this is looooong. I think it is totally possible to split it into
>>>> multiple smaller patches.
>>> 
>>> I don't really know how to split it further without hurting reviewing
>>> by artificially splitting related code into separate patches. Remove
>>> any single function and algorithm will be incomplete.
>>> 
>>> Let me give you some high-level overview of how pieces are put
>>> together. There are 9 non-trivial functions, let's go over their
>>> purpose in the orderd in which they are defined in file:
>>> 
>>> 1. bpf_core_spec_parse()
>>> 
>>> This one take bpf_offset_reloc's type_id and accessor string
>>> ("0:1:2:3") and parses it into more convenient bpf_core_spec
>>> datastructure, which has calculated offset and high-level spec
>>> "steps": either named field or array access.
>>> 
>>> 2. bpf_core_find_cands()
>>> 
>>> Given local type name, finds all possible target BTF types with same
>>> name (modulo "flavor" differences, ___flavor suffix is just ignored).
>>> 
>>> 3. bpf_core_fields_are_compat()
>>> 
>>> Given local and target field match, checks that their types are
>>> compatible (so that we don't accidentally match, e.g., int against
>>> struct).
>>> 
>>> 4. bpf_core_match_member()
>>> 
>>> Given named local field, find corresponding field in target struct. To
>>> understand why it's not trivial, here's an example:
>>> 
>>> Local type:
>>> 
>>> struct s___local {
>>> int a;
>>> };
>>> 
>>> Target type:
>>> 
>>> struct s___target {
>>> struct {
>>>   union {
>>>     int a;
>>>   };
>>> };
>>> };
>>> 
>>> For both cases you can access a as s.a, but in local case, field a is
>>> immediately inside s___local, while for s___target, you have to
>>> traverse two levels deeper into anonymous fields to get to an `a`
>>> inside anonymous union.
>>> 
>>> So this function find that `a` by doing exhaustive search across all
>>> named field and anonymous struct/unions. But otherwise it's pretty
>>> straightforward recursive function.
>>> 
>>> bpf_core_spec_match()
>>> 
>>> Just goes over high-level spec steps in local spec and tries to figure
>>> out both high-level and low-level steps for targe type. Consider the
>>> above example. For both structs accessing s.a is one high-level step,
>>> but for s___local it's single low-level step (just another :0 in spec
>>> string), while for s___target it's three low-level steps: ":0:0:0",
>>> one step for each BTF type we need to traverse.
>>> 
>>> Array access is simpler, it's always one high-level and one low-level step.
>>> 
>>> bpf_core_reloc_insn()
>>> 
>>> Once we match local and target specs and have local and target
>>> offsets, do the relocations - check that instruction has expected
>>> local offset and replace it with target offset.
>>> 
>>> bpf_core_find_kernel_btf()
>>> 
>>> This is the only function that can be moved into separate patch, but
>>> it's also very simple. It just iterates over few known possible
>>> locations for vmlinux image and once found, tries to parse .BTF out of
>>> it, to be used as target BTF.
>>> 
>>> bpf_core_reloc_offset()
>>> 
>>> It combines all the above functions to perform single relocation.
>>> Parse spec, get candidates, for each candidate try to find matching
>>> target spec. All candidates that matched are cached for given local
>>> root type.
>> 
>> Thanks for these explanation. They are really helpful.
>> 
>> I think some example explaining each step of bpf_core_reloc_offset()
>> will be very helpful. Something like:
>> 
>> Example:
>> 
>> struct s {
>>        int a;
>>        struct {
>>                int b;
>>                bool c;
>>        };
>> };
>> 
>> To get offset for c, we do:
>> 
>> bpf_core_reloc_offset() {
>> 
>>        /* input data: xxx */
>> 
>>        /* first step: bpf_core_spec_parse() */
>> 
>>        /* data after first step */
>> 
>>        /* second step: bpf_core_find_cands() */
>> 
>>        /* candidate A and B after second step */
>> 
>>        ...
>> }
>> 
>> Well, it requires quite some work to document this way. Please let me
>> know if you feel this is an overkill.
> 
> Yeah :) And it's not just work, but I think it's bad if comments
> become too specific and document very low-level steps, because code
> might evolve and comments can quickly get out of sync and just add to
> confusion. Which is why I tried to document high-level ideas, leaving
> it up to the source code to be the ultimate reference of minutia
> details.

Fair enough. 

> 
>> 
>>> 
>>> bpf_core_reloc_offsets()
>>> 
>>> High-level coordination. Iterate over all per-program .BTF.ext offset
>>> reloc sections, each relocation within them. Find corresponding
>>> program and try to apply relocations one by one.
>>> 
>>> 
>>> I think the only non-obvious part here is to understand that
>>> relocation records local raw spec with every single anonymous type
>>> traversal, which is not that useful when we try to match it against
>>> target type, which can have very different composition, but still the
>>> same field access pattern, from C language standpoint (which hides all
>>> those anonymous type traversals from programmer).
>>> 
>>> But it should be pretty clear now, plus also check tests, they have
>>> lots of cases showing what's compatible and what's not.
>> 
>> I see. I will review the tests.
>> 
>>>>> 
>>>>> static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
>>>>> -                                                  __u32 id)
>>>>> +                                                  __u32 id,
>>>>> +                                                  __u32 *res_id)
>>>>> {
>>>>>     const struct btf_type *t = btf__type_by_id(btf, id);
>>>> 
>>>> Maybe have a local "__u32 rid;"
>>>> 
>>>>> 
>>>>> +     if (res_id)
>>>>> +             *res_id = id;
>>>>> +
>>>> 
>>>> and do "rid = id;" here
>>>> 
>>>>>     while (true) {
>>>>>             switch (BTF_INFO_KIND(t->info)) {
>>>>>             case BTF_KIND_VOLATILE:
>>>>>             case BTF_KIND_CONST:
>>>>>             case BTF_KIND_RESTRICT:
>>>>>             case BTF_KIND_TYPEDEF:
>>>>> +                     if (res_id)
>>>>> +                             *res_id = t->type;
>>>> and here
>>>> 
>>>>>                     t = btf__type_by_id(btf, t->type);
>>>>>                     break;
>>>>>             default:
>>>> and "*res_id = rid;" right before return?
>>> 
>>> Sure, but why?
>> 
>> I think it is cleaner that way. But feel free to ignore if you
>> think otherwise.
>> 
>>> 
>>>> 
>>>>> @@ -1041,7 +1049,7 @@ static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
>>>>> static bool get_map_field_int(const char *map_name, const struct btf *btf,
>>>>>                           const struct btf_type *def,
>>>>>                           const struct btf_member *m, __u32 *res) {
>>> 
>>> [...]
>>> 
>>>>> +struct bpf_core_spec {
>>>>> +     const struct btf *btf;
>>>>> +     /* high-level spec: named fields and array indicies only */
>>>> 
>>>> typo: indices
>>> 
>>> thanks!
>>> 
>>>> 
>>>>> +     struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN];
>>>>> +     /* high-level spec length */
>>>>> +     int len;
>>>>> +     /* raw, low-level spec: 1-to-1 with accessor spec string */
>>>>> +     int raw_spec[BPF_CORE_SPEC_MAX_LEN];
>>>>> +     /* raw spec length */
>>>>> +     int raw_len;
>>>>> +     /* field byte offset represented by spec */
>>>>> +     __u32 offset;
>>>>> +};
>>> 
>>> [...]
>>> 
>>>>> + *
>>>>> + *   int x = &s->a[3]; // access string = '0:1:2:3'
>>>>> + *
>>>>> + * Low-level spec has 1:1 mapping with each element of access string (it's
>>>>> + * just a parsed access string representation): [0, 1, 2, 3].
>>>>> + *
>>>>> + * High-level spec will capture only 3 points:
>>>>> + *   - intial zero-index access by pointer (&s->... is the same as &s[0]...);
>>>>> + *   - field 'a' access (corresponds to '2' in low-level spec);
>>>>> + *   - array element #3 access (corresponds to '3' in low-level spec).
>>>>> + *
>>>>> + */
>>>> 
>>>> IIUC, high-level points are subset of low-level points. How about we introduce
>>>> "anonymous" high-level points, so that high-level points and low-level points
>>>> are 1:1 mapping?
>>> 
>>> No, that will just hurt and complicate things. See above explanation
>>> about why we need high-level points (it's what you as C programmer try
>>> to achieve vs low-level spec is what C-language does in reality, with
>>> all the anonymous struct/union traversal).
>>> 
>>> What's wrong with this separation? Think about it as recording
>>> "intent" (high-level spec) vs "mechanics" (low-level spec, how exactly
>>> to achieve that intent, in excruciating details).
>> 
>> There is nothing wrong with separation. I just personally think it is
>> cleaner the other way. That's why I raised the question.
>> 
>> I will go with your assessment, as you looked into this much more than
>> I did. :-)
> 
> For me it's a machine view of the problem (raw spec) vs human view of
> the problem (high-level spec, which resembles how you think about this
> in C code). I'll keep it separate unless it proves to be problematic
> going forward.

[...]

>>> 
>>> No. spec->offset is carefully maintained across multiple low-level
>>> steps, as we traverse down embedded structs/unions.
>>> 
>>> Think about, e.g.:
>>> 
>>> struct s {
>>>   int a;
>>>   struct {
>>>       int b;
>>>   };
>>> };
>>> 
>>> Imagine you are trying to match s.b access. With what you propose
>>> you'll end up with offset 0, but it should be 4.
>> 
>> Hmm... this is just for i == 0, right? Which line updated spec->offset
>> after "memset(spec, 0, sizeof(*spec));"?
> 
> Ah, I missed that you are referring to the special i == 0 case. I can
> do assignment, yes, you are right. I'll probably also extract it out
> of the loop to make it less confusing.

Yes, please. 

> 
>>>>> +                     continue;
>>>>> +             }
>>>> 
>>>> Maybe pull i == 0 case out of the for loop?

As I mentioned earlier. ;-)

>>>> 
>>>>> +
>>>>> +             if (btf_is_composite(t)) {
>>> 
>>> [...]
>>> 
>>>>> +
>>>>> +     if (spec->len == 0)
>>>>> +             return -EINVAL;
>>>> 
>>>> Can this ever happen?
>>> 
>>> Not really, because I already check raw_len == 0 and exit with error.
>>> I'll remove.
>>> 
>>>> 
>>>>> +
>>>>> +     return 0;
>>>>> +}
>>>>> +
>>> 

[...]


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox