* Re: [PATCH v2 net] net: sonic: replace dev_kfree_skb in sonic_send_packet
From: David Miller @ 2019-09-11 8:14 UTC (permalink / raw)
To: maowenan; +Cc: tsbogend, netdev, linux-kernel, kernel-janitors
In-Reply-To: <20190911013623.36897-1-maowenan@huawei.com>
From: Mao Wenan <maowenan@huawei.com>
Date: Wed, 11 Sep 2019 09:36:23 +0800
> sonic_send_packet will be processed in irq or non-irq
> context, so it would better use dev_kfree_skb_any
> instead of dev_kfree_skb.
>
> Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: qrtr: fix memort leak in qrtr_tun_write_iter
From: David Miller @ 2019-09-11 8:13 UTC (permalink / raw)
To: navid.emamdoost; +Cc: emamd001, smccaman, kjlu, netdev, linux-kernel
In-Reply-To: <20190911003748.26841-1-navid.emamdoost@gmail.com>
From: Navid Emamdoost <navid.emamdoost@gmail.com>
Date: Tue, 10 Sep 2019 19:37:45 -0500
> In qrtr_tun_write_iter the allocated kbuf should be release in case of
> error happening.
>
> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Shouldn't it also be freed in case of success too?
^ permalink raw reply
* [PATCH iproute2-next] rdma: Check comm string before print in print_comm()
From: Leon Romanovsky @ 2019-09-11 8:12 UTC (permalink / raw)
To: David Ahern
Cc: Mark Zhang, netdev, RDMA mailing list, Stephen Hemminger,
Leon Romanovsky
From: Mark Zhang <markz@mellanox.com>
Broken kernels (not-upstream) can provide wrong empty "comm" field.
It causes to segfault while printing in JSON format.
Fixes: 8ecac46a60ff ("rdma: Add QP resource tracking information")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
rdma/res.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/rdma/res.c b/rdma/res.c
index 97a7b964..6003006e 100644
--- a/rdma/res.c
+++ b/rdma/res.c
@@ -161,6 +161,9 @@ void print_comm(struct rd *rd, const char *str, struct nlattr **nla_line)
{
char tmp[18];
+ if (!str)
+ return;
+
if (rd->json_output) {
/* Don't beatify output in JSON format */
jsonw_string_field(rd->jw, "comm", str);
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] wimax: i2400: fix memory leak
From: David Miller @ 2019-09-11 8:10 UTC (permalink / raw)
To: navid.emamdoost
Cc: emamd001, smccaman, kjlu, inaky.perez-gonzalez, linux-wimax,
netdev, linux-kernel
In-Reply-To: <20190910230210.19012-1-navid.emamdoost@gmail.com>
From: Navid Emamdoost <navid.emamdoost@gmail.com>
Date: Tue, 10 Sep 2019 18:01:40 -0500
> In i2400m_op_rfkill_sw_toggle cmd buffer should be released along with
> skb response.
>
> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Applied.
Good thing nobody uses wimax.
^ permalink raw reply
* Re: [PATCH V2 net-next 0/7] net: hns3: add a feature & bugfixes & cleanups
From: David Miller @ 2019-09-11 8:09 UTC (permalink / raw)
To: tanhuazhong
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>
From: Huazhong Tan <tanhuazhong@huawei.com>
Date: Wed, 11 Sep 2019 10:40:32 +0800
> This patch-set includes a VF feature, bugfixes and cleanups for the HNS3
> ethernet controller driver.
Series applied.
^ permalink raw reply
* Re: [PATCH] net: Remove the source address setting in connect() for UDP
From: David Miller @ 2019-09-11 7:57 UTC (permalink / raw)
To: enkechen; +Cc: kuznet, yoshfuji, netdev, linux-kernel, xe-linux-external
In-Reply-To: <324B00C3-4526-4026-809B-299634E49368@cisco.com>
From: "Enke Chen (enkechen)" <enkechen@cisco.com>
Date: Tue, 10 Sep 2019 23:55:59 +0000
> Do you still have concerns about backward compatibility of the fix?
I'm not convinced by your arguments and I am also completely swamped at
LPC2019 running the networking track this week.
^ permalink raw reply
* [PATCH 00/11] Add support for software nodes to gpiolib
From: Dmitry Torokhov @ 2019-09-11 7:52 UTC (permalink / raw)
To: Linus Walleij
Cc: Andy Shevchenko, Mika Westerberg, linux-kernel, linux-gpio,
Andrew Lunn, Andrzej Hajda, Bartosz Golaszewski, Daniel Vetter,
David Airlie, David S. Miller, Florian Fainelli, Heiner Kallweit,
Jernej Skrabec, Jonas Karlman, Laurent Pinchart, Neil Armstrong,
Russell King, dri-devel, linux-acpi, netdev
This series attempts to add support for software nodes to gpiolib, using
software node references that were introduced recently. This allows us
to convert more drivers to the generic device properties and drop
support for custom platform data:
static const struct software_node gpio_bank_b_node = {
|-------.name = "B",
};
static const struct property_entry simone_key_enter_props[] = {
|-------PROPERTY_ENTRY_U32("linux,code", KEY_ENTER),
|-------PROPERTY_ENTRY_STRING("label", "enter"),
|-------PROPERTY_ENTRY_REF("gpios", &gpio_bank_b_node, 123, GPIO_ACTIVE_LOW),
|-------{ }
};
If we agree in principle, I would like to have the very first 3 patches
in an immutable branch off maybe -rc8 so that it can be pulled into
individual subsystems so that patches switching various drivers to
fwnode_gpiod_get_index() could be applied.
Thanks,
Dmitry
Dmitry Torokhov (11):
gpiolib: of: add a fallback for wlf,reset GPIO name
gpiolib: introduce devm_fwnode_gpiod_get_index()
gpiolib: introduce fwnode_gpiod_get_index()
net: phylink: switch to using fwnode_gpiod_get_index()
net: mdio: switch to using fwnode_gpiod_get_index()
drm/bridge: ti-tfp410: switch to using fwnode_gpiod_get_index()
gpliolib: make fwnode_get_named_gpiod() static
gpiolib: of: tease apart of_find_gpio()
gpiolib: of: tease apart acpi_find_gpio()
gpiolib: consolidate fwnode GPIO lookups
gpiolib: add support for software nodes
drivers/gpio/Makefile | 1 +
drivers/gpio/gpiolib-acpi.c | 153 ++++++++++++++----------
drivers/gpio/gpiolib-acpi.h | 21 ++--
drivers/gpio/gpiolib-devres.c | 33 ++----
drivers/gpio/gpiolib-of.c | 159 ++++++++++++++-----------
drivers/gpio/gpiolib-of.h | 26 ++--
drivers/gpio/gpiolib-swnode.c | 92 +++++++++++++++
drivers/gpio/gpiolib-swnode.h | 13 ++
drivers/gpio/gpiolib.c | 184 ++++++++++++++++-------------
drivers/gpu/drm/bridge/ti-tfp410.c | 4 +-
drivers/net/phy/mdio_bus.c | 4 +-
drivers/net/phy/phylink.c | 4 +-
include/linux/gpio/consumer.h | 53 ++++++---
13 files changed, 471 insertions(+), 276 deletions(-)
create mode 100644 drivers/gpio/gpiolib-swnode.c
create mode 100644 drivers/gpio/gpiolib-swnode.h
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply
* [PATCH 05/11] net: mdio: switch to using fwnode_gpiod_get_index()
From: Dmitry Torokhov @ 2019-09-11 7:52 UTC (permalink / raw)
To: Linus Walleij
Cc: Andy Shevchenko, Mika Westerberg, linux-kernel, linux-gpio,
Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit,
netdev
In-Reply-To: <20190911075215.78047-1-dmitry.torokhov@gmail.com>
Instead of fwnode_get_named_gpiod() that I plan to hide away, let's use
the new fwnode_gpiod_get_index() that mimics gpiod_get_index(), bit
works with arbitrary firmware node.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
drivers/net/phy/mdio_bus.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index ce940871331e..9ca51d678123 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -46,8 +46,8 @@ static int mdiobus_register_gpiod(struct mdio_device *mdiodev)
/* Deassert the optional reset signal */
if (mdiodev->dev.of_node)
- gpiod = fwnode_get_named_gpiod(&mdiodev->dev.of_node->fwnode,
- "reset-gpios", 0, GPIOD_OUT_LOW,
+ gpiod = fwnode_gpiod_get_index(&mdiodev->dev.of_node->fwnode,
+ "reset", 0, GPIOD_OUT_LOW,
"PHY reset");
if (IS_ERR(gpiod)) {
if (PTR_ERR(gpiod) == -ENOENT || PTR_ERR(gpiod) == -ENOSYS)
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related
* [PATCH 04/11] net: phylink: switch to using fwnode_gpiod_get_index()
From: Dmitry Torokhov @ 2019-09-11 7:52 UTC (permalink / raw)
To: Linus Walleij
Cc: Andy Shevchenko, Mika Westerberg, linux-kernel, linux-gpio,
Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit,
Russell King, netdev
In-Reply-To: <20190911075215.78047-1-dmitry.torokhov@gmail.com>
Instead of fwnode_get_named_gpiod() that I plan to hide away, let's use
the new fwnode_gpiod_get_index() that mimics gpiod_get_index(), bit
works with arbitrary firmware node.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
drivers/net/phy/phylink.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index a45c5de96ab1..14b608991445 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -168,8 +168,8 @@ static int phylink_parse_fixedlink(struct phylink *pl,
pl->link_config.pause |= MLO_PAUSE_ASYM;
if (ret == 0) {
- desc = fwnode_get_named_gpiod(fixed_node, "link-gpios",
- 0, GPIOD_IN, "?");
+ desc = fwnode_gpiod_get_index(fixed_node, "link", 0,
+ GPIOD_IN, "?");
if (!IS_ERR(desc))
pl->link_gpio = desc;
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related
* Re: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Yonghong Song @ 2019-09-11 7:42 UTC (permalink / raw)
To: Sami Tolvanen
Cc: Alexei Starovoitov, Daniel Borkmann, Kees Cook, Martin Lau,
Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org, Toke Høiland-Jørgensen,
Björn Töpel
In-Reply-To: <20190910172253.GA164966@google.com>
On 9/10/19 6:22 PM, Sami Tolvanen wrote:
> On Tue, Sep 10, 2019 at 08:37:19AM +0000, Yonghong Song wrote:
>> You did not mention BPF_BINARY_HEADER_MAGIC and added member
>> of `magic` in bpf_binary_header. Could you add some details
>> on what is the purpose for this `magic` member?
>
> Sure, I'll add a description to the next version.
>
> The magic is a random number used to identify bpf_binary_header in
> memory. The purpose of this patch is to limit the possible call
> targets for the function pointer and checking for the magic helps
> ensure we are jumping to a page that contains a jited function,
> instead of allowing calls to arbitrary targets.
>
> This is particularly useful when combined with the compiler-based
> Control-Flow Integrity (CFI) mitigation, which Google started shipping
> in Pixel kernels last year. The compiler injects checks to all
> indirect calls, but cannot obviously validate jumps to dynamically
> generated code.
>
>>> +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
>>> +{
>>> + const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
>>> +
>>> + if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
>>> + return prog->bpf_func(ctx, prog->insnsi);
>>> +
>>> + if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
>>> + !arch_bpf_jit_check_func(prog))) {
>>> + WARN(1, "attempt to jump to an invalid address");
>>> + return 0;
>>> + }
>>> +
>>> + return prog->bpf_func(ctx, prog->insnsi);
>>> +}
>
>> The above can be rewritten as
>> if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
>> hdr->magic != BPF_BINARY_HEADER_MAGIC ||
>> !arch_bpf_jit_check_func(prog))) {
>> WARN(1, "attempt to jump to an invalid address");
>> return 0;
>> }
>
> That doesn't look quite equivalent, but yes, this can be rewritten as a
Indeed, I made a mistake. Your below change is correct.
> single if statement like this:
>
> if ((IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) ||
> prog->jited) &&
> (hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> !arch_bpf_jit_check_func(prog)))
>
> I think splitting the interpreter and JIT paths would be more readable,
> but I can certainly change this if you prefer.
How about this:
if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
goto out;
if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
!arch_bpf_jit_check_func(prog))) {
WARN(1, "attempt to jump to an invalid address");
return 0;
}
out:
return prog->bpf_func(ctx, prog->insnsi);
>
>> BPF_PROG_RUN() will be called during xdp fast path.
>> Have you measured how much slowdown the above change could
>> cost for the performance?
>
> I have not measured the overhead, but it shouldn't be significant. Is
> there a particular benchmark you'd like me to run?
I am not an expert in XDP testing. Toke, Björn, could you give some
suggestions what to test for XDP performance here?
>
> Sami
>
^ permalink raw reply
* Re: Q: fixed link
From: Ranran @ 2019-09-11 7:25 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20190908090551.GC28580@lunn.ch>
On Sun, Sep 8, 2019 at 12:05 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sun, Sep 08, 2019 at 10:30:51AM +0300, Ranran wrote:
> > Hello,
> >
> > In documentation of fixed-link it is said:"
> > Some Ethernet MACs have a "fixed link", and are not connected to a
> > normal MDIO-managed PHY device. For those situations, a Device Tree
> > binding allows to describe a "fixed link".
> > "
> > Does it mean, that on using unmanaged switch ("no cpu" mode), it is
> > better be used with fixed-link ?
>
> Hi Ranran
>
> Is there a MAC to MAC connection, or PHY to PHY connection?
>
> If the interface MAC is directly connected to the switch MAC, fixed
> link is what you should use. The fixed link will then tell the
> interface MAC what speed it should use.
>
> If you have back to back PHYs, you need a PHY driver for the PHY
> connected to the interface MAC, to configure its speed, duplex
> etc. The dumb switch should be controlling its PHY, and auto-neg will
> probably work.
>
> Andrew
Hi,
We are using RGMII mode with the switch, which probably means
mac-to-mac (as far as I understand).
Does it mean we can choose between these 2 options:
1. configure dsa switch in device tree
2. configure fixed link in device tree
Thanks,
ranran
^ permalink raw reply
* Re: ixgbe: driver drops packets routed from an IPSec interface with a "bad sa_idx" error
From: Shannon Nelson @ 2019-09-11 7:17 UTC (permalink / raw)
To: Steffen Klassert, Michael Marley; +Cc: netdev, Jeff Kirsher
In-Reply-To: <20190911061547.GR2879@gauss3.secunet.de>
On 9/10/19 11:15 PM, Steffen Klassert wrote:
> On Tue, Sep 10, 2019 at 06:53:30PM -0400, Michael Marley wrote:
>> StrongSwan has hardware offload disabled by default, and I didn't enable
>> it explicitly. I also already tried turning off all those switches with
>> ethtool and it has no effect. This doesn't surprise me though, because
>> as I said, I don't actually have the IPSec connection running over the
>> ixgbe device. The IPSec connection runs over another network adapter
>> that doesn't support IPSec offload at all. The problem comes when
>> traffic received over the IPSec interface is then routed back out
>> (unencrypted) through the ixgbe device into the local network.
>
> Seems like the ixgbe driver tries to use the sec_path
> from RX to setup an offload at the TX side.
>
> Can you please try this (completely untested) patch?
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 9bcae44e9883..ae31bd57127c 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -36,6 +36,7 @@
> #include <net/vxlan.h>
> #include <net/mpls.h>
> #include <net/xdp_sock.h>
> +#include <net/xfrm.h>
>
> #include "ixgbe.h"
> #include "ixgbe_common.h"
> @@ -8696,7 +8697,7 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
> #endif /* IXGBE_FCOE */
>
> #ifdef CONFIG_IXGBE_IPSEC
> - if (secpath_exists(skb) &&
> + if (xfrm_offload(skb) &&
> !ixgbe_ipsec_tx(tx_ring, first, &ipsec_tx))
> goto out_drop;
> #endif
Thanks, Steffen, that looks right to me.
sln
^ permalink raw reply
* Re: [PATCH v2 5/5] ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode
From: Geert Uytterhoeven @ 2019-09-11 7:00 UTC (permalink / raw)
To: Shawn Guo
Cc: Vladimir Oltean, Mark Brown, linux-spi, lkml,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
netdev, Rob Herring
In-Reply-To: <20190911063350.GB17142@dragon>
On Wed, Sep 11, 2019 at 8:34 AM Shawn Guo <shawnguo@kernel.org> wrote:
> On Tue, Aug 27, 2019 at 09:16:39PM +0300, Vladimir Oltean wrote:
> > On Tue, 27 Aug 2019 at 21:13, Mark Brown <broonie@kernel.org> wrote:
> > > On Tue, Aug 27, 2019 at 09:06:14PM +0300, Vladimir Oltean wrote:
> > > > On Tue, 27 Aug 2019 at 21:05, Mark Brown <broonie@kernel.org> wrote:
> > > > > On Mon, Aug 26, 2019 at 04:10:51PM +0300, Vladimir Oltean wrote:
> > >
> > > > > > I noticed you skipped applying this patch, and I'm not sure that Shawn
> > > > > > will review it/take it.
> > > > > > Do you have a better suggestion how I can achieve putting the DSPI
> > > > > > driver in poll mode for this board? A Kconfig option maybe?
> > >
> > > > > DT changes go through the relevant platform trees, not the
> > > > > subsystem trees, so it's not something I'd expect to apply.
> > >
> > > > But at least is it something that you expect to see done through a
> > > > device tree change?
> > >
> > > Well, it's not ideal - if it performs better all the time the
> > > driver should probably just do it unconditionally. If there's
> > > some threashold where it tends to perform better then the driver
> > > should check for that but IIRC it sounds like the interrupt just
> > > isn't at all helpful here.
> >
> > I can't seem to find any situation where it performs worse. Hence my
> > question on whether it's a better idea to condition this behavior on a
> > Kconfig option rather than a DT blob which may or may not be in sync.
>
> DT is a description of hardware not condition for software behavior,
> where module parameter is usually used for.
+1
DT says the interrupt line is wired.
The driver should know if it should make use of the interrupt, or not.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH v2 5/5] ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode
From: Shawn Guo @ 2019-09-11 6:33 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Mark Brown, linux-spi, lkml, devicetree, netdev, Rob Herring
In-Reply-To: <CA+h21hqMVdsUjBdtiHKtKGpyvuaOf25tc4h-GdDEBQqa3EB7tw@mail.gmail.com>
On Tue, Aug 27, 2019 at 09:16:39PM +0300, Vladimir Oltean wrote:
> On Tue, 27 Aug 2019 at 21:13, Mark Brown <broonie@kernel.org> wrote:
> >
> > On Tue, Aug 27, 2019 at 09:06:14PM +0300, Vladimir Oltean wrote:
> > > On Tue, 27 Aug 2019 at 21:05, Mark Brown <broonie@kernel.org> wrote:
> > > > On Mon, Aug 26, 2019 at 04:10:51PM +0300, Vladimir Oltean wrote:
> >
> > > > > I noticed you skipped applying this patch, and I'm not sure that Shawn
> > > > > will review it/take it.
> > > > > Do you have a better suggestion how I can achieve putting the DSPI
> > > > > driver in poll mode for this board? A Kconfig option maybe?
> >
> > > > DT changes go through the relevant platform trees, not the
> > > > subsystem trees, so it's not something I'd expect to apply.
> >
> > > But at least is it something that you expect to see done through a
> > > device tree change?
> >
> > Well, it's not ideal - if it performs better all the time the
> > driver should probably just do it unconditionally. If there's
> > some threashold where it tends to perform better then the driver
> > should check for that but IIRC it sounds like the interrupt just
> > isn't at all helpful here.
>
> I can't seem to find any situation where it performs worse. Hence my
> question on whether it's a better idea to condition this behavior on a
> Kconfig option rather than a DT blob which may or may not be in sync.
DT is a description of hardware not condition for software behavior,
where module parameter is usually used for.
Shawn
^ permalink raw reply
* Re: [patch net-next v2 3/3] net: devlink: move reload fail indication to devlink core and expose to user
From: Jiri Pirko @ 2019-09-11 6:17 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev@vger.kernel.org, davem@davemloft.net, dsahern@gmail.com,
jakub.kicinski@netronome.com, Tariq Toukan, mlxsw
In-Reply-To: <20190908112534.GA27998@splinter>
Sun, Sep 08, 2019 at 01:25:36PM CEST, idosch@mellanox.com wrote:
>On Sat, Sep 07, 2019 at 10:54:00PM +0200, Jiri Pirko wrote:
>> +bool devlink_is_reload_failed(struct devlink *devlink)
>
>Forgot to mention that this can be 'const'
ok.
>
>> +{
>> + return devlink->reload_failed;
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_is_reload_failed);
^ permalink raw reply
* Re: [patch net-next v2 3/3] net: devlink: move reload fail indication to devlink core and expose to user
From: Jiri Pirko @ 2019-09-11 6:17 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev@vger.kernel.org, davem@davemloft.net, dsahern@gmail.com,
jakub.kicinski@netronome.com, Tariq Toukan, mlxsw
In-Reply-To: <20190908103928.GA21777@splinter>
Sun, Sep 08, 2019 at 12:39:32PM CEST, idosch@mellanox.com wrote:
>On Sat, Sep 07, 2019 at 10:54:00PM +0200, Jiri Pirko wrote:
>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> index 546e75dd74ac..7cb5e8c5ae0d 100644
>> --- a/include/uapi/linux/devlink.h
>> +++ b/include/uapi/linux/devlink.h
>> @@ -410,6 +410,8 @@ enum devlink_attr {
>> DEVLINK_ATTR_TRAP_METADATA, /* nested */
>> DEVLINK_ATTR_TRAP_GROUP_NAME, /* string */
>>
>> + DEVLINK_ATTR_RELOAD_FAILED, /* u8 0 or 1 */
>> +
>> /* add new attributes above here, update the policy in devlink.c */
>>
>> __DEVLINK_ATTR_MAX,
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 1e3a2288b0b2..e00a4a643d17 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -471,6 +471,8 @@ static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
>>
>> if (devlink_nl_put_handle(msg, devlink))
>> goto nla_put_failure;
>> + if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed))
>
>Why not use NLA_FLAG for this?
Bacause older kernel does not return this. So the user would not know if
everything is fine or if the kernel does not support the attribute.
Using u8 instead allows it.
>
>> + goto nla_put_failure;
>>
>> genlmsg_end(msg, hdr);
>> return 0;
>> @@ -2677,6 +2679,21 @@ static bool devlink_reload_supported(struct devlink *devlink)
>> return devlink->ops->reload_down && devlink->ops->reload_up;
>> }
^ permalink raw reply
* [PATCH v4 1/2] PTP: introduce new versions of IOCTLs
From: Felipe Balbi @ 2019-09-11 6:16 UTC (permalink / raw)
To: Richard Cochran; +Cc: Christopher S Hall, netdev, linux-kernel, Felipe Balbi
The current version of the IOCTL have a small problem which prevents us
from extending the API by making use of reserved fields. In these new
IOCTLs, we are now making sure that flags and rsv fields are zero which
will allow us to extend the API in the future.
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
Changes since v3:
- Remove bogus bitwise negation
Changes since v2:
- Define PTP_{PEROUT,EXTTS}_VALID_FLAGS
- Fix comment above PTP_*_FLAGS
Changes since v1:
- Add a blank line after memset()
- Move memset(req) to the three places where it's needed
- Fix the accidental removal of GETFUNC and SETFUNC
drivers/ptp/ptp_chardev.c | 63 ++++++++++++++++++++++++++++++++++
include/uapi/linux/ptp_clock.h | 24 ++++++++++++-
2 files changed, 86 insertions(+), 1 deletion(-)
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 18ffe449efdf..9c18476d8d10 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -126,7 +126,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
switch (cmd) {
case PTP_CLOCK_GETCAPS:
+ case PTP_CLOCK_GETCAPS2:
memset(&caps, 0, sizeof(caps));
+
caps.max_adj = ptp->info->max_adj;
caps.n_alarm = ptp->info->n_alarm;
caps.n_ext_ts = ptp->info->n_ext_ts;
@@ -139,11 +141,24 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_EXTTS_REQUEST:
+ case PTP_EXTTS_REQUEST2:
+ memset(&req, 0, sizeof(req));
+
if (copy_from_user(&req.extts, (void __user *)arg,
sizeof(req.extts))) {
err = -EFAULT;
break;
}
+ if (((req.extts.flags & ~PTP_EXTTS_VALID_FLAGS) ||
+ req.extts.rsv[0] || req.extts.rsv[1]) &&
+ cmd == PTP_EXTTS_REQUEST2) {
+ err = -EINVAL;
+ break;
+ } else if (cmd == PTP_EXTTS_REQUEST) {
+ req.extts.flags &= ~PTP_EXTTS_VALID_FLAGS;
+ req.extts.rsv[0] = 0;
+ req.extts.rsv[1] = 0;
+ }
if (req.extts.index >= ops->n_ext_ts) {
err = -EINVAL;
break;
@@ -154,11 +169,27 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_PEROUT_REQUEST:
+ case PTP_PEROUT_REQUEST2:
+ memset(&req, 0, sizeof(req));
+
if (copy_from_user(&req.perout, (void __user *)arg,
sizeof(req.perout))) {
err = -EFAULT;
break;
}
+ if (((req.perout.flags & ~PTP_PEROUT_VALID_FLAGS) ||
+ req.perout.rsv[0] || req.perout.rsv[1] ||
+ req.perout.rsv[2] || req.perout.rsv[3]) &&
+ cmd == PTP_PEROUT_REQUEST2) {
+ err = -EINVAL;
+ break;
+ } else if (cmd == PTP_PEROUT_REQUEST) {
+ req.perout.flags &= ~PTP_PEROUT_VALID_FLAGS;
+ req.perout.rsv[0] = 0;
+ req.perout.rsv[1] = 0;
+ req.perout.rsv[2] = 0;
+ req.perout.rsv[3] = 0;
+ }
if (req.perout.index >= ops->n_per_out) {
err = -EINVAL;
break;
@@ -169,6 +200,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_ENABLE_PPS:
+ case PTP_ENABLE_PPS2:
+ memset(&req, 0, sizeof(req));
+
if (!capable(CAP_SYS_TIME))
return -EPERM;
req.type = PTP_CLK_REQ_PPS;
@@ -177,6 +211,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_SYS_OFFSET_PRECISE:
+ case PTP_SYS_OFFSET_PRECISE2:
if (!ptp->info->getcrosststamp) {
err = -EOPNOTSUPP;
break;
@@ -201,6 +236,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_SYS_OFFSET_EXTENDED:
+ case PTP_SYS_OFFSET_EXTENDED2:
if (!ptp->info->gettimex64) {
err = -EOPNOTSUPP;
break;
@@ -232,6 +268,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_SYS_OFFSET:
+ case PTP_SYS_OFFSET2:
sysoff = memdup_user((void __user *)arg, sizeof(*sysoff));
if (IS_ERR(sysoff)) {
err = PTR_ERR(sysoff);
@@ -266,10 +303,23 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_PIN_GETFUNC:
+ case PTP_PIN_GETFUNC2:
if (copy_from_user(&pd, (void __user *)arg, sizeof(pd))) {
err = -EFAULT;
break;
}
+ if ((pd.rsv[0] || pd.rsv[1] || pd.rsv[2]
+ || pd.rsv[3] || pd.rsv[4])
+ && cmd == PTP_PIN_GETFUNC2) {
+ err = -EINVAL;
+ break;
+ } else if (cmd == PTP_PIN_GETFUNC) {
+ pd.rsv[0] = 0;
+ pd.rsv[1] = 0;
+ pd.rsv[2] = 0;
+ pd.rsv[3] = 0;
+ pd.rsv[4] = 0;
+ }
pin_index = pd.index;
if (pin_index >= ops->n_pins) {
err = -EINVAL;
@@ -285,10 +335,23 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
case PTP_PIN_SETFUNC:
+ case PTP_PIN_SETFUNC2:
if (copy_from_user(&pd, (void __user *)arg, sizeof(pd))) {
err = -EFAULT;
break;
}
+ if ((pd.rsv[0] || pd.rsv[1] || pd.rsv[2]
+ || pd.rsv[3] || pd.rsv[4])
+ && cmd == PTP_PIN_SETFUNC2) {
+ err = -EINVAL;
+ break;
+ } else if (cmd == PTP_PIN_SETFUNC) {
+ pd.rsv[0] = 0;
+ pd.rsv[1] = 0;
+ pd.rsv[2] = 0;
+ pd.rsv[3] = 0;
+ pd.rsv[4] = 0;
+ }
pin_index = pd.index;
if (pin_index >= ops->n_pins) {
err = -EINVAL;
diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index 1bc794ad957a..9a0af3511b68 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -25,10 +25,20 @@
#include <linux/ioctl.h>
#include <linux/types.h>
-/* PTP_xxx bits, for the flags field within the request structures. */
+/*
+ * Bits of the ptp_extts_request.flags field:
+ */
#define PTP_ENABLE_FEATURE (1<<0)
#define PTP_RISING_EDGE (1<<1)
#define PTP_FALLING_EDGE (1<<2)
+#define PTP_EXTTS_VALID_FLAGS (PTP_ENABLE_FEATURE | \
+ PTP_RISING_EDGE | \
+ PTP_FALLING_EDGE)
+
+/*
+ * Bits of the ptp_perout_request.flags field:
+ */
+#define PTP_PEROUT_VALID_FLAGS (0)
/*
* struct ptp_clock_time - represents a time value
@@ -149,6 +159,18 @@ struct ptp_pin_desc {
#define PTP_SYS_OFFSET_EXTENDED \
_IOWR(PTP_CLK_MAGIC, 9, struct ptp_sys_offset_extended)
+#define PTP_CLOCK_GETCAPS2 _IOR(PTP_CLK_MAGIC, 10, struct ptp_clock_caps)
+#define PTP_EXTTS_REQUEST2 _IOW(PTP_CLK_MAGIC, 11, struct ptp_extts_request)
+#define PTP_PEROUT_REQUEST2 _IOW(PTP_CLK_MAGIC, 12, struct ptp_perout_request)
+#define PTP_ENABLE_PPS2 _IOW(PTP_CLK_MAGIC, 13, int)
+#define PTP_SYS_OFFSET2 _IOW(PTP_CLK_MAGIC, 14, struct ptp_sys_offset)
+#define PTP_PIN_GETFUNC2 _IOWR(PTP_CLK_MAGIC, 15, struct ptp_pin_desc)
+#define PTP_PIN_SETFUNC2 _IOW(PTP_CLK_MAGIC, 16, struct ptp_pin_desc)
+#define PTP_SYS_OFFSET_PRECISE2 \
+ _IOWR(PTP_CLK_MAGIC, 17, struct ptp_sys_offset_precise)
+#define PTP_SYS_OFFSET_EXTENDED2 \
+ _IOWR(PTP_CLK_MAGIC, 18, struct ptp_sys_offset_extended)
+
struct ptp_extts_event {
struct ptp_clock_time t; /* Time event occured. */
unsigned int index; /* Which channel produced the event. */
--
2.23.0
^ permalink raw reply related
* [PATCH v4 2/2] PTP: add support for one-shot output
From: Felipe Balbi @ 2019-09-11 6:16 UTC (permalink / raw)
To: Richard Cochran; +Cc: Christopher S Hall, netdev, linux-kernel, Felipe Balbi
In-Reply-To: <20190911061622.774006-1-felipe.balbi@linux.intel.com>
Some controllers allow for a one-shot output pulse, in contrast to
periodic output. Now that we have extensible versions of our IOCTLs, we
can finally make use of the 'flags' field to pass a bit telling driver
that if we want one-shot pulse output.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
Changes since v3:
- Remove bogus bitwise negation
Changes since v2:
- Add _PEROUT_ to bit macro
Changes since v1:
- remove comment from .flags field
include/uapi/linux/ptp_clock.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index 9a0af3511b68..f16301015949 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -38,8 +38,8 @@
/*
* Bits of the ptp_perout_request.flags field:
*/
-#define PTP_PEROUT_VALID_FLAGS (0)
-
+#define PTP_PEROUT_ONE_SHOT (1<<0)
+#define PTP_PEROUT_VALID_FLAGS (PTP_PEROUT_ONE_SHOT)
/*
* struct ptp_clock_time - represents a time value
*
@@ -77,7 +77,7 @@ struct ptp_perout_request {
struct ptp_clock_time start; /* Absolute start time. */
struct ptp_clock_time period; /* Desired period, zero means disable. */
unsigned int index; /* Which channel to configure. */
- unsigned int flags; /* Reserved for future use. */
+ unsigned int flags;
unsigned int rsv[4]; /* Reserved for future use. */
};
--
2.23.0
^ permalink raw reply related
* Re: ixgbe: driver drops packets routed from an IPSec interface with a "bad sa_idx" error
From: Steffen Klassert @ 2019-09-11 6:15 UTC (permalink / raw)
To: Michael Marley; +Cc: Shannon Nelson, netdev, Jeff Kirsher
In-Reply-To: <6ab15854-154a-2c7c-b429-7ba6dfe785ae@michaelmarley.com>
On Tue, Sep 10, 2019 at 06:53:30PM -0400, Michael Marley wrote:
>
> StrongSwan has hardware offload disabled by default, and I didn't enable
> it explicitly. I also already tried turning off all those switches with
> ethtool and it has no effect. This doesn't surprise me though, because
> as I said, I don't actually have the IPSec connection running over the
> ixgbe device. The IPSec connection runs over another network adapter
> that doesn't support IPSec offload at all. The problem comes when
> traffic received over the IPSec interface is then routed back out
> (unencrypted) through the ixgbe device into the local network.
Seems like the ixgbe driver tries to use the sec_path
from RX to setup an offload at the TX side.
Can you please try this (completely untested) patch?
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9bcae44e9883..ae31bd57127c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -36,6 +36,7 @@
#include <net/vxlan.h>
#include <net/mpls.h>
#include <net/xdp_sock.h>
+#include <net/xfrm.h>
#include "ixgbe.h"
#include "ixgbe_common.h"
@@ -8696,7 +8697,7 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
#endif /* IXGBE_FCOE */
#ifdef CONFIG_IXGBE_IPSEC
- if (secpath_exists(skb) &&
+ if (xfrm_offload(skb) &&
!ixgbe_ipsec_tx(tx_ring, first, &ipsec_tx))
goto out_drop;
#endif
^ permalink raw reply related
* Re: [PATCH v3 1/2] PTP: introduce new versions of IOCTLs
From: Felipe Balbi @ 2019-09-11 6:08 UTC (permalink / raw)
To: Richard Cochran; +Cc: Christopher S Hall, netdev, linux-kernel
In-Reply-To: <20190910154452.GB4016@localhost>
[-- Attachment #1: Type: text/plain, Size: 705 bytes --]
Hi,
Richard Cochran <richardcochran@gmail.com> writes:
> On Mon, Sep 09, 2019 at 10:59:39AM +0300, Felipe Balbi wrote:
>
>> case PTP_PEROUT_REQUEST:
>> + case PTP_PEROUT_REQUEST2:
>
> ...
>
>> + if (((req.perout.flags & ~PTP_PEROUT_VALID_FLAGS) ||
>> + req.perout.rsv[0] || req.perout.rsv[1] ||
>> + req.perout.rsv[2] || req.perout.rsv[3]) &&
>> + cmd == PTP_PEROUT_REQUEST2) {
>> + err = -EINVAL;
>> + break;
>
> ...
>
>> +/*
>> + * Bits of the ptp_perout_request.flags field:
>> + */
>> +#define PTP_PEROUT_VALID_FLAGS (~0)
>
> I think you meant (0) here, or I am confused...
Argh. What a brain fart!
Sorry about that. I'll go fix that ASAP.
--
balbi
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply
* Re: linux-next: Signed-off-by missing for commit in the net-next tree
From: Luciano Coelho @ 2019-09-11 5:24 UTC (permalink / raw)
To: Stephen Rothwell, David Miller, Networking
Cc: Linux Next Mailing List, Linux Kernel Mailing List, Alex Malamud
In-Reply-To: <20190911004229.74d2763a@canb.auug.org.au>
On Wed, 2019-09-11 at 00:42 +1000, Stephen Rothwell wrote:
> Hi all,
>
> Commit
>
> aa43ae121675 ("iwlwifi: LTR updates")
>
> is missing a Signed-off-by from its committer.
Oops, that was my fault. What can we do about it? Is it enough if I
give my s-o-b publicly here?
I hereby sign off this change:
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
--
Cheers,
Luca.
^ permalink raw reply
* RE: VRF Issue Since kernel 5
From: Gowen @ 2019-09-11 5:09 UTC (permalink / raw)
To: David Ahern, Alexis Bauvin; +Cc: netdev@vger.kernel.org
In-Reply-To: <db3f6cd0-aa28-0883-715c-3e1eaeb7fd1e@gmail.com>
Thanks for the link - that's really useful. I did re-order ip rules Friday (I think) - no change
-----Original Message-----
From: David Ahern <dsahern@gmail.com>
Sent: 10 September 2019 17:36
To: Alexis Bauvin <abauvin@online.net>; Gowen <gowen@potatocomputing.co.uk>
Cc: netdev@vger.kernel.org
Subject: Re: VRF Issue Since kernel 5
On 9/9/19 1:01 PM, Alexis Bauvin wrote:
> Could you try swapping the local and l3mdev rules?
>
> `ip rule del pref 0; ip rule add from all lookup local pref 1001`
yes, the rules should be re-ordered so that local rule is after l3mdev rule (VRF is implemented as policy routing). In general, I would reverse the order of those commands to ensure no breakage.
Also, 5.0 I think it was (too many kernel versions) added a new l3mdev sysctl (raw_l3mdev_accept). Check all 3 of them and nmake sure they are set properly for your use case.
These slides do not cover 5.0 changes but are still the best collection of notes on VRF:
http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf
^ permalink raw reply
* Re: [PATCH] net: stmmac: socfpga: re-use the `interface` parameter from platform data
From: Ardelean, Alexandru @ 2019-09-11 5:03 UTC (permalink / raw)
To: davem@davemloft.net
Cc: linux-stm32@st-md-mailman.stormreply.com, joabreu@synopsys.com,
linux-kernel@vger.kernel.org, mcoquelin.stm32@gmail.com,
netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
peppe.cavallaro@st.com, alexandre.torgue@st.com
In-Reply-To: <20190910.174653.800353422834551780.davem@davemloft.net>
On Tue, 2019-09-10 at 17:46 +0200, David Miller wrote:
> [External]
>
> From: David Miller <davem@davemloft.net>
> Date: Tue, 10 Sep 2019 17:45:44 +0200 (CEST)
>
> > From: Alexandru Ardelean <alexandru.ardelean@analog.com>
> > Date: Fri, 6 Sep 2019 15:30:54 +0300
> >
> > > The socfpga sub-driver defines an `interface` field in the `socfpga_dwmac`
> > > struct and parses it on init.
> > >
> > > The shared `stmmac_probe_config_dt()` function also parses this from the
> > > device-tree and makes it available on the returned `plat_data` (which is
> > > the same data available via `netdev_priv()`).
> > >
> > > All that's needed now is to dig that information out, via some
> > > `dev_get_drvdata()` && `netdev_priv()` calls and re-use it.
> > >
> > > Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> >
> > This doesn't build even on net-next.
>
Right.
My bad.
I think I got confused with multiple/cross-testing and probably this change didn't even get compiled.
Apologies for this.
Will send a good version.
Alex
> Specifically:
>
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c: In function ‘socfpga_gen5_set_phy_mode’:
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:264:44: error: ‘phymode’ undeclared (first use in this function);
> did you mean ‘phy_modes’?
> 264 | dev_err(dwmac->dev, "bad phy mode %d\n", phymode);
> | ^~~~~~~
> ./include/linux/device.h:1499:32: note: in definition of macro ‘dev_err’
> 1499 | _dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__)
> | ^~~~~~~~~~~
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:264:44: note: each undeclared identifier is reported only once for
> each function it appears in
> 264 | dev_err(dwmac->dev, "bad phy mode %d\n", phymode);
> | ^~~~~~~
> ./include/linux/device.h:1499:32: note: in definition of macro ‘dev_err’
> 1499 | _dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__)
> | ^~~~~~~~~~~
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c: In function ‘socfpga_gen10_set_phy_mode’:
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:340:6: error: ‘phymode’ undeclared (first use in this function);
> did you mean ‘phy_modes’?
> 340 | phymode == PHY_INTERFACE_MODE_MII ||
> | ^~~~~~~
> | phy_modes
^ permalink raw reply
* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
From: Carlos Antonio Neira Bustos @ 2019-09-11 4:33 UTC (permalink / raw)
To: Yonghong Song
Cc: netdev@vger.kernel.org, ebiederm@xmission.com, brouer@redhat.com,
bpf@vger.kernel.org
In-Reply-To: <c5ecd602-8b45-94bd-96e8-2264a88a3c09@fb.com>
Thanks for reviewing the rest of the code,
I'll include these changes in ver 11.
Bests
On Tue, Sep 10, 2019 at 10:46:45PM +0000, Yonghong Song wrote:
>
>
> On 9/6/19 4:09 PM, Carlos Neira wrote:
> > This helper(bpf_get_current_pidns_info) obtains the active namespace from
> > current and returns pid, tgid, device and namespace id as seen from that
> > namespace, allowing to instrument a process inside a container.
> >
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> > include/linux/bpf.h | 1 +
> > include/uapi/linux/bpf.h | 35 +++++++++++++++++++-
> > kernel/bpf/core.c | 1 +
> > kernel/bpf/helpers.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++
> > kernel/trace/bpf_trace.c | 2 ++
> > 5 files changed, 124 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 5b9d22338606..819cb1c84be0 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> > extern const struct bpf_func_proto bpf_strtol_proto;
> > extern const struct bpf_func_proto bpf_strtoul_proto;
> > extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >
> > /* Shared helpers among cBPF and eBPF. */
> > void bpf_user_rnd_init_once(void);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index b5889257cc33..3ec9aa1438b7 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2747,6 +2747,32 @@ union bpf_attr {
> > * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> > *
> > * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> > + * Description
> > + * Get tgid, pid and namespace id as seen by the current namespace,
> > + * and device major/minor numbers from /proc/self/ns/pid. Such
> > + * information is stored in *pidns* of size *size*.
> > + *
> > + * This helper is used when pid filtering is needed inside a
> > + * container as bpf_get_current_tgid() helper always returns the
> > + * pid id as seen by the root namespace.
> > + * Return
> > + * 0 on success
> > + *
> > + * On failure, the returned value is one of the following:
> > + *
> > + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + * or tgid of the current task.
> > + *
> > + * **-ENOENT** if /proc/self/ns/pid does not exists.
> > + *
> > + * **-ENOENT** if /proc/self/ns does not exists.
> > + *
> > + * **-ENOMEM** if helper internal allocation fails.
>
> -ENOMEM can be removed.
>
> > + *
> > + * **-EPERM** if not able to call helper.
> > + *
> > */
> > #define __BPF_FUNC_MAPPER(FN) \
> > FN(unspec), \
> > @@ -2859,7 +2885,8 @@ union bpf_attr {
> > FN(sk_storage_get), \
> > FN(sk_storage_delete), \
> > FN(send_signal), \
> > - FN(tcp_gen_syncookie),
> > + FN(tcp_gen_syncookie), \
> > + FN(get_current_pidns_info),
> >
> > /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > * function eBPF program intends to call
> > @@ -3610,4 +3637,10 @@ struct bpf_sockopt {
> > __s32 retval;
> > };
> >
> > +struct bpf_pidns_info {
> > + __u32 dev; /* dev_t from /proc/self/ns/pid inode */
>
> /* dev_t of pid namespace pseudo file (typically /proc/seelf/ns/pid)
> after following symbolic link */
>
> > + __u32 nsid;
> > + __u32 tgid;
> > + __u32 pid;
> > +};
> > #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> > const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> > const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> > const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >
> > const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> > {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..8dbe6347893c 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,11 @@
> > #include <linux/uidgid.h>
> > #include <linux/filter.h>
> > #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/kdev_t.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> >
> > #include "../../lib/kstrtox.h"
> >
> > @@ -312,6 +317,87 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> > preempt_enable();
> > }
> >
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > + size)
> > +{
> > + const char *pidns_path = "/proc/self/ns/pid";
> > + struct pid_namespace *pidns = NULL;
> > + struct filename *fname = NULL;
> > + struct inode *inode;
> > + struct path kp;
> > + pid_t tgid = 0;
> > + pid_t pid = 0;
> > + int ret = -EINVAL;
> > + int len;
> > +
> > + if (unlikely(in_interrupt() ||
> > + current->flags & (PF_KTHREAD | PF_EXITING)))
> > + return -EPERM;
> > +
> > + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > + return -EINVAL;
> > +
> > + pidns = task_active_pid_ns(current);
> > + if (unlikely(!pidns))
> > + return -ENOENT;
> > +
> > + pidns_info->nsid = pidns->ns.inum;
> > + pid = task_pid_nr_ns(current, pidns);
> > + if (unlikely(!pid))
> > + goto clear;
> > +
> > + tgid = task_tgid_nr_ns(current, pidns);
> > + if (unlikely(!tgid))
> > + goto clear;
> > +
> > + pidns_info->tgid = (u32) tgid;
> > + pidns_info->pid = (u32) pid;
> > +
> [...]
> > + fname = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > + if (unlikely(!fname)) {
> > + ret = -ENOMEM;
> > + goto clear;
> > + }
> > + const size_t fnamesize = offsetof(struct filename, iname[1]);
> > + struct filename *tmp;
> > +
> > + tmp = kmalloc(fnamesize, GFP_ATOMIC);
> > + if (unlikely(!tmp)) {
> > + __putname(fname);
> > + ret = -ENOMEM;
> > + goto clear;
> > + }
> > +
> > + tmp->name = (char *)fname;
> > + fname = tmp;
> > + len = strlen(pidns_path) + 1;
> > + memcpy((char *)fname->name, pidns_path, len);
> > + fname->uptr = NULL;
> > + fname->aname = NULL;
> > + fname->refcnt = 1;
> > +
> > + ret = filename_lookup(AT_FDCWD, fname, 0, &kp, NULL);
> > + if (ret)
> > + goto clear;
> > +
> > + inode = d_backing_inode(kp.dentry);
> > + pidns_info->dev = (u32)inode->i_rdev;
> The above can bee replaced with new nsfs interface function
> ns_get_inum_dev().
> > +
> > + return 0;
> > +
> > +clear:
> > + memset((void *)pidns_info, 0, (size_t) size);
> > + return ret;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > + .func = bpf_get_current_pidns_info,
> > + .gpl_only = false,
> > + .ret_type = RET_INTEGER,
> > + .arg1_type = ARG_PTR_TO_UNINIT_MEM,
> > + .arg2_type = ARG_CONST_SIZE,
> > +};
> > +
> > #ifdef CONFIG_CGROUPS
> > BPF_CALL_0(bpf_get_current_cgroup_id)
> > {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > #endif
> > case BPF_FUNC_send_signal:
> > return &bpf_send_signal_proto;
> > + case BPF_FUNC_get_current_pidns_info:
> > + return &bpf_get_current_pidns_info_proto;
> > default:
> > return NULL;
> > }
> >
^ permalink raw reply
* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
From: Carlos Antonio Neira Bustos @ 2019-09-11 4:32 UTC (permalink / raw)
To: Yonghong Song
Cc: Al Viro, netdev@vger.kernel.org, ebiederm@xmission.com,
brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <dadf3657-2648-14ef-35ee-e09efb2cdb3e@fb.com>
On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
Thanks a lot Yonghong.
I'll include this patch when submitting changes for version 11 of
this patch.
>
> Carlos,
>
> Discussed with Eric today for what is the best way to get
> the device number for a namespace. The following patch seems
> a reasonable start although Eric would like to see
> how the helper is used in order to decide whether the
> interface looks right.
>
> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
> Author: Yonghong Song <yhs@fb.com>
> Date: Mon Sep 9 21:50:51 2019 -0700
>
> nsfs: add an interface function ns_get_inum_dev()
>
> This patch added an interface function
> ns_get_inum_dev(). Given a ns_common structure,
> the function returns the inode and device
> numbers. The function will be used later
> by a newly added bpf helper.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
>
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index a0431642c6b5..a603c6fc3f54 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
> return ERR_PTR(-EINVAL);
> }
>
> +/* Get the device number for the current task pidns.
> + */
> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
> +{
> + *inum = ns->inum;
> + *dev = nsfs_mnt->mnt_sb->s_dev;
> +}
> +
> static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
> {
> struct inode *inode = d_inode(dentry);
> diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
> index d31cb6215905..b8fc680cdf1a 100644
> --- a/include/linux/proc_ns.h
> +++ b/include/linux/proc_ns.h
> @@ -81,6 +81,7 @@ extern void *ns_get_path(struct path *path, struct
> task_struct *task,
> typedef struct ns_common *ns_get_path_helper_t(void *);
> extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t
> ns_get_cb,
> void *private_data);
> +extern void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev);
>
> extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
> const struct proc_ns_operations *ns_ops);
>
> Could you put the above change and patch #1 and then have
> all your other patches? In your kernel change, please use
> interface function ns_get_inum_dev() to get pidns inode number
> and dev number.
>
> On 9/9/19 6:45 PM, Carlos Antonio Neira Bustos wrote:
> > Thanks a lot, Al Viro and Yonghong for taking the time to review this patch and
> > provide technical insights needed on this one.
> > But how do we move this forward?
> > Al Viro's review is clear that this will not work and we should strip the name
> > resolution code (thanks for your detailed analysis).
> > As there is currently only one instance of the nsfs device on the system,
> > I think we could leave out the retrieval of the pidns device number and address it
> > when the situation changes.
> > What do you think?
> >
> >
> > On Sat, Sep 07, 2019 at 06:34:39AM +0000, Yonghong Song wrote:
> >>
> >>
> >> On 9/6/19 5:10 PM, Al Viro wrote:
> >>> On Fri, Sep 06, 2019 at 11:21:14PM +0000, Yonghong Song wrote:
> >>>
> >>>> -bash-4.4$ readlink /proc/self/ns/pid
> >>>> pid:[4026531836]
> >>>> -bash-4.4$ stat /proc/self/ns/pid
> >>>> File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
> >>>> Size: 0 Blocks: 0 IO Block: 1024 symbolic link
> >>>> Device: 4h/4d Inode: 344795989 Links: 1
> >>>> Access: (0777/lrwxrwxrwx) Uid: (128203/ yhs) Gid: ( 100/ users)
> >>>> Context: user_u:base_r:base_t
> >>>> Access: 2019-09-06 16:06:09.431616380 -0700
> >>>> Modify: 2019-09-06 16:06:09.431616380 -0700
> >>>> Change: 2019-09-06 16:06:09.431616380 -0700
> >>>> Birth: -
> >>>> -bash-4.4$
> >>>>
> >>>> Based on a discussion with Eric Biederman back in 2019 Linux
> >>>> Plumbers, Eric suggested that to uniquely identify a
> >>>> namespace, device id (major/minor) number should also
> >>>> be included. Although today's kernel implementation
> >>>> has the same device for all namespace pseudo files,
> >>>> but from uapi perspective, device id should be included.
> >>>>
> >>>> That is the reason why we try to get device id which holds
> >>>> pid namespace pseudo file.
> >>>>
> >>>> Do you have a better suggestion on how to get
> >>>> the device id for 'current' pid namespace? Or from design, we
> >>>> really should not care about device id at all?
> >>>
> >>> What the hell is "device id for pid namespace"? This is the
> >>> first time I've heard about that mystery object, so it's
> >>> hard to tell where it could be found.
> >>>
> >>> I can tell you what device numbers are involved in the areas
> >>> you seem to be looking in.
> >>>
> >>> 1) there's whatever device number that gets assigned to
> >>> (this) procfs instance. That, ironically, _is_ per-pidns, but
> >>> that of the procfs instance, not that of your process (and
> >>> those can be different). That's what you get in ->st_dev
> >>> when doing lstat() of anything in /proc (assuming that
> >>> procfs is mounted there, in the first place). NOTE:
> >>> that's lstat(2), not stat(2). stat(1) uses lstat(2),
> >>> unless given -L (in which case it's stat(2) time). The
> >>> difference:
> >>>
> >>> root@kvm1:~# stat /proc/self/ns/pid
> >>> File: /proc/self/ns/pid -> pid:[4026531836]
> >>> Size: 0 Blocks: 0 IO Block: 1024 symbolic link
> >>> Device: 4h/4d Inode: 17396 Links: 1
> >>> Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
> >>> Access: 2019-09-06 19:43:11.871312319 -0400
> >>> Modify: 2019-09-06 19:43:11.871312319 -0400
> >>> Change: 2019-09-06 19:43:11.871312319 -0400
> >>> Birth: -
> >>> root@kvm1:~# stat -L /proc/self/ns/pid
> >>> File: /proc/self/ns/pid
> >>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
> >>> Device: 3h/3d Inode: 4026531836 Links: 1
> >>> Access: (0444/-r--r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
> >>> Access: 2019-09-06 19:43:15.955313293 -0400
> >>> Modify: 2019-09-06 19:43:15.955313293 -0400
> >>> Change: 2019-09-06 19:43:15.955313293 -0400
> >>> Birth: -
> >>>
> >>> The former is lstat, the latter - stat.
> >>>
> >>> 2) device number of the filesystem where the symlink target lives.
> >>> In this case, it's nsfs and there's only one instance on the entire
> >>> system. _That_ would be obtained by looking at st_dev in stat(2) on
> >>> /proc/self/ns/pid (0:3 above).
> >>>
> >>> 3) device number *OF* the symlink. That would be st_rdev in lstat(2).
> >>> There's none - it's a symlink, not a character or block device. It's
> >>> always zero and always will be zero.
> >>>
> >>> 4) the same for the target; st_rdev in stat(2) results and again,
> >>> there's no such beast - it's neither character nor block device.
> >>>
> >>> Your code is looking at (3). Please, reread any textbook on Unix
> >>> in the section that would cover stat(2) and discussion of the
> >>> difference between st_dev and st_rdev.
> >>>
> >>> I have no idea what Eric had been talking about - it's hard to
> >>> reconstruct by what you said so far. Making nsfs per-userns,
> >>> perhaps? But that makes no sense whatsoever, not that userns
> >>> ever had... Cheap shots aside, I really can't guess what that's
> >>> about. Sorry.
> >>
> >> Thanks for the detailed information. The device number we want
> >> is nsfs. Indeed, currently, there is only one instance
> >> on the entire system. But not exactly sure what is the possibility
> >> to have more than one nsfs device in the future. Maybe per-userns
> >> or any other criteria?
> >>
> >>>
> >>> In any case, pathname resolution is *NOT* for the situations where
> >>> you can't block. Even if it's procfs (and from the same pidns as
> >>> the process) mounted there, there is no promise that the target
> >>> of /proc/self has already been looked up and not evicted from
> >>> memory since then. And in case of cache miss pathwalk will
> >>> have to call ->lookup(), which requires locking the directory
> >>> (rw_sem, shared). You can't do that in such context.
> >>>
> >>> And that doesn't even go into the possibility that process has
> >>> something very different mounted on /proc.
> >>>
> >>> Again, I don't know what it is that you want to get to, but
> >>> I would strongly recommend finding a way to get to that data
> >>> that would not involve going anywhere near pathname resolution.
> >>>
> >>> How would you expect the userland to work with that value,
> >>> whatever it might be? If it's just a 32bit field that will
> >>> never be read, you might as well store there the same value
> >>> you store now (0, that is) in much cheaper and safer way ;-)
> >>
> >> Suppose inside pid namespace, user can pass the device number,
> >> say n1, (`stat -L /proc/self/ns/pid`) to bpf program (through map
> >> or JIT). At runtime, bpf program will try to get device number,
> >> say n2, for the 'current' process. If n1 is not the same as
> >> n2, that means they are not in the same namespace. 'current'
> >> is in the same pid namespace as the user iff
> >> n1 == n2 and also pidns id is the same for 'current' and
> >> the one with `lsns -t pid`.
> >>
> >> Are you aware of any way to get the pidns device number
> >> for 'current' without going through the pathname
> >> lookup?
> >>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox