* Re: [PATCH net-next v3 2/3] flow_offload: support get tcf block immediately
From: wenxu @ 2019-07-27 8:02 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: pablo, fw, netfilter-devel, netdev
In-Reply-To: <20190726175245.4467d94b@cakuba.netronome.com>
在 2019/7/27 8:52, Jakub Kicinski 写道:
> On Fri, 26 Jul 2019 21:34:06 +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> Because the new flow-indr-block can't get the tcf_block
>> directly.
>> It provide a callback to find the tcf block immediately
>> when the device register and contain a ingress block.
>>
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
> Please CC people who gave you feedback on your subsequent submissions.
>
>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>> index 66f89bc..3b2e848 100644
>> --- a/include/net/flow_offload.h
>> +++ b/include/net/flow_offload.h
>> @@ -391,6 +391,10 @@ struct flow_indr_block_dev {
>> struct flow_block *flow_block;
>> };
>>
>> +typedef void flow_indr_get_default_block_t(struct flow_indr_block_dev *indr_dev);
>> +
>> +void flow_indr_set_default_block_cb(flow_indr_get_default_block_t *cb);
>> +
>> struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev);
>>
>> int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
>> diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
>> index 9f1ae67..db8469d 100644
>> --- a/net/core/flow_offload.c
>> +++ b/net/core/flow_offload.c
>> @@ -298,6 +298,14 @@ struct flow_indr_block_dev *
>> }
>> EXPORT_SYMBOL(flow_indr_block_dev_lookup);
>>
>> +static flow_indr_get_default_block_t *flow_indr_get_default_block;
> This static variable which can only be set to the TC's callback really
> is not a great API design :/
So any advise? just call the function in tc system with #ifdef NET_CLSXXX?
>
^ permalink raw reply
* Re: [PATCH net-next v3 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu @ 2019-07-27 8:05 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: pablo, fw, netfilter-devel, netdev
In-Reply-To: <20190726175627.7c146f94@cakuba.netronome.com>
在 2019/7/27 8:56, Jakub Kicinski 写道:
> On Fri, 26 Jul 2019 21:34:05 +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> move tc indirect block to flow_offload and rename
>> it to flow indirect block.The nf_tables can use the
>> indr block architecture.
>>
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>> index 00b9aab..66f89bc 100644
>> --- a/include/net/flow_offload.h
>> +++ b/include/net/flow_offload.h
>> @@ -4,6 +4,7 @@
>> #include <linux/kernel.h>
>> #include <linux/list.h>
>> #include <net/flow_dissector.h>
>> +#include <linux/rhashtable.h>
>>
>> struct flow_match {
>> struct flow_dissector *dissector;
>> @@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block)
>> INIT_LIST_HEAD(&flow_block->cb_list);
>> }
>>
>> +typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
>> + enum tc_setup_type type, void *type_data);
>> +
>> +struct flow_indr_block_cb {
>> + struct list_head list;
>> + void *cb_priv;
>> + flow_indr_block_bind_cb_t *cb;
>> + void *cb_ident;
>> +};
>> +
>> +typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
>> + struct flow_block *flow_block,
>> + struct flow_indr_block_cb *indr_block_cb,
>> + enum flow_block_command command);
>> +
>> +struct flow_indr_block_dev {
>> + struct rhash_head ht_node;
>> + struct net_device *dev;
>> + unsigned int refcnt;
>> + struct list_head cb_list;
>> + flow_indr_block_ing_cmd_t *ing_cmd_cb;
>> + struct flow_block *flow_block;
> TC can only have one block per device. Now with nftables offload we can
> have multiple blocks. Could you elaborate how this is solved?
>
>> +};
the nftable offload only work on netdev base chain. Each device can limit to one netdev base chain.
^ permalink raw reply
* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Sedat Dilek @ 2019-07-27 8:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Yonghong Song, Alexei Starovoitov, Daniel Borkmann, Martin Lau,
Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
Clang-Built-Linux ML, Kees Cook, Nick Desaulniers,
Nathan Chancellor
In-Reply-To: <CA+icZUXGPCgdJzxTO+8W0EzNLZEQ88J_wusp7fPfEkNE2RoXJA@mail.gmail.com>
On Sat, Jul 27, 2019 at 9:36 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Sat, Jul 27, 2019 at 4:24 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >
> > > On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
> > > >
> > > >
> > > >
> > > > On 7/26/19 2:02 PM, Sedat Dilek wrote:
> > > > > On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > > > >>
> > > > >> Hi Yonghong Song,
> > > > >>
> > > > >> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
> > > > >>>> Hi,
> > > > >>>>
> > > > >>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
> > > > >>>
> > > > >>> Glad to know clang 9 has asm goto support and now It can compile
> > > > >>> kernel again.
> > > > >>>
> > > > >>
> > > > >> Yupp.
> > > > >>
> > > > >>>>
> > > > >>>> I am seeing a problem in the area bpf/seccomp causing
> > > > >>>> systemd/journald/udevd services to fail.
> > > > >>>>
> > > > >>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
> > > > >>>> to connect stdout to the journal socket, ignoring: Connection refused
> > > > >>>>
> > > > >>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
> > > > >>>> BFD linker ld.bfd on Debian/buster AMD64.
> > > > >>>> In both cases I use clang-9 (prerelease).
> > > > >>>
> > > > >>> Looks like it is a lld bug.
> > > > >>>
> > > > >>> I see the stack trace has __bpf_prog_run32() which is used by
> > > > >>> kernel bpf interpreter. Could you try to enable bpf jit
> > > > >>> sysctl net.core.bpf_jit_enable = 1
> > > > >>> If this passed, it will prove it is interpreter related.
> > > > >>>
> > > > >>
> > > > >> After...
> > > > >>
> > > > >> sysctl -w net.core.bpf_jit_enable=1
> > > > >>
> > > > >> I can start all failed systemd services.
> > > > >>
> > > > >> systemd-journald.service
> > > > >> systemd-udevd.service
> > > > >> haveged.service
> > > > >>
> > > > >> This is in maintenance mode.
> > > > >>
> > > > >> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
> > > > >>
> > > > >
> > > > > This is what I did:
> > > >
> > > > I probably won't have cycles to debug this potential lld issue.
> > > > Maybe you already did, I suggest you put enough reproducible
> > > > details in the bug you filed against lld so they can take a look.
> > > >
> > >
> > > I understand and will put the journalctl-log into the CBL issue
> > > tracker and update informations.
> > >
> > > Thanks for your help understanding the BPF correlations.
> > >
> > > Is setting 'net.core.bpf_jit_enable = 2' helpful here?
> >
> > jit_enable=1 is enough.
> > Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
> >
> > It sounds like clang miscompiles interpreter.
Just to clarify:
This does not happen with clang-9 + ld.bfd (GNU/ld linker).
> > modprobe test_bpf
> > should be able to point out which part of interpreter is broken.
>
> Maybe we need something like...
>
> "bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()"
>
> ...for clang?
>
Not sure if something like GCC's...
-fgcse
Perform a global common subexpression elimination pass. This pass also
performs global constant and copy propagation.
Note: When compiling a program using computed gotos, a GCC extension,
you may get better run-time performance if you disable the global
common subexpression elimination pass by adding -fno-gcse to the
command line.
Enabled at levels -O2, -O3, -Os.
...is available for clang.
I tried with hopping to turn off "global common subexpression elimination":
diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
index 383c87300b0d..92f934a1e9ff 100644
--- a/arch/x86/net/Makefile
+++ b/arch/x86/net/Makefile
@@ -3,6 +3,8 @@
# Arch-specific network modules
#
+KBUILD_CFLAGS += -O0
+
ifeq ($(CONFIG_X86_32),y)
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o
else
Still see...
BROKEN: test_bpf: #294 BPF_MAXINSNS: Jump, gap, jump, ... jited:0
- Sedat -
^ permalink raw reply related
* Re: [PATCH] net: key: af_key: Fix possible null-pointer dereferences in pfkey_send_policy_notify()
From: Steffen Klassert @ 2019-07-27 8:32 UTC (permalink / raw)
To: Jeremy Sowden; +Cc: Jia-Ju Bai, herbert, davem, netdev, linux-kernel
In-Reply-To: <20190726201555.GA4745@azazel.net>
On Fri, Jul 26, 2019 at 09:15:55PM +0100, Jeremy Sowden wrote:
> On 2019-07-26, at 11:45:14 +0200, Steffen Klassert wrote:
> > On Wed, Jul 24, 2019 at 05:35:09PM +0800, Jia-Ju Bai wrote:
> > >
> > > diff --git a/net/key/af_key.c b/net/key/af_key.c
> > > index b67ed3a8486c..ced54144d5fd 100644
> > > --- a/net/key/af_key.c
> > > +++ b/net/key/af_key.c
> > > @@ -3087,6 +3087,8 @@ static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, const struc
> > > case XFRM_MSG_DELPOLICY:
> > > case XFRM_MSG_NEWPOLICY:
> > > case XFRM_MSG_UPDPOLICY:
> > > + if (!xp)
> > > + break;
> >
> > I think this can not happen. Who sends one of these notifications
> > without a pointer to the policy?
>
> I had a quick grep and found two places where km_policy_notify is passed
> NULL as the policy:
>
> $ grep -rn '\<km_policy_notify(NULL,' net/
> net/xfrm/xfrm_user.c:2154: km_policy_notify(NULL, 0, &c);
> net/key/af_key.c:2788: km_policy_notify(NULL, 0, &c);
>
> They occur in xfrm_flush_policy() and pfkey_spdflush() respectively.
Yes, but these two send a XFRM_MSG_FLUSHPOLICY notify.
This does not trigger the code that is changed here.
^ permalink raw reply
* Re: [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check
From: Jiri Pirko @ 2019-07-27 8:36 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: chrims, netdev, sthemmin, dsahern, alexanderk, mlxsw
In-Reply-To: <20190726124707.2c53d6a4@hermes.lan>
Fri, Jul 26, 2019 at 09:47:07PM CEST, stephen@networkplumber.org wrote:
>On Tue, 23 Jul 2019 13:25:37 +0200
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> One cannot depend on *argv being null in case of no arg is left on the
>> command line. For example in batch mode, this is not always true. Check
>> argc instead to prevent crash.
>>
>> Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
>> Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>> tc/m_action.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tc/m_action.c b/tc/m_action.c
>> index ab6bc0ad28ff..0f9c3a27795d 100644
>> --- a/tc/m_action.c
>> +++ b/tc/m_action.c
>> @@ -222,7 +222,7 @@ done0:
>> goto bad_val;
>> }
>>
>> - if (*argv && strcmp(*argv, "cookie") == 0) {
>> + if (argc && strcmp(*argv, "cookie") == 0) {
>> size_t slen;
>>
>> NEXT_ARG();
>
>
>The logic here is broken at end of file.
>
> do {
> if (getcmdline(&line_next, &len, stdin) == -1)
> lastline = true;
>
> largc_next = makeargs(line_next, largv_next, 100);
> bs_enabled_next = batchsize_enabled(largc_next, largv_next);
> if (bs_enabled) {
> struct batch_
>
>
>getcmdline() will return -1 at end of file.
>The code will call make_args on an uninitialized pointer.
>
>I see lots of other unnecessary complexity in the whole batch logic.
>It needs to be rewritten.
>
>Rather than me fixing the code, I am probably going to revert.
I agree. This is a mess :(
>
>commit 485d0c6001c4aa134b99c86913d6a7089b7b2ab0
>Author: Chris Mi <chrism@mellanox.com>
>Date: Fri Jan 12 14:13:16 2018 +0900
>
> tc: Add batchsize feature for filter and actions
^ permalink raw reply
* Re: [PATCH net-next 3/3] net: dsa: mt7530: Add support for port 5
From: Florian Fainelli @ 2019-07-27 8:42 UTC (permalink / raw)
To: René van Dorst, netdev
Cc: frank-w, sean.wang, linux, davem, matthias.bgg, andrew,
vivien.didelot, john, linux-mediatek, linux-mips, robh+dt,
devicetree
In-Reply-To: <20190724192549.24615-4-opensource@vdorst.com>
On 7/24/2019 9:25 PM, René van Dorst wrote:
> Adding support for port 5.
>
> Port 5 can muxed/interface to:
> - internal 5th GMAC of the switch; can be used as 2nd CPU port or as
> extra port with an external phy for a 6th ethernet port.
> - internal PHY of port 0 or 4; Used in most applications so that port 0
> or 4 is the WAN port and interfaces with the 2nd GMAC of the SOC.
>
> Signed-off-by: René van Dorst <opensource@vdorst.com>
[snip]
> + /* Setup port 5 */
> + priv->p5_intf_sel = P5_DISABLED;
> + interface = PHY_INTERFACE_MODE_NA;
> +
> + if (!dsa_is_unused_port(ds, 5)) {
> + priv->p5_intf_sel = P5_INTF_SEL_GMAC5;
> + interface = of_get_phy_mode(ds->ports[5].dn);
> + } else {
> + /* Scan the ethernet nodes. Look for GMAC1, Lookup used phy */
> + for_each_child_of_node(dn, mac_np) {
> + if (!of_device_is_compatible(mac_np,
> + "mediatek,eth-mac"))
> + continue;
> + _id = of_get_property(mac_np, "reg", NULL);
> + if (be32_to_cpup(_id) != 1)
> + continue;
> +
> + interface = of_get_phy_mode(mac_np);
> + phy_node = of_parse_phandle(mac_np, "phy-handle", 0);
> +
> + if (phy_node->parent == priv->dev->of_node->parent) {
> + _id = of_get_property(phy_node, "reg", NULL);
> + id = be32_to_cpup(_id);
> + if (id == 0)
> + priv->p5_intf_sel = P5_INTF_SEL_PHY_P0;
> + if (id == 4)
> + priv->p5_intf_sel = P5_INTF_SEL_PHY_P4;
Can you use of_mdio_parse_addr() here?
--
Florian
^ permalink raw reply
* Re: [PATCH] isdn/gigaset: check endpoint null in gigaset_probe
From: Paul Bolle @ 2019-07-27 9:36 UTC (permalink / raw)
To: Phong Tran, isdn, gregkh
Cc: gigaset307x-common, netdev, linux-kernel, linux-kernel-mentees,
syzbot+35b1c403a14f5c89eba7
In-Reply-To: <24cd0b70-45e6-ea98-fc8f-b25fbf6e817f@gmail.com>
Hi Phong,
Phong Tran schreef op za 27-07-2019 om 08:56 [+0700]:
> On 7/26/19 9:22 PM, Paul Bolle wrote:
> > Phong Tran schreef op vr 26-07-2019 om 20:35 [+0700]:
> > > diff --git a/drivers/isdn/gigaset/usb-gigaset.c b/drivers/isdn/gigaset/usb-gigaset.c
> > > index 1b9b43659bdf..2e011f3db59e 100644
> > > --- a/drivers/isdn/gigaset/usb-gigaset.c
> > > +++ b/drivers/isdn/gigaset/usb-gigaset.c
> > > @@ -703,6 +703,10 @@ static int gigaset_probe(struct usb_interface *interface,
> > > usb_set_intfdata(interface, cs);
> > >
> > > endpoint = &hostif->endpoint[0].desc;
> > > + if (!endpoint) {
> > > + dev_err(cs->dev, "Couldn't get control endpoint\n");
> > > + return -ENODEV;
> > > + }
> >
> > When can this happen? Is this one of those bugs that one can only trigger with
> > a specially crafted (evil) usb device?
> >
>
> Yes, in my understanding, this only happens with random test of syzbot.
Looking at this again, I note the code is taking the address of a struct
usb_endpoint_descriptor that's stored somewhere in memory. That address can't
be NULL, can it?
So I haven't even looked at the fuzzer's report here, but I don't see how this
patch could help. It only adds dead code. Am I missing something and should I
drink even more coffee this Saturday morning?
> > > buffer_size = le16_to_cpu(endpoint->wMaxPacketSize);
> > > ucs->bulk_out_size = buffer_size;
> > > @@ -722,6 +726,11 @@ static int gigaset_probe(struct usb_interface *interface,
> > >
> > Please note that I'm very close to getting cut off from the ISDN network, so
> > the chances of being able to testi this on a live system are getting small.
> >
>
> This bug can be invalid now. Do you agree?
It's just that your patch arrived while I was busy doing my last ever test of
the gigaset driver. So please don't expect me to put much time in this report
(see
https://lwn.net/ml/linux-kernel/20190726220541.28783-1-pebolle%40tiscali.nl/
).
Thanks,
Paul Bolle
^ permalink raw reply
* [PATCH net] net: phylink: Fix flow control for fixed-link
From: René van Dorst @ 2019-07-27 9:40 UTC (permalink / raw)
To: Russell King, Andrew Lunn, Florian Fainelli, Heiner Kallweit,
David S . Miller
Cc: netdev, René van Dorst
In phylink_parse_fixedlink() the pl->link_config.advertising bits are AND
with pl->supported, pl->supported is zeroed and only the speed/duplex
modes and MII bits are set.
So pl->link_config.advertising always loses the flow control/pause bits.
By setting Pause and Asym_Pause bits in pl->supported, the flow control
work again when devicetree "pause" is set in fixes-link node and the MAC
advertise that is supports pause.
Results with this patch.
Legend:
- DT = 'Pause' is set in the fixed-link in devicetree.
- validate() = ‘Yes’ means phylink_set(mask, Pause) is set in the
validate().
- flow = results reported my link is Up line.
+-----+------------+-------+
| DT | validate() | flow |
+-----+------------+-------+
| Yes | Yes | rx/tx |
| No | Yes | off |
| Yes | No | off |
+-----+------------+-------+
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: René van Dorst <opensource@vdorst.com>
---
drivers/net/phy/phylink.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 5d0af041b8f9..a6aebaa14338 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -216,6 +216,8 @@ static int phylink_parse_fixedlink(struct phylink *pl,
pl->supported, true);
linkmode_zero(pl->supported);
phylink_set(pl->supported, MII);
+ phylink_set(pl->supported, Pause);
+ phylink_set(pl->supported, Asym_Pause);
if (s) {
__set_bit(s->bit, pl->supported);
} else {
--
2.20.1
^ permalink raw reply related
* [patch net-next 0/3] net: devlink: Finish network namespace support
From: Jiri Pirko @ 2019-07-27 9:44 UTC (permalink / raw)
To: netdev; +Cc: davem, jakub.kicinski, sthemmin, dsahern, mlxsw
From: Jiri Pirko <jiri@mellanox.com>
Devlink from the beginning counts with network namespaces, but the
instances has been fixed to init_net. The first patch allows user
to move existing devlink instances into namespaces:
$ devlink dev
netdevsim/netdevsim1
$ ip netns add ns1
$ devlink dev set netdevsim/netdevsim1 netns ns1
$ devlink -N ns1 dev
netdevsim/netdevsim1
The last patch allows user to create new netdevsim instance directly
inside network namespace of a caller.
Jiri Pirko (3):
net: devlink: allow to change namespaces
net: devlink: export devlink net set/get helpers
netdevsim: create devlink and netdev instances in namespace
drivers/net/netdevsim/bus.c | 1 +
drivers/net/netdevsim/dev.c | 17 ++--
drivers/net/netdevsim/netdev.c | 4 +-
drivers/net/netdevsim/netdevsim.h | 5 +-
include/net/devlink.h | 3 +
include/uapi/linux/devlink.h | 4 +
net/core/devlink.c | 128 ++++++++++++++++++++++++++++--
7 files changed, 148 insertions(+), 14 deletions(-)
--
2.21.0
^ permalink raw reply
* [patch net-next 1/3] net: devlink: allow to change namespaces
From: Jiri Pirko @ 2019-07-27 9:44 UTC (permalink / raw)
To: netdev; +Cc: davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727094459.26345-1-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
All devlink instances are created in init_net and stay there for a
lifetime. Allow user to be able to move devlink instances into
namespaces.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
include/uapi/linux/devlink.h | 4 ++
net/core/devlink.c | 112 ++++++++++++++++++++++++++++++++++-
2 files changed, 113 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index ffc993256527..95f0a1edab99 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -348,6 +348,10 @@ enum devlink_attr {
DEVLINK_ATTR_PORT_PCI_PF_NUMBER, /* u16 */
DEVLINK_ATTR_PORT_PCI_VF_NUMBER, /* u16 */
+ DEVLINK_ATTR_NETNS_FD, /* u32 */
+ DEVLINK_ATTR_NETNS_PID, /* u32 */
+ DEVLINK_ATTR_NETNS_ID, /* u32 */
+
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 4f40aeace902..ec024462e7d4 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -439,8 +439,16 @@ static void devlink_nl_post_doit(const struct genl_ops *ops,
{
struct devlink *devlink;
- devlink = devlink_get_from_info(info);
- if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
+ /* When devlink changes netns, it would not be found
+ * by devlink_get_from_info(). So try if it is stored first.
+ */
+ if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK) {
+ devlink = info->user_ptr[0];
+ } else {
+ devlink = devlink_get_from_info(info);
+ WARN_ON(IS_ERR(devlink));
+ }
+ if (!IS_ERR(devlink) && ~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
mutex_unlock(&devlink->lock);
mutex_unlock(&devlink_mutex);
}
@@ -645,6 +653,70 @@ static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info)
return genlmsg_reply(msg, info);
}
+static struct net *devlink_netns_get(struct sk_buff *skb,
+ struct devlink *devlink,
+ struct genl_info *info)
+{
+ struct nlattr *netns_pid_attr = info->attrs[DEVLINK_ATTR_NETNS_PID];
+ struct nlattr *netns_fd_attr = info->attrs[DEVLINK_ATTR_NETNS_FD];
+ struct nlattr *netns_id_attr = info->attrs[DEVLINK_ATTR_NETNS_ID];
+ struct net *net;
+
+ if ((netns_pid_attr && (netns_fd_attr || netns_id_attr)) ||
+ (netns_fd_attr && (netns_pid_attr || netns_id_attr)) ||
+ (netns_id_attr && (netns_pid_attr || netns_fd_attr))) {
+ NL_SET_ERR_MSG(info->extack, "multiple netns identifying attributes specified");
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (netns_pid_attr) {
+ net = get_net_ns_by_pid(nla_get_u32(netns_pid_attr));
+ } else if (netns_fd_attr) {
+ net = get_net_ns_by_fd(nla_get_u32(netns_fd_attr));
+ } else if (netns_id_attr) {
+ net = get_net_ns_by_id(sock_net(skb->sk),
+ nla_get_u32(netns_id_attr));
+ if (!net)
+ net = ERR_PTR(-EINVAL);
+ }
+ if (IS_ERR(net)) {
+ NL_SET_ERR_MSG(info->extack, "Unknown network namespace");
+ return ERR_PTR(-EINVAL);
+ }
+ if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
+ put_net(net);
+ return ERR_PTR(-EPERM);
+ }
+ return net;
+}
+
+static void devlink_netns_change(struct devlink *devlink, struct net *net)
+{
+ if (net_eq(devlink_net(devlink), net))
+ return;
+ devlink_notify(devlink, DEVLINK_CMD_DEL);
+ devlink_net_set(devlink, net);
+ devlink_notify(devlink, DEVLINK_CMD_NEW);
+}
+
+static int devlink_nl_cmd_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+
+ if (info->attrs[DEVLINK_ATTR_NETNS_PID] ||
+ info->attrs[DEVLINK_ATTR_NETNS_FD] ||
+ info->attrs[DEVLINK_ATTR_NETNS_ID]) {
+ struct net *net;
+
+ net = devlink_netns_get(skb, devlink, info);
+ if (IS_ERR(net))
+ return PTR_ERR(net);
+ devlink_netns_change(devlink, net);
+ put_net(net);
+ }
+ return 0;
+}
+
static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg,
struct netlink_callback *cb)
{
@@ -5184,6 +5256,9 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING },
+ [DEVLINK_ATTR_NETNS_PID] = { .type = NLA_U32 },
+ [DEVLINK_ATTR_NETNS_FD] = { .type = NLA_U32 },
+ [DEVLINK_ATTR_NETNS_ID] = { .type = NLA_U32 },
};
static const struct genl_ops devlink_nl_ops[] = {
@@ -5195,6 +5270,13 @@ static const struct genl_ops devlink_nl_ops[] = {
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
+ {
+ .cmd = DEVLINK_CMD_SET,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .doit = devlink_nl_cmd_set_doit,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ },
{
.cmd = DEVLINK_CMD_PORT_GET,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
@@ -6955,9 +7037,33 @@ int devlink_compat_switch_id_get(struct net_device *dev,
return 0;
}
+static void __net_exit devlink_pernet_exit(struct net *net)
+{
+ struct devlink *devlink;
+
+ mutex_lock(&devlink_mutex);
+ list_for_each_entry(devlink, &devlink_list, list)
+ if (net_eq(devlink_net(devlink), net))
+ devlink_netns_change(devlink, &init_net);
+ mutex_unlock(&devlink_mutex);
+}
+
+static struct pernet_operations __net_initdata devlink_pernet_ops = {
+ .exit = devlink_pernet_exit,
+};
+
static int __init devlink_init(void)
{
- return genl_register_family(&devlink_nl_family);
+ int err;
+
+ err = genl_register_family(&devlink_nl_family);
+ if (err)
+ goto out;
+ err = register_pernet_device(&devlink_pernet_ops);
+
+out:
+ WARN_ON(err);
+ return err;
}
subsys_initcall(devlink_init);
--
2.21.0
^ permalink raw reply related
* [patch net-next 2/3] net: devlink: export devlink net set/get helpers
From: Jiri Pirko @ 2019-07-27 9:44 UTC (permalink / raw)
To: netdev; +Cc: davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727094459.26345-1-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
Allow drivers to set/get net struct for devlink instance. Set is only
allowed for newly allocated devlink instance.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
include/net/devlink.h | 3 +++
net/core/devlink.c | 18 ++++++++++++++----
2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index bc36f942a7d5..98b89eabd73a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -35,6 +35,7 @@ struct devlink {
struct device *dev;
possible_net_t _net;
struct mutex lock;
+ bool registered;
char priv[0] __aligned(NETDEV_ALIGN);
};
@@ -591,6 +592,8 @@ static inline struct devlink *netdev_to_devlink(struct net_device *dev)
struct ib_device;
+struct net *devlink_net(const struct devlink *devlink);
+void devlink_net_set(struct devlink *devlink, struct net *net);
struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
int devlink_register(struct devlink *devlink, struct device *dev);
void devlink_unregister(struct devlink *devlink);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index ec024462e7d4..ad57058ed0d5 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -92,16 +92,25 @@ static LIST_HEAD(devlink_list);
*/
static DEFINE_MUTEX(devlink_mutex);
-static struct net *devlink_net(const struct devlink *devlink)
+struct net *devlink_net(const struct devlink *devlink)
{
return read_pnet(&devlink->_net);
}
+EXPORT_SYMBOL_GPL(devlink_net);
-static void devlink_net_set(struct devlink *devlink, struct net *net)
+static void __devlink_net_set(struct devlink *devlink, struct net *net)
{
write_pnet(&devlink->_net, net);
}
+void devlink_net_set(struct devlink *devlink, struct net *net)
+{
+ if (WARN_ON(devlink->registered))
+ return;
+ __devlink_net_set(devlink, net);
+}
+EXPORT_SYMBOL_GPL(devlink_net_set);
+
static struct devlink *devlink_get_from_attrs(struct net *net,
struct nlattr **attrs)
{
@@ -695,7 +704,7 @@ static void devlink_netns_change(struct devlink *devlink, struct net *net)
if (net_eq(devlink_net(devlink), net))
return;
devlink_notify(devlink, DEVLINK_CMD_DEL);
- devlink_net_set(devlink, net);
+ __devlink_net_set(devlink, net);
devlink_notify(devlink, DEVLINK_CMD_NEW);
}
@@ -5602,7 +5611,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
if (!devlink)
return NULL;
devlink->ops = ops;
- devlink_net_set(devlink, &init_net);
+ __devlink_net_set(devlink, &init_net);
INIT_LIST_HEAD(&devlink->port_list);
INIT_LIST_HEAD(&devlink->sb_list);
INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
@@ -5626,6 +5635,7 @@ int devlink_register(struct devlink *devlink, struct device *dev)
{
mutex_lock(&devlink_mutex);
devlink->dev = dev;
+ devlink->registered = true;
list_add_tail(&devlink->list, &devlink_list);
devlink_notify(devlink, DEVLINK_CMD_NEW);
mutex_unlock(&devlink_mutex);
--
2.21.0
^ permalink raw reply related
* [patch net-next 3/3] netdevsim: create devlink and netdev instances in namespace
From: Jiri Pirko @ 2019-07-27 9:44 UTC (permalink / raw)
To: netdev; +Cc: davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727094459.26345-1-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
When user does create new netdevsim instance using sysfs bus file,
create the devlink instance and related netdev instance in the namespace
of the caller.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
drivers/net/netdevsim/bus.c | 1 +
drivers/net/netdevsim/dev.c | 17 +++++++++++------
drivers/net/netdevsim/netdev.c | 4 +++-
drivers/net/netdevsim/netdevsim.h | 5 ++++-
4 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
index 1a0ff3d7747b..6aeed0c600f8 100644
--- a/drivers/net/netdevsim/bus.c
+++ b/drivers/net/netdevsim/bus.c
@@ -283,6 +283,7 @@ nsim_bus_dev_new(unsigned int id, unsigned int port_count)
nsim_bus_dev->dev.bus = &nsim_bus;
nsim_bus_dev->dev.type = &nsim_bus_dev_type;
nsim_bus_dev->port_count = port_count;
+ nsim_bus_dev->initial_net = current->nsproxy->net_ns;
err = device_register(&nsim_bus_dev->dev);
if (err)
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index c5c417a3c0ce..685dd21f5500 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -268,7 +268,8 @@ static const struct devlink_ops nsim_dev_devlink_ops = {
};
static struct nsim_dev *
-nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
+nsim_dev_create(struct net *net, struct nsim_bus_dev *nsim_bus_dev,
+ unsigned int port_count)
{
struct nsim_dev *nsim_dev;
struct devlink *devlink;
@@ -277,6 +278,7 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
devlink = devlink_alloc(&nsim_dev_devlink_ops, sizeof(*nsim_dev));
if (!devlink)
return ERR_PTR(-ENOMEM);
+ devlink_net_set(devlink, net);
nsim_dev = devlink_priv(devlink);
nsim_dev->nsim_bus_dev = nsim_bus_dev;
nsim_dev->switch_id.id_len = sizeof(nsim_dev->switch_id.id);
@@ -335,7 +337,7 @@ static void nsim_dev_destroy(struct nsim_dev *nsim_dev)
devlink_free(devlink);
}
-static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
+static int __nsim_dev_port_add(struct net *net, struct nsim_dev *nsim_dev,
unsigned int port_index)
{
struct nsim_dev_port *nsim_dev_port;
@@ -361,7 +363,7 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
if (err)
goto err_dl_port_unregister;
- nsim_dev_port->ns = nsim_create(nsim_dev, nsim_dev_port);
+ nsim_dev_port->ns = nsim_create(net, nsim_dev, nsim_dev_port);
if (IS_ERR(nsim_dev_port->ns)) {
err = PTR_ERR(nsim_dev_port->ns);
goto err_port_debugfs_exit;
@@ -404,17 +406,19 @@ static void nsim_dev_port_del_all(struct nsim_dev *nsim_dev)
int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
{
+ struct net *initial_net = nsim_bus_dev->initial_net;
struct nsim_dev *nsim_dev;
int i;
int err;
- nsim_dev = nsim_dev_create(nsim_bus_dev, nsim_bus_dev->port_count);
+ nsim_dev = nsim_dev_create(initial_net, nsim_bus_dev,
+ nsim_bus_dev->port_count);
if (IS_ERR(nsim_dev))
return PTR_ERR(nsim_dev);
dev_set_drvdata(&nsim_bus_dev->dev, nsim_dev);
for (i = 0; i < nsim_bus_dev->port_count; i++) {
- err = __nsim_dev_port_add(nsim_dev, i);
+ err = __nsim_dev_port_add(initial_net, nsim_dev, i);
if (err)
goto err_port_del_all;
}
@@ -449,13 +453,14 @@ int nsim_dev_port_add(struct nsim_bus_dev *nsim_bus_dev,
unsigned int port_index)
{
struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
+ struct net *net = devlink_net(priv_to_devlink(nsim_dev));
int err;
mutex_lock(&nsim_dev->port_list_lock);
if (__nsim_dev_port_lookup(nsim_dev, port_index))
err = -EEXIST;
else
- err = __nsim_dev_port_add(nsim_dev, port_index);
+ err = __nsim_dev_port_add(net, nsim_dev, port_index);
mutex_unlock(&nsim_dev->port_list_lock);
return err;
}
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 0740940f41b1..25c7de7a4a31 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -280,7 +280,8 @@ static void nsim_setup(struct net_device *dev)
}
struct netdevsim *
-nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port)
+nsim_create(struct net *net, struct nsim_dev *nsim_dev,
+ struct nsim_dev_port *nsim_dev_port)
{
struct net_device *dev;
struct netdevsim *ns;
@@ -290,6 +291,7 @@ nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port)
if (!dev)
return ERR_PTR(-ENOMEM);
+ dev_net_set(dev, net);
ns = netdev_priv(dev);
ns->netdev = dev;
ns->nsim_dev = nsim_dev;
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 79c05af2a7c0..cdf53d0e0c49 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -19,6 +19,7 @@
#include <linux/netdevice.h>
#include <linux/u64_stats_sync.h>
#include <net/devlink.h>
+#include <net/net_namespace.h>
#include <net/xdp.h>
#define DRV_NAME "netdevsim"
@@ -75,7 +76,8 @@ struct netdevsim {
};
struct netdevsim *
-nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port);
+nsim_create(struct net *net, struct nsim_dev *nsim_dev,
+ struct nsim_dev_port *nsim_dev_port);
void nsim_destroy(struct netdevsim *ns);
#ifdef CONFIG_BPF_SYSCALL
@@ -213,6 +215,7 @@ struct nsim_bus_dev {
struct device dev;
struct list_head list;
unsigned int port_count;
+ struct net *initial_net;
unsigned int num_vfs;
struct nsim_vf_config *vfconfigs;
};
--
2.21.0
^ permalink raw reply related
* Re: KASAN: use-after-free Read in lock_sock_nested
From: syzbot @ 2019-07-27 9:45 UTC (permalink / raw)
To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs
In-Reply-To: <0000000000007a5aad057e7748c9@google.com>
syzbot has found a reproducer for the following crash on:
HEAD commit: 3ea54d9b Merge tag 'docs-5.3-1' of git://git.lwn.net/linux
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16a66564600000
kernel config: https://syzkaller.appspot.com/x/.config?x=195ab3ca46c2e324
dashboard link: https://syzkaller.appspot.com/bug?extid=500c69d1e21d970e461b
compiler: clang version 9.0.0 (/home/glider/llvm/clang
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=145318b4600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14ac7b78600000
Bisection is inconclusive: the bug happens on the oldest tested release.
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11c610a7200000
final crash: https://syzkaller.appspot.com/x/report.txt?x=13c610a7200000
console output: https://syzkaller.appspot.com/x/log.txt?x=15c610a7200000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+500c69d1e21d970e461b@syzkaller.appspotmail.com
==================================================================
BUG: KASAN: use-after-free in debug_spin_lock_before
kernel/locking/spinlock_debug.c:83 [inline]
BUG: KASAN: use-after-free in do_raw_spin_lock+0x295/0x3a0
kernel/locking/spinlock_debug.c:112
Read of size 4 at addr ffff88809f0acf0c by task syz-executor847/10804
CPU: 0 PID: 10804 Comm: syz-executor847 Not tainted 5.3.0-rc1+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
print_address_description+0x75/0x5b0 mm/kasan/report.c:351
__kasan_report+0x14b/0x1c0 mm/kasan/report.c:482
kasan_report+0x26/0x50 mm/kasan/common.c:612
__asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
do_raw_spin_lock+0x295/0x3a0 kernel/locking/spinlock_debug.c:112
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
_raw_spin_lock_bh+0x40/0x50 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:343 [inline]
lock_sock_nested+0x45/0x120 net/core/sock.c:2917
lock_sock include/net/sock.h:1522 [inline]
nr_getname+0x5b/0x220 net/netrom/af_netrom.c:838
__sys_accept4+0x63a/0x9a0 net/socket.c:1759
__do_sys_accept4 net/socket.c:1789 [inline]
__se_sys_accept4 net/socket.c:1786 [inline]
__x64_sys_accept4+0x9a/0xb0 net/socket.c:1786
do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4480e9
Code: e8 ac e7 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 4b 06 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f43bf6ced88 EFLAGS: 00000246 ORIG_RAX: 0000000000000120
RAX: ffffffffffffffda RBX: 00000000006ddc38 RCX: 00000000004480e9
RDX: 0000000000000000 RSI: 0000000020000b00 RDI: 0000000000000004
RBP: 00000000006ddc30 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006ddc3c
R13: 00007ffd18de174f R14: 00007f43bf6cf9c0 R15: 00000000006ddc3c
Allocated by task 0:
save_stack mm/kasan/common.c:69 [inline]
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:487
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:501
__do_kmalloc mm/slab.c:3655 [inline]
__kmalloc+0x254/0x340 mm/slab.c:3664
kmalloc include/linux/slab.h:557 [inline]
sk_prot_alloc+0xb0/0x290 net/core/sock.c:1603
sk_alloc+0x38/0x950 net/core/sock.c:1657
nr_make_new net/netrom/af_netrom.c:476 [inline]
nr_rx_frame+0xabc/0x1e40 net/netrom/af_netrom.c:959
nr_loopback_timer+0x6a/0x140 net/netrom/nr_loopback.c:59
call_timer_fn+0xec/0x200 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers+0x7cd/0x9c0 kernel/time/timer.c:1685
run_timer_softirq+0x4a/0x90 kernel/time/timer.c:1698
__do_softirq+0x333/0x7c4 arch/x86/include/asm/paravirt.h:778
Freed by task 10804:
save_stack mm/kasan/common.c:69 [inline]
set_track mm/kasan/common.c:77 [inline]
__kasan_slab_free+0x12a/0x1e0 mm/kasan/common.c:449
kasan_slab_free+0xe/0x10 mm/kasan/common.c:457
__cache_free mm/slab.c:3425 [inline]
kfree+0x115/0x200 mm/slab.c:3756
sk_prot_free net/core/sock.c:1640 [inline]
__sk_destruct+0x567/0x660 net/core/sock.c:1726
sk_destruct net/core/sock.c:1734 [inline]
__sk_free+0x317/0x3e0 net/core/sock.c:1745
sk_free net/core/sock.c:1756 [inline]
sock_put include/net/sock.h:1725 [inline]
sock_efree+0x60/0x80 net/core/sock.c:2042
skb_release_head_state+0x100/0x220 net/core/skbuff.c:652
skb_release_all net/core/skbuff.c:663 [inline]
__kfree_skb+0x25/0x170 net/core/skbuff.c:679
kfree_skb+0x6f/0xb0 net/core/skbuff.c:697
nr_accept+0x4ef/0x650 net/netrom/af_netrom.c:819
__sys_accept4+0x5bc/0x9a0 net/socket.c:1754
__do_sys_accept4 net/socket.c:1789 [inline]
__se_sys_accept4 net/socket.c:1786 [inline]
__x64_sys_accept4+0x9a/0xb0 net/socket.c:1786
do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff88809f0ace80
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 140 bytes inside of
2048-byte region [ffff88809f0ace80, ffff88809f0ad680)
The buggy address belongs to the page:
page:ffffea00027c2b00 refcount:1 mapcount:0 mapping:ffff8880aa400e00
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea0002704708 ffffea0002695508 ffff8880aa400e00
raw: 0000000000000000 ffff88809f0ac600 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88809f0ace00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88809f0ace80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88809f0acf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88809f0acf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88809f0ad000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
------------[ cut here ]------------
ODEBUG: activate not available (active state 0) object type: timer_list
hint: nr_t1timer_expiry+0x0/0x400 net/netrom/nr_timer.c:46
WARNING: CPU: 0 PID: 10804 at lib/debugobjects.c:484 debug_print_object
lib/debugobjects.c:481 [inline]
WARNING: CPU: 0 PID: 10804 at lib/debugobjects.c:484
debug_object_activate+0x33d/0x6f0 lib/debugobjects.c:680
Modules linked in:
CPU: 0 PID: 10804 Comm: syz-executor847 Tainted: G B
5.3.0-rc1+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:debug_print_object lib/debugobjects.c:481 [inline]
RIP: 0010:debug_object_activate+0x33d/0x6f0 lib/debugobjects.c:680
Code: f7 e8 f7 01 4a fe 4d 8b 06 48 c7 c7 ca 56 88 88 48 c7 c6 f0 2d a1 88
48 c7 c2 e3 69 81 88 31 c9 49 89 d9 31 c0 e8 63 6d e0 fd <0f> 0b 48 ba 00
00 00 00 00 fc ff df ff 05 65 92 95 05 49 83 c6 20
RSP: 0018:ffff88809633faa8 EFLAGS: 00010046
RAX: a65408733c6cb800 RBX: ffffffff86dc84e0 RCX: ffff8880a8b08440
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: ffff88809633faf0 R08: ffffffff816063f4 R09: ffffed1015d440c2
R10: ffffed1015d440c2 R11: 0000000000000000 R12: ffff8880a10bfd70
R13: 1ffff11014217fae R14: ffffffff88cd9fc0 R15: ffff88809f0ad358
FS: 00007f43bf6cf700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000008c7b1000 CR4: 00000000001406f0
Call Trace:
debug_timer_activate kernel/time/timer.c:710 [inline]
__mod_timer+0x960/0x16e0 kernel/time/timer.c:1035
mod_timer+0x1f/0x30 kernel/time/timer.c:1096
sk_reset_timer+0x22/0x50 net/core/sock.c:2821
nr_start_t1timer+0x78/0x90 net/netrom/nr_timer.c:52
nr_release+0x238/0x390 net/netrom/af_netrom.c:537
__sock_release net/socket.c:590 [inline]
sock_close+0xe1/0x260 net/socket.c:1268
__fput+0x2e4/0x740 fs/file_table.c:280
____fput+0x15/0x20 fs/file_table.c:313
task_work_run+0x17e/0x1b0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:163 [inline]
prepare_exit_to_usermode+0x459/0x580 arch/x86/entry/common.c:194
syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:274
do_syscall_64+0x126/0x140 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4480e9
Code: e8 ac e7 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 4b 06 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f43bf6ced88 EFLAGS: 00000246 ORIG_RAX: 0000000000000120
RAX: fffffffffffffff2 RBX: 00000000006ddc38 RCX: 00000000004480e9
RDX: 0000000000000000 RSI: 0000000020000b00 RDI: 0000000000000004
RBP: 00000000006ddc30 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006ddc3c
R13: 00007ffd18de174f R14: 00007f43bf6cf9c0 R15: 00000000006ddc3c
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff81484c09>]
copy_process+0x1589/0x5bc0 kernel/fork.c:1960
softirqs last enabled at (0): [<ffffffff81484c7f>]
copy_process+0x15ff/0x5bc0 kernel/fork.c:1963
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 41aab9a9be4009d5 ]---
^ permalink raw reply
* Re: KASAN: use-after-free Read in nr_release
From: syzbot @ 2019-07-27 9:55 UTC (permalink / raw)
To: davem, hdanton, linux-hams, linux-kernel, netdev, ralf,
syzkaller-bugs, xiyou.wangcong
In-Reply-To: <0000000000007e8b70058acbd60f@google.com>
syzbot has found a reproducer for the following crash on:
HEAD commit: 3ea54d9b Merge tag 'docs-5.3-1' of git://git.lwn.net/linux
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=140a9794600000
kernel config: https://syzkaller.appspot.com/x/.config?x=4809ada7c73f0407
dashboard link: https://syzkaller.appspot.com/bug?extid=6eaef7158b19e3fec3a0
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=156f25a2600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13fae0e8600000
The bug was bisected to:
commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date: Thu Jun 27 21:30:58 2019 +0000
netrom: fix a memory leak in nr_rx_frame()
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10a3bcd0600000
final crash: https://syzkaller.appspot.com/x/report.txt?x=12a3bcd0600000
console output: https://syzkaller.appspot.com/x/log.txt?x=14a3bcd0600000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")
==================================================================
BUG: KASAN: use-after-free in atomic_read
include/asm-generic/atomic-instrumented.h:26 [inline]
BUG: KASAN: use-after-free in refcount_inc_not_zero_checked+0x81/0x200
lib/refcount.c:123
Read of size 4 at addr ffff888093589300 by task syz-executor118/6083
CPU: 1 PID: 6083 Comm: syz-executor118 Not tainted 5.3.0-rc1+ #85
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
__kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
kasan_report+0x12/0x17 mm/kasan/common.c:612
check_memory_region_inline mm/kasan/generic.c:185 [inline]
check_memory_region+0x134/0x1a0 mm/kasan/generic.c:192
__kasan_check_read+0x11/0x20 mm/kasan/common.c:92
atomic_read include/asm-generic/atomic-instrumented.h:26 [inline]
refcount_inc_not_zero_checked+0x81/0x200 lib/refcount.c:123
refcount_inc_checked+0x17/0x70 lib/refcount.c:156
sock_hold include/net/sock.h:649 [inline]
nr_release+0x62/0x3e0 net/netrom/af_netrom.c:520
__sock_release+0xce/0x280 net/socket.c:590
sock_close+0x1e/0x30 net/socket.c:1268
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x406901
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 24 1a 00 00 c3 48
83 ec 08 e8 6a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48
89 c2 e8 b3 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffede643710 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000406901
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006dcc30 R08: 0000000120080522 R09: 0000000120080522
R10: 00007ffede643730 R11: 0000000000000293 R12: 00007ffede643760
R13: 0000000000000004 R14: 00000000006dcc3c R15: 0000000000000064
Allocated by task 0:
save_stack+0x23/0x90 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc mm/kasan/common.c:487 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:460
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:501
__do_kmalloc mm/slab.c:3655 [inline]
__kmalloc+0x163/0x770 mm/slab.c:3664
kmalloc include/linux/slab.h:557 [inline]
sk_prot_alloc+0x23a/0x310 net/core/sock.c:1603
sk_alloc+0x39/0xf70 net/core/sock.c:1657
nr_make_new net/netrom/af_netrom.c:476 [inline]
nr_rx_frame+0x733/0x1e73 net/netrom/af_netrom.c:959
nr_loopback_timer+0x7b/0x170 net/netrom/nr_loopback.c:59
call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x697/0x17a0 kernel/time/timer.c:1698
__do_softirq+0x262/0x98c kernel/softirq.c:292
Freed by task 6086:
save_stack+0x23/0x90 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:449
kasan_slab_free+0xe/0x10 mm/kasan/common.c:457
__cache_free mm/slab.c:3425 [inline]
kfree+0x10a/0x2c0 mm/slab.c:3756
sk_prot_free net/core/sock.c:1640 [inline]
__sk_destruct+0x4f7/0x6e0 net/core/sock.c:1726
sk_destruct+0x86/0xa0 net/core/sock.c:1734
__sk_free+0xfb/0x360 net/core/sock.c:1745
sk_free+0x42/0x50 net/core/sock.c:1756
sock_put include/net/sock.h:1725 [inline]
sock_efree+0x61/0x80 net/core/sock.c:2042
skb_release_head_state+0xeb/0x250 net/core/skbuff.c:652
skb_release_all+0x16/0x60 net/core/skbuff.c:663
__kfree_skb net/core/skbuff.c:679 [inline]
kfree_skb net/core/skbuff.c:697 [inline]
kfree_skb+0x101/0x3c0 net/core/skbuff.c:691
nr_accept+0x56e/0x700 net/netrom/af_netrom.c:819
__sys_accept4+0x34e/0x6a0 net/socket.c:1754
__do_sys_accept4 net/socket.c:1789 [inline]
__se_sys_accept4 net/socket.c:1786 [inline]
__x64_sys_accept4+0x97/0xf0 net/socket.c:1786
do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff888093589280
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 128 bytes inside of
2048-byte region [ffff888093589280, ffff888093589a80)
The buggy address belongs to the page:
page:ffffea00024d6200 refcount:1 mapcount:0 mapping:ffff8880aa400e00
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000228a508 ffffea00024cef08 ffff8880aa400e00
raw: 0000000000000000 ffff888093588180 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888093589200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888093589280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888093589300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888093589380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888093589400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
------------[ cut here ]------------
ODEBUG: activate not available (active state 0) object type: timer_list
hint: nr_t1timer_expiry+0x0/0x340 net/netrom/nr_timer.c:157
WARNING: CPU: 1 PID: 6083 at lib/debugobjects.c:481
debug_print_object+0x168/0x250 lib/debugobjects.c:481
Modules linked in:
CPU: 1 PID: 6083 Comm: syz-executor118 Tainted: G B
5.3.0-rc1+ #85
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:debug_print_object+0x168/0x250 lib/debugobjects.c:481
Code: dd e0 30 c6 87 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 b5 00 00 00 48
8b 14 dd e0 30 c6 87 48 c7 c7 e0 25 c6 87 e8 50 c3 05 fe <0f> 0b 83 05 33
4d 67 06 01 48 83 c4 20 5b 41 5c 41 5d 41 5e 5d c3
RSP: 0018:ffff888089fbfaf0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815c5bd6 RDI: ffffed10113f7f50
RBP: ffff888089fbfb30 R08: ffff8880924ea140 R09: fffffbfff134ac80
R10: fffffbfff134ac7f R11: ffffffff89a563ff R12: 0000000000000001
R13: ffffffff88db6460 R14: ffffffff8161fa40 R15: 1ffff110113f7f6c
FS: 0000555555b49880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f63ae9fde78 CR3: 000000009873d000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
debug_object_activate+0x2e5/0x470 lib/debugobjects.c:680
debug_timer_activate kernel/time/timer.c:710 [inline]
__mod_timer kernel/time/timer.c:1035 [inline]
mod_timer+0x452/0xc10 kernel/time/timer.c:1096
sk_reset_timer+0x24/0x60 net/core/sock.c:2821
nr_start_t1timer+0x6e/0xa0 net/netrom/nr_timer.c:52
nr_release+0x1de/0x3e0 net/netrom/af_netrom.c:537
__sock_release+0xce/0x280 net/socket.c:590
sock_close+0x1e/0x30 net/socket.c:1268
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x406901
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 24 1a 00 00 c3 48
83 ec 08 e8 6a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48
89 c2 e8 b3 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffede643710 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000406901
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006dcc30 R08: 0000000120080522 R09: 0000000120080522
R10: 00007ffede643730 R11: 0000000000000293 R12: 00007ffede643760
R13: 0000000000000004 R14: 00000000006dcc3c R15: 0000000000000064
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff81437c05>]
copy_process+0x1815/0x6b00 kernel/fork.c:1960
softirqs last enabled at (0): [<ffffffff81437cac>]
copy_process+0x18bc/0x6b00 kernel/fork.c:1963
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace f5b6f61236ba2f96 ]---
------------[ cut here ]------------
^ permalink raw reply
* [patch iproute2 1/2] devlink: introduce cmdline option to switch to a different namespace
From: Jiri Pirko @ 2019-07-27 10:05 UTC (permalink / raw)
To: netdev; +Cc: davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727094459.26345-1-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
devlink/devlink.c | 12 ++++++++++--
man/man8/devlink.8 | 4 ++++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/devlink/devlink.c b/devlink/devlink.c
index d8197ea3a478..9242cc05ad0c 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -32,6 +32,7 @@
#include "mnlg.h"
#include "json_writer.h"
#include "utils.h"
+#include "namespace.h"
#define ESWITCH_MODE_LEGACY "legacy"
#define ESWITCH_MODE_SWITCHDEV "switchdev"
@@ -6332,7 +6333,7 @@ static int cmd_health(struct dl *dl)
static void help(void)
{
pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
- " devlink [ -f[orce] ] -b[atch] filename\n"
+ " devlink [ -f[orce] ] -b[atch] filename -N[etns] netnsname\n"
"where OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n"
" OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n");
}
@@ -6478,6 +6479,7 @@ int main(int argc, char **argv)
{ "json", no_argument, NULL, 'j' },
{ "pretty", no_argument, NULL, 'p' },
{ "verbose", no_argument, NULL, 'v' },
+ { "Netns", required_argument, NULL, 'N' },
{ NULL, 0, NULL, 0 }
};
const char *batch_file = NULL;
@@ -6493,7 +6495,7 @@ int main(int argc, char **argv)
return EXIT_FAILURE;
}
- while ((opt = getopt_long(argc, argv, "Vfb:njpv",
+ while ((opt = getopt_long(argc, argv, "Vfb:njpvN:",
long_options, NULL)) >= 0) {
switch (opt) {
@@ -6519,6 +6521,12 @@ int main(int argc, char **argv)
case 'v':
dl->verbose = true;
break;
+ case 'N':
+ if (netns_switch(optarg)) {
+ ret = EXIT_FAILURE;
+ goto dl_free;
+ }
+ break;
default:
pr_err("Unknown option.\n");
help();
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 13d4dcd908b3..9fc9b034eefe 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -51,6 +51,10 @@ When combined with -j generate a pretty JSON output.
.BR "\-v" , " --verbose"
Turn on verbose output.
+.TP
+.BR "\-N", " \-Netns " <NETNSNAME>
+Switches to the specified network namespace.
+
.SS
.I OBJECT
--
2.21.0
^ permalink raw reply related
* [patch iproute2 2/2] devlink: add support for network namespace change
From: Jiri Pirko @ 2019-07-27 10:05 UTC (permalink / raw)
To: netdev; +Cc: davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727094459.26345-1-jiri@resnulli.us>
From: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
devlink/devlink.c | 54 +++++++++++++++++++++++++++++++++++-
include/uapi/linux/devlink.h | 4 +++
man/man8/devlink-dev.8 | 12 ++++++++
3 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/devlink/devlink.c b/devlink/devlink.c
index 9242cc05ad0c..a39bd8771d8b 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -235,6 +235,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
#define DL_OPT_HEALTH_REPORTER_NAME BIT(27)
#define DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD BIT(27)
#define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER BIT(28)
+#define DL_OPT_NETNS BIT(29)
struct dl_opts {
uint32_t present; /* flags of present items */
@@ -271,6 +272,8 @@ struct dl_opts {
const char *reporter_name;
uint64_t reporter_graceful_period;
bool reporter_auto_recover;
+ bool netns_is_pid;
+ uint32_t netns;
};
struct dl {
@@ -1331,6 +1334,22 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
if (err)
return err;
o_found |= DL_OPT_HEALTH_REPORTER_AUTO_RECOVER;
+ } else if (dl_argv_match(dl, "netns") &&
+ (o_all & DL_OPT_NETNS)) {
+ const char *netns_str;
+
+ dl_arg_inc(dl);
+ err = dl_argv_str(dl, &netns_str);
+ if (err)
+ return err;
+ opts->netns = netns_get_fd(netns_str);
+ if (opts->netns < 0) {
+ err = dl_argv_uint32_t(dl, &opts->netns);
+ if (err)
+ return err;
+ opts->netns_is_pid = true;
+ }
+ o_found |= DL_OPT_NETNS;
} else {
pr_err("Unknown option \"%s\"\n", dl_argv(dl));
return -EINVAL;
@@ -1444,7 +1463,11 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
if (opts->present & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER)
mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
opts->reporter_auto_recover);
-
+ if (opts->present & DL_OPT_NETNS)
+ mnl_attr_put_u32(nlh,
+ opts->netns_is_pid ? DEVLINK_ATTR_NETNS_PID :
+ DEVLINK_ATTR_NETNS_FD,
+ opts->netns);
}
static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -1499,6 +1522,7 @@ static bool dl_dump_filter(struct dl *dl, struct nlattr **tb)
static void cmd_dev_help(void)
{
pr_err("Usage: devlink dev show [ DEV ]\n");
+ pr_err(" devlink dev set DEV netns { PID | NAME | ID }\n");
pr_err(" devlink dev eswitch set DEV [ mode { legacy | switchdev } ]\n");
pr_err(" [ inline-mode { none | link | network | transport } ]\n");
pr_err(" [ encap { disable | enable } ]\n");
@@ -2551,6 +2575,31 @@ static int cmd_dev_show(struct dl *dl)
return err;
}
+static void cmd_dev_set_help(void)
+{
+ pr_err("Usage: devlink dev set DEV netns { PID | NAME | ID }\n");
+}
+
+static int cmd_dev_set(struct dl *dl)
+{
+ struct nlmsghdr *nlh;
+ int err;
+
+ if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+ cmd_dev_set_help();
+ return 0;
+ }
+
+ nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SET,
+ NLM_F_REQUEST | NLM_F_ACK);
+
+ err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE, DL_OPT_NETNS);
+ if (err)
+ return err;
+
+ return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
static void cmd_dev_reload_help(void)
{
pr_err("Usage: devlink dev reload [ DEV ]\n");
@@ -2747,6 +2796,9 @@ static int cmd_dev(struct dl *dl)
dl_argv_match(dl, "list") || dl_no_arg(dl)) {
dl_arg_inc(dl);
return cmd_dev_show(dl);
+ } else if (dl_argv_match(dl, "set")) {
+ dl_arg_inc(dl);
+ return cmd_dev_set(dl);
} else if (dl_argv_match(dl, "eswitch")) {
dl_arg_inc(dl);
return cmd_dev_eswitch(dl);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index fc195cbd66f4..bc1869993e20 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -348,6 +348,10 @@ enum devlink_attr {
DEVLINK_ATTR_PORT_PCI_PF_NUMBER, /* u16 */
DEVLINK_ATTR_PORT_PCI_VF_NUMBER, /* u16 */
+ DEVLINK_ATTR_NETNS_FD, /* u32 */
+ DEVLINK_ATTR_NETNS_PID, /* u32 */
+ DEVLINK_ATTR_NETNS_ID, /* u32 */
+
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
diff --git a/man/man8/devlink-dev.8 b/man/man8/devlink-dev.8
index 1804463b2321..0e1a5523fa7b 100644
--- a/man/man8/devlink-dev.8
+++ b/man/man8/devlink-dev.8
@@ -25,6 +25,13 @@ devlink-dev \- devlink device configuration
.ti -8
.B devlink dev help
+.ti -8
+.BR "devlink dev set"
+.IR DEV
+.RI "[ "
+.BI "netns { " PID " | " NAME " | " ID " }
+.RI "]"
+
.ti -8
.BR "devlink dev eswitch set"
.IR DEV
@@ -92,6 +99,11 @@ Format is:
.in +2
BUS_NAME/BUS_ADDRESS
+.SS devlink dev set - sets devlink device attributes
+
+.TP
+.BI "netns { " PID " | " NAME " | " ID " }
+
.SS devlink dev eswitch show - display devlink device eswitch attributes
.SS devlink dev eswitch set - sets devlink device eswitch attributes
--
2.21.0
^ permalink raw reply related
* Re: [patch iproute2 1/2] devlink: introduce cmdline option to switch to a different namespace
From: Toke Høiland-Jørgensen @ 2019-07-27 10:12 UTC (permalink / raw)
To: Jiri Pirko, netdev; +Cc: davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727100544.28649-1-jiri@resnulli.us>
Jiri Pirko <jiri@resnulli.us> writes:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
> devlink/devlink.c | 12 ++++++++++--
> man/man8/devlink.8 | 4 ++++
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/devlink/devlink.c b/devlink/devlink.c
> index d8197ea3a478..9242cc05ad0c 100644
> --- a/devlink/devlink.c
> +++ b/devlink/devlink.c
> @@ -32,6 +32,7 @@
> #include "mnlg.h"
> #include "json_writer.h"
> #include "utils.h"
> +#include "namespace.h"
>
> #define ESWITCH_MODE_LEGACY "legacy"
> #define ESWITCH_MODE_SWITCHDEV "switchdev"
> @@ -6332,7 +6333,7 @@ static int cmd_health(struct dl *dl)
> static void help(void)
> {
> pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
> - " devlink [ -f[orce] ] -b[atch] filename\n"
> + " devlink [ -f[orce] ] -b[atch] filename -N[etns]
> netnsname\n"
'ip' uses lower-case n for this; why not be consistent?
-Toke
^ permalink raw reply
* Re: memory leak in new_inode_pseudo (2)
From: syzbot @ 2019-07-27 10:16 UTC (permalink / raw)
To: axboe, axboe, catalin.marinas, davem, linux-block, linux-kernel,
linux-mm, michaelcallahan, netdev, syzkaller-bugs
In-Reply-To: <000000000000111cbe058dc7754d@google.com>
syzbot has bisected this bug to:
commit a21f2a3ec62abe2e06500d6550659a0ff5624fbb
Author: Michael Callahan <michaelcallahan@fb.com>
Date: Tue May 3 15:12:49 2016 +0000
block: Minor blk_account_io_start usage cleanup
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13565e92600000
start commit: be8454af Merge tag 'drm-next-2019-07-16' of git://anongit...
git tree: upstream
final crash: https://syzkaller.appspot.com/x/report.txt?x=10d65e92600000
console output: https://syzkaller.appspot.com/x/log.txt?x=17565e92600000
kernel config: https://syzkaller.appspot.com/x/.config?x=d23a1a7bf85c5250
dashboard link: https://syzkaller.appspot.com/bug?extid=e682cca30bc101a4d9d9
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=155c5800600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1738f800600000
Reported-by: syzbot+e682cca30bc101a4d9d9@syzkaller.appspotmail.com
Fixes: a21f2a3ec62a ("block: Minor blk_account_io_start usage cleanup")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply
* Re: [patch iproute2 1/2] devlink: introduce cmdline option to switch to a different namespace
From: Jiri Pirko @ 2019-07-27 10:21 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: netdev, davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <87ef2bwztr.fsf@toke.dk>
Sat, Jul 27, 2019 at 12:12:48PM CEST, toke@redhat.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>> devlink/devlink.c | 12 ++++++++++--
>> man/man8/devlink.8 | 4 ++++
>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/devlink/devlink.c b/devlink/devlink.c
>> index d8197ea3a478..9242cc05ad0c 100644
>> --- a/devlink/devlink.c
>> +++ b/devlink/devlink.c
>> @@ -32,6 +32,7 @@
>> #include "mnlg.h"
>> #include "json_writer.h"
>> #include "utils.h"
>> +#include "namespace.h"
>>
>> #define ESWITCH_MODE_LEGACY "legacy"
>> #define ESWITCH_MODE_SWITCHDEV "switchdev"
>> @@ -6332,7 +6333,7 @@ static int cmd_health(struct dl *dl)
>> static void help(void)
>> {
>> pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
>> - " devlink [ -f[orce] ] -b[atch] filename\n"
>> + " devlink [ -f[orce] ] -b[atch] filename -N[etns]
>> netnsname\n"
>
>'ip' uses lower-case n for this; why not be consistent?
Because "n" is taken :/
>
>-Toke
^ permalink raw reply
* Re: [patch iproute2 1/2] devlink: introduce cmdline option to switch to a different namespace
From: Toke Høiland-Jørgensen @ 2019-07-27 10:25 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, jakub.kicinski, sthemmin, dsahern, mlxsw
In-Reply-To: <20190727102116.GC2843@nanopsycho>
Jiri Pirko <jiri@resnulli.us> writes:
> Sat, Jul 27, 2019 at 12:12:48PM CEST, toke@redhat.com wrote:
>>Jiri Pirko <jiri@resnulli.us> writes:
>>
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>> devlink/devlink.c | 12 ++++++++++--
>>> man/man8/devlink.8 | 4 ++++
>>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/devlink/devlink.c b/devlink/devlink.c
>>> index d8197ea3a478..9242cc05ad0c 100644
>>> --- a/devlink/devlink.c
>>> +++ b/devlink/devlink.c
>>> @@ -32,6 +32,7 @@
>>> #include "mnlg.h"
>>> #include "json_writer.h"
>>> #include "utils.h"
>>> +#include "namespace.h"
>>>
>>> #define ESWITCH_MODE_LEGACY "legacy"
>>> #define ESWITCH_MODE_SWITCHDEV "switchdev"
>>> @@ -6332,7 +6333,7 @@ static int cmd_health(struct dl *dl)
>>> static void help(void)
>>> {
>>> pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
>>> - " devlink [ -f[orce] ] -b[atch] filename\n"
>>> + " devlink [ -f[orce] ] -b[atch] filename -N[etns]
>>> netnsname\n"
>>
>>'ip' uses lower-case n for this; why not be consistent?
>
> Because "n" is taken :/
Ah, right, that was right there on the line below in the patch context.
Oops, by bad (and too bad!)
-Toke
^ permalink raw reply
* [PATCH net] Revert ("r8169: remove 1000/Half from supported modes")
From: Heiner Kallweit @ 2019-07-27 10:32 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Bernhard Held
This reverts commit a6851c613fd7fccc5d1f28d5d8a0cbe9b0f4e8cc.
It was reported that RTL8111b successfully finishes 1000/Full autoneg
but no data flows. Reverting the original patch fixes the issue.
It seems to be a HW issue with the integrated RTL8211B PHY. This PHY
version used also e.g. on RTL8168d, so better revert the original patch.
Reported-by: Bernhard Held <berny156@gmx.de>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 6272115b2..a71dd669a 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6136,10 +6136,7 @@ static int r8169_phy_connect(struct rtl8169_private *tp)
if (ret)
return ret;
- if (tp->supports_gmii)
- phy_remove_link_mode(phydev,
- ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
- else
+ if (!tp->supports_gmii)
phy_set_max_speed(phydev, SPEED_100);
phy_support_asym_pause(phydev);
--
2.22.0
^ permalink raw reply related
* Re: [REGRESSION] 5.3-rc1: r8169: remove 1000/Half from supported modes
From: Heiner Kallweit @ 2019-07-27 10:27 UTC (permalink / raw)
To: Bernhard Held, linux-kernel, netdev; +Cc: David S. Miller
In-Reply-To: <9af99856-2e5d-0e3a-34d3-0582da869919@gmx.de>
On 26.07.2019 22:45, Bernhard Held wrote:
> On 26.07.19 at 22:24, Heiner Kallweit wrote:
>> On 26.07.2019 22:16, Bernhard Held wrote:
>>> Hi Heiner,
>>>
>>> with commit a6851c613fd7 "r8169: remove 1000/Half from supported modes" my RTL8111B GB-link stops working. It thinks that it established a link, however nothing is actually transmitted. Setting the mode with `mii-tool -F 100baseTx-HD` establishes a successful connection.
>>>
>> Can you provide standard ethtool output w/ and w/o this patch? Also a full dmesg output
>> with the patch would be helpful.
>> Is "100baseTx-HD" a typo and you mean GBit? And any special reason why you set half duplex?
>>
>
> The requested files are attached.
>
Looks all normal. So it seems to be a HW issue with the integrated PHY (RTL8211B).
This PHY version is used also e.g. in RTL8168d. So better revert the original change.
> mii-tool doesn't offer GBit settings. I used HD only while playing around, both FD and HD are working.
>
> Hope it helps!
> Bernhard
Heiner
^ permalink raw reply
* [PATCH net] r8169: don't use MSI before RTL8168d
From: Heiner Kallweit @ 2019-07-27 10:43 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Dušan Dragić
It was reported that after resuming from suspend network fails with
error "do_IRQ: 3.38 No irq handler for vector", see [0]. Enabling WoL
can work around the issue, but the only actual fix is to disable MSI.
So let's mimic the behavior of the vendor driver and disable MSI on
all chip versions before RTL8168d.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204079
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Reported-by: Dušan Dragić <dragic.dusan@gmail.com>
Tested-by: Dušan Dragić <dragic.dusan@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
This version of the fix applies from 5.3 only. I'll submit a separate
version for previous kernel versions.
---
drivers/net/ethernet/realtek/r8169_main.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index a71dd669a..e1dd6ea60 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6586,13 +6586,18 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
{
unsigned int flags;
- if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
+ switch (tp->mac_version) {
+ case RTL_GIGA_MAC_VER_02 ... RTL_GIGA_MAC_VER_06:
rtl_unlock_config_regs(tp);
RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
rtl_lock_config_regs(tp);
+ /* fall through */
+ case RTL_GIGA_MAC_VER_07 ... RTL_GIGA_MAC_VER_24:
flags = PCI_IRQ_LEGACY;
- } else {
+ break;
+ default:
flags = PCI_IRQ_ALL_TYPES;
+ break;
}
return pci_alloc_irq_vectors(tp->pci_dev, 1, 1, flags);
--
2.22.0
^ permalink raw reply related
* Re: [PATCH 4.4 stable net] net: tcp: Fix use-after-free in tcp_write_xmit
From: maowenan @ 2019-07-27 10:44 UTC (permalink / raw)
To: Greg KH; +Cc: stable, davem, netdev, linux-kernel
In-Reply-To: <a5965aac-7de2-3c3f-349d-8894ae1b897b@huawei.com>
On 2019/7/24 20:13, maowenan wrote:
>
>
> On 2019/7/24 19:05, Greg KH wrote:
>> On Wed, Jul 24, 2019 at 05:17:15PM +0800, Mao Wenan wrote:
>>> There is one report about tcp_write_xmit use-after-free with version 4.4.136:
>>>
>>> BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline]
>>> BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
>>> BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
>>> Read of size 2 at addr ffff8801d6fc87b0 by task syz-executor408/4195
>>>
>>> CPU: 0 PID: 4195 Comm: syz-executor408 Not tainted 4.4.136-gfb7e319 #59
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>> 0000000000000000 7d8f38ecc03be946 ffff8801d73b7710 ffffffff81e0edad
>>> ffffea00075bf200 ffff8801d6fc87b0 0000000000000000 ffff8801d6fc87b0
>>> dffffc0000000000 ffff8801d73b7748 ffffffff815159b6 ffff8801d6fc87b0
>>> Call Trace:
>>> [<ffffffff81e0edad>] __dump_stack lib/dump_stack.c:15 [inline]
>>> [<ffffffff81e0edad>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
>>> [<ffffffff815159b6>] print_address_description+0x6c/0x216 mm/kasan/report.c:252
>>> [<ffffffff81515cd5>] kasan_report_error mm/kasan/report.c:351 [inline]
>>> [<ffffffff81515cd5>] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408
>>> [<ffffffff814f9784>] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427
>>> [<ffffffff83286582>] tcp_skb_pcount include/net/tcp.h:796 [inline]
>>> [<ffffffff83286582>] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
>>> [<ffffffff83286582>] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
>>> [<ffffffff83287a40>] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307
>>> [<ffffffff8328e966>] tcp_send_fin+0x176/0xab0 net/ipv4/tcp_output.c:2883
>>> [<ffffffff8324c0d0>] tcp_close+0xca0/0xf70 net/ipv4/tcp.c:2112
>>> [<ffffffff832f8d0f>] inet_release+0xff/0x1d0 net/ipv4/af_inet.c:435
>>> [<ffffffff82f1a156>] sock_release+0x96/0x1c0 net/socket.c:586
>>> [<ffffffff82f1a296>] sock_close+0x16/0x20 net/socket.c:1037
>>> [<ffffffff81522da5>] __fput+0x235/0x6f0 fs/file_table.c:208
>>> [<ffffffff815232e5>] ____fput+0x15/0x20 fs/file_table.c:244
>>> [<ffffffff8118bd7f>] task_work_run+0x10f/0x190 kernel/task_work.c:115
>>> [<ffffffff81135285>] exit_task_work include/linux/task_work.h:21 [inline]
>>> [<ffffffff81135285>] do_exit+0x9e5/0x26b0 kernel/exit.c:759
>>> [<ffffffff8113b1d1>] do_group_exit+0x111/0x330 kernel/exit.c:889
>>> [<ffffffff8115e5cc>] get_signal+0x4ec/0x14b0 kernel/signal.c:2321
>>> [<ffffffff8100e02b>] do_signal+0x8b/0x1d30 arch/x86/kernel/signal.c:712
>>> [<ffffffff8100360a>] exit_to_usermode_loop+0x11a/0x160 arch/x86/entry/common.c:248
>>> [<ffffffff81006535>] prepare_exit_to_usermode arch/x86/entry/common.c:283 [inline]
>>> [<ffffffff81006535>] syscall_return_slowpath+0x1b5/0x1f0 arch/x86/entry/common.c:348
>>> [<ffffffff838c29b5>] int_ret_from_sys_call+0x25/0xa3
>>>
>>> Allocated by task 4194:
>>> [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
>>> [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
>>> [<ffffffff814f8b57>] set_track mm/kasan/kasan.c:524 [inline]
>>> [<ffffffff814f8b57>] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:616
>>> [<ffffffff814f9122>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:554
>>> [<ffffffff814f4c1e>] slab_post_alloc_hook mm/slub.c:1349 [inline]
>>> [<ffffffff814f4c1e>] slab_alloc_node mm/slub.c:2615 [inline]
>>> [<ffffffff814f4c1e>] slab_alloc mm/slub.c:2623 [inline]
>>> [<ffffffff814f4c1e>] kmem_cache_alloc+0xbe/0x2a0 mm/slub.c:2628
>>> [<ffffffff82f380a6>] kmem_cache_alloc_node include/linux/slab.h:350 [inline]
>>> [<ffffffff82f380a6>] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218
>>> [<ffffffff832466c3>] alloc_skb_fclone include/linux/skbuff.h:856 [inline]
>>> [<ffffffff832466c3>] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833
>>> [<ffffffff83249164>] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178
>>> [<ffffffff83300ef3>] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755
>>> [<ffffffff82f1e1fc>] sock_sendmsg_nosec net/socket.c:625 [inline]
>>> [<ffffffff82f1e1fc>] sock_sendmsg+0xcc/0x110 net/socket.c:635
>>> [<ffffffff82f1eedc>] SYSC_sendto+0x21c/0x370 net/socket.c:1665
>>> [<ffffffff82f21560>] SyS_sendto+0x40/0x50 net/socket.c:1633
>>> [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
>>>
>>> Freed by task 4194:
>>> [<ffffffff810341d6>] save_stack_trace+0x26/0x50 arch/x86/kernel/stacktrace.c:63
>>> [<ffffffff814f8873>] save_stack+0x43/0xd0 mm/kasan/kasan.c:512
>>> [<ffffffff814f91a2>] set_track mm/kasan/kasan.c:524 [inline]
>>> [<ffffffff814f91a2>] kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:589
>>> [<ffffffff814f632e>] slab_free_hook mm/slub.c:1383 [inline]
>>> [<ffffffff814f632e>] slab_free_freelist_hook mm/slub.c:1405 [inline]
>>> [<ffffffff814f632e>] slab_free mm/slub.c:2859 [inline]
>>> [<ffffffff814f632e>] kmem_cache_free+0xbe/0x340 mm/slub.c:2881
>>> [<ffffffff82f3527f>] kfree_skbmem+0xcf/0x100 net/core/skbuff.c:635
>>> [<ffffffff82f372fd>] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676
>>> [<ffffffff83288834>] sk_wmem_free_skb include/net/sock.h:1447 [inline]
>>> [<ffffffff83288834>] tcp_write_queue_purge include/net/tcp.h:1460 [inline]
>>> [<ffffffff83288834>] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline]
>>> [<ffffffff83288834>] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261
>>> [<ffffffff8329b991>] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246
>>> [<ffffffff832f9ca9>] __inet_stream_connect+0x2a9/0xc30 net/ipv4/af_inet.c:615
>>> [<ffffffff832fa685>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:676
>>> [<ffffffff82f1eb78>] SYSC_connect+0x1b8/0x300 net/socket.c:1557
>>> [<ffffffff82f214b4>] SyS_connect+0x24/0x30 net/socket.c:1538
>>> [<ffffffff838c2825>] entry_SYSCALL_64_fastpath+0x22/0x9e
>>>
>>> Syzkaller reproducer():
>>> r0 = socket$packet(0x11, 0x3, 0x300)
>>> r1 = socket$inet_tcp(0x2, 0x1, 0x0)
>>> bind$inet(r1, &(0x7f0000000300)={0x2, 0x4e21, @multicast1}, 0x10)
>>> connect$inet(r1, &(0x7f0000000140)={0x2, 0x1000004e21, @loopback}, 0x10)
>>> recvmmsg(r1, &(0x7f0000001e40)=[{{0x0, 0x0, &(0x7f0000000100)=[{&(0x7f00000005c0)=""/88, 0x58}], 0x1}}], 0x1, 0x40000000, 0x0)
>>> sendto$inet(r1, &(0x7f0000000000)="e2f7ad5b661c761edf", 0x9, 0x8080, 0x0, 0x0)
>>> r2 = fcntl$dupfd(r1, 0x0, r0)
>>> connect$unix(r2, &(0x7f00000001c0)=@file={0x0, './file0\x00'}, 0x6e)
>>>
>>> C repro link: https://syzkaller.appspot.com/text?tag=ReproC&x=14db474f800000
>>>
>>> This is because when tcp_connect_init call tcp_write_queue_purge, it will
>>> kfree all the skb in the write_queue, but the sk->sk_send_head forget to set NULL,
>>> then tcp_write_xmit try to send skb, which has freed in tcp_write_queue_purge, UAF happens.
>>>
>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>> ---
>>> include/net/tcp.h | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>>> index bf8a0dae977a..8f8aace28cf8 100644
>>> --- a/include/net/tcp.h
>>> +++ b/include/net/tcp.h
>>> @@ -1457,6 +1457,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
>>>
>>> while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
>>> sk_wmem_free_skb(sk, skb);
>>> + sk->sk_send_head = NULL;
>>> sk_mem_reclaim(sk);
>>> tcp_clear_all_retrans_hints(tcp_sk(sk));
>>> inet_csk(sk)->icsk_backoff = 0;
>>
>> Does this corrispond with a specific commit that is already in Linus's
>> tree? If not, why, did we change/mess something up when doing
>> backports, or is the code just that different?
>>
>> Also, is this needed in 4.9.y, 4.14.y, 4.19.y, and/or 5.2.y? Why just
>> 4.4.y?
Greg,
I have tested latest stable tree
4.4.186 oops
4.9.151 oops
4.14.106 NO oops
This patch can simple fix them.
>
> Is it the commit 75c119afe14f? It does not use sk_send_head to indicate whether it has skb to be sent.
>
> commit 75c119afe14f74b4dd967d75ed9f57ab6c0ef045
> Author: Eric Dumazet <edumazet@google.com>
> Date: Thu Oct 5 22:21:27 2017 -0700
>
> tcp: implement rb-tree based retransmit queue
>
>
> static inline struct sk_buff *tcp_send_head(const struct sock *sk)
> {
> - return sk->sk_send_head;
> + return skb_peek(&sk->sk_write_queue);
> }
>
>
>
>>
>> thanks,
>>
>> greg k-h
>>
>> .
>>
>
>
> .
>
^ permalink raw reply
* [PATCH net] r8169: don't use MSI before RTL8168d
From: Heiner Kallweit @ 2019-07-27 10:45 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Dušan Dragić
It was reported that after resuming from suspend network fails with
error "do_IRQ: 3.38 No irq handler for vector", see [0]. Enabling WoL
can work around the issue, but the only actual fix is to disable MSI.
So let's mimic the behavior of the vendor driver and disable MSI on
all chip versions before RTL8168d.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204079
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Reported-by: Dušan Dragić <dragic.dusan@gmail.com>
Tested-by: Dušan Dragić <dragic.dusan@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
This version of the fix applies on kernel versions up to 5.2.
---
drivers/net/ethernet/realtek/r8169.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 6d176be51..038a034ee 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7105,13 +7105,18 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
{
unsigned int flags;
- if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
+ switch (tp->mac_version) {
+ case RTL_GIGA_MAC_VER_02 ... RTL_GIGA_MAC_VER_06:
rtl_unlock_config_regs(tp);
RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
rtl_lock_config_regs(tp);
+ /* fall through */
+ case RTL_GIGA_MAC_VER_07 ... RTL_GIGA_MAC_VER_24:
flags = PCI_IRQ_LEGACY;
- } else {
+ break;
+ default:
flags = PCI_IRQ_ALL_TYPES;
+ break;
}
return pci_alloc_irq_vectors(tp->pci_dev, 1, 1, flags);
--
2.22.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox