* Re: [PATCH v4 net-next 1/6] lwt: Add net to build_state argument
From: Roopa Prabhu @ 2017-12-19 15:47 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, netdev, rohit
In-Reply-To: <20171215182800.10248-2-tom@quantonium.net>
On Fri, Dec 15, 2017 at 10:27 AM, Tom Herbert <tom@quantonium.net> wrote:
> Users of LWT need to know net if they want to have per net operations
> in LWT.
>
> Signed-off-by: Tom Herbert <tom@quantonium.net>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
^ permalink raw reply
* Re: [PATCH] net: arc_emac: fix arc_emac_rx() error paths
From: Alexander Kochetkov @ 2017-12-19 15:49 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel, f.fainelli, edumazet
In-Reply-To: <20171219.102254.1959679351283055093.davem@davemloft.net>
> 19 дек. 2017 г., в 18:22, David Miller <davem@davemloft.net> написал(а):
>
> From: Alexander Kochetkov <al.kochet@gmail.com>
> Date: Fri, 15 Dec 2017 20:20:06 +0300
>
>> arc_emac_rx() has some issues found by code review.
>>
>> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
>> rx fifo entry will not be returned to EMAC.
>>
>> In case dma_map_single() failure previously allocated skb became
>> lost to driver. At the same time address of newly allocated skb
>> will not be provided to EMAC.
>>
>> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
>
> This patch adds quite a few bugs, here is one which shows this is not
> functionally tested:
May be I don’t understand correctly, but I don’t see bug here.
Wrong dma mapping usage should immediately manifest itself (kernel
instability, koops and so on). The patch was tested on rk3188 and work fine for me.
Also I did simulations of netdev_alloc_skb_ip_align() and dma_map_single()
faults and can confirm that new error handling work.
Could someone else with ARC EMAC test the patch? I would be very grateful for the help.
Florian or Eric, can you test it on your hardware?
>> +
>> + /* unmap previosly mapped skb */
>> + dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
>> + dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
>
> And then you unmap it. "addr" is the new DMA mapping, not the old one.
The old mapping should be unmapped here. It refer to old skb what contains already
received packet. Because buffer doesn’t belong to EMAC anymore.
That old mapping point to old skb buffer. And netif_receive_skb() will be
called for old skb after that.
The new mapping «addr" will be unmapped then EMAC receive
new packet. Later by the code (after netif_receive_skb()) there are lines for
saving «addr» value:
rx_buff->skb = skb;
dma_unmap_addr_set(rx_buff, addr, addr);
dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);
Regards,
Alexander.
^ permalink raw reply
* Re: [PATCH net-next v5 0/5] Introduce NETIF_F_GRO_HW
From: David Miller @ 2017-12-19 15:50 UTC (permalink / raw)
To: michael.chan; +Cc: netdev, andrew.gospodarek
In-Reply-To: <1513411784-17653-1-git-send-email-michael.chan@broadcom.com>
From: Michael Chan <michael.chan@broadcom.com>
Date: Sat, 16 Dec 2017 03:09:39 -0500
> Introduce NETIF_F_GRO_HW feature flag and convert drivers that support
> hardware GRO to use the new flag.
Series applied, thanks for following through with this work.
^ permalink raw reply
* Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28
From: Quentin Monnet @ 2017-12-19 15:57 UTC (permalink / raw)
To: Roman Gushchin, netdev
Cc: linux-kernel, kernel-team, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann
In-Reply-To: <20171219143821.26291-1-guro@fb.com>
Hi Roman, thanks for working on this!
2017-12-19 14:38 UTC+0000 ~ Roman Gushchin <guro@fb.com>
> Bpftool build is broken with binutils version 2.28 and later.
> The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> in the binutils repo, which changed the disassembler() function
> signature.
>
> Fix this by checking binutils version and use an appropriate
> disassembler() signature.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> ---
> tools/bpf/Makefile | 6 ++++++
> tools/bpf/bpf_jit_disasm.c | 7 +++++++
> tools/bpf/bpftool/Makefile | 6 ++++++
> tools/bpf/bpftool/jit_disasm.c | 5 +++++
> 4 files changed, 24 insertions(+)
>
> diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> index 07a6697466ef..3fd32fd0daa1 100644
> --- a/tools/bpf/Makefile
> +++ b/tools/bpf/Makefile
> @@ -6,8 +6,14 @@ LEX = flex
> YACC = bison
> MAKE = make
>
> +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
> +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +
> CFLAGS += -Wall -O2
> CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
>
> %.yacc.c: %.y
> $(YACC) -o $@ -d $<
> diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> index 75bf526a0168..3ef7c8bdc0f3 100644
> --- a/tools/bpf/bpf_jit_disasm.c
> +++ b/tools/bpf/bpf_jit_disasm.c
> @@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int opcodes)
>
> disassemble_init_for_target(&info);
>
> +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> + disassemble = disassembler(bfd_get_arch(bfdf),
> + bfd_big_endian(bfdf),
> + bfd_get_mach(bfdf),
> + bfdf);
> +#else
> disassemble = disassembler(bfdf);
> +#endif
> assert(disassemble);
>
> do {
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index 3f17ad317512..94ad51bf14b5 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
> CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
> LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
>
> +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
This does not seem to be portable. I tried that on Ubuntu and `readelf
-v` returns "GNU readelf (GNU Binutils for Ubuntu) 2.26.1", and
BINUTILS_VER catches "Binutils".
> +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> +
> INSTALL ?= install
> RM ?= rm -f
>
> diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
> index 1551d3918d4c..eaa7127e9eeb 100644
> --- a/tools/bpf/bpftool/jit_disasm.c
> +++ b/tools/bpf/bpftool/jit_disasm.c
> @@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
>
> disassemble_init_for_target(&info);
>
> +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> + disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
> + bfd_get_mach(bfdf), bfdf);
> +#else
> disassemble = disassembler(bfdf);
> +#endif
> assert(disassemble);
>
> if (json_output)
I discussed this issue with Jakub recently, and one suggestion he had
was to look in tools/build/feature to add a new "feature", by trying to
compile short programs, for making the distinction between binutils
versions. It probably requires more work, but could be more robust than
parsing the version from the command line?
Best,
Quentin
^ permalink raw reply
* Re: [PATCH net-next 1/1] forcedeth: remove duplicate structure member in xmit
From: David Miller @ 2017-12-19 15:57 UTC (permalink / raw)
To: yanjun.zhu; +Cc: netdev, stephen
In-Reply-To: <1513416663-9321-1-git-send-email-yanjun.zhu@oracle.com>
From: Zhu Yanjun <yanjun.zhu@oracle.com>
Date: Sat, 16 Dec 2017 04:31:03 -0500
> Since both first_tx_ctx and tx_skb are the head of tx ctx, it not
> necessary to use two structure members to statically indicate
> the head of tx ctx. So first_tx_ctx is removed.
>
> CC: Srinivas Eeda <srinivas.eeda@oracle.com>
> CC: Joe Jin <joe.jin@oracle.com>
> CC: Junxiao Bi <junxiao.bi@oracle.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH iproute2 1/3] iplink: Improve index parameter handling
From: Stephen Hemminger @ 2017-12-19 15:59 UTC (permalink / raw)
To: Serhey Popovich; +Cc: netdev
In-Reply-To: <18e98fc6-4145-0bb9-143d-d4305d22bdc8@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2680 bytes --]
On Mon, 18 Dec 2017 23:37:09 +0200
Serhey Popovich <serhe.popovych@gmail.com> wrote:
> Stephen Hemminger wrote:
> > On Mon, 18 Dec 2017 23:02:07 +0200
> > Serhey Popovich <serhe.popovych@gmail.com> wrote:
> >
> >> Stephen Hemminger wrote:
> >>> On Mon, 18 Dec 2017 20:54:06 +0200
> >>> Serhey Popovych <serhe.popovych@gmail.com> wrote:
> >>>
> >>>> diff --git a/ip/iplink.c b/ip/iplink.c
> >>>> index 1e685cc..4f9c169 100644
> >>>> --- a/ip/iplink.c
> >>>> +++ b/ip/iplink.c
> >>>> @@ -586,8 +586,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
> >>>> *name = *argv;
> >>>> } else if (strcmp(*argv, "index") == 0) {
> >>>> NEXT_ARG();
> >>>> + if (*index)
> >>>> + duparg("index", *argv);
> >>>> *index = atoi(*argv);
> >>>> - if (*index < 0)
> >>>> + if (*index <= 0)
> >>>
> >>> Why not use strtoul instead of atoi?
> >> Do not see reason for strtoul() instead atoi():
> >>
> >> 1) main arg: indexes in kernel represented as "int", which is
> >> signed. <= 0 values are reserved for various special purposes
> >> (see net/core/fib_rules.c on how device matching implemented).
> >>
> >> Configuring network device manually with index <= 0 is not correct
> >> (however possible). Kernel itself never chooses ifindex <= 0.
> >>
> >> Having unsigned int > 0x7fffffff actually means index <= 0.
> >>
> >> 2) this is not single place in iproute2 where it is used: not
> >> going to remove last user.
> >>
> >> 3) make changes clear and transparent for review.
> >
> > I would rather all of iproute2 correctly handles unsigned values.
> > Too much code is old K&R style C "the world is an int" and "who needs
> > to check for negative".
>
> You are right :(. I'm just trying to improve things a bit.
>
> >
> > There already is get_unsigned() in iproute2 util functions.
> This is good one based on strtoul(). But do we want to submit say
> index = (unsigned int)2147483648(0x7fffffff) to the kernel that is
> illegal from it's perspective?
>
> Or do you mean I can prepare treewide change to replace atoi() with
> get_unsigned()/get_integer() where appropriate?
>
> We already check if (*index < 0) since commit 3c682146aeff
> (iplink: forbid negative ifindex and modifying ifindex), and I just
> put index == 0 in the same range of invalid indexes.
>
The legacy BSD ABI for interfaces uses int, so that sets the upper
bound for kernel.
The netlink ABI limit is u32 for ifindex so technically 1..UINT32_MAX are
possible values but kernel is bound by BSD mistake.
I will take the original patch.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net-next 0/2] net: speedup geneve/vxlan tunnels dismantle
From: David Miller @ 2017-12-19 16:00 UTC (permalink / raw)
To: yanhaishuang; +Cc: edumazet, netdev, linux-kernel
In-Reply-To: <1513418090-8595-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 17:54:48 +0800
> This patch series add batching to vxlan/geneve tunnels so that netns
> dismantles are less costly.
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH] isdn: avm: Handle return value of skb_dequeue()
From: David Miller @ 2017-12-19 16:04 UTC (permalink / raw)
To: arvind.yadav.cs; +Cc: isdn, stephen, johannes.berg, linux-kernel, netdev
In-Reply-To: <5818ec104ed4354db3625ac8c7741d771719e4d4.1513453258.git.arvind.yadav.cs@gmail.com>
From: Arvind Yadav <arvind.yadav.cs@gmail.com>
Date: Sun, 17 Dec 2017 01:17:23 +0530
> skb_dequeue() will return NULL for an empty list or a pointer
> to the head element.
>
> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
There is no code path where this is possible.
The two call sites:
1) Explicitly add an SKB to the queue
OR
2) Explcitily guard the call with an skb_queue_empty() check
Therefore the stated condition is not possible.
^ permalink raw reply
* Re: [PATCH iproute2 1/3] iplink: Improve index parameter handling
From: Serhey Popovich @ 2017-12-19 16:05 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171219075951.7aca0d53@xeon-e3>
[-- Attachment #1.1: Type: text/plain, Size: 2783 bytes --]
Stephen Hemminger wrote:
> On Mon, 18 Dec 2017 23:37:09 +0200
> Serhey Popovich <serhe.popovych@gmail.com> wrote:
>
>> Stephen Hemminger wrote:
>>> On Mon, 18 Dec 2017 23:02:07 +0200
>>> Serhey Popovich <serhe.popovych@gmail.com> wrote:
>>>
>>>> Stephen Hemminger wrote:
>>>>> On Mon, 18 Dec 2017 20:54:06 +0200
>>>>> Serhey Popovych <serhe.popovych@gmail.com> wrote:
>>>>>
>>>>>> diff --git a/ip/iplink.c b/ip/iplink.c
>>>>>> index 1e685cc..4f9c169 100644
>>>>>> --- a/ip/iplink.c
>>>>>> +++ b/ip/iplink.c
>>>>>> @@ -586,8 +586,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
>>>>>> *name = *argv;
>>>>>> } else if (strcmp(*argv, "index") == 0) {
>>>>>> NEXT_ARG();
>>>>>> + if (*index)
>>>>>> + duparg("index", *argv);
>>>>>> *index = atoi(*argv);
>>>>>> - if (*index < 0)
>>>>>> + if (*index <= 0)
>>>>>
>>>>> Why not use strtoul instead of atoi?
>>>> Do not see reason for strtoul() instead atoi():
>>>>
>>>> 1) main arg: indexes in kernel represented as "int", which is
>>>> signed. <= 0 values are reserved for various special purposes
>>>> (see net/core/fib_rules.c on how device matching implemented).
>>>>
>>>> Configuring network device manually with index <= 0 is not correct
>>>> (however possible). Kernel itself never chooses ifindex <= 0.
>>>>
>>>> Having unsigned int > 0x7fffffff actually means index <= 0.
>>>>
>>>> 2) this is not single place in iproute2 where it is used: not
>>>> going to remove last user.
>>>>
>>>> 3) make changes clear and transparent for review.
>>>
>>> I would rather all of iproute2 correctly handles unsigned values.
>>> Too much code is old K&R style C "the world is an int" and "who needs
>>> to check for negative".
>>
>> You are right :(. I'm just trying to improve things a bit.
>>
>>>
>>> There already is get_unsigned() in iproute2 util functions.
>> This is good one based on strtoul(). But do we want to submit say
>> index = (unsigned int)2147483648(0x7fffffff) to the kernel that is
>> illegal from it's perspective?
>>
>> Or do you mean I can prepare treewide change to replace atoi() with
>> get_unsigned()/get_integer() where appropriate?
>>
>> We already check if (*index < 0) since commit 3c682146aeff
>> (iplink: forbid negative ifindex and modifying ifindex), and I just
>> put index == 0 in the same range of invalid indexes.
>>
>
> The legacy BSD ABI for interfaces uses int, so that sets the upper
> bound for kernel.
>
> The netlink ABI limit is u32 for ifindex so technically 1..UINT32_MAX are
> possible values but kernel is bound by BSD mistake.
Thank you for in depth explanation!
>
> I will take the original patch.
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [PATCH 0/4] bcm63xx_enet: remove mac_id usage
From: David Miller @ 2017-12-19 16:07 UTC (permalink / raw)
To: jonas.gorski
Cc: netdev, linux-mips, ralf, f.fainelli, bcm-kernel-feedback-list
In-Reply-To: <20171217160255.30342-1-jonas.gorski@gmail.com>
From: Jonas Gorski <jonas.gorski@gmail.com>
Date: Sun, 17 Dec 2017 17:02:51 +0100
> This patchset aims at reducing the platform device id number usage with
> the target of making it eventually possible to probe the driver through OF.
>
> Runtested on BCM6358.
>
> Since the patches touch mostly net/, they should go through net-next.
Series applied, thank you.
^ permalink raw reply
* Re: [patch net] mlxsw: spectrum_router: Remove batch neighbour deletion causing FW bug
From: David Miller @ 2017-12-19 16:08 UTC (permalink / raw)
To: jiri; +Cc: netdev, petrm, idosch, mlxsw
In-Reply-To: <20171217161643.2725-1-jiri@resnulli.us>
From: Jiri Pirko <jiri@resnulli.us>
Date: Sun, 17 Dec 2017 17:16:43 +0100
> From: Petr Machata <petrm@mellanox.com>
>
> This reverts commit 63dd00fa3e524c27cc0509190084ab147ecc8ae2.
>
> RAUHT DELETE_ALL seems to trigger a bug in FW. That manifests by later
> calls to RAUHT ADD of an IPv6 neighbor to fail with "bad parameter"
> error code.
>
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Fixes: 63dd00fa3e52 ("mlxsw: spectrum_router: Add batch neighbour deletion")
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Applied.
^ permalink raw reply
* Re: [patch net-next] mlxsw: spectrum: Add "spectrum" prefix macro
From: David Miller @ 2017-12-19 16:09 UTC (permalink / raw)
To: joe; +Cc: jiri, netdev, arkadis, idosch, mlxsw
In-Reply-To: <1513527900.31581.27.camel@perches.com>
From: Joe Perches <joe@perches.com>
Date: Sun, 17 Dec 2017 08:25:00 -0800
> On Sun, 2017-12-17 at 17:15 +0100, Jiri Pirko wrote:
>> From: Arkadi Sharshevsky <arkadis@mellanox.com>
>>
>> Add "spectrum" string prefix macro for error strings.
> []
>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> []
>> @@ -4168,13 +4168,11 @@ mlxsw_sp_master_lag_check(struct mlxsw_sp *mlxsw_sp,
>> u16 lag_id;
>>
>> if (mlxsw_sp_lag_index_get(mlxsw_sp, lag_dev, &lag_id) != 0) {
>> - NL_SET_ERR_MSG(extack,
>> - "spectrum: Exceeded number of supported LAG devices");
>> + NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Exceeded number of supported LAG devices");
>
> Perhaps use NL_SET_ERR_MSG_MOD instead.
Yeah that probably makes sense.
^ permalink raw reply
* Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28
From: Roman Gushchin @ 2017-12-19 16:10 UTC (permalink / raw)
To: Quentin Monnet
Cc: netdev, linux-kernel, kernel-team, Jakub Kicinski,
Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <79acdc04-bba9-c9d5-a651-57d0e9628653@netronome.com>
On Tue, Dec 19, 2017 at 03:57:02PM +0000, Quentin Monnet wrote:
> Hi Roman, thanks for working on this!
>
> 2017-12-19 14:38 UTC+0000 ~ Roman Gushchin <guro@fb.com>
> > Bpftool build is broken with binutils version 2.28 and later.
> > The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> > in the binutils repo, which changed the disassembler() function
> > signature.
> >
> > Fix this by checking binutils version and use an appropriate
> > disassembler() signature.
> >
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > ---
> > tools/bpf/Makefile | 6 ++++++
> > tools/bpf/bpf_jit_disasm.c | 7 +++++++
> > tools/bpf/bpftool/Makefile | 6 ++++++
> > tools/bpf/bpftool/jit_disasm.c | 5 +++++
> > 4 files changed, 24 insertions(+)
> >
> > diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> > index 07a6697466ef..3fd32fd0daa1 100644
> > --- a/tools/bpf/Makefile
> > +++ b/tools/bpf/Makefile
> > @@ -6,8 +6,14 @@ LEX = flex
> > YACC = bison
> > MAKE = make
> >
> > +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
> > +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +
> > CFLAGS += -Wall -O2
> > CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> > +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> > +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> >
> > %.yacc.c: %.y
> > $(YACC) -o $@ -d $<
> > diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> > index 75bf526a0168..3ef7c8bdc0f3 100644
> > --- a/tools/bpf/bpf_jit_disasm.c
> > +++ b/tools/bpf/bpf_jit_disasm.c
> > @@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int opcodes)
> >
> > disassemble_init_for_target(&info);
> >
> > +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> > + disassemble = disassembler(bfd_get_arch(bfdf),
> > + bfd_big_endian(bfdf),
> > + bfd_get_mach(bfdf),
> > + bfdf);
> > +#else
> > disassemble = disassembler(bfdf);
> > +#endif
> > assert(disassemble);
> >
> > do {
> > diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> > index 3f17ad317512..94ad51bf14b5 100644
> > --- a/tools/bpf/bpftool/Makefile
> > +++ b/tools/bpf/bpftool/Makefile
> > @@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
> > CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
> > LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
> >
> > +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
>
> This does not seem to be portable. I tried that on Ubuntu and `readelf
> -v` returns "GNU readelf (GNU Binutils for Ubuntu) 2.26.1", and
> BINUTILS_VER catches "Binutils".
>
> > +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> > +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> > +
> > INSTALL ?= install
> > RM ?= rm -f
> >
> > diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
> > index 1551d3918d4c..eaa7127e9eeb 100644
> > --- a/tools/bpf/bpftool/jit_disasm.c
> > +++ b/tools/bpf/bpftool/jit_disasm.c
> > @@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
> >
> > disassemble_init_for_target(&info);
> >
> > +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> > + disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
> > + bfd_get_mach(bfdf), bfdf);
> > +#else
> > disassemble = disassembler(bfdf);
> > +#endif
> > assert(disassemble);
> >
> > if (json_output)
>
> I discussed this issue with Jakub recently, and one suggestion he had
> was to look in tools/build/feature to add a new "feature", by trying to
> compile short programs, for making the distinction between binutils
> versions. It probably requires more work, but could be more robust than
> parsing the version from the command line?
Hm, might be an option. Parsing readelf output is pretty ugly, here I agree.
In general it feels more like a binutils issue, so we have to workaround it
in either way.
Is Jakub or someone else working on it?
Thanks!
^ permalink raw reply
* Re: [PATCH net] vxlan: update skb dst pmtu on tx path
From: David Miller @ 2017-12-19 16:12 UTC (permalink / raw)
To: lucien.xin; +Cc: netdev, jbenc
In-Reply-To: <62a0508ac6fde19a62a81f0a5451d39b8a07d722.1513578056.git.lucien.xin@gmail.com>
From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 18 Dec 2017 14:20:56 +0800
> Unlike ip tunnels, now vxlan doesn't do any pmtu update for
> upper dst pmtu, even if it doesn't match the lower dst pmtu
> any more.
>
> The problem can be reproduced when reducing the vxlan lower
> dev's pmtu when running netperf. In jianlin's testing, the
> performance went to 1/7 of the previous.
>
> This patch is to update the upper dst pmtu to match the lower
> dst pmtu on tx path so that packets can be sent out even when
> lower dev's pmtu has been changed.
>
> It also works for metadata dst.
>
> Note that this patch doesn't process any pmtu icmp packet.
> But even in the future, the support for pmtu icmp packets
> process of udp tunnels will also needs this.
>
> The same thing will be done for geneve in another patch.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Yikes...
You're going to have to find a way to fix this without
invoking ->update_pmtu() on every single transmit. That's
really excessive, especially for an operation which is
going to be a NOP %99.9999 of the time.
We need some way, instead, for the MTU change event to propagate
properly. I know this might be hard, but doing this in the transmit
handler on every packet to deal with it is not the way to go.
Thanks.
^ permalink raw reply
* Re: [net] Revert "net: core: maybe return -EEXIST in __dev_alloc_name"
From: Johannes Berg @ 2017-12-19 16:15 UTC (permalink / raw)
To: Michael Ellerman
Cc: netdev@vger.kernel.org, Jouni Malinen, Rasmus Villemoes,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <87vah29a1m.fsf@concordia.ellerman.id.au>
Hi,
> This revert seems to have broken networking on one of my powerpc
> machines, according to git bisect.
Fun!
TBH, I only looked at the immediate problem we ran into, and reverted
what was causing it. I don't think we saw the follow-up problem you're
seeing.
> The symptom is DHCP fails and I don't get a link, I didn't dig any
> further than that. I can if it's helpful.
>
> I think the problem is that 87c320e51519 ("net: core: dev_get_valid_name
> is now the same as dev_alloc_name_ns") only makes sense while
> d6f295e9def0 remains in the tree.
>
> ie. before the entire series, dev_get_valid_name() would return EEXIST,
> and that was retained when 87c320e51519 was merged, but now that
> d6f295e9def0 has been reverted dev_get_valid_name() is returning ENFILE.
>
> I can get the network up again if I also revert 87c320e51519 ("net:
> core: dev_get_valid_name is now the same as dev_alloc_name_ns"), or with
> the gross patch below.
Makes sense. I guess that should be reverted too then, or even your
"gross" patch applied.
johannes
^ permalink raw reply
* Re: [patch net-next] mlxsw: spectrum: Add "spectrum" prefix macro
From: Jiri Pirko @ 2017-12-19 16:17 UTC (permalink / raw)
To: David Miller; +Cc: joe, netdev, arkadis, idosch, mlxsw
In-Reply-To: <20171219.110906.560509930655165996.davem@davemloft.net>
Tue, Dec 19, 2017 at 05:09:06PM CET, davem@davemloft.net wrote:
>From: Joe Perches <joe@perches.com>
>Date: Sun, 17 Dec 2017 08:25:00 -0800
>
>> On Sun, 2017-12-17 at 17:15 +0100, Jiri Pirko wrote:
>>> From: Arkadi Sharshevsky <arkadis@mellanox.com>
>>>
>>> Add "spectrum" string prefix macro for error strings.
>> []
>>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>> []
>>> @@ -4168,13 +4168,11 @@ mlxsw_sp_master_lag_check(struct mlxsw_sp *mlxsw_sp,
>>> u16 lag_id;
>>>
>>> if (mlxsw_sp_lag_index_get(mlxsw_sp, lag_dev, &lag_id) != 0) {
>>> - NL_SET_ERR_MSG(extack,
>>> - "spectrum: Exceeded number of supported LAG devices");
>>> + NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Exceeded number of supported LAG devices");
>>
>> Perhaps use NL_SET_ERR_MSG_MOD instead.
>
>Yeah that probably makes sense.
Yes. Thanks
^ permalink raw reply
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure.
From: Marcelo Ricardo Leitner @ 2017-12-19 16:18 UTC (permalink / raw)
To: Ilya Lesokhin
Cc: netdev@vger.kernel.org, davem@davemloft.net, davejwatson@fb.com,
tom@herbertland.com, hannes@stressinduktion.org, Boris Pismenny,
Aviad Yehezkel, Liran Liss
In-Reply-To: <AM4PR0501MB272304FEA861384F0F7895BCD40F0@AM4PR0501MB2723.eurprd05.prod.outlook.com>
On Tue, Dec 19, 2017 at 03:38:16PM +0000, Ilya Lesokhin wrote:
> Tuesday, December 19, 2017 5:12 PM, Marcelo Ricardo Leitner wrote:
>
> > > I'm not quite sure what you mean by "no net_device's are registered"
> > > Presumably you mean there is no device that implements the
> > > NETIF_F_HW_TLS_TX capability yet.
> >
> > Not really. Let me try again. This patchset is using the expression "tls_device".
> > When I read that, I expect a new interface type, like a tunnel, that would be
> > created on top of another interface that has the offloading capability. That's
> > why I'm confused. IMHO "tls_offload" is a better fit. Makes sense?
> >
>
> We don't expose a new interface. An existing netdev does the offload.
>
> The xfrm layer also calls the offload layer xfrm_device and It also doesn't need to
> add another interface to offload ipsec to a netdev.
Hm right, there is xfrm_dev_init() and others, but there is also
XFRM_OFFLOAD as the config define and not XFRM_DEVICE.
>
> I thought about calling it tls_hw or tls_hw_offload.
> The problem is that the important distinction here is that the
> offload is done by a netdev.
> tls_sw can also use hw offload if you have the required
> memory to memory crypto engine and crypto_alloc_aead("gcm(aes)", 0, 0);
> decides on using it.
Now I can see the confusion in both ways, thanks.
And now I don't have a preference either.
Marcelo
^ permalink raw reply
* Re: [PATCH iproute2 0/3] ip/tunnels: Reuse code, vti6 zero endpoint support and minor cleanup
From: Stephen Hemminger @ 2017-12-19 16:19 UTC (permalink / raw)
To: Serhey Popovych; +Cc: netdev
In-Reply-To: <1513619285-23187-1-git-send-email-serhe.popovych@gmail.com>
On Mon, 18 Dec 2017 19:48:02 +0200
Serhey Popovych <serhe.popovych@gmail.com> wrote:
> In this series I present next set of improvements:
>
> 1) Use tnl_parse_key() to avoid code duplication in tunnel
> configuration via netlink code.
>
> 2) Trivial: use IN6ADDR_ANY_INIT instead of open coded
> initialization of local/remote endpoint in ip6tnl code.
>
> 3) Trivial: drop additional checks for zero endpoint
> in vti6 code. This completes and unifies support for
> unconfiguring local/remote endpoint for tunnel.
>
> See individual patch description message for details.
>
> Thanks,
> Serhii
>
> Serhey Popovych (3):
> ip/tunnel: Use tnl_parse_key() to parse tunnel key
> link_ip6tnl: Use IN6ADDR_ANY_INIT to initialize local/remote
> endpoints
> link_vti6: Always add local/remote endpoint attributes
>
> ip/link_gre.c | 45 +++++----------------------------------------
> ip/link_gre6.c | 45 +++++----------------------------------------
> ip/link_ip6tnl.c | 4 ++--
> ip/link_vti.c | 45 +++++----------------------------------------
> ip/link_vti6.c | 51 +++++++--------------------------------------------
> ip/tunnel.c | 5 +++--
> 6 files changed, 27 insertions(+), 168 deletions(-)
>
Applied, thanks
^ permalink raw reply
* Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
From: Willem de Bruijn @ 2017-12-19 16:19 UTC (permalink / raw)
To: Jason Wang; +Cc: David Hill, Paolo Bonzini, kvm, Willem de Bruijn, netdev
In-Reply-To: <cd0ea113-34ff-d24a-b798-819dfb536c76@redhat.com>
>> It looks like the first bad commit would be the following:
>>
>> [jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g
>> 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
>> commit 3ece782693c4b64d588dd217868558ab9a19bfe7
>> Author: Willem de Bruijn <willemb@google.com>
>> Date: Thu Aug 3 16:29:38 2017 -0400
>>
>> sock: skb_copy_ubufs support for compound pages
>>
>> Refine skb_copy_ubufs to support compound pages. With upcoming TCP
>> zerocopy sendmsg, such fragments may appear.
>>
>> The existing code replaces each page one for one. Splitting each
>> compound page into an independent number of regular pages can result
>> in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.
>>
>> Instead, fill all destination pages but the last to PAGE_SIZE.
>> Split the existing alloc + copy loop into separate stages:
>> 1. compute bytelength and minimum number of pages to store this.
>> 2. allocate
>> 3. copy, filling each page except the last to PAGE_SIZE bytes
>> 4. update skb frag array
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2
>> 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M include
>> :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb
>> 4fc8384362693e4619fab39b0a945f6f2349226b M net
>>
>> Here is the bisect log:
>
>
> Thanks for the hard bisecting.
>
> Cc netdev and Willem.
This is being discussed in
http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
David also previously reported this at
https://bugzilla.kernel.org/show_bug.cgi?id=197861
which has a pointer to the above thread, too. Let's discuss this in a
single thread. I have suggested a fix there.
Thanks for bisecting. Please also test the patch in the above thread
if possible.
^ permalink raw reply
* Re: INFO: task hung in bpf_exit_net
From: Xin Long @ 2017-12-19 16:21 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: syzbot, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs,
David Miller, David Ahern, Florian Westphal, Daniel Borkmann,
jakub.kicinski, mschiffer, Vladislav Yasevich, Jiri Benc, netdev
In-Reply-To: <CACT4Y+Ze8t3ipFyhfy9XfmvUw4tob+dAUiVMQ-xHimYDXpSiRw@mail.gmail.com>
On Tue, Dec 19, 2017 at 8:47 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Dec 19, 2017 at 1:36 PM, syzbot
> <bot+21b498fc12cf2041655f8e1eeae0733807d794b3@syzkaller.appspotmail.com>
> wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 7ceb97a071e80f1b5e4cd5a36de135612a836388
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>>
>> Unfortunately, I don't have any reproducer for this bug yet.
>>
>>
>> sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using default
>> minimum of 512
>> INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
>> Not tainted 4.15.0-rc2-next-20171205+ #59
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kworker/u4:0 D15808 5 2 0x80000000
>> Workqueue: netns cleanup_net
>> Call Trace:
>> context_switch kernel/sched/core.c:2800 [inline]
>> __schedule+0x8eb/0x2060 kernel/sched/core.c:3376
>> schedule+0xf5/0x430 kernel/sched/core.c:3435
>> schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3493
>> __mutex_lock_common kernel/locking/mutex.c:833 [inline]
>> __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893
>> mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>> rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
>> tc_action_net_exit include/net/act_api.h:125 [inline]
>> bpf_exit_net+0x1a2/0x340 net/sched/act_bpf.c:408
>> ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142
>> cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484
>> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113
>> worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>> kthread+0x37a/0x440 kernel/kthread.c:238
>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517
>>
[...]
>> Call Trace:
>> serial_in drivers/tty/serial/8250/8250.h:111 [inline]
>> wait_for_xmitr+0x93/0x1e0 drivers/tty/serial/8250/8250_port.c:2033
I saw this call trace on both 'bpf_exit_net task hung' and
'cleanup_net task hung'.
Note when cpu is here, it's still holding the rtnl_lock in these 2 cases,
one is in nl80211_dump_interface(), and the other one is in dev_ioctl().
I noticed this patch:
commit 54f19b4a679149130f78413c421a5780e90a9d0a
Author: Jiri Olsa <jolsa@kernel.org>
Date: Wed Sep 21 16:43:15 2016 +0200
tty/serial/8250: Touch NMI watchdog in wait_for_xmitr
It means in early time, watchdog timeout can be triggered here,
And this patch was to fix it by restarting NMI watchdog timeout
with calling touch_nmi_watchdog().
But this patch missed that it's still holding the rtnl_lock(), other
threads may timeout on watchdog when trying to acquire
rtnl_lock().
>> serial8250_console_putchar+0x1f/0x60
>> drivers/tty/serial/8250/8250_port.c:3170
>> uart_console_write+0xac/0xe0 drivers/tty/serial/serial_core.c:1858
>> serial8250_console_write+0x647/0xa20
>> drivers/tty/serial/8250/8250_port.c:3236
>> univ8250_console_write+0x5f/0x70 drivers/tty/serial/8250/8250_core.c:590
>> call_console_drivers kernel/printk/printk.c:1574 [inline]
>> console_unlock+0x788/0xd70 kernel/printk/printk.c:2233
>> vprintk_emit+0x4ad/0x590 kernel/printk/printk.c:1757
>> vprintk_default+0x28/0x30 kernel/printk/printk.c:1796
>> vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
>> printk+0xaa/0xca kernel/printk/printk.c:1829
>> nla_parse+0x374/0x3d0 lib/nlattr.c:257
>> nlmsg_parse include/net/netlink.h:398 [inline]
>> nl80211_dump_wiphy_parse.isra.37.constprop.83+0x138/0x5c0
>> net/wireless/nl80211.c:1920
>> nl80211_dump_interface+0x596/0x820 net/wireless/nl80211.c:2660
>> genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:480
>> netlink_dump+0x48c/0xce0 net/netlink/af_netlink.c:2186
>> __netlink_dump_start+0x4f0/0x6d0 net/netlink/af_netlink.c:2283
>> genl_family_rcv_msg+0xd27/0xfc0 net/netlink/genetlink.c:548
>> genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624
>> netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2405
>> genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
>> netlink_unicast_kernel net/netlink/af_netlink.c:1272 [inline]
>> netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1298
>> netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1861
>> sock_sendmsg_nosec net/socket.c:636 [inline]
>> sock_sendmsg+0xca/0x110 net/socket.c:646
>> sock_write_iter+0x320/0x5e0 net/socket.c:915
>> call_write_iter include/linux/fs.h:1776 [inline]
>> new_sync_write fs/read_write.c:469 [inline]
>> __vfs_write+0x68a/0x970 fs/read_write.c:482
>> vfs_write+0x18f/0x510 fs/read_write.c:544
>> SYSC_write fs/read_write.c:589 [inline]
>> SyS_write+0xef/0x220 fs/read_write.c:581
>> entry_SYSCALL_64_fastpath+0x1f/0x96
>> RIP: 0033:0x4529d9
>> RSP: 002b:00007f6d52e3ec58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001
>> RAX: ffffffffffffffda RBX: 00007f6d52e3f700 RCX: 00000000004529d9
>> RDX: 0000000000000024 RSI: 0000000020454000 RDI: 0000000000000016
>> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000000
>> R13: 0000000000a6f7ff R14: 00007f6d52e3f9c0 R15: 0000000000000000
>> Code: 24 d9 00 00 00 49 8d 7c 24 40 48 b8 00 00 00 00 00 fc ff df 48 89 fa
>> 48 c1 ea 03 d3 e3 80 3c 02 00 75 17 41 03 5c 24 40 89 da ec <5b> 0f b6 c0 41
>> 5c 5d c3 e8 38 b0 18 ff eb c2 e8 91 b0 18 ff eb
>>
>>
>> ---
>> This bug is generated by a dumb bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for details.
>> Direct all questions to syzkaller@googlegroups.com.
>> Please credit me with: Reported-by: syzbot <syzkaller@googlegroups.com>
>>
>> syzbot will keep track of this bug report.
>> Once a fix for this bug is merged into any tree, reply to this email with:
>> #syz fix: exact-commit-title
>> To mark this as a duplicate of another syzbot report, please reply with:
>> #syz dup: exact-subject-of-another-report
>> If it's a one-off invalid bug report, please reply with:
>> #syz invalid
>> Note: if the crash happens again, it will cause creation of a new bug
>> report.
>> Note: all commands must start from beginning of the line in the email body.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "syzkaller-bugs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/syzkaller-bugs/001a1143fd00a8cc790560b0b552%40google.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> This looks like +rtnetlink issue.
^ permalink raw reply
* Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28
From: Quentin Monnet @ 2017-12-19 16:22 UTC (permalink / raw)
To: Roman Gushchin
Cc: netdev, linux-kernel, kernel-team, Jakub Kicinski,
Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20171219161009.GA30572@castle.DHCP.thefacebook.com>
2017-12-19 16:10 UTC+0000 ~ Roman Gushchin <guro@fb.com>
> On Tue, Dec 19, 2017 at 03:57:02PM +0000, Quentin Monnet wrote:
>> Hi Roman, thanks for working on this!
>>
>> 2017-12-19 14:38 UTC+0000 ~ Roman Gushchin <guro@fb.com>
>>> Bpftool build is broken with binutils version 2.28 and later.
>>> The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
>>> in the binutils repo, which changed the disassembler() function
>>> signature.
>>>
>>> Fix this by checking binutils version and use an appropriate
>>> disassembler() signature.
>>>
>>> Signed-off-by: Roman Gushchin <guro@fb.com>
>>> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> ---
>>> tools/bpf/Makefile | 6 ++++++
>>> tools/bpf/bpf_jit_disasm.c | 7 +++++++
>>> tools/bpf/bpftool/Makefile | 6 ++++++
>>> tools/bpf/bpftool/jit_disasm.c | 5 +++++
>>> 4 files changed, 24 insertions(+)
>>>
>>> diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
>>> index 07a6697466ef..3fd32fd0daa1 100644
>>> --- a/tools/bpf/Makefile
>>> +++ b/tools/bpf/Makefile
>>> @@ -6,8 +6,14 @@ LEX = flex
>>> YACC = bison
>>> MAKE = make
>>>
>>> +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
>>> +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
>>> +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
>>> +
>>> CFLAGS += -Wall -O2
>>> CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
>>> +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
>>> +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
>>>
>>> %.yacc.c: %.y
>>> $(YACC) -o $@ -d $<
>>> diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
>>> index 75bf526a0168..3ef7c8bdc0f3 100644
>>> --- a/tools/bpf/bpf_jit_disasm.c
>>> +++ b/tools/bpf/bpf_jit_disasm.c
>>> @@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int opcodes)
>>>
>>> disassemble_init_for_target(&info);
>>>
>>> +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
>>> + disassemble = disassembler(bfd_get_arch(bfdf),
>>> + bfd_big_endian(bfdf),
>>> + bfd_get_mach(bfdf),
>>> + bfdf);
>>> +#else
>>> disassemble = disassembler(bfdf);
>>> +#endif
>>> assert(disassemble);
>>>
>>> do {
>>> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
>>> index 3f17ad317512..94ad51bf14b5 100644
>>> --- a/tools/bpf/bpftool/Makefile
>>> +++ b/tools/bpf/bpftool/Makefile
>>> @@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
>>> CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
>>> LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
>>>
>>> +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
>>
>> This does not seem to be portable. I tried that on Ubuntu and `readelf
>> -v` returns "GNU readelf (GNU Binutils for Ubuntu) 2.26.1", and
>> BINUTILS_VER catches "Binutils".
>>
>>> +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
>>> +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
>>> +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
>>> +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
>>> +
>>> INSTALL ?= install
>>> RM ?= rm -f
>>>
>>> diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
>>> index 1551d3918d4c..eaa7127e9eeb 100644
>>> --- a/tools/bpf/bpftool/jit_disasm.c
>>> +++ b/tools/bpf/bpftool/jit_disasm.c
>>> @@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
>>>
>>> disassemble_init_for_target(&info);
>>>
>>> +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
>>> + disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
>>> + bfd_get_mach(bfdf), bfdf);
>>> +#else
>>> disassemble = disassembler(bfdf);
>>> +#endif
>>> assert(disassemble);
>>>
>>> if (json_output)
>>
>> I discussed this issue with Jakub recently, and one suggestion he had
>> was to look in tools/build/feature to add a new "feature", by trying to
>> compile short programs, for making the distinction between binutils
>> versions. It probably requires more work, but could be more robust than
>> parsing the version from the command line?
>
> Hm, might be an option. Parsing readelf output is pretty ugly, here I agree.
> In general it feels more like a binutils issue, so we have to workaround it
> in either way.
>
> Is Jakub or someone else working on it?
>
> Thanks!
>
Jakub isn't. On our side, I noticed last week that there was this change
in binutils, and started to have a look at how these "features" work.
But I have nothing that works so far, so feel free to tackle this.
Quentin
^ permalink raw reply
* Re: [PATCH] iproute: "list/flush/save default" selected all of the routes
From: Stephen Hemminger @ 2017-12-19 16:27 UTC (permalink / raw)
To: Alexander Zubkov; +Cc: zubkov318, netdev
In-Reply-To: <1513508940-29430-1-git-send-email-green@msu.ru>
On Sun, 17 Dec 2017 12:09:00 +0100
Alexander Zubkov <green@msu.ru> wrote:
> When running "ip route list default" and not specifying address family,
> one will get all of the routes instead of just default only. The same
> is for "exact default" and "match default".
>
> It behaves in such a way because default route with unspecified family
> has the same all-zeroes value like no prefix specified at all. Thus
> following code blindly ignores the fact, that prefix was actually
> specified.
>
> This patch adds the flag PREFIXLEN_SPECIFIED to the default route too.
> And then checks its value when filtering routes.
>
> Signed-off-by: Alexander Zubkov <green@msu.ru>
Applied, thanks Alexander
^ permalink raw reply
* Re: [PATCH v3] iproute: list/flush/save filter also by metric
From: Stephen Hemminger @ 2017-12-19 16:22 UTC (permalink / raw)
To: Alexander Zubkov; +Cc: zubkov318, netdev
In-Reply-To: <1513512131-31902-1-git-send-email-green@msu.ru>
On Sun, 17 Dec 2017 13:02:11 +0100
Alexander Zubkov <green@msu.ru> wrote:
> Metric is one of the "unique key" fields of the route in Linux. But
> still one can not use its value in filter while running ip list.
> Because of this writing checks in scripts for example is incovenient.
>
> Signed-off-by: Alexander Zubkov <green@msu.ru>
Applied, thanks Alexander
^ permalink raw reply
* Re: [PATCH v4 net-next 6/6] ila: Route notify
From: Roopa Prabhu @ 2017-12-19 16:28 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, netdev, rohit
In-Reply-To: <20171215182800.10248-7-tom@quantonium.net>
On Fri, Dec 15, 2017 at 10:28 AM, Tom Herbert <tom@quantonium.net> wrote:
> Implement RTM notifications for ILA routers. This adds support to
> ILA LWT to send a netlink RTM message when a router is uses.
>
> THe ILA notify mechanism can be used in two contexts:
>
> - On an ILA forwarding cache a route prefix can be configured to
> do an ILA notification. This method is used when address
> resolution needs to be done on an address.
> - One an ILA router an ILA host route entry may include a
> noitification. The purpose of this is to get a notification
> to a userspace daemon to send and ILA redirect
>
> Signed-off-by: Tom Herbert <tom@quantonium.net>
> ---
> include/uapi/linux/ila.h | 2 +
> include/uapi/linux/rtnetlink.h | 8 +-
> net/ipv6/ila/ila_lwt.c | 268 ++++++++++++++++++++++++++++-------------
> 3 files changed, 193 insertions(+), 85 deletions(-)
>
> diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
> index db45d3e49a12..5675f3e71fac 100644
> --- a/include/uapi/linux/ila.h
> +++ b/include/uapi/linux/ila.h
> @@ -19,6 +19,8 @@ enum {
> ILA_ATTR_CSUM_MODE, /* u8 */
> ILA_ATTR_IDENT_TYPE, /* u8 */
> ILA_ATTR_HOOK_TYPE, /* u8 */
> + ILA_ATTR_NOTIFY_DST, /* flag */
> + ILA_ATTR_NOTIFY_SRC, /* flag */
>
> __ILA_ATTR_MAX,
> };
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index d8b5f80c2ea6..8d358a300d8a 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -13,7 +13,8 @@
> */
> #define RTNL_FAMILY_IPMR 128
> #define RTNL_FAMILY_IP6MR 129
> -#define RTNL_FAMILY_MAX 129
> +#define RTNL_FAMILY_ILA 130
I don't see this being used, unless I missed it.
> +#define RTNL_FAMILY_MAX 130
>
> /****
> * Routing/neighbour discovery messages.
> @@ -150,6 +151,9 @@ enum {
> RTM_NEWCACHEREPORT = 96,
> #define RTM_NEWCACHEREPORT RTM_NEWCACHEREPORT
>
> + RTM_ADDR_RESOLVE = 98,
> +#define RTM_ADDR_RESOLVE RTM_ADDR_RESOLVE
Its nice that you are trying to add a generic resolver notify
format..., looks good...a few minor comments below for consistency:
your patch adds address resolve notifications with:
- generic notification msg type RTM_ADDR_RESOLVE with route msg format
- On the ILA specific RTNLGRP_ILA_NOTIFY multicast group
Other way to possibly format this for "generic" rtnl netlink resolver is:
- notification msg type RTM_NEWROUTE with route msg format (This is
for consistency: route msg format is always used with RTM_*ROUTE msg
types)
- route entry msg type RTNL_FAMILY_ILA (ie struct rtmsg-> rtm_family
to RTNL_FAMILY_ILA)
- generic address resolution netlink multicast group RTNLGRP_ADDR_RESOLVE_NOTIFY
OR keep everything "specific" to ILA using the ILA genl channel msg
format and family
Besides that, since you are using the route msg format, you could
potentially s/ADDR_RESOLVE/ROUTE_RESOLVE/g everywhere above.
> +
> __RTM_MAX,
> #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
> };
> @@ -676,6 +680,8 @@ enum rtnetlink_groups {
> #define RTNLGRP_IPV4_MROUTE_R RTNLGRP_IPV4_MROUTE_R
> RTNLGRP_IPV6_MROUTE_R,
> #define RTNLGRP_IPV6_MROUTE_R RTNLGRP_IPV6_MROUTE_R
> + RTNLGRP_ILA_NOTIFY,
> +#define RTNLGRP_ILA_NOTIFY RTNLGRP_ILA_NOTIFY
> __RTNLGRP_MAX
> };
> #define RTNLGRP_MAX (__RTNLGRP_MAX - 1)
> diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
> index 9f1e46a1468e..303c91e3bf76 100644
> --- a/net/ipv6/ila/ila_lwt.c
> +++ b/net/ipv6/ila/ila_lwt.c
> @@ -19,10 +19,15 @@
> struct ila_lwt {
> struct ila_params p;
> struct dst_cache dst_cache;
> + u8 hook_type;
> u32 connected : 1;
> - u32 lwt_output : 1;
> + u32 xlat : 1;
> + u32 notify : 2;
> };
>
> +#define ILA_NOTIFY_DST 1
> +#define ILA_NOTIFY_SRC 2
> +
> static inline struct ila_lwt *ila_lwt_lwtunnel(
> struct lwtunnel_state *lwt)
> {
> @@ -35,6 +40,67 @@ static inline struct ila_params *ila_params_lwtunnel(
> return &ila_lwt_lwtunnel(lwt)->p;
> }
>
> +static size_t ila_rslv_msgsize(void)
> +{
> + size_t len =
> + NLMSG_ALIGN(sizeof(struct rtmsg))
> + + nla_total_size(16) /* RTA_DST */
> + + nla_total_size(16) /* RTA_SRC */
> + ;
> +
> + return len;
> +}
> +
> +void ila_notify(struct net *net, struct sk_buff *skb, struct ila_lwt *lwt)
> +{
> + struct ipv6hdr *ip6h = ipv6_hdr(skb);
> + int flags = NLM_F_MULTI;
> + struct sk_buff *nlskb;
> + struct nlmsghdr *nlh;
> + struct rtmsg *rtm;
> + int err = 0;
> +
> + /* Send ILA notification to user */
> + nlskb = nlmsg_new(ila_rslv_msgsize(), GFP_KERNEL);
> + if (!nlskb)
> + return;
> +
> + nlh = nlmsg_put(nlskb, 0, 0, RTM_ADDR_RESOLVE, sizeof(*rtm), flags);
> + if (!nlh) {
> + err = -EMSGSIZE;
> + goto errout;
> + }
> +
> + rtm = nlmsg_data(nlh);
> + rtm->rtm_family = AF_INET6;
> + rtm->rtm_dst_len = 128;
> + rtm->rtm_src_len = 0;
> + rtm->rtm_tos = 0;
> + rtm->rtm_table = RT6_TABLE_UNSPEC;
> + rtm->rtm_type = RTN_UNICAST;
> + rtm->rtm_scope = RT_SCOPE_UNIVERSE;
> +
> + if (((lwt->notify & ILA_NOTIFY_DST) &&
> + nla_put_in6_addr(nlskb, RTA_DST, &ip6h->daddr)) ||
> + ((lwt->notify & ILA_NOTIFY_SRC) &&
> + nla_put_in6_addr(nlskb, RTA_SRC, &ip6h->saddr))) {
> + nlmsg_cancel(nlskb, nlh);
> + err = -EMSGSIZE;
> + goto errout;
> + }
> +
> + nlmsg_end(nlskb, nlh);
> +
> + rtnl_notify(nlskb, net, 0, RTNLGRP_ILA_NOTIFY, NULL, GFP_ATOMIC);
> +
> + return;
> +
> +errout:
> + kfree_skb(nlskb);
> + WARN_ON(err == -EMSGSIZE);
> + rtnl_set_sk_err(net, RTNLGRP_ILA_NOTIFY, err);
> +}
> +
> static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> {
> struct dst_entry *orig_dst = skb_dst(skb);
> @@ -46,11 +112,14 @@ static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> if (skb->protocol != htons(ETH_P_IPV6))
> goto drop;
>
> - if (ilwt->lwt_output)
> + if (ilwt->xlat)
> ila_update_ipv6_locator(skb,
> ila_params_lwtunnel(orig_dst->lwtstate),
> true);
>
> + if (ilwt->notify)
> + ila_notify(net, skb, ilwt);
> +
> if (rt->rt6i_flags & (RTF_GATEWAY | RTF_CACHE)) {
> /* Already have a next hop address in route, no need for
> * dest cache route.
> @@ -106,11 +175,14 @@ static int ila_input(struct sk_buff *skb)
> if (skb->protocol != htons(ETH_P_IPV6))
> goto drop;
>
> - if (!ilwt->lwt_output)
> + if (ilwt->xlat)
> ila_update_ipv6_locator(skb,
> ila_params_lwtunnel(dst->lwtstate),
> false);
>
> + if (ilwt->notify)
> + ila_notify(dev_net(dst->dev), skb, ilwt);
> +
> return dst->lwtstate->orig_input(skb);
>
> drop:
> @@ -123,6 +195,8 @@ static const struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
> [ILA_ATTR_CSUM_MODE] = { .type = NLA_U8, },
> [ILA_ATTR_IDENT_TYPE] = { .type = NLA_U8, },
> [ILA_ATTR_HOOK_TYPE] = { .type = NLA_U8, },
> + [ILA_ATTR_NOTIFY_DST] = { .type = NLA_FLAG },
> + [ILA_ATTR_NOTIFY_SRC] = { .type = NLA_FLAG },
> };
>
> static int ila_build_state(struct net *net, struct nlattr *nla,
> @@ -130,64 +204,73 @@ static int ila_build_state(struct net *net, struct nlattr *nla,
> struct lwtunnel_state **ts,
> struct netlink_ext_ack *extack)
> {
> - struct ila_lwt *ilwt;
> - struct ila_params *p;
> - struct nlattr *tb[ILA_ATTR_MAX + 1];
> - struct lwtunnel_state *newts;
> const struct fib6_config *cfg6 = cfg;
> - struct ila_addr *iaddr;
> + struct ila_addr *iaddr = (struct ila_addr *)&cfg6->fc_dst;
> u8 ident_type = ILA_ATYPE_USE_FORMAT;
> u8 hook_type = ILA_HOOK_ROUTE_OUTPUT;
> + struct nlattr *tb[ILA_ATTR_MAX + 1];
> u8 csum_mode = ILA_CSUM_NO_ACTION;
> - bool lwt_output = true;
> + struct lwtunnel_state *newts;
> + struct ila_lwt *ilwt;
> + struct ila_params *p;
> u8 eff_ident_type;
> - int ret;
> + int err;
>
> if (family != AF_INET6)
> return -EINVAL;
>
> - ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla, ila_nl_policy, extack);
> - if (ret < 0)
> - return ret;
> + err = nla_parse_nested(tb, ILA_ATTR_MAX, nla, ila_nl_policy, extack);
> + if (err < 0)
> + return err;
>
> - if (!tb[ILA_ATTR_LOCATOR])
> - return -EINVAL;
> + if (tb[ILA_ATTR_LOCATOR]) {
> + /* Doing ILA translation */
>
> - iaddr = (struct ila_addr *)&cfg6->fc_dst;
> + if (tb[ILA_ATTR_IDENT_TYPE])
> + ident_type = nla_get_u8(tb[ILA_ATTR_IDENT_TYPE]);
>
> - if (tb[ILA_ATTR_IDENT_TYPE])
> - ident_type = nla_get_u8(tb[ILA_ATTR_IDENT_TYPE]);
> + if (ident_type == ILA_ATYPE_USE_FORMAT) {
> + /* Infer identifier type from type field in formatted
> + * identifier.
> + */
>
> - if (ident_type == ILA_ATYPE_USE_FORMAT) {
> - /* Infer identifier type from type field in formatted
> - * identifier.
> - */
> + if (cfg6->fc_dst_len < 8 *
> + sizeof(struct ila_locator) + 3) {
> + /* Need to have full locator and at least type
> + * field included in destination
> + */
> + return -EINVAL;
> + }
> +
> + eff_ident_type = iaddr->ident.type;
> + } else {
> + eff_ident_type = ident_type;
> + }
>
> - if (cfg6->fc_dst_len < 8 * sizeof(struct ila_locator) + 3) {
> - /* Need to have full locator and at least type field
> - * included in destination
> - */
> + switch (eff_ident_type) {
> + case ILA_ATYPE_IID:
> + /* Don't allow ILA for IID type */
> + return -EINVAL;
> + case ILA_ATYPE_LUID:
> + break;
> + case ILA_ATYPE_VIRT_V4:
> + case ILA_ATYPE_VIRT_UNI_V6:
> + case ILA_ATYPE_VIRT_MULTI_V6:
> + case ILA_ATYPE_NONLOCAL_ADDR:
> + /* These ILA formats are not supported yet. */
> + default:
> return -EINVAL;
> }
>
> - eff_ident_type = iaddr->ident.type;
> - } else {
> - eff_ident_type = ident_type;
> - }
> + csum_mode = nla_get_u8(tb[ILA_ATTR_CSUM_MODE]);
>
> - switch (eff_ident_type) {
> - case ILA_ATYPE_IID:
> - /* Don't allow ILA for IID type */
> - return -EINVAL;
> - case ILA_ATYPE_LUID:
> - break;
> - case ILA_ATYPE_VIRT_V4:
> - case ILA_ATYPE_VIRT_UNI_V6:
> - case ILA_ATYPE_VIRT_MULTI_V6:
> - case ILA_ATYPE_NONLOCAL_ADDR:
> - /* These ILA formats are not supported yet. */
> - default:
> - return -EINVAL;
> + if (csum_mode == ILA_CSUM_NEUTRAL_MAP &&
> + ila_csum_neutral_set(iaddr->ident)) {
> + /* Don't allow translation if checksum neutral bit is
> + * configured and it's set in the SIR address.
> + */
> + return -EINVAL;
> + }
> }
>
> if (tb[ILA_ATTR_HOOK_TYPE])
> @@ -195,58 +278,62 @@ static int ila_build_state(struct net *net, struct nlattr *nla,
>
> switch (hook_type) {
> case ILA_HOOK_ROUTE_OUTPUT:
> - lwt_output = true;
> - break;
> case ILA_HOOK_ROUTE_INPUT:
> - lwt_output = false;
> break;
> default:
> return -EINVAL;
> }
>
> - if (tb[ILA_ATTR_CSUM_MODE])
> - csum_mode = nla_get_u8(tb[ILA_ATTR_CSUM_MODE]);
> -
> - if (csum_mode == ILA_CSUM_NEUTRAL_MAP &&
> - ila_csum_neutral_set(iaddr->ident)) {
> - /* Don't allow translation if checksum neutral bit is
> - * configured and it's set in the SIR address.
> - */
> - return -EINVAL;
> - }
> -
> newts = lwtunnel_state_alloc(sizeof(*ilwt));
> if (!newts)
> return -ENOMEM;
>
> ilwt = ila_lwt_lwtunnel(newts);
> - ret = dst_cache_init(&ilwt->dst_cache, GFP_ATOMIC);
> - if (ret) {
> +
> + err = dst_cache_init(&ilwt->dst_cache, GFP_ATOMIC);
> + if (err) {
> kfree(newts);
> - return ret;
> + return err;
> }
>
> - ilwt->lwt_output = !!lwt_output;
> + newts->type = LWTUNNEL_ENCAP_ILA;
>
> - p = ila_params_lwtunnel(newts);
> + switch (hook_type) {
> + case ILA_HOOK_ROUTE_OUTPUT:
> + newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT;
> + break;
> + case ILA_HOOK_ROUTE_INPUT:
> + newts->flags |= LWTUNNEL_STATE_INPUT_REDIRECT;
> + break;
> + }
>
> - p->csum_mode = csum_mode;
> - p->ident_type = ident_type;
> - p->locator.v64 = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
> + ilwt->hook_type = hook_type;
>
> - /* Precompute checksum difference for translation since we
> - * know both the old locator and the new one.
> - */
> - p->locator_match = iaddr->loc;
> + if (tb[ILA_ATTR_NOTIFY_DST])
> + ilwt->notify |= ILA_NOTIFY_DST;
>
> - ila_init_saved_csum(p);
> + if (tb[ILA_ATTR_NOTIFY_SRC])
> + ilwt->notify |= ILA_NOTIFY_SRC;
>
> - newts->type = LWTUNNEL_ENCAP_ILA;
> - newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
> - LWTUNNEL_STATE_INPUT_REDIRECT;
> + p = ila_params_lwtunnel(newts);
>
> - if (cfg6->fc_dst_len == 8 * sizeof(struct in6_addr))
> - ilwt->connected = 1;
> + if (tb[ILA_ATTR_LOCATOR]) {
> + ilwt->xlat = true;
> + p->csum_mode = csum_mode;
> + p->ident_type = ident_type;
> + p->locator.v64 = (__force __be64)nla_get_u64(
> + tb[ILA_ATTR_LOCATOR]);
> +
> + /* Precompute checksum difference for translation since we
> + * know both the old locator and the new one.
> + */
> + p->locator_match = iaddr->loc;
> +
> + ila_init_saved_csum(p);
> +
> + if (cfg6->fc_dst_len == 8 * sizeof(struct in6_addr))
> + ilwt->connected = 1;
> + }
>
> *ts = newts;
>
> @@ -264,21 +351,32 @@ static int ila_fill_encap_info(struct sk_buff *skb,
> struct ila_params *p = ila_params_lwtunnel(lwtstate);
> struct ila_lwt *ilwt = ila_lwt_lwtunnel(lwtstate);
>
> - if (nla_put_u64_64bit(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator.v64,
> - ILA_ATTR_PAD))
> + if (ilwt->xlat) {
> + if (nla_put_u64_64bit(skb, ILA_ATTR_LOCATOR,
> + (__force u64)p->locator.v64,
> + ILA_ATTR_PAD))
> goto nla_put_failure;
>
> - if (nla_put_u8(skb, ILA_ATTR_CSUM_MODE, (__force u8)p->csum_mode))
> - goto nla_put_failure;
> + if (nla_put_u8(skb, ILA_ATTR_CSUM_MODE,
> + (__force u8)p->csum_mode))
> + goto nla_put_failure;
>
> - if (nla_put_u8(skb, ILA_ATTR_IDENT_TYPE, (__force u8)p->ident_type))
> - goto nla_put_failure;
> + if (nla_put_u8(skb, ILA_ATTR_IDENT_TYPE,
> + (__force u8)p->ident_type))
> + goto nla_put_failure;
> + }
>
> - if (nla_put_u8(skb, ILA_ATTR_HOOK_TYPE,
> - ilwt->lwt_output ? ILA_HOOK_ROUTE_OUTPUT :
> - ILA_HOOK_ROUTE_INPUT))
> + if (nla_put_u8(skb, ILA_ATTR_HOOK_TYPE, ilwt->hook_type))
> goto nla_put_failure;
>
> + if (ilwt->notify & ILA_NOTIFY_DST)
> + if (nla_put_flag(skb, ILA_ATTR_NOTIFY_DST))
> + goto nla_put_failure;
> +
> + if (ilwt->notify & ILA_NOTIFY_SRC)
> + if (nla_put_flag(skb, ILA_ATTR_NOTIFY_SRC))
> + goto nla_put_failure;
> +
> return 0;
>
> nla_put_failure:
> @@ -291,6 +389,8 @@ static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
> nla_total_size(sizeof(u8)) + /* ILA_ATTR_CSUM_MODE */
> nla_total_size(sizeof(u8)) + /* ILA_ATTR_IDENT_TYPE */
> nla_total_size(sizeof(u8)) + /* ILA_ATTR_HOOK_TYPE */
> + nla_total_size(0) + /* ILA_ATTR_NOTIFY_DST */
> + nla_total_size(0) + /* ILA_ATTR_NOTIFY_SRC */
> 0;
> }
>
> --
> 2.11.0
>
^ permalink raw reply
* Re: [PATCH net] ipv4: Fix use-after-free when flushing FIB tables
From: David Miller @ 2017-12-19 16:32 UTC (permalink / raw)
To: idosch; +Cc: netdev, alexander.h.duyck, fengguang.wu, dsahern, mlxsw
In-Reply-To: <20171218081320.29442-1-idosch@mellanox.com>
From: Ido Schimmel <idosch@mellanox.com>
Date: Mon, 18 Dec 2017 10:13:20 +0200
> Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
> local table uses the same trie allocated for the main table when custom
> rules are not in use.
>
> When a net namespace is dismantled, the main table is flushed and freed
> (via an RCU callback) before the local table. In case the callback is
> invoked before the local table is iterated, a use-after-free can occur.
>
> Fix this by iterating over the FIB tables in reverse order, so that the
> main table is always freed after the local table.
>
> Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
This is really too clever of a fix I think :-)
I would prefer if we fixed things more explicitly.
In struct fib_table you can add a "data_ref" integer. Any pointer
reference created to fib_table->__data increases this counter. It is
always done inside of RTNL locking, so should be doable without
atomics or extra locking.
For a non-aliased fib_table we go:
if (!--fib_table->data_ref)
kfree(fib_table);
And for aliased ones we do something like:
if (fib_table->tb_data != fib_table->__data) {
void *data = fib_table->fb_data;
struct fib_table *alias;
alias = container_of(data, struct fib_table, __data[0]);
if (!--alias->data_ref)
kfree(alias);
kfree(fib_table);
}
Something like that.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox