* Re: [PATCH 0/5] ATM-nicstar: Fine-tuning for three function implementations
From: David Miller @ 2016-09-11 4:46 UTC (permalink / raw)
To: elfring
Cc: linux-atm-general, netdev, 3chas3, linux-kernel, kernel-janitors,
julia.lawall, pbonzini
In-Reply-To: <fe480e1f-285a-bb98-3550-b4fa354f8174@users.sourceforge.net>
From: SF Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 10 Sep 2016 09:15:37 +0200
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Sat, 10 Sep 2016 09:10:01 +0200
>
> A few update suggestions were taken into account
> from static source code analysis.
Series applied.
^ permalink raw reply
* Re: [PATCH net-next] tcp: better use ooo_last_skb in tcp_data_queue_ofo()
From: David Miller @ 2016-09-11 4:43 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, willemb, wygivan, ycheng, ncardwell, ilpo.jarvinen
In-Reply-To: <1473456165.18970.66.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 09 Sep 2016 14:22:45 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> Willem noticed that we could avoid an rbtree lookup if the
> the attempt to coalesce incoming skb to the last skb failed
> for some reason.
>
> Since most ooo additions are at the tail, this is definitely
> worth adding a test and fast path.
>
> Suggested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] openvswitch: use alias for genetlink family names
From: David Miller @ 2016-09-11 4:43 UTC (permalink / raw)
To: cascardo-H+wXaHxf7aLQT0dZR+AlfA
Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1473453750-18854-1-git-send-email-cascardo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
From: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Date: Fri, 9 Sep 2016 17:42:30 -0300
> When userspace tries to create datapaths and the module is not loaded,
> it will simply fail. With this patch, the module will be automatically
> loaded.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Applied to net-next, thanks.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [PATCH net-next 3/5] liquidio CN23XX: Mailbox support
From: David Miller @ 2016-09-11 4:42 UTC (permalink / raw)
To: rvatsavayi
Cc: netdev, derek.chickles, satananda.burla, felix.manlunas,
raghu.vatsavayi
In-Reply-To: <1473451707-26397-4-git-send-email-rvatsavayi@caviumnetworks.com>
From: Raghu Vatsavayi <rvatsavayi@caviumnetworks.com>
Date: Fri, 9 Sep 2016 13:08:25 -0700
> +int octeon_mbox_read(struct octeon_mbox *mbox)
> +{
> + int ret = 0;
> + union octeon_mbox_message msg;
> +
Please always order local variable declarations from longest
to shortest line.
Please audit your entire submission for this problem.
^ permalink raw reply
* Re: [PATCH net-next] Revert "hv_netvsc: make inline functions static"
From: David Miller @ 2016-09-11 4:23 UTC (permalink / raw)
To: sthemmin, sthemmin; +Cc: kys, netdev
In-Reply-To: <1473450324-25918-1-git-send-email-sthemmin@exchange.microsoft.com>
From: sthemmin@exchange.microsoft.com
Date: Fri, 9 Sep 2016 12:45:24 -0700
> From: Stephen Hemminger <sthemmin@microsoft.com>
>
> These functions are used by other code misc-next tree.
>
> This reverts commit 30d1de08c87ddde6f73936c3350e7e153988fe02.
>
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 00/11] Mellanox 100G mlx5 seamless error recovery
From: David Miller @ 2016-09-11 4:22 UTC (permalink / raw)
To: saeedm; +Cc: netdev, tariqt, mohamad, ogerlitz
In-Reply-To: <1473431727-20150-1-git-send-email-saeedm@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Fri, 9 Sep 2016 17:35:16 +0300
> This series from Mohamad improves the driver load/unload flows
> to seamlessly handle pci errors and device internal errors recovery
> reset flows.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH 0/8] net: use IS_ENABLED() instead of checking for built-in or module
From: David Miller @ 2016-09-11 4:19 UTC (permalink / raw)
To: javier
Cc: linux-kernel, kuznet, nikolay, jchapman, aduyck, vyasevich, fw,
acme, yoshfuji, daniel, stephen, tom, linux-sctp, a, jhs, herbert,
mugunthanvnm, steffen.klassert, edumazet, felipe.balbi, netdev,
jiri, nhorman, jarod, kaber, pabeni, jmorris
In-Reply-To: <1473425000-9782-1-git-send-email-javier@osg.samsung.com>
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Fri, 9 Sep 2016 08:43:12 -0400
> This trivial series replace the open coding to check for a Kconfig symbol
> being built-in or module, with IS_ENABLED() macro that does exactly that.
>
> Using the macro makes the code more readable by helping abstract away some
> of the Kconfig built-in and module enable details.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH v2 0/3] Optimisation of fs_enet ethernet driver
From: David Miller @ 2016-09-11 4:17 UTC (permalink / raw)
To: christophe.leroy
Cc: pantelis.antoniou, vbordug, linux-kernel, linuxppc-dev, netdev
In-Reply-To: <cover.1473422536.git.christophe.leroy@c-s.fr>
From: Christophe Leroy <christophe.leroy@c-s.fr>
Date: Fri, 9 Sep 2016 14:26:18 +0200 (CEST)
> This set optimises the freescale fs_enet ethernet driver:
> 1/ Merge of RX and TX NAPI functions in order to limit the amount of
> interrupts
> 2/ Do not unmap DMA when packets len is below copybreak, otherwise there
> is no benefit in copying the skb instead of allocating a new one
> 3/ Make copybreak value configurable as the optimised value is not the
> same on all targets
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH v2] net: ethernet: renesas: sh_eth: add POST registers for rz
From: David Miller @ 2016-09-11 4:12 UTC (permalink / raw)
To: chris.brandt
Cc: sergei.shtylyov, horms+renesas, geert+renesas, daniel, netdev,
linux-renesas-soc
In-Reply-To: <20160907185709.24150-1-chris.brandt@renesas.com>
From: Chris Brandt <chris.brandt@renesas.com>
Date: Wed, 7 Sep 2016 14:57:09 -0400
> Due to a mistake in the hardware manual, the FWSLC and POST1-4 registers
> were not documented and left out of the driver for RZ/A making the CAM
> feature non-operational.
> Additionally, when the offset values for POST1-4 are left blank, the driver
> attempts to set them using an offset of 0xFFFF which can cause a memory
> corruption or panic.
>
> This patch fixes the panic and properly enables CAM.
>
> Reported-by: Daniel Palmer <daniel@0x0f.com>
> Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC
From: David Miller @ 2016-09-11 4:06 UTC (permalink / raw)
To: hadarh
Cc: netdev, jiri, jbenc, jhs, shmulik.ladkani, tom, edumazet,
xiyou.wangcong, amirva, ogerlitz
In-Reply-To: <1473341028-29368-1-git-send-email-hadarh@mellanox.com>
From: Hadar Hen Zion <hadarh@mellanox.com>
Date: Thu, 8 Sep 2016 16:23:44 +0300
> This patchset introduces ip tunnel manipulation support using the TC subsystem.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH v6 2/8] thunderbolt: Updating the register definitions
From: Andreas Noever @ 2016-09-11 0:02 UTC (permalink / raw)
To: Amir Levy
Cc: Greg KH, Bjorn Helgaas, Jonathan Corbet,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, netdev,
linux-doc, thunderbolt-linux, Westerberg, Mika, Winkler, Tomas
In-Reply-To: <1470054233-4309-3-git-send-email-amir.jer.levy@intel.com>
On Mon, Aug 1, 2016 at 2:23 PM, Amir Levy <amir.jer.levy@intel.com> wrote:
> Adding more Thunderbolt(TM) register definitions
> and some helper macros.
Thinking about this again I would prefer it if you would put your
definitions into a separate file under icm/ (even if there is some
duplication). The style (bitfields vs. genmask) is different between
the drivers and for a reader it is difficult to find out what is
actually supposed to be used by the two drivers (ring_desc vs
tbt_buf_desc or the ring RING_INT_EN/DISABLE macros in the header file
vs. ring_interrupt_active in nhi.c).
This would also completely separate the two drivers.
Andreas
> Signed-off-by: Amir Levy <amir.jer.levy@intel.com>
> ---
> drivers/thunderbolt/nhi_regs.h | 109 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 109 insertions(+)
>
> diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
> index 75cf069..b8e961f 100644
> --- a/drivers/thunderbolt/nhi_regs.h
> +++ b/drivers/thunderbolt/nhi_regs.h
> @@ -9,6 +9,11 @@
>
> #include <linux/types.h>
>
> +#define NHI_MMIO_BAR 0
> +
> +#define TBT_RING_MIN_NUM_BUFFERS 2
> +#define TBT_RING_MAX_FRAME_SIZE (4 * 1024)
> +
> enum ring_flags {
> RING_FLAG_ISOCH_ENABLE = 1 << 27, /* TX only? */
> RING_FLAG_E2E_FLOW_CONTROL = 1 << 28,
> @@ -39,6 +44,33 @@ struct ring_desc {
> u32 time; /* write zero */
> } __packed;
>
> +/**
> + * struct tbt_buf_desc - TX/RX ring buffer descriptor.
> + * This is same as struct ring_desc, but without the use of bitfields and
> + * with explicit endianity.
> + */
> +struct tbt_buf_desc {
> + __le64 phys;
> + __le32 attributes;
> + __le32 time;
> +};
> +
> +#define DESC_ATTR_LEN_SHIFT 0
> +#define DESC_ATTR_LEN_MASK GENMASK(11, DESC_ATTR_LEN_SHIFT)
> +#define DESC_ATTR_EOF_SHIFT 12
> +#define DESC_ATTR_EOF_MASK GENMASK(15, DESC_ATTR_EOF_SHIFT)
> +#define DESC_ATTR_SOF_SHIFT 16
> +#define DESC_ATTR_SOF_MASK GENMASK(19, DESC_ATTR_SOF_SHIFT)
> +#define DESC_ATTR_TX_ISOCH_DMA_EN BIT(20) /* TX */
> +#define DESC_ATTR_RX_CRC_ERR BIT(20) /* RX after use */
> +#define DESC_ATTR_DESC_DONE BIT(21)
> +#define DESC_ATTR_REQ_STS BIT(22) /* TX and RX before use */
> +#define DESC_ATTR_RX_BUF_OVRN_ERR BIT(22) /* RX after use */
> +#define DESC_ATTR_INT_EN BIT(23)
> +#define DESC_ATTR_OFFSET_SHIFT 24
> +#define DESC_ATTR_OFFSET_MASK GENMASK(31, DESC_ATTR_OFFSET_SHIFT)
> +
> +
> /* NHI registers in bar 0 */
>
> /*
> @@ -60,6 +92,30 @@ struct ring_desc {
> */
> #define REG_RX_RING_BASE 0x08000
>
> +#define REG_RING_STEP 16
> +#define REG_RING_PHYS_LO_OFFSET 0
> +#define REG_RING_PHYS_HI_OFFSET 4
> +#define REG_RING_CONS_PROD_OFFSET 8 /* cons - RO, prod - RW */
> +#define REG_RING_CONS_SHIFT 0
> +#define REG_RING_CONS_MASK GENMASK(15, REG_RING_CONS_SHIFT)
> +#define REG_RING_PROD_SHIFT 16
> +#define REG_RING_PROD_MASK GENMASK(31, REG_RING_PROD_SHIFT)
> +#define REG_RING_SIZE_OFFSET 12
> +#define REG_RING_SIZE_SHIFT 0
> +#define REG_RING_SIZE_MASK GENMASK(15, REG_RING_SIZE_SHIFT)
> +#define REG_RING_BUF_SIZE_SHIFT 16
> +#define REG_RING_BUF_SIZE_MASK GENMASK(27, REG_RING_BUF_SIZE_SHIFT)
> +
> +#define TBT_RING_CONS_PROD_REG(iobase, ringbase, ringnumber) \
> + ((iobase) + (ringbase) + \
> + ((ringnumber) * REG_RING_STEP) + \
> + REG_RING_CONS_PROD_OFFSET)
> +
> +#define TBT_REG_RING_PROD_EXTRACT(val) (((val) & REG_RING_PROD_MASK) >> \
> + REG_RING_PROD_SHIFT)
> +
> +#define TBT_REG_RING_CONS_EXTRACT(val) (((val) & REG_RING_CONS_MASK) >> \
> + REG_RING_CONS_SHIFT)
> /*
> * 32 bytes per entry, one entry for every hop (REG_HOP_COUNT)
> * 00: enum_ring_flags
> @@ -77,6 +133,19 @@ struct ring_desc {
> * ..: unknown
> */
> #define REG_RX_OPTIONS_BASE 0x29800
> +#define REG_RX_OPTS_TX_E2E_HOP_ID_SHIFT 12
> +#define REG_RX_OPTS_TX_E2E_HOP_ID_MASK \
> + GENMASK(22, REG_RX_OPTS_TX_E2E_HOP_ID_SHIFT)
> +#define REG_RX_OPTS_MASK_OFFSET 4
> +#define REG_RX_OPTS_MASK_EOF_SHIFT 0
> +#define REG_RX_OPTS_MASK_EOF_MASK GENMASK(15, REG_RX_OPTS_MASK_EOF_SHIFT)
> +#define REG_RX_OPTS_MASK_SOF_SHIFT 16
> +#define REG_RX_OPTS_MASK_SOF_MASK GENMASK(31, REG_RX_OPTS_MASK_SOF_SHIFT)
> +
> +#define REG_OPTS_STEP 32
> +#define REG_OPTS_E2E_EN BIT(28)
> +#define REG_OPTS_RAW BIT(30)
> +#define REG_OPTS_VALID BIT(31)
>
> /*
> * three bitfields: tx, rx, rx overflow
> @@ -86,6 +155,7 @@ struct ring_desc {
> */
> #define REG_RING_NOTIFY_BASE 0x37800
> #define RING_NOTIFY_REG_COUNT(nhi) ((31 + 3 * nhi->hop_count) / 32)
> +#define REG_RING_NOTIFY_STEP 4
>
> /*
> * two bitfields: rx, tx
> @@ -94,8 +164,47 @@ struct ring_desc {
> */
> #define REG_RING_INTERRUPT_BASE 0x38200
> #define RING_INTERRUPT_REG_COUNT(nhi) ((31 + 2 * nhi->hop_count) / 32)
> +#define REG_RING_INT_TX_PROCESSED(ring_num) BIT(ring_num)
> +#define REG_RING_INT_RX_PROCESSED(ring_num, num_paths) BIT((ring_num) + \
> + (num_paths))
> +#define RING_INT_DISABLE(base, val) iowrite32( \
> + ioread32((base) + REG_RING_INTERRUPT_BASE) & ~(val), \
> + (base) + REG_RING_INTERRUPT_BASE)
> +#define RING_INT_ENABLE(base, val) iowrite32( \
> + ioread32((base) + REG_RING_INTERRUPT_BASE) | (val), \
> + (base) + REG_RING_INTERRUPT_BASE)
> +#define RING_INT_DISABLE_TX(base, ring_num) \
> + RING_INT_DISABLE(base, REG_RING_INT_TX_PROCESSED(ring_num))
> +#define RING_INT_DISABLE_RX(base, ring_num, num_paths) \
> + RING_INT_DISABLE(base, REG_RING_INT_RX_PROCESSED(ring_num, num_paths))
> +#define RING_INT_ENABLE_TX(base, ring_num) \
> + RING_INT_ENABLE(base, REG_RING_INT_TX_PROCESSED(ring_num))
> +#define RING_INT_ENABLE_RX(base, ring_num, num_paths) \
> + RING_INT_ENABLE(base, REG_RING_INT_RX_PROCESSED(ring_num, num_paths))
> +#define RING_INT_DISABLE_TX_RX(base, ring_num, num_paths) \
> + RING_INT_DISABLE(base, REG_RING_INT_TX_PROCESSED(ring_num) | \
> + REG_RING_INT_RX_PROCESSED(ring_num, num_paths))
> +
> +#define REG_RING_INTERRUPT_STEP 4
> +
> +#define REG_INT_THROTTLING_RATE 0x38c00
> +#define REG_INT_THROTTLING_RATE_STEP 4
> +#define NUM_INT_VECTORS 16
> +
> +#define REG_INT_VEC_ALLOC_BASE 0x38c40
> +#define REG_INT_VEC_ALLOC_STEP 4
> +#define REG_INT_VEC_ALLOC_FIELD_BITS 4
> +#define REG_INT_VEC_ALLOC_FIELD_MASK (BIT(REG_INT_VEC_ALLOC_FIELD_BITS) - 1)
> +#define REG_INT_VEC_ALLOC_PER_REG ((BITS_PER_BYTE * sizeof(u32)) / \
> + REG_INT_VEC_ALLOC_FIELD_BITS)
>
> /* The last 11 bits contain the number of hops supported by the NHI port. */
> #define REG_HOP_COUNT 0x39640
> +#define REG_HOP_COUNT_TOTAL_PATHS_MASK GENMASK(10, 0)
> +
> +#define REG_HOST_INTERFACE_RST 0x39858
> +
> +#define REG_DMA_MISC 0x39864
> +#define REG_DMA_MISC_INT_AUTO_CLEAR BIT(2)
>
> #endif
> --
> 2.7.4
>
^ permalink raw reply
* Re: iptables, conntrack, and the raw table vs. L3DSR
From: Quentin Barnes @ 2016-09-10 23:49 UTC (permalink / raw)
To: netdev
In-Reply-To: <20160827215538.GA36125@gmail.com>
Being unsuccessful at getting help so far, can anyone think of who or
where else I could ask for help on this issue?
On Sat, Aug 27, 2016 at 4:55 PM, Quentin Barnes <qbarnes@gmail.com> wrote:
> Several years ago, I wrote an iptables module that rewrites packets'
> destination addresses based on the value in the DSCP field to
> implement Layer 3 Direct Server Return (L3DSR). The main code of
> the iptables target module you can find here:
> https://github.com/yahoo/l3dsr/blob/master/linux/kmod-xt/xt_DADDR.c
>
> The iptable-daddr module has been in production since I wrote it
> with some limitations. One of those limitations is it doesn't
> work well with conntrack modules. I believe that's from the daddr
> rewriting confuses conntrack since changing a packet's daddr changes
> its 4-tuple not allowing conntrack to track a connection.
>
> Someone recently suggested I change the module from the "mangle"
> table to "raw", so it can be put in the prerouting chain ahead of
> conntrack. That would let conntrack see the packet after its daddr
> update. This approach seems to work fine in a test case letting
> L3DSR and conntrack apparently work in concert, but has me concerned
> that there might be unforeseen negative side-effects from using the
> raw table for doing mangling work.
>
> Can anyone think of any issues with having a mangle target module be
> invoked from the raw table?
>
> Or as an alternative if necessary, is there a possible/rational way
> to leave the module in the mangle table and then inform conntrack
> about the packet's daddr alteration?
>
> Quentin
^ permalink raw reply
* Re: [PATCH v6 1/8] thunderbolt: Macro rename
From: Andreas Noever @ 2016-09-10 23:20 UTC (permalink / raw)
To: Amir Levy
Cc: Greg KH, Bjorn Helgaas, corbet, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, netdev, linux-doc, thunderbolt-linux,
Westerberg, Mika, Winkler, Tomas
In-Reply-To: <1470054233-4309-2-git-send-email-amir.jer.levy@intel.com>
On Mon, Aug 1, 2016 at 2:23 PM, Amir Levy <amir.jer.levy@intel.com> wrote:
> This first patch updates the registers file to
> reflect that it isn't only for Cactus Ridge.
> No functional change intended.
>
> Signed-off-by: Amir Levy <amir.jer.levy@intel.com>
> ---
> drivers/thunderbolt/nhi_regs.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
> index 86b996c..75cf069 100644
> --- a/drivers/thunderbolt/nhi_regs.h
> +++ b/drivers/thunderbolt/nhi_regs.h
> @@ -1,11 +1,11 @@
> /*
> - * Thunderbolt Cactus Ridge driver - NHI registers
> + * Thunderbolt driver - NHI registers
> *
> * Copyright (c) 2014 Andreas Noever <andreas.noever@gmail.com>
> */
>
> -#ifndef DSL3510_REGS_H_
> -#define DSL3510_REGS_H_
> +#ifndef NHI_REGS_H_
> +#define NHI_REGS_H_
>
> #include <linux/types.h>
>
> --
> 2.7.4
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
^ permalink raw reply
* Re: [PATCH v6 0/8] thunderbolt: Introducing Thunderbolt(TM) networking
From: Andreas Noever @ 2016-09-10 23:19 UTC (permalink / raw)
To: Greg KH
Cc: Amir Levy, Bjorn Helgaas, corbet, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, netdev, linux-doc, thunderbolt-linux,
Westerberg, Mika, Winkler, Tomas
In-Reply-To: <20160831112813.GB17607@kroah.com>
On Wed, Aug 31, 2016 at 1:28 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Mon, Aug 01, 2016 at 03:23:45PM +0300, Amir Levy wrote:
>> This is version 6 of Thunderbolt(TM) driver for non-Apple hardware.
>>
>> Changes since v5:
>> - Removed the padding of short packets in receive
>> - Replaced RW semaphore with mutex
>> - Cleanup
>>
>> These patches were pushed to GitHub where they can be reviewed more
>> comfortably with green/red highlighting:
>> https://github.com/01org/thunderbolt-software-kernel-tree
>>
>> Daemon code:
>> https://github.com/01org/thunderbolt-software-daemon
>>
>> For reference, here's a link to version 5:
>> [v5]: https://lkml.org/lkml/2016/7/28/85
>
> Without acks from the thunderbolt maintainer, or any network driver
> developers, I'm not going to take this series. Please work to get that
> review.
Sorry for the late response. I was away for the last two weeks and
will be away again for the next week.
This driver is independent from mine. It uses an interface provided by
the firmware which is not present on Apple hardware and with which I
am not familiar (also it does networking, not pci with which I am also
not familiar). So I cannot comment on the driver itself. I don't mind
a second driver, if that is what you are asking.
Andreas
^ permalink raw reply
* Re: [PATCH v2] net: ip, diag -- Add diag interface for raw sockets
From: David Ahern @ 2016-09-10 22:28 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: NETDEV, LKML, David S. Miller, Eric Dumazet, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrey Vagin,
Stephen Hemminger
In-Reply-To: <20160910220528.GB2001@uranus.lan>
On 9/10/16 4:05 PM, Cyrill Gorcunov wrote:
> On Sat, Sep 10, 2016 at 10:31:35AM -0600, David Ahern wrote:
>>
>> Would you mind adding the destroy capability as well? The udp version
>> should be close to what is needed for raw sockets. See udp_diag_destroy
>> and udp_abort.
>
> Should be something like below. Didn't tested it yet so for review only.
> Will do testing at Monday.
doesn't compile:
- raw_abort needs to be in a header for ipv6, and
- inet_sk_diag_fill args have changed due to a recent commit
^ permalink raw reply
* [PATCH v2] net: ip, diag -- Add diag interface for raw sockets
From: Cyrill Gorcunov @ 2016-09-10 22:05 UTC (permalink / raw)
To: David Ahern
Cc: NETDEV, LKML, David S. Miller, Eric Dumazet, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrey Vagin,
Stephen Hemminger
In-Reply-To: <de68cd54-849e-93d6-b291-83e8be010f9a@cumulusnetworks.com>
On Sat, Sep 10, 2016 at 10:31:35AM -0600, David Ahern wrote:
>
> Would you mind adding the destroy capability as well? The udp version
> should be close to what is needed for raw sockets. See udp_diag_destroy
> and udp_abort.
Should be something like below. Didn't tested it yet so for review only.
Will do testing at Monday.
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: [PATCH v2] net: ip, diag -- Add diag interface for raw sockets
In criu we are actively using diag interface to collect sockets
present in the system when dumping applications. And while for
unix, tcp, udp[lite], packet, netlink it works as expected,
the raw sockets do not have. Thus add it.
v2:
- add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
- implement @destroy for diag requests (by dsa@)
CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrey Vagin <avagin@openvz.org>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
Take a look please, once time permit. Hopefully I didn't
miss something obvious, tested as "ss -n -A raw" for modified
iproute2 instance and c/r for trivial application which has
raw sockets opened. A patch for ss tool is at https://goo.gl/VFQ93L
for the reference, will send it out then.
include/net/raw.h | 5 +
include/net/rawv6.h | 5 +
net/ipv4/Kconfig | 8 +
net/ipv4/Makefile | 1
net/ipv4/raw.c | 22 ++++-
net/ipv4/raw_diag.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++++
net/ipv6/raw.c | 7 +
7 files changed, 263 insertions(+), 4 deletions(-)
Index: linux-ml.git/include/net/raw.h
===================================================================
--- linux-ml.git.orig/include/net/raw.h
+++ linux-ml.git/include/net/raw.h
@@ -23,6 +23,11 @@
extern struct proto raw_prot;
+extern struct raw_hashinfo raw_v4_hashinfo;
+struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
+ unsigned short num, __be32 raddr,
+ __be32 laddr, int dif);
+
void raw_icmp_error(struct sk_buff *, int, u32);
int raw_local_deliver(struct sk_buff *, int);
Index: linux-ml.git/include/net/rawv6.h
===================================================================
--- linux-ml.git.orig/include/net/rawv6.h
+++ linux-ml.git/include/net/rawv6.h
@@ -3,6 +3,11 @@
#include <net/protocol.h>
+extern struct raw_hashinfo raw_v6_hashinfo;
+struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
+ unsigned short num, const struct in6_addr *loc_addr,
+ const struct in6_addr *rmt_addr, int dif);
+
void raw6_icmp_error(struct sk_buff *, int nexthdr,
u8 type, u8 code, int inner_offset, __be32);
bool raw6_local_deliver(struct sk_buff *, int);
Index: linux-ml.git/net/ipv4/Kconfig
===================================================================
--- linux-ml.git.orig/net/ipv4/Kconfig
+++ linux-ml.git/net/ipv4/Kconfig
@@ -430,6 +430,14 @@ config INET_UDP_DIAG
Support for UDP socket monitoring interface used by the ss tool.
If unsure, say Y.
+config INET_RAW_DIAG
+ tristate "RAW: socket monitoring interface"
+ depends on INET_DIAG && (IPV6 || IPV6=n)
+ default n
+ ---help---
+ Support for RAW socket monitoring interface used by the ss tool.
+ If unsure, say Y.
+
config INET_DIAG_DESTROY
bool "INET: allow privileged process to administratively close sockets"
depends on INET_DIAG
Index: linux-ml.git/net/ipv4/Makefile
===================================================================
--- linux-ml.git.orig/net/ipv4/Makefile
+++ linux-ml.git/net/ipv4/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_NETFILTER) += netfilter.o n
obj-$(CONFIG_INET_DIAG) += inet_diag.o
obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o
obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
+obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o
obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
Index: linux-ml.git/net/ipv4/raw.c
===================================================================
--- linux-ml.git.orig/net/ipv4/raw.c
+++ linux-ml.git/net/ipv4/raw.c
@@ -89,9 +89,10 @@ struct raw_frag_vec {
int hlen;
};
-static struct raw_hashinfo raw_v4_hashinfo = {
+struct raw_hashinfo raw_v4_hashinfo = {
.lock = __RW_LOCK_UNLOCKED(raw_v4_hashinfo.lock),
};
+EXPORT_SYMBOL_GPL(raw_v4_hashinfo);
int raw_hash_sk(struct sock *sk)
{
@@ -120,7 +121,7 @@ void raw_unhash_sk(struct sock *sk)
}
EXPORT_SYMBOL_GPL(raw_unhash_sk);
-static struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
+struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
unsigned short num, __be32 raddr, __be32 laddr, int dif)
{
sk_for_each_from(sk) {
@@ -136,6 +137,7 @@ static struct sock *__raw_v4_lookup(stru
found:
return sk;
}
+EXPORT_SYMBOL_GPL(__raw_v4_lookup);
/*
* 0 - deliver
@@ -918,6 +920,21 @@ static int compat_raw_ioctl(struct sock
}
#endif
+int raw_abort(struct sock *sk, int err)
+{
+ lock_sock(sk);
+
+ sk->sk_err = err;
+ sk->sk_error_report(sk);
+ udp_disconnect(sk, 0);
+
+ release_sock(sk);
+
+ sock_put(sk);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(raw_abort);
+
struct proto raw_prot = {
.name = "RAW",
.owner = THIS_MODULE,
@@ -943,6 +960,7 @@ struct proto raw_prot = {
.compat_getsockopt = compat_raw_getsockopt,
.compat_ioctl = compat_raw_ioctl,
#endif
+ .diag_destroy = raw_abort,
};
#ifdef CONFIG_PROC_FS
Index: linux-ml.git/net/ipv4/raw_diag.c
===================================================================
--- /dev/null
+++ linux-ml.git/net/ipv4/raw_diag.c
@@ -0,0 +1,219 @@
+#include <linux/module.h>
+
+#include <linux/inet_diag.h>
+#include <linux/sock_diag.h>
+
+#include <net/raw.h>
+#include <net/rawv6.h>
+
+#ifdef pr_fmt
+# undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+static struct raw_hashinfo *
+raw_get_hashinfo(const struct inet_diag_req_v2 *r)
+{
+ if (r->sdiag_family == AF_INET) {
+ return &raw_v4_hashinfo;
+#if IS_ENABLED(CONFIG_IPV6)
+ } else if (r->sdiag_family == AF_INET6) {
+ return &raw_v6_hashinfo;
+#endif
+ } else {
+ pr_warn_once("Unexpected inet family %d\n",
+ r->sdiag_family);
+ WARN_ON_ONCE(1);
+ return ERR_PTR(-EINVAL);
+ }
+}
+
+static struct sock *raw_lookup(struct net *net, struct sock *from,
+ const struct inet_diag_req_v2 *r)
+{
+ struct sock *sk = NULL;
+
+ if (r->sdiag_family == AF_INET)
+ sk = __raw_v4_lookup(net, from, r->sdiag_protocol,
+ r->id.idiag_dst[0],
+ r->id.idiag_src[0],
+ r->id.idiag_if);
+#if IS_ENABLED(CONFIG_IPV6)
+ else
+ sk = __raw_v6_lookup(net, from, r->sdiag_protocol,
+ (const struct in6_addr *)r->id.idiag_src,
+ (const struct in6_addr *)r->id.idiag_dst,
+ r->id.idiag_if);
+#endif
+ return sk;
+}
+
+static struct sock *raw_sock_get(struct net *net, const struct inet_diag_req_v2 *r)
+{
+ struct raw_hashinfo *hashinfo = raw_get_hashinfo(r);
+ struct sock *sk = NULL, *s;
+ int slot;
+
+ if (IS_ERR(hashinfo))
+ return ERR_CAST(hashinfo);
+
+ read_lock(&hashinfo->lock);
+ for (slot = 0; slot < RAW_HTABLE_SIZE; slot++) {
+ sk_for_each(s, &hashinfo->ht[slot]) {
+ sk = raw_lookup(net, s, r);
+ if (sk)
+ break;
+ }
+ }
+ if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+ sk = NULL;
+ read_unlock(&hashinfo->lock);
+
+ return sk ? sk : ERR_PTR(-ENOENT);
+}
+
+static int raw_diag_dump_one(struct sk_buff *in_skb,
+ const struct nlmsghdr *nlh,
+ const struct inet_diag_req_v2 *r)
+{
+ struct net *net = sock_net(in_skb->sk);
+ struct sk_buff *rep;
+ struct sock *sk;
+ int err;
+
+ sk = raw_sock_get(net, r);
+ if (IS_ERR(sk))
+ return PTR_ERR(sk);
+
+ rep = nlmsg_new(sizeof(struct inet_diag_msg) +
+ sizeof(struct inet_diag_meminfo) + 64,
+ GFP_KERNEL);
+ if (!rep) {
+ sock_put(sk);
+ return -ENOMEM;
+ }
+
+ err = inet_sk_diag_fill(sk, NULL, rep, r,
+ sk_user_ns(NETLINK_CB(in_skb).sk),
+ NETLINK_CB(in_skb).portid,
+ nlh->nlmsg_seq, 0, nlh);
+ sock_put(sk);
+
+ if (err < 0) {
+ kfree_skb(rep);
+ return err;
+ }
+
+ err = netlink_unicast(net->diag_nlsk, rep,
+ NETLINK_CB(in_skb).portid,
+ MSG_DONTWAIT);
+ if (err > 0)
+ err = 0;
+ return err;
+}
+
+static int sk_diag_dump(struct sock *sk, struct sk_buff *skb,
+ struct netlink_callback *cb,
+ const struct inet_diag_req_v2 *r,
+ struct nlattr *bc)
+{
+ if (!inet_diag_bc_sk(bc, sk))
+ return 0;
+
+ return inet_sk_diag_fill(sk, NULL, skb, r,
+ sk_user_ns(NETLINK_CB(cb->skb).sk),
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, NLM_F_MULTI, cb->nlh);
+}
+
+static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
+ const struct inet_diag_req_v2 *r, struct nlattr *bc)
+{
+ struct raw_hashinfo *hashinfo = raw_get_hashinfo(r);
+ struct net *net = sock_net(skb->sk);
+ int num, s_num, slot, s_slot;
+ struct sock *sk = NULL;
+
+ if (IS_ERR(hashinfo))
+ return;
+
+ s_slot = cb->args[0];
+ num = s_num = cb->args[1];
+
+ read_lock(&hashinfo->lock);
+ for (slot = s_slot; slot < RAW_HTABLE_SIZE; s_num = 0, slot++) {
+ num = 0;
+
+ sk_for_each(sk, &hashinfo->ht[slot]) {
+ struct inet_sock *inet = inet_sk(sk);
+
+ if (!net_eq(sock_net(sk), net))
+ continue;
+ if (num < s_num)
+ goto next;
+ if (sk->sk_family != r->sdiag_family)
+ goto next;
+ if (r->id.idiag_sport != inet->inet_sport &&
+ r->id.idiag_sport)
+ goto next;
+ if (r->id.idiag_dport != inet->inet_dport &&
+ r->id.idiag_dport)
+ goto next;
+ if (sk_diag_dump(sk, skb, cb, r, bc) < 0)
+ goto out_unlock;
+next:
+ num++;
+ }
+ }
+
+out_unlock:
+ read_unlock(&hashinfo->lock);
+
+ cb->args[0] = slot;
+ cb->args[1] = num;
+}
+
+static void raw_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
+ void *info)
+{
+ r->idiag_rqueue = sk_rmem_alloc_get(sk);
+ r->idiag_wqueue = sk_wmem_alloc_get(sk);
+}
+
+static int raw_diag_destroy(struct sk_buff *in_skb,
+ const struct inet_diag_req_v2 *r)
+{
+ struct net *net = sock_net(in_skb->sk);
+ struct sock *sk;
+
+ sk = raw_sock_get(net, r);
+ if (IS_ERR(sk))
+ return PTR_ERR(sk);
+ return sock_diag_destroy(sk, ECONNABORTED);
+}
+
+static const struct inet_diag_handler raw_diag_handler = {
+ .dump = raw_diag_dump,
+ .dump_one = raw_diag_dump_one,
+ .idiag_get_info = raw_diag_get_info,
+ .idiag_type = IPPROTO_RAW,
+ .idiag_info_size = 0,
+ .destroy = raw_diag_destroy,
+};
+
+static int __init raw_diag_init(void)
+{
+ return inet_diag_register(&raw_diag_handler);
+}
+
+static void __exit raw_diag_exit(void)
+{
+ inet_diag_unregister(&raw_diag_handler);
+}
+
+module_init(raw_diag_init);
+module_exit(raw_diag_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 2-255 /* AF_INET - IPPROTO_RAW */);
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 10-255 /* AF_INET6 - IPPROTO_RAW */);
Index: linux-ml.git/net/ipv6/raw.c
===================================================================
--- linux-ml.git.orig/net/ipv6/raw.c
+++ linux-ml.git/net/ipv6/raw.c
@@ -65,11 +65,12 @@
#define ICMPV6_HDRLEN 4 /* ICMPv6 header, RFC 4443 Section 2.1 */
-static struct raw_hashinfo raw_v6_hashinfo = {
+struct raw_hashinfo raw_v6_hashinfo = {
.lock = __RW_LOCK_UNLOCKED(raw_v6_hashinfo.lock),
};
+EXPORT_SYMBOL_GPL(raw_v6_hashinfo);
-static struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
+struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
unsigned short num, const struct in6_addr *loc_addr,
const struct in6_addr *rmt_addr, int dif)
{
@@ -102,6 +103,7 @@ static struct sock *__raw_v6_lookup(stru
found:
return sk;
}
+EXPORT_SYMBOL_GPL(__raw_v6_lookup);
/*
* 0 - deliver
@@ -1252,6 +1254,7 @@ struct proto rawv6_prot = {
.compat_getsockopt = compat_rawv6_getsockopt,
.compat_ioctl = compat_rawv6_ioctl,
#endif
+ .diag_destroy = raw_abort,
};
#ifdef CONFIG_PROC_FS
^ permalink raw reply
* [PATCH net-next] net: dsa: bcm_sf2: Get VLAN_PORT_MASK from b53_device
From: Florian Fainelli @ 2016-09-10 19:39 UTC (permalink / raw)
To: netdev; +Cc: davem, vivien.didelot, andrew, Florian Fainelli
While migrating the bcm_sf2 driver to use b53_common, we left a small
piece untouched where we kept our local copy of the per-port
port_vlan_ctl bitmask value. This value is now maintained by b53_device
so we need to use it instead of our local (and now stale) copy of it.
Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when possible")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/dsa/bcm_sf2.c | 2 +-
drivers/net/dsa/bcm_sf2.h | 2 --
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 51f1fc0dddc5..5bf4f3452676 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -256,7 +256,7 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port,
reg = core_readl(priv, CORE_PORT_VLAN_CTL_PORT(port));
reg &= ~PORT_VLAN_CTRL_MASK;
reg |= (1 << port);
- reg |= priv->port_sts[port].vlan_ctl_mask;
+ reg |= priv->dev->ports[port].vlan_ctl_mask;
core_writel(priv, reg, CORE_PORT_VLAN_CTL_PORT(port));
bcm_sf2_imp_vlan_setup(ds, cpu_port);
diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index 46c4ea796574..44692673e1d5 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -50,8 +50,6 @@ struct bcm_sf2_port_status {
unsigned int link;
struct ethtool_eee eee;
-
- u16 vlan_ctl_mask;
};
struct bcm_sf2_priv {
--
2.7.4
^ permalink raw reply related
* Re: [PATCH] net: ip, diag -- Add diag interface for raw sockets
From: Cyrill Gorcunov @ 2016-09-10 19:31 UTC (permalink / raw)
To: David Ahern
Cc: NETDEV, LKML, David S. Miller, Eric Dumazet, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrey Vagin,
Stephen Hemminger
In-Reply-To: <de68cd54-849e-93d6-b291-83e8be010f9a@cumulusnetworks.com>
On Sat, Sep 10, 2016 at 10:31:35AM -0600, David Ahern wrote:
> On 9/9/16 12:26 PM, Cyrill Gorcunov wrote:
> > In criu we are actively using diag interface to collect sockets
> > present in the system when dumping applications. And while for
> > unix, tcp, udp[lite], packet, netlink it works as expected,
> > the raw sockets do not have. Thus add it.
> >
> > CC: David S. Miller <davem@davemloft.net>
> > CC: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > CC: James Morris <jmorris@namei.org>
> > CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > CC: Patrick McHardy <kaber@trash.net>
> > CC: Andrey Vagin <avagin@openvz.org>
> > CC: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> > ---
>
> Would you mind adding the destroy capability as well? The udp version should be close to
> what is needed for raw sockets. See udp_diag_destroy and udp_abort.
Sure, will do.
^ permalink raw reply
* [PATCH net-next 11/11] net: flow: Remove FLOWI_FLAG_L3MDEV_SRC flag
From: David Ahern @ 2016-09-10 19:10 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
In-Reply-To: <1473534602-23602-1-git-send-email-dsa@cumulusnetworks.com>
No longer used
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
drivers/net/vrf.c | 5 ++---
include/net/flow.h | 3 +--
2 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index ccce59fbb2b3..55674b0e65b7 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -165,7 +165,7 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
.flowlabel = ip6_flowinfo(iph),
.flowi6_mark = skb->mark,
.flowi6_proto = iph->nexthdr,
- .flowi6_flags = FLOWI_FLAG_L3MDEV_SRC | FLOWI_FLAG_SKIP_NH_OIF,
+ .flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF,
};
int ret = NET_XMIT_DROP;
struct dst_entry *dst;
@@ -265,8 +265,7 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
.flowi4_oif = vrf_dev->ifindex,
.flowi4_iif = LOOPBACK_IFINDEX,
.flowi4_tos = RT_TOS(ip4h->tos),
- .flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_L3MDEV_SRC |
- FLOWI_FLAG_SKIP_NH_OIF,
+ .flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_SKIP_NH_OIF,
.daddr = ip4h->daddr,
};
struct net *net = dev_net(vrf_dev);
diff --git a/include/net/flow.h b/include/net/flow.h
index d47ef4bb5423..035aa7716967 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -34,8 +34,7 @@ struct flowi_common {
__u8 flowic_flags;
#define FLOWI_FLAG_ANYSRC 0x01
#define FLOWI_FLAG_KNOWN_NH 0x02
-#define FLOWI_FLAG_L3MDEV_SRC 0x04
-#define FLOWI_FLAG_SKIP_NH_OIF 0x08
+#define FLOWI_FLAG_SKIP_NH_OIF 0x04
__u32 flowic_secid;
struct flowi_tunnel flowic_tun_key;
};
--
2.1.4
^ permalink raw reply related
* [PATCH net-next 10/11] net: l3mdev: remove get_rtable method
From: David Ahern @ 2016-09-10 19:10 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
In-Reply-To: <1473534602-23602-1-git-send-email-dsa@cumulusnetworks.com>
No longer used
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
drivers/net/vrf.c | 21 ---------------------
include/net/l3mdev.h | 21 ---------------------
2 files changed, 42 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 3a34f547c578..ccce59fbb2b3 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -843,26 +843,6 @@ static u32 vrf_fib_table(const struct net_device *dev)
return vrf->tb_id;
}
-static struct rtable *vrf_get_rtable(const struct net_device *dev,
- const struct flowi4 *fl4)
-{
- struct rtable *rth = NULL;
-
- if (!(fl4->flowi4_flags & FLOWI_FLAG_L3MDEV_SRC)) {
- struct net_vrf *vrf = netdev_priv(dev);
-
- rcu_read_lock();
-
- rth = rcu_dereference(vrf->rth);
- if (likely(rth))
- dst_hold(&rth->dst);
-
- rcu_read_unlock();
- }
-
- return rth;
-}
-
static int vrf_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
return 0;
@@ -1088,7 +1068,6 @@ static struct dst_entry *vrf_link_scope_lookup(const struct net_device *dev,
static const struct l3mdev_ops vrf_l3mdev_ops = {
.l3mdev_fib_table = vrf_fib_table,
- .l3mdev_get_rtable = vrf_get_rtable,
.l3mdev_l3_rcv = vrf_l3_rcv,
.l3mdev_l3_out = vrf_l3_out,
#if IS_ENABLED(CONFIG_IPV6)
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index a586035c97cb..3832099289c5 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -23,8 +23,6 @@
*
* @l3mdev_l3_out: Hook in L3 output path
*
- * @l3mdev_get_rtable: Get cached IPv4 rtable (dst_entry) for device
- *
* @l3mdev_link_scope_lookup: IPv6 lookup for linklocal and mcast destinations
*/
@@ -36,10 +34,6 @@ struct l3mdev_ops {
struct sock *sk, struct sk_buff *skb,
u16 proto);
- /* IPv4 ops */
- struct rtable * (*l3mdev_get_rtable)(const struct net_device *dev,
- const struct flowi4 *fl4);
-
/* IPv6 ops */
struct dst_entry * (*l3mdev_link_scope_lookup)(const struct net_device *dev,
struct flowi6 *fl6);
@@ -120,15 +114,6 @@ static inline u32 l3mdev_fib_table(const struct net_device *dev)
return tb_id;
}
-static inline struct rtable *l3mdev_get_rtable(const struct net_device *dev,
- const struct flowi4 *fl4)
-{
- if (netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_rtable)
- return dev->l3mdev_ops->l3mdev_get_rtable(dev, fl4);
-
- return NULL;
-}
-
static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
{
struct net_device *dev;
@@ -241,12 +226,6 @@ static inline u32 l3mdev_fib_table_by_index(struct net *net, int ifindex)
return 0;
}
-static inline struct rtable *l3mdev_get_rtable(const struct net_device *dev,
- const struct flowi4 *fl4)
-{
- return NULL;
-}
-
static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
{
return false;
--
2.1.4
^ permalink raw reply related
* [PATCH net-next 08/11] net: ipv6: Remove l3mdev_get_saddr6
From: David Ahern @ 2016-09-10 19:09 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
In-Reply-To: <1473534602-23602-1-git-send-email-dsa@cumulusnetworks.com>
No longer needed
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
drivers/net/vrf.c | 41 -----------------------------------------
include/net/l3mdev.h | 11 -----------
net/ipv6/ip6_output.c | 9 +--------
net/l3mdev/l3mdev.c | 24 ------------------------
4 files changed, 1 insertion(+), 84 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 9ad2a169485f..3a34f547c578 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1084,46 +1084,6 @@ static struct dst_entry *vrf_link_scope_lookup(const struct net_device *dev,
return dst;
}
-
-/* called under rcu_read_lock */
-static int vrf_get_saddr6(struct net_device *dev, const struct sock *sk,
- struct flowi6 *fl6)
-{
- struct net *net = dev_net(dev);
- struct dst_entry *dst;
- struct rt6_info *rt;
- int err;
-
- if (rt6_need_strict(&fl6->daddr)) {
- rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif,
- RT6_LOOKUP_F_IFACE);
- if (unlikely(!rt))
- return 0;
-
- dst = &rt->dst;
- } else {
- __u8 flags = fl6->flowi6_flags;
-
- fl6->flowi6_flags |= FLOWI_FLAG_L3MDEV_SRC;
- fl6->flowi6_flags |= FLOWI_FLAG_SKIP_NH_OIF;
-
- dst = ip6_route_output(net, sk, fl6);
- rt = (struct rt6_info *)dst;
-
- fl6->flowi6_flags = flags;
- }
-
- err = dst->error;
- if (!err) {
- err = ip6_route_get_saddr(net, rt, &fl6->daddr,
- sk ? inet6_sk(sk)->srcprefs : 0,
- &fl6->saddr);
- }
-
- dst_release(dst);
-
- return err;
-}
#endif
static const struct l3mdev_ops vrf_l3mdev_ops = {
@@ -1133,7 +1093,6 @@ static const struct l3mdev_ops vrf_l3mdev_ops = {
.l3mdev_l3_out = vrf_l3_out,
#if IS_ENABLED(CONFIG_IPV6)
.l3mdev_link_scope_lookup = vrf_link_scope_lookup,
- .l3mdev_get_saddr6 = vrf_get_saddr6,
#endif
};
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 1129e1d8cd6e..a5e506eb51de 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -43,9 +43,6 @@ struct l3mdev_ops {
/* IPv6 ops */
struct dst_entry * (*l3mdev_link_scope_lookup)(const struct net_device *dev,
struct flowi6 *fl6);
- int (*l3mdev_get_saddr6)(struct net_device *dev,
- const struct sock *sk,
- struct flowi6 *fl6);
};
#ifdef CONFIG_NET_L3_MASTER_DEV
@@ -172,8 +169,6 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
}
struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6);
-int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
- struct flowi6 *fl6);
static inline
struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
@@ -292,12 +287,6 @@ struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6)
return NULL;
}
-static inline int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
- struct flowi6 *fl6)
-{
- return 0;
-}
-
static inline
struct sk_buff *l3mdev_ip_rcv(struct sk_buff *skb)
{
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 1cb41b365048..6001e781164e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -926,13 +926,6 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
int err;
int flags = 0;
- if (ipv6_addr_any(&fl6->saddr) && fl6->flowi6_oif &&
- (!*dst || !(*dst)->error)) {
- err = l3mdev_get_saddr6(net, sk, fl6);
- if (err)
- goto out_err;
- }
-
/* The correct way to handle this would be to do
* ip6_route_get_saddr, and then ip6_route_output; however,
* the route-specific preferred source forces the
@@ -1024,7 +1017,7 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
out_err_release:
dst_release(*dst);
*dst = NULL;
-out_err:
+
if (err == -ENETUNREACH)
IP6_INC_STATS(net, NULL, IPSTATS_MIB_OUTNOROUTES);
return err;
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index be40df60703c..8da86ceca33d 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -130,30 +130,6 @@ struct dst_entry *l3mdev_link_scope_lookup(struct net *net,
}
EXPORT_SYMBOL_GPL(l3mdev_link_scope_lookup);
-int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
- struct flowi6 *fl6)
-{
- struct net_device *dev;
- int rc = 0;
-
- if (fl6->flowi6_oif) {
- rcu_read_lock();
-
- dev = dev_get_by_index_rcu(net, fl6->flowi6_oif);
- if (dev && netif_is_l3_slave(dev))
- dev = netdev_master_upper_dev_get_rcu(dev);
-
- if (dev && netif_is_l3_master(dev) &&
- dev->l3mdev_ops->l3mdev_get_saddr6)
- rc = dev->l3mdev_ops->l3mdev_get_saddr6(dev, sk, fl6);
-
- rcu_read_unlock();
- }
-
- return rc;
-}
-EXPORT_SYMBOL_GPL(l3mdev_get_saddr6);
-
/**
* l3mdev_fib_rule_match - Determine if flowi references an
* L3 master device
--
2.1.4
^ permalink raw reply related
* [PATCH net-next 09/11] net: l3mdev: Remove l3mdev_fib_oif
From: David Ahern @ 2016-09-10 19:10 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
In-Reply-To: <1473534602-23602-1-git-send-email-dsa@cumulusnetworks.com>
No longer used
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
include/net/l3mdev.h | 29 -----------------------------
1 file changed, 29 deletions(-)
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index a5e506eb51de..a586035c97cb 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -107,26 +107,6 @@ struct net_device *l3mdev_master_dev_rcu(const struct net_device *_dev)
return master;
}
-/* get index of an interface to use for FIB lookups. For devices
- * enslaved to an L3 master device FIB lookups are based on the
- * master index
- */
-static inline int l3mdev_fib_oif_rcu(struct net_device *dev)
-{
- return l3mdev_master_ifindex_rcu(dev) ? : dev->ifindex;
-}
-
-static inline int l3mdev_fib_oif(struct net_device *dev)
-{
- int oif;
-
- rcu_read_lock();
- oif = l3mdev_fib_oif_rcu(dev);
- rcu_read_unlock();
-
- return oif;
-}
-
u32 l3mdev_fib_table_rcu(const struct net_device *dev);
u32 l3mdev_fib_table_by_index(struct net *net, int ifindex);
static inline u32 l3mdev_fib_table(const struct net_device *dev)
@@ -248,15 +228,6 @@ struct net_device *l3mdev_master_dev_rcu(const struct net_device *dev)
return NULL;
}
-static inline int l3mdev_fib_oif_rcu(struct net_device *dev)
-{
- return dev ? dev->ifindex : 0;
-}
-static inline int l3mdev_fib_oif(struct net_device *dev)
-{
- return dev ? dev->ifindex : 0;
-}
-
static inline u32 l3mdev_fib_table_rcu(const struct net_device *dev)
{
return 0;
--
2.1.4
^ permalink raw reply related
* [PATCH net-next 07/11] net: ipv4: Remove l3mdev_get_saddr
From: David Ahern @ 2016-09-10 19:09 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
In-Reply-To: <1473534602-23602-1-git-send-email-dsa@cumulusnetworks.com>
No longer needed
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
drivers/net/vrf.c | 38 --------------------------------------
include/net/l3mdev.h | 12 ------------
include/net/route.h | 10 ----------
net/ipv4/raw.c | 6 ------
net/ipv4/udp.c | 6 ------
net/l3mdev/l3mdev.c | 31 -------------------------------
6 files changed, 103 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index f5372edf6edc..9ad2a169485f 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -863,43 +863,6 @@ static struct rtable *vrf_get_rtable(const struct net_device *dev,
return rth;
}
-/* called under rcu_read_lock */
-static int vrf_get_saddr(struct net_device *dev, struct flowi4 *fl4)
-{
- struct fib_result res = { .tclassid = 0 };
- struct net *net = dev_net(dev);
- u32 orig_tos = fl4->flowi4_tos;
- u8 flags = fl4->flowi4_flags;
- u8 scope = fl4->flowi4_scope;
- u8 tos = RT_FL_TOS(fl4);
- int rc;
-
- if (unlikely(!fl4->daddr))
- return 0;
-
- fl4->flowi4_flags |= FLOWI_FLAG_SKIP_NH_OIF;
- fl4->flowi4_iif = LOOPBACK_IFINDEX;
- /* make sure oif is set to VRF device for lookup */
- fl4->flowi4_oif = dev->ifindex;
- fl4->flowi4_tos = tos & IPTOS_RT_MASK;
- fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
- RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
-
- rc = fib_lookup(net, fl4, &res, 0);
- if (!rc) {
- if (res.type == RTN_LOCAL)
- fl4->saddr = res.fi->fib_prefsrc ? : fl4->daddr;
- else
- fib_select_path(net, &res, fl4, -1);
- }
-
- fl4->flowi4_flags = flags;
- fl4->flowi4_tos = orig_tos;
- fl4->flowi4_scope = scope;
-
- return rc;
-}
-
static int vrf_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
return 0;
@@ -1166,7 +1129,6 @@ static int vrf_get_saddr6(struct net_device *dev, const struct sock *sk,
static const struct l3mdev_ops vrf_l3mdev_ops = {
.l3mdev_fib_table = vrf_fib_table,
.l3mdev_get_rtable = vrf_get_rtable,
- .l3mdev_get_saddr = vrf_get_saddr,
.l3mdev_l3_rcv = vrf_l3_rcv,
.l3mdev_l3_out = vrf_l3_out,
#if IS_ENABLED(CONFIG_IPV6)
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 51aab20a4d0a..1129e1d8cd6e 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -25,8 +25,6 @@
*
* @l3mdev_get_rtable: Get cached IPv4 rtable (dst_entry) for device
*
- * @l3mdev_get_saddr: Get source address for a flow
- *
* @l3mdev_link_scope_lookup: IPv6 lookup for linklocal and mcast destinations
*/
@@ -41,8 +39,6 @@ struct l3mdev_ops {
/* IPv4 ops */
struct rtable * (*l3mdev_get_rtable)(const struct net_device *dev,
const struct flowi4 *fl4);
- int (*l3mdev_get_saddr)(struct net_device *dev,
- struct flowi4 *fl4);
/* IPv6 ops */
struct dst_entry * (*l3mdev_link_scope_lookup)(const struct net_device *dev,
@@ -175,8 +171,6 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
return rc;
}
-int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
-
struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6);
int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
struct flowi6 *fl6);
@@ -292,12 +286,6 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
return false;
}
-static inline int l3mdev_get_saddr(struct net *net, int ifindex,
- struct flowi4 *fl4)
-{
- return 0;
-}
-
static inline
struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6)
{
diff --git a/include/net/route.h b/include/net/route.h
index ad777d79af94..0429d47cad25 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -29,7 +29,6 @@
#include <net/flow.h>
#include <net/inet_sock.h>
#include <net/ip_fib.h>
-#include <net/l3mdev.h>
#include <linux/in_route.h>
#include <linux/rtnetlink.h>
#include <linux/rcupdate.h>
@@ -285,15 +284,6 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
ip_route_connect_init(fl4, dst, src, tos, oif, protocol,
sport, dport, sk);
- if (!src && oif) {
- int rc;
-
- rc = l3mdev_get_saddr(net, oif, fl4);
- if (rc < 0)
- return ERR_PTR(rc);
-
- src = fl4->saddr;
- }
if (!dst || !src) {
rt = __ip_route_output_key(net, fl4);
if (IS_ERR(rt))
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 438f50c1a676..90a85c955872 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -606,12 +606,6 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
(inet->hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0);
- if (!saddr && ipc.oif) {
- err = l3mdev_get_saddr(net, ipc.oif, &fl4);
- if (err < 0)
- goto done;
- }
-
if (!inet->hdrincl) {
rfv.msg = msg;
rfv.hlen = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 058c31286ce1..7d96dc2d3d08 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1021,12 +1021,6 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
flow_flags,
faddr, saddr, dport, inet->inet_sport);
- if (!saddr && ipc.oif) {
- err = l3mdev_get_saddr(net, ipc.oif, fl4);
- if (err < 0)
- goto out;
- }
-
security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
rt = ip_route_output_flow(net, fl4, sk);
if (IS_ERR(rt)) {
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index ac9d928d0a9e..be40df60703c 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -130,37 +130,6 @@ struct dst_entry *l3mdev_link_scope_lookup(struct net *net,
}
EXPORT_SYMBOL_GPL(l3mdev_link_scope_lookup);
-/**
- * l3mdev_get_saddr - get source address for a flow based on an interface
- * enslaved to an L3 master device
- * @net: network namespace for device index lookup
- * @ifindex: Interface index
- * @fl4: IPv4 flow struct
- */
-
-int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4)
-{
- struct net_device *dev;
- int rc = 0;
-
- if (ifindex) {
- rcu_read_lock();
-
- dev = dev_get_by_index_rcu(net, ifindex);
- if (dev && netif_is_l3_slave(dev))
- dev = netdev_master_upper_dev_get_rcu(dev);
-
- if (dev && netif_is_l3_master(dev) &&
- dev->l3mdev_ops->l3mdev_get_saddr)
- rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4);
-
- rcu_read_unlock();
- }
-
- return rc;
-}
-EXPORT_SYMBOL_GPL(l3mdev_get_saddr);
-
int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
struct flowi6 *fl6)
{
--
2.1.4
^ permalink raw reply related
* [PATCH net-next 06/11] net: l3mdev: remove redundant calls
From: David Ahern @ 2016-09-10 19:09 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
In-Reply-To: <1473534602-23602-1-git-send-email-dsa@cumulusnetworks.com>
A previous patch added l3mdev flow update making these hooks
redundant. Remove them.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
net/ipv4/ip_output.c | 3 +--
net/ipv4/route.c | 12 ++----------
net/ipv4/xfrm4_policy.c | 2 +-
net/ipv6/ip6_output.c | 2 --
net/ipv6/ndisc.c | 11 ++---------
net/ipv6/route.c | 7 +------
net/ipv6/tcp_ipv6.c | 8 ++------
net/ipv6/xfrm6_policy.c | 2 +-
8 files changed, 10 insertions(+), 37 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4f37cbaa57b2..b43f094b2f47 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1582,8 +1582,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
}
oif = arg->bound_dev_if;
- if (!oif && netif_index_is_l3_master(net, skb->skb_iif))
- oif = skb->skb_iif;
+ oif = oif ? : skb->skb_iif;
flowi4_init_output(&fl4, oif,
IP4_REPLY_MARK(net, skb->mark),
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ad83f85fb240..b52496fd5107 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1831,7 +1831,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
* Now we are ready to route packet.
*/
fl4.flowi4_oif = 0;
- fl4.flowi4_iif = l3mdev_fib_oif_rcu(dev);
+ fl4.flowi4_iif = dev->ifindex;
fl4.flowi4_mark = skb->mark;
fl4.flowi4_tos = tos;
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
@@ -2150,7 +2150,6 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
unsigned int flags = 0;
struct fib_result res;
struct rtable *rth;
- int master_idx;
int orig_oif;
int err = -ENETUNREACH;
@@ -2160,9 +2159,6 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
orig_oif = fl4->flowi4_oif;
- master_idx = l3mdev_master_ifindex_by_index(net, fl4->flowi4_oif);
- if (master_idx)
- fl4->flowi4_oif = master_idx;
fl4->flowi4_iif = LOOPBACK_IFINDEX;
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
@@ -2263,8 +2259,7 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
if (err) {
res.fi = NULL;
res.table = NULL;
- if (fl4->flowi4_oif &&
- !netif_index_is_l3_master(net, fl4->flowi4_oif)) {
+ if (fl4->flowi4_oif) {
/* Apparently, routing tables are wrong. Assume,
that the destination is on link.
@@ -2577,9 +2572,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
fl4.flowi4_oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0;
fl4.flowi4_mark = mark;
- if (netif_index_is_l3_master(net, fl4.flowi4_oif))
- fl4.flowi4_flags = FLOWI_FLAG_L3MDEV_SRC | FLOWI_FLAG_SKIP_NH_OIF;
-
if (iif) {
struct net_device *dev;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index b644a23c3db0..3155ed73d3b3 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -112,7 +112,7 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse)
int oif = 0;
if (skb_dst(skb))
- oif = l3mdev_fib_oif(skb_dst(skb)->dev);
+ oif = skb_dst(skb)->dev->ifindex;
memset(fl4, 0, sizeof(struct flowi4));
fl4->flowi4_mark = skb->mark;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6ea6caace3a8..1cb41b365048 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1070,8 +1070,6 @@ struct dst_entry *ip6_dst_lookup_flow(const struct sock *sk, struct flowi6 *fl6,
return ERR_PTR(err);
if (final_dst)
fl6->daddr = *final_dst;
- if (!fl6->flowi6_oif)
- fl6->flowi6_oif = l3mdev_fib_oif(dst->dev);
return xfrm_lookup_route(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
}
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index fe65cdc28a45..d8e671457d10 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -67,7 +67,6 @@
#include <net/flow.h>
#include <net/ip6_checksum.h>
#include <net/inet_common.h>
-#include <net/l3mdev.h>
#include <linux/proc_fs.h>
#include <linux/netfilter.h>
@@ -457,11 +456,9 @@ static void ndisc_send_skb(struct sk_buff *skb,
if (!dst) {
struct flowi6 fl6;
- int oif = l3mdev_fib_oif(skb->dev);
+ int oif = skb->dev->ifindex;
icmpv6_flow_init(sk, &fl6, type, saddr, daddr, oif);
- if (oif != skb->dev->ifindex)
- fl6.flowi6_flags |= FLOWI_FLAG_L3MDEV_SRC;
dst = icmp6_dst_alloc(skb->dev, &fl6);
if (IS_ERR(dst)) {
kfree_skb(skb);
@@ -1538,7 +1535,6 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
int rd_len;
u8 ha_buf[MAX_ADDR_LEN], *ha = NULL,
ops_data_buf[NDISC_OPS_REDIRECT_DATA_SPACE], *ops_data = NULL;
- int oif = l3mdev_fib_oif(dev);
bool ret;
if (ipv6_get_lladdr(dev, &saddr_buf, IFA_F_TENTATIVE)) {
@@ -1555,10 +1551,7 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
}
icmpv6_flow_init(sk, &fl6, NDISC_REDIRECT,
- &saddr_buf, &ipv6_hdr(skb)->saddr, oif);
-
- if (oif != skb->dev->ifindex)
- fl6.flowi6_flags |= FLOWI_FLAG_L3MDEV_SRC;
+ &saddr_buf, &ipv6_hdr(skb)->saddr, dev->ifindex);
dst = ip6_route_output(net, NULL, &fl6);
if (dst->error) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 87e0a01ce744..ad4a7ff301fc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1164,7 +1164,7 @@ void ip6_route_input(struct sk_buff *skb)
int flags = RT6_LOOKUP_F_HAS_SADDR;
struct ip_tunnel_info *tun_info;
struct flowi6 fl6 = {
- .flowi6_iif = l3mdev_fib_oif(skb->dev),
+ .flowi6_iif = skb->dev->ifindex,
.daddr = iph->daddr,
.saddr = iph->saddr,
.flowlabel = ip6_flowinfo(iph),
@@ -3349,11 +3349,6 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
} else {
fl6.flowi6_oif = oif;
- if (netif_index_is_l3_master(net, oif)) {
- fl6.flowi6_flags = FLOWI_FLAG_L3MDEV_SRC |
- FLOWI_FLAG_SKIP_NH_OIF;
- }
-
rt = (struct rt6_info *)ip6_route_output(net, NULL, &fl6);
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 04529a3d42cb..54cf7197c7ab 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -818,12 +818,8 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
fl6.flowi6_proto = IPPROTO_TCP;
if (rt6_need_strict(&fl6.daddr) && !oif)
fl6.flowi6_oif = tcp_v6_iif(skb);
- else {
- if (!oif && netif_index_is_l3_master(net, skb->skb_iif))
- oif = skb->skb_iif;
-
- fl6.flowi6_oif = oif;
- }
+ else
+ fl6.flowi6_oif = oif ? : skb->skb_iif;
fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark);
fl6.fl6_dport = t1->dest;
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 6cc97003e4a9..b7b7e863a2bb 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -134,7 +134,7 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
nexthdr = nh[nhoff];
if (skb_dst(skb))
- oif = l3mdev_fib_oif(skb_dst(skb)->dev);
+ oif = skb_dst(skb)->dev->ifindex;
memset(fl6, 0, sizeof(struct flowi6));
fl6->flowi6_mark = skb->mark;
--
2.1.4
^ permalink raw reply related
* [PATCH net-next 05/11] net: vrf: Flip IPv6 output path from FIB lookup hook to out hook
From: David Ahern @ 2016-09-10 19:09 UTC (permalink / raw)
To: netdev; +Cc: shm, David Ahern
In-Reply-To: <1473534602-23602-1-git-send-email-dsa@cumulusnetworks.com>
Flip the IPv6 output path to use the l3mdev tx out hook. The VRF dst
is not returned on the first FIB lookup. Instead, the dst on the
skb is switched at the beginning of the IPv6 output processing to
send the packet to the VRF driver on xmit.
Link scope addresses (linklocal and multicast) need special handling:
specifically the oif the flow struct can not be changed because we
want the lookup tied to the enslaved interface. ie., the source address
and the returned route MUST point to the interface scope passed in.
Convert the existing vrf_get_rt6_dst to handle only link scope addresses.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
drivers/net/vrf.c | 124 ++++++++++++++++++++++++++++++++++-----------------
include/net/l3mdev.h | 8 ++--
net/ipv6/route.c | 11 +++--
net/l3mdev/l3mdev.c | 15 +++----
4 files changed, 100 insertions(+), 58 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 08540b96ec18..f5372edf6edc 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -137,6 +137,20 @@ static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev,
}
#if IS_ENABLED(CONFIG_IPV6)
+static int vrf_ip6_local_out(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ int err;
+
+ err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net,
+ sk, skb, NULL, skb_dst(skb)->dev, dst_output);
+
+ if (likely(err == 1))
+ err = dst_output(net, sk, skb);
+
+ return err;
+}
+
static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
struct net_device *dev)
{
@@ -207,7 +221,7 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
/* strip the ethernet header added for pass through VRF device */
__skb_pull(skb, skb_network_offset(skb));
- ret = ip6_local_out(net, skb->sk, skb);
+ ret = vrf_ip6_local_out(net, skb->sk, skb);
if (unlikely(net_xmit_eval(ret)))
dev->stats.tx_errors++;
else
@@ -391,6 +405,43 @@ static int vrf_output6(struct net *net, struct sock *sk, struct sk_buff *skb)
!(IP6CB(skb)->flags & IP6SKB_REROUTED));
}
+/* set dst on skb to send packet to us via dev_xmit path. Allows
+ * packet to go through device based features such as qdisc, netfilter
+ * hooks and packet sockets with skb->dev set to vrf device.
+ */
+static struct sk_buff *vrf_ip6_out(struct net_device *vrf_dev,
+ struct sock *sk,
+ struct sk_buff *skb)
+{
+ struct net_vrf *vrf = netdev_priv(vrf_dev);
+ struct dst_entry *dst = NULL;
+ struct rt6_info *rt6;
+
+ /* don't divert link scope packets */
+ if (rt6_need_strict(&ipv6_hdr(skb)->daddr))
+ return skb;
+
+ rcu_read_lock();
+
+ rt6 = rcu_dereference(vrf->rt6);
+ if (likely(rt6)) {
+ dst = &rt6->dst;
+ dst_hold(dst);
+ }
+
+ rcu_read_unlock();
+
+ if (unlikely(!dst)) {
+ vrf_tx_error(vrf_dev, skb);
+ return NULL;
+ }
+
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+
+ return skb;
+}
+
/* holding rtnl */
static void vrf_rt6_release(struct net_device *dev, struct net_vrf *vrf)
{
@@ -477,6 +528,13 @@ static int vrf_rt6_create(struct net_device *dev)
return rc;
}
#else
+static struct sk_buff *vrf_ip6_out(struct net_device *vrf_dev,
+ struct sock *sk,
+ struct sk_buff *skb)
+{
+ return skb;
+}
+
static void vrf_rt6_release(struct net_device *dev, struct net_vrf *vrf)
{
}
@@ -587,6 +645,8 @@ static struct sk_buff *vrf_l3_out(struct net_device *vrf_dev,
switch (proto) {
case AF_INET:
return vrf_ip_out(vrf_dev, sk, skb);
+ case AF_INET6:
+ return vrf_ip6_out(vrf_dev, sk, skb);
}
return skb;
@@ -1031,53 +1091,33 @@ static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev,
}
#if IS_ENABLED(CONFIG_IPV6)
-static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev,
- struct flowi6 *fl6)
+/* send to link-local or multicast address via interface enslaved to
+ * VRF device. Force lookup to VRF table without changing flow struct
+ */
+static struct dst_entry *vrf_link_scope_lookup(const struct net_device *dev,
+ struct flowi6 *fl6)
{
- bool need_strict = rt6_need_strict(&fl6->daddr);
- struct net_vrf *vrf = netdev_priv(dev);
struct net *net = dev_net(dev);
+ int flags = RT6_LOOKUP_F_IFACE;
struct dst_entry *dst = NULL;
struct rt6_info *rt;
- /* send to link-local or multicast address */
- if (need_strict) {
- int flags = RT6_LOOKUP_F_IFACE;
-
- /* VRF device does not have a link-local address and
- * sending packets to link-local or mcast addresses over
- * a VRF device does not make sense
- */
- if (fl6->flowi6_oif == dev->ifindex) {
- struct dst_entry *dst = &net->ipv6.ip6_null_entry->dst;
-
- dst_hold(dst);
- return dst;
- }
-
- if (!ipv6_addr_any(&fl6->saddr))
- flags |= RT6_LOOKUP_F_HAS_SADDR;
-
- rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif, flags);
- if (rt)
- dst = &rt->dst;
-
- } else if (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC)) {
-
- rcu_read_lock();
-
- rt = rcu_dereference(vrf->rt6);
- if (likely(rt)) {
- dst = &rt->dst;
- dst_hold(dst);
- }
-
- rcu_read_unlock();
+ /* VRF device does not have a link-local address and
+ * sending packets to link-local or mcast addresses over
+ * a VRF device does not make sense
+ */
+ if (fl6->flowi6_oif == dev->ifindex) {
+ dst = &net->ipv6.ip6_null_entry->dst;
+ dst_hold(dst);
+ return dst;
}
- /* make sure oif is set to VRF device for lookup */
- if (!need_strict)
- fl6->flowi6_oif = dev->ifindex;
+ if (!ipv6_addr_any(&fl6->saddr))
+ flags |= RT6_LOOKUP_F_HAS_SADDR;
+
+ rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif, flags);
+ if (rt)
+ dst = &rt->dst;
return dst;
}
@@ -1130,7 +1170,7 @@ static const struct l3mdev_ops vrf_l3mdev_ops = {
.l3mdev_l3_rcv = vrf_l3_rcv,
.l3mdev_l3_out = vrf_l3_out,
#if IS_ENABLED(CONFIG_IPV6)
- .l3mdev_get_rt6_dst = vrf_get_rt6_dst,
+ .l3mdev_link_scope_lookup = vrf_link_scope_lookup,
.l3mdev_get_saddr6 = vrf_get_saddr6,
#endif
};
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 3ee110518584..51aab20a4d0a 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -27,7 +27,7 @@
*
* @l3mdev_get_saddr: Get source address for a flow
*
- * @l3mdev_get_rt6_dst: Get cached IPv6 rt6_info (dst_entry) for device
+ * @l3mdev_link_scope_lookup: IPv6 lookup for linklocal and mcast destinations
*/
struct l3mdev_ops {
@@ -45,7 +45,7 @@ struct l3mdev_ops {
struct flowi4 *fl4);
/* IPv6 ops */
- struct dst_entry * (*l3mdev_get_rt6_dst)(const struct net_device *dev,
+ struct dst_entry * (*l3mdev_link_scope_lookup)(const struct net_device *dev,
struct flowi6 *fl6);
int (*l3mdev_get_saddr6)(struct net_device *dev,
const struct sock *sk,
@@ -177,7 +177,7 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
-struct dst_entry *l3mdev_get_rt6_dst(struct net *net, struct flowi6 *fl6);
+struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6);
int l3mdev_get_saddr6(struct net *net, const struct sock *sk,
struct flowi6 *fl6);
@@ -299,7 +299,7 @@ static inline int l3mdev_get_saddr(struct net *net, int ifindex,
}
static inline
-struct dst_entry *l3mdev_get_rt6_dst(struct net *net, struct flowi6 *fl6)
+struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6)
{
return NULL;
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2c681113c055..87e0a01ce744 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1188,12 +1188,15 @@ static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table
struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk,
struct flowi6 *fl6, int flags)
{
- struct dst_entry *dst;
bool any_src;
- dst = l3mdev_get_rt6_dst(net, fl6);
- if (dst)
- return dst;
+ if (rt6_need_strict(&fl6->daddr)) {
+ struct dst_entry *dst;
+
+ dst = l3mdev_link_scope_lookup(net, fl6);
+ if (dst)
+ return dst;
+ }
fl6->flowi6_iif = LOOPBACK_IFINDEX;
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index 43610e5acc4e..ac9d928d0a9e 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -100,15 +100,14 @@ u32 l3mdev_fib_table_by_index(struct net *net, int ifindex)
EXPORT_SYMBOL_GPL(l3mdev_fib_table_by_index);
/**
- * l3mdev_get_rt6_dst - IPv6 route lookup based on flow. Returns
- * cached route for L3 master device if relevant
- * to flow
+ * l3mdev_link_scope_lookup - IPv6 route lookup based on flow for link
+ * local and multicast addresses
* @net: network namespace for device index lookup
* @fl6: IPv6 flow struct for lookup
*/
-struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
- struct flowi6 *fl6)
+struct dst_entry *l3mdev_link_scope_lookup(struct net *net,
+ struct flowi6 *fl6)
{
struct dst_entry *dst = NULL;
struct net_device *dev;
@@ -121,15 +120,15 @@ struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
dev = netdev_master_upper_dev_get_rcu(dev);
if (dev && netif_is_l3_master(dev) &&
- dev->l3mdev_ops->l3mdev_get_rt6_dst)
- dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
+ dev->l3mdev_ops->l3mdev_link_scope_lookup)
+ dst = dev->l3mdev_ops->l3mdev_link_scope_lookup(dev, fl6);
rcu_read_unlock();
}
return dst;
}
-EXPORT_SYMBOL_GPL(l3mdev_get_rt6_dst);
+EXPORT_SYMBOL_GPL(l3mdev_link_scope_lookup);
/**
* l3mdev_get_saddr - get source address for a flow based on an interface
--
2.1.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox