* Re: [PATCH v3 07/14] net: smc911x: remove the dmaengine compat need
From: David Miller @ 2018-06-20 23:39 UTC (permalink / raw)
To: robert.jarzmik; +Cc: daniel, linux-kernel, netdev
In-Reply-To: <87muvppebs.fsf@belgarion.home>
From: Robert Jarzmik <robert.jarzmik@free.fr>
Date: Wed, 20 Jun 2018 19:17:43 +0200
> Could you (or somebody from netdev) review it and either ack it (and I'll take
> it through the pxa tree), or take it for v4.19 please ?
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [PATCH v3 08/14] net: smc91x: remove the dmaengine compat need
From: David Miller @ 2018-06-20 23:40 UTC (permalink / raw)
To: robert.jarzmik; +Cc: daniel, linux-kernel, netdev
In-Reply-To: <87lgb9pebm.fsf@belgarion.home>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Saeed Mahameed @ 2018-06-20 23:41 UTC (permalink / raw)
To: eric.dumazet@gmail.com, kafai@fb.com, Tariq Toukan
Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <1bd6da9b-fa46-25e7-8921-cb56eb91e71b@gmail.com>
On Tue, 2018-06-19 at 17:25 -0700, Eric Dumazet wrote:
>
> On 06/19/2018 11:05 AM, Saeed Mahameed wrote:
>
> > this is only true for XDP setup, for non XDP max stride_size can
> > only
> > be around ~3k and only for mtu > ~6k
> >
> > For XDP setup you suggested:
> > - priv->frag_info[0].frag_size = eff_mtu;
> > + priv->frag_info[0].frag_size = PAGE_SIZE;
> >
> > currently the condition is:
> >
> > release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> >
> > so my solution and yours have the same problem you described above.
> >
> > the problem is not with the initial values or with stride/farg size
> > math, it just that in XDP we shouldn't reuse at ALL. I agree with
> > you
> > that we need to optimize and maybe for PAGE_SIZE > 8k we need to
> > allow
> > XDP setup to reuses. but for now there is a data corruption to
> > handle.
>
>
> Sure, we all agree there is a bug to fix.
>
> The way you are fixing it is kind of illogical.
>
> The NIC can use a frag if its _size_ is big enough to receive the
> frame.
>
> The _stride_ is an abstraction created by the driver to report an
> estimation of the _truesize_,
> or memory consumption, so that linux can better track overall memory
> usage.
>
> For example, if MTU=1500, the size of the fragment is 1536 bytes, but
> since we can put only
> 2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048
> bytes.
>
> Declaring that a final blob of a page, being 1600 bytes, not able to
> receive a frame because
> _stride_ is 2048 is illogical and waste resources.
>
>
I see, I wanted to use _stride_ as grantee for how much a page frag can
grow, for example in mlx5 we need the whole stride to build_skb around
the frag, since we always need the trailer, but it is different in here
and we can avoid resource waste.
so how a bout this: (As suggested by Martin).
currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
is always 0 in non-XDP setup, hence:
frags->page_offset += sz_align;
where it really should be:
frags->page_offset += sz_align + priv->rx_headroom;
we can use it as a hint to not reuse as below:
what do you think ?
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
{
const struct mlx4_en_frag_info *frag_info = priv->frag_info;
unsigned int truesize = 0;
+ bool release = true;
int nr, frag_size;
struct page *page;
dma_addr_t dma;
- bool release;
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
/* Collect used fragments while replacing them in the HW
descriptors */
for (nr = 0;; frags++) {
@@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
release = page_count(page) != 1 ||
page_is_pfmemalloc(page) ||
page_to_nid(page) != numa_mem_id();
- } else {
+ } elseif(!priv->rx_headroom) {
u32 sz_align = ALIGN(frag_size,
SMP_CACHE_BYTES);
frags->page_offset += sz_align;
^ permalink raw reply related
* Re: net-next compilation failures
From: David Miller @ 2018-06-20 23:43 UTC (permalink / raw)
To: Manish.Chopra; +Cc: netdev
In-Reply-To: <CY1PR0701MB1163F3D32FBB527D5C7054E189770@CY1PR0701MB1163.namprd07.prod.outlook.com>
From: "Chopra, Manish" <Manish.Chopra@cavium.com>
Date: Wed, 20 Jun 2018 21:00:40 +0000
> I am trying to compile net-next kernel and I face these below
> compilation errros for some reason. Attached the kernel .config
> used. Any idea for what reason these failures could be stemming ?
The net-next tree isn't open and therefore you shouldn't be doing
work against it.
When it is closed, I don't try hard to keep net-next up to date with
upstream and thus the recent upstream fixes for build problems or
other bugs.
^ permalink raw reply
* [PATCH net] bpf: enforce correct alignment for instructions
From: Eric Dumazet @ 2018-06-21 0:24 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Eric Dumazet, Eric Dumazet, Daniel Borkmann,
Martin KaFai Lau, Alexei Starovoitov
After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
offsetof(struct bpf_binary_header, image) became 3 instead of 4,
breaking powerpc BPF badly, since instructions need to be word aligned.
Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
---
include/linux/filter.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index b615df57b7d5b2ccb468c411c3a2aae103cd2aea..20f2659dd829256d7fef206087ab3262e1e291f5 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -472,7 +472,9 @@ struct sock_fprog_kern {
struct bpf_binary_header {
u16 pages;
u16 locked:1;
- u8 image[];
+
+ /* Some arches need word alignment for their instructions */
+ u8 image[] __aligned(4);
};
struct bpf_prog {
--
2.18.0.rc1.244.gcf134e6275-goog
^ permalink raw reply related
* Re: [PATCH] bpfilter: fix user mode helper cross compilation
From: David Miller @ 2018-06-21 0:19 UTC (permalink / raw)
To: mcroce; +Cc: netdev
In-Reply-To: <20180620140434.18139-1-mcroce@redhat.com>
From: Matteo Croce <mcroce@redhat.com>
Date: Wed, 20 Jun 2018 16:04:34 +0200
> Use $(OBJDUMP) instead of literal 'objdump' to avoid
> using host toolchain when cross compiling.
>
> Fixes: 421780fd4983 ("bpfilter: fix build error")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>
Applied.
^ permalink raw reply
* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Eric Dumazet @ 2018-06-21 0:28 UTC (permalink / raw)
To: Saeed Mahameed, eric.dumazet@gmail.com, kafai@fb.com,
Tariq Toukan
Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <65f1a941a3013250e2a768a31f5e521dc21f73e8.camel@mellanox.com>
On 06/20/2018 04:41 PM, Saeed Mahameed wrote:
>
> I see, I wanted to use _stride_ as grantee for how much a page frag can
> grow, for example in mlx5 we need the whole stride to build_skb around
> the frag, since we always need the trailer, but it is different in here
> and we can avoid resource waste.
>
> so how a bout this: (As suggested by Martin).
> currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
> is always 0 in non-XDP setup, hence:
>
> frags->page_offset += sz_align;
>
> where it really should be:
> frags->page_offset += sz_align + priv->rx_headroom;
>
> we can use it as a hint to not reuse as below:
> what do you think ?
>
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
> {
> const struct mlx4_en_frag_info *frag_info = priv->frag_info;
> unsigned int truesize = 0;
> + bool release = true;
> int nr, frag_size;
> struct page *page;
> dma_addr_t dma;
> - bool release;
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>
> /* Collect used fragments while replacing them in the HW
> descriptors */
> for (nr = 0;; frags++) {
> @@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
> release = page_count(page) != 1 ||
> page_is_pfmemalloc(page) ||
> page_to_nid(page) != numa_mem_id();
> - } else {
> + } elseif(!priv->rx_headroom) {
> u32 sz_align = ALIGN(frag_size,
> SMP_CACHE_BYTES);
>
> frags->page_offset += sz_align;
>
I guess that would work, please double check priv->rx_headroom wont need another cache line,
thanks !
^ permalink raw reply
* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Shannon Nelson @ 2018-06-21 0:32 UTC (permalink / raw)
To: Anders Roxell; +Cc: Networking, David Miller
In-Reply-To: <CADYN=9Jmgk91RBiEyEbSdMr5_3JQgk058CO6HLRhDW6NFaF6qQ@mail.gmail.com>
On 6/20/2018 4:18 PM, Anders Roxell wrote:
> On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>
>> On 6/20/2018 12:09 PM, Anders Roxell wrote:
>>> On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>>>
>>>> A couple of bad behaviors in the ipsec selftest were pointed out
>>>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
>>>>
>>>> Shannon Nelson (2):
>>>> selftests: rtnetlink: hide complaint from terminated monitor
>>>> selftests: rtnetlink: use a local IP address for IPsec tests
>>>>
>>>> tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
>>>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>>>
>>>> --
>>>> 2.7.4
>>>>
>>>
>>> Hi Shannon,
>>>
>>> With this patches applied and my config patch.
>>>
>>> I still get this error when I run the ipsec test:
>>>
>>> FAIL: can't add fou port 7777, skipping test
>>> RTNETLINK answers: Operation not supported
>>> FAIL: can't add macsec interface, skipping test
>>> RTNETLINK answers: Protocol not supported
>>> RTNETLINK answers: No such process
>>> RTNETLINK answers: No such process
>>> FAIL: ipsec
>>
>> One of the odd things I noticed about this script is that there really
>> aren't any diagnosis messages, just PASS or FAIL. I followed this
>> custom when I added the ipsec tests, but I think this is something that
>> should change so we can get some idea of what breaks.
>>
>> I'm curious about the "RTNETLINK answers" messages and where they might
>> be coming from, especially "RTNETLINK answers: Protocol not supported".
>
> I added: "set -x" in the beginning of the rtnetlink.sh script.
> + ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
> transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
> 0x3132333435
> 363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
> RTNETLINK answers: Protocol not supported
Okay, so ip didn't like this command...
>> What are the XFRM and AES settings in your kernel config - what is the
>> output from
>> egrep -i "xfrm|_aes" .config
>
> CONFIG_XFRM=y
> CONFIG_XFRM_ALGO=y
> CONFIG_XFRM_USER=y
> CONFIG_INET_XFRM_MODE_TUNNEL=y
> CONFIG_INET6_XFRM_MODE_TRANSPORT=y
> CONFIG_INET6_XFRM_MODE_TUNNEL=y
> CONFIG_INET6_XFRM_MODE_BEET=y
> CONFIG_CRYPTO_AES=y
And this is probably why - there seem to be a few config variables
missing, including CONFIG_INET_XFRM_MODE_TRANSPORT, which might be why
the ip command fails above.
Here's what I have in my config:
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_X86_64=m
CONFIG_CRYPTO_AES_NI_INTEL=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
Can I talk you into adding CONFIG_INET_XFRM_MODE_TRANSPORT to your
config and trying again?
sln
^ permalink raw reply
* Re: [PATCH] selftests: net: add config fragments
From: David Miller @ 2018-06-21 0:47 UTC (permalink / raw)
To: anders.roxell
Cc: shuah, fw, shannon.nelson, netdev, linux-kselftest, linux-kernel
In-Reply-To: <20180619164111.30785-1-anders.roxell@linaro.org>
From: Anders Roxell <anders.roxell@linaro.org>
Date: Tue, 19 Jun 2018 18:41:11 +0200
> Add fragments to pass bridge and vlan tests.
>
> Fixes: 33b01b7b4f19 ("selftests: add rtnetlink test script")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net][RESEND] strparser: Don't schedule in workqueue in paused state
From: David Miller @ 2018-06-21 0:54 UTC (permalink / raw)
To: vakul.garg
Cc: doronrk, tom, john.fastabend, davejwatson, netdev, ebiggers,
linux-kernel
In-Reply-To: <20180620215949.32334-1-vakul.garg@nxp.com>
From: Vakul Garg <vakul.garg@nxp.com>
Date: Thu, 21 Jun 2018 03:29:49 +0530
> In function strp_data_ready(), it is useless to call queue_work if
> the state of strparser is already paused. The state checking should
> be done before calling queue_work. The change reduces the context
> switches and improves the ktls-rx throughput by approx 20% (measured
> on cortex-a53 based platform).
>
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH] r8169: Fix netpoll oops
From: David Miller @ 2018-06-21 0:56 UTC (permalink / raw)
To: ville.syrjala; +Cc: netdev, nic_swsd, hkallweit1
In-Reply-To: <20180620120153.11676-1-ville.syrjala@linux.intel.com>
From: Ville Syrjala <ville.syrjala@linux.intel.com>
Date: Wed, 20 Jun 2018 15:01:53 +0300
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> Pass the correct thing to rtl8169_interrupt() from netpoll.
>
> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
> Cc: netdev@vger.kernel.org
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Fixes: ebcd5daa7ffd ("r8169: change interrupt handler argument type")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Hangbin Liu @ 2018-06-21 1:18 UTC (permalink / raw)
To: Cong Wang
Cc: David Miller, Linux Kernel Network Developers, Stefano Brivio,
Paolo Abeni, Mahesh Bandewar
In-Reply-To: <CAM_iQpVdCJ2wjujco0+xZ_Jjeyu-qZ_g+8cVHv9LmUARhJmMoQ@mail.gmail.com>
On Wed, Jun 20, 2018 at 10:45:39AM -0700, Cong Wang wrote:
> On Tue, Jun 19, 2018 at 10:31 PM, David Miller <davem@davemloft.net> wrote:
> > From: Hangbin Liu <liuhangbin@gmail.com>
> > Date: Wed, 20 Jun 2018 11:22:54 +0800
> >
> >> The only case dev_change_flags() return an err is when we change IFF_UP flag.
> >> Since we only set/reset IFF_NOARP, do you think we still need to check the
> >> return value?
> >
> > It is bad to try and take shortcuts on error handling using assumptions
> > like that.
> >
> > If dev_change_flags() is adjusted to return error codes in more
> > situations, nobody is going to remember to undo your "optimziation"
> > here.
> >
> > Please check for errors, thank you.
>
> Yeah. Also since the notifier is triggered in this case:
>
> if (dev->flags & IFF_UP &&
> (changes & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI | IFF_VOLATILE))) {
> struct netdev_notifier_change_info change_info = {
> .info = {
> .dev = dev,
> },
> .flags_changed = changes,
> };
>
> call_netdevice_notifiers_info(NETDEV_CHANGE, &change_info.info);
> }
>
> the return value of call_netdevice_notifiers_info() isn't captured
> either, but it should be.
Thanks for the explanation. I will fix it.
Regards
Hangbin
^ permalink raw reply
* [PATCH] cfg80211: use IDA to allocate wiphy indeces
From: Brian Norris @ 2018-06-21 1:29 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-kernel, linux-wireless, netdev, Brian Norris
It's annoying to see the phy index increase arbitrarily, just because a
device got removed and re-probed (e.g., during a device reset, or due to
probe testing). We can use the in-kernel index allocator for this,
instead of just an increasing counter.
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
net/wireless/core.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index c0fd8a85e7f7..80c108c3ca38 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -8,6 +8,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/idr.h>
#include <linux/if.h>
#include <linux/module.h>
#include <linux/err.h>
@@ -380,11 +381,11 @@ static void cfg80211_propagate_cac_done_wk(struct work_struct *work)
/* exported functions */
+static DEFINE_IDA(wiphy_ida);
+
struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
const char *requested_name)
{
- static atomic_t wiphy_counter = ATOMIC_INIT(0);
-
struct cfg80211_registered_device *rdev;
int alloc_size;
@@ -413,18 +414,12 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
rdev->ops = ops;
- rdev->wiphy_idx = atomic_inc_return(&wiphy_counter);
-
+ rdev->wiphy_idx = ida_simple_get(&wiphy_ida, 0, 0, GFP_KERNEL);
if (unlikely(rdev->wiphy_idx < 0)) {
- /* ugh, wrapped! */
- atomic_dec(&wiphy_counter);
kfree(rdev);
return NULL;
}
- /* atomic_inc_return makes it start at 1, make it start at 0 */
- rdev->wiphy_idx--;
-
/* give it a proper name */
if (requested_name && requested_name[0]) {
int rv;
@@ -452,10 +447,8 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
* value, and use a different name if this one exists?
*/
rv = dev_set_name(&rdev->wiphy.dev, PHY_NAME "%d", rdev->wiphy_idx);
- if (rv < 0) {
- kfree(rdev);
- return NULL;
- }
+ if (rv < 0)
+ goto err;
}
INIT_LIST_HEAD(&rdev->wiphy.wdev_list);
@@ -497,10 +490,8 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
&rdev->wiphy.dev, RFKILL_TYPE_WLAN,
&rdev->rfkill_ops, rdev);
- if (!rdev->rfkill) {
- kfree(rdev);
- return NULL;
- }
+ if (!rdev->rfkill)
+ goto err;
INIT_WORK(&rdev->rfkill_sync, cfg80211_rfkill_sync_work);
INIT_WORK(&rdev->conn_work, cfg80211_conn_work);
@@ -525,6 +516,11 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
rdev->wiphy.max_sched_scan_plan_interval = U32_MAX;
return &rdev->wiphy;
+
+err:
+ ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
+ kfree(rdev);
+ return NULL;
}
EXPORT_SYMBOL(wiphy_new_nm);
@@ -972,6 +968,7 @@ void cfg80211_dev_free(struct cfg80211_registered_device *rdev)
}
list_for_each_entry_safe(scan, tmp, &rdev->bss_list, list)
cfg80211_put_bss(&rdev->wiphy, &scan->pub);
+ ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
kfree(rdev);
}
--
2.18.0.rc1.244.gcf134e6275-goog
^ permalink raw reply related
* Re: [PATCH net] net: mscc: fix the injection header
From: David Miller @ 2018-06-21 1:30 UTC (permalink / raw)
To: antoine.tenart
Cc: f.fainelli, andrew, netdev, linux-kernel, thomas.petazzoni,
alexandre.belloni, quentin.schulz, allan.nielsen
In-Reply-To: <20180620085046.2377-1-antoine.tenart@bootlin.com>
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Wed, 20 Jun 2018 10:50:46 +0200
> When injecting frames in the Ocelot switch driver an injection header
> (IFH) should be used to configure various parameters related to a given
> frame, such as the port onto which the frame should be departed or its
> vlan id. Other parameters in the switch configuration can led to an
> injected frame being sent without an IFH but this led to various issues
> as the per-frame parameters are then not used. This is especially true
> when using multiple ports for injection.
>
> The IFH was injected with the wrong endianness which led to the switch
> not taking it into account as the IFH_INJ_BYPASS bit was then unset.
> (The bit tells the switch to use the IFH over its internal
> configuration). This patch fixes it.
>
> In addition to the endianness fix, the IFH is also fixed. As it was
> (unwillingly) unused, some of its fields were not configured the right
> way.
>
> Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH v2 0/2] net: davinci_emac: fix suspend/resume (both a regression and a common clk problem)
From: Florian Fainelli @ 2018-06-21 2:44 UTC (permalink / raw)
To: Bartosz Golaszewski, Grygorii Strashko, David S . Miller,
Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Lukas Wunner,
Kevin Hilman, David Lechner, Sekhar Nori, Andrew Lunn
Cc: linux-omap, netdev, linux-kernel, Bartosz Golaszewski
In-Reply-To: <20180620080356.11900-1-brgl@bgdev.pl>
On 06/20/2018 01:03 AM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
>
> Earlier I sent the first patch as a solution to a regression introduced
> during the v4.16 merge window, but after testing David's common clock
> series on top of 4.18-rc1 + this patch it turned out that the problem
> persisted.
>
> This is a follow-up containing the regression fix and an additional
> patche that makes suspend/resume work with David's changes.
>
> v1 -> v2:
> - dropped patch 2/3
> - in patch 2/2: check the device's parent's compatible
Much better:
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>
> Bartosz Golaszewski (2):
> net: ethernet: fix suspend/resume in davinci_emac
> net: davinci_emac: match the mdio device against its compatible if
> possible
>
> drivers/net/ethernet/ti/davinci_emac.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
--
Florian
^ permalink raw reply
* [PATCH v2 bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: dsahern @ 2018-06-21 3:00 UTC (permalink / raw)
To: netdev, borkmann, ast; +Cc: davem, kafai, David Ahern
From: David Ahern <dsahern@gmail.com>
For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.
In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.
The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
v2
- drop BPF_FIB_LKUP_RET_NO_NHDEV; check in dev in fib result not needed
- enhance documentation of BPF_FIB_LKUP_RET_ codes
include/uapi/linux/bpf.h | 28 ++++++++++++++----
net/core/filter.c | 72 ++++++++++++++++++++++++++++++----------------
samples/bpf/xdp_fwd_kern.c | 8 +++---
3 files changed, 74 insertions(+), 34 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59b19b6a40d7..b7db3261c62d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1857,7 +1857,8 @@ union bpf_attr {
* is resolved), the nexthop address is returned in ipv4_dst
* or ipv6_dst based on family, smac is set to mac address of
* egress device, dmac is set to nexthop mac address, rt_metric
- * is set to metric from route (IPv4/IPv6 only).
+ * is set to metric from route (IPv4/IPv6 only), and ifindex
+ * is set to the device index of the nexthop from the FIB lookup.
*
* *plen* argument is the size of the passed in struct.
* *flags* argument can be a combination of one or more of the
@@ -1873,9 +1874,10 @@ union bpf_attr {
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs.
* Return
- * Egress device index on success, 0 if packet needs to continue
- * up the stack for further processing or a negative error in case
- * of failure.
+ * * < 0 if any input argument is invalid
+ * * 0 on success (packet is forwarded, nexthop neighbor exists)
+ * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
+ * * packet is not forwarded or needs assist from full stack
*
* int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
* Description
@@ -2612,6 +2614,18 @@ struct bpf_raw_tracepoint_args {
#define BPF_FIB_LOOKUP_DIRECT BIT(0)
#define BPF_FIB_LOOKUP_OUTPUT BIT(1)
+enum {
+ BPF_FIB_LKUP_RET_SUCCESS, /* lookup successful */
+ BPF_FIB_LKUP_RET_BLACKHOLE, /* dest is blackholed; can be dropped */
+ BPF_FIB_LKUP_RET_UNREACHABLE, /* dest is unreachable; can be dropped */
+ BPF_FIB_LKUP_RET_PROHIBIT, /* dest not allowed; can be dropped */
+ BPF_FIB_LKUP_RET_NOT_FWDED, /* packet is not forwarded */
+ BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
+ BPF_FIB_LKUP_RET_UNSUPP_LWT, /* fwd requires encapsulation */
+ BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */
+ BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */
+};
+
struct bpf_fib_lookup {
/* input: network family for lookup (AF_INET, AF_INET6)
* output: network family of egress nexthop
@@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
/* total length of packet from network header - used for MTU check */
__u16 tot_len;
- __u32 ifindex; /* L3 device index for lookup */
+
+ /* input: L3 device index for lookup
+ * output: device index from FIB lookup
+ */
+ __u32 ifindex;
union {
/* inputs to lookup */
diff --git a/net/core/filter.c b/net/core/filter.c
index e7f12e9f598c..f8dd8aa89de4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4073,8 +4073,9 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
memcpy(params->smac, dev->dev_addr, ETH_ALEN);
params->h_vlan_TCI = 0;
params->h_vlan_proto = 0;
+ params->ifindex = dev->ifindex;
- return dev->ifindex;
+ return 0;
}
#endif
@@ -4098,7 +4099,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
/* verify forwarding is enabled on this interface */
in_dev = __in_dev_get_rcu(dev);
if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
- return 0;
+ return BPF_FIB_LKUP_RET_FWD_DISABLED;
if (flags & BPF_FIB_LOOKUP_OUTPUT) {
fl4.flowi4_iif = 1;
@@ -4123,7 +4124,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
tb = fib_get_table(net, tbid);
if (unlikely(!tb))
- return 0;
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
} else {
@@ -4135,8 +4136,20 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
}
- if (err || res.type != RTN_UNICAST)
- return 0;
+ if (err) {
+ /* map fib lookup errors to RTN_ type */
+ if (err == -EINVAL)
+ return BPF_FIB_LKUP_RET_BLACKHOLE;
+ if (err == -EHOSTUNREACH)
+ return BPF_FIB_LKUP_RET_UNREACHABLE;
+ if (err == -EACCES)
+ return BPF_FIB_LKUP_RET_PROHIBIT;
+
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
+ }
+
+ if (res.type != RTN_UNICAST)
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
if (res.fi->fib_nhs > 1)
fib_select_path(net, &res, &fl4, NULL);
@@ -4144,19 +4157,16 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (check_mtu) {
mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
if (params->tot_len > mtu)
- return 0;
+ return BPF_FIB_LKUP_RET_FRAG_NEEDED;
}
nh = &res.fi->fib_nh[res.nh_sel];
/* do not handle lwt encaps right now */
if (nh->nh_lwtstate)
- return 0;
+ return BPF_FIB_LKUP_RET_UNSUPP_LWT;
dev = nh->nh_dev;
- if (unlikely(!dev))
- return 0;
-
if (nh->nh_gw)
params->ipv4_dst = nh->nh_gw;
@@ -4166,10 +4176,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
* rcu_read_lock_bh is not needed here
*/
neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
- if (neigh)
- return bpf_fib_set_fwd_params(params, neigh, dev);
+ if (!neigh)
+ return BPF_FIB_LKUP_RET_NO_NEIGH;
- return 0;
+ return bpf_fib_set_fwd_params(params, neigh, dev);
}
#endif
@@ -4190,7 +4200,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
/* link local addresses are never forwarded */
if (rt6_need_strict(dst) || rt6_need_strict(src))
- return 0;
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
dev = dev_get_by_index_rcu(net, params->ifindex);
if (unlikely(!dev))
@@ -4198,7 +4208,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
idev = __in6_dev_get_safely(dev);
if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
- return 0;
+ return BPF_FIB_LKUP_RET_FWD_DISABLED;
if (flags & BPF_FIB_LOOKUP_OUTPUT) {
fl6.flowi6_iif = 1;
@@ -4225,7 +4235,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
tb = ipv6_stub->fib6_get_table(net, tbid);
if (unlikely(!tb))
- return 0;
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
} else {
@@ -4238,11 +4248,23 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
}
if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
- return 0;
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
+
+ if (unlikely(f6i->fib6_flags & RTF_REJECT)) {
+ switch (f6i->fib6_type) {
+ case RTN_BLACKHOLE:
+ return BPF_FIB_LKUP_RET_BLACKHOLE;
+ case RTN_UNREACHABLE:
+ return BPF_FIB_LKUP_RET_UNREACHABLE;
+ case RTN_PROHIBIT:
+ return BPF_FIB_LKUP_RET_PROHIBIT;
+ default:
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
+ }
+ }
- if (unlikely(f6i->fib6_flags & RTF_REJECT ||
- f6i->fib6_type != RTN_UNICAST))
- return 0;
+ if (f6i->fib6_type != RTN_UNICAST)
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
@@ -4252,11 +4274,11 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (check_mtu) {
mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
if (params->tot_len > mtu)
- return 0;
+ return BPF_FIB_LKUP_RET_FRAG_NEEDED;
}
if (f6i->fib6_nh.nh_lwtstate)
- return 0;
+ return BPF_FIB_LKUP_RET_UNSUPP_LWT;
if (f6i->fib6_flags & RTF_GATEWAY)
*dst = f6i->fib6_nh.nh_gw;
@@ -4270,10 +4292,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
*/
neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
ndisc_hashfn, dst, dev);
- if (neigh)
- return bpf_fib_set_fwd_params(params, neigh, dev);
+ if (!neigh)
+ return BPF_FIB_LKUP_RET_NO_NEIGH;
- return 0;
+ return bpf_fib_set_fwd_params(params, neigh, dev);
}
#endif
diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index 6673cdb9f55c..a7e94e7ff87d 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -48,9 +48,9 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
struct ethhdr *eth = data;
struct ipv6hdr *ip6h;
struct iphdr *iph;
- int out_index;
u16 h_proto;
u64 nh_off;
+ int rc;
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
@@ -101,7 +101,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
fib_params.ifindex = ctx->ingress_ifindex;
- out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
+ rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
/* verify egress index has xdp support
* TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
@@ -109,7 +109,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
* NOTE: without verification that egress index supports XDP
* forwarding packets are dropped.
*/
- if (out_index > 0) {
+ if (rc == 0) {
if (h_proto == htons(ETH_P_IP))
ip_decrease_ttl(iph);
else if (h_proto == htons(ETH_P_IPV6))
@@ -117,7 +117,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
- return bpf_redirect_map(&tx_port, out_index, 0);
+ return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
}
return XDP_PASS;
--
2.11.0
^ permalink raw reply related
* [net:master 6/6] drivers/net/ethernet/mscc/ocelot.c:377:17: sparse: incorrect type in argument 2 (different base types)
From: kbuild test robot @ 2018-06-21 3:43 UTC (permalink / raw)
To: Antoine Tenart; +Cc: kbuild-all, netdev, Alexandre Belloni
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head: 08d02364b12faa54d76dbfea2090321fd27996f2
commit: 08d02364b12faa54d76dbfea2090321fd27996f2 [6/6] net: mscc: fix the injection header
reproduce:
# apt-get install sparse
git checkout 08d02364b12faa54d76dbfea2090321fd27996f2
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/mscc/ocelot.c:377:17: sparse: incorrect type in argument 2 (different base types) @@ expected unsigned int [unsigned] [usertype] val @@ got ed int [unsigned] [usertype] val @@
drivers/net/ethernet/mscc/ocelot.c:377:17: expected unsigned int [unsigned] [usertype] val
drivers/net/ethernet/mscc/ocelot.c:377:17: got restricted __be32 [usertype] <noident>
include/linux/device.h:678:13: sparse: undefined identifier '__builtin_mul_overflow'
include/linux/device.h:678:13: sparse: call with no type!
vim +377 drivers/net/ethernet/mscc/ocelot.c
353
354 static int ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
355 {
356 struct ocelot_port *port = netdev_priv(dev);
357 struct ocelot *ocelot = port->ocelot;
358 u32 val, ifh[IFH_LEN];
359 struct frame_info info = {};
360 u8 grp = 0; /* Send everything on CPU group 0 */
361 unsigned int i, count, last;
362
363 val = ocelot_read(ocelot, QS_INJ_STATUS);
364 if (!(val & QS_INJ_STATUS_FIFO_RDY(BIT(grp))) ||
365 (val & QS_INJ_STATUS_WMARK_REACHED(BIT(grp))))
366 return NETDEV_TX_BUSY;
367
368 ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
369 QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
370
371 info.port = BIT(port->chip_port);
372 info.tag_type = IFH_TAG_TYPE_C;
373 info.vid = skb_vlan_tag_get(skb);
374 ocelot_gen_ifh(ifh, &info);
375
376 for (i = 0; i < IFH_LEN; i++)
> 377 ocelot_write_rix(ocelot, cpu_to_be32(ifh[i]), QS_INJ_WR, grp);
378
379 count = (skb->len + 3) / 4;
380 last = skb->len % 4;
381 for (i = 0; i < count; i++) {
382 ocelot_write_rix(ocelot, ((u32 *)skb->data)[i], QS_INJ_WR, grp);
383 }
384
385 /* Add padding */
386 while (i < (OCELOT_BUFFER_CELL_SZ / 4)) {
387 ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
388 i++;
389 }
390
391 /* Indicate EOF and valid bytes in last word */
392 ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
393 QS_INJ_CTRL_VLD_BYTES(skb->len < OCELOT_BUFFER_CELL_SZ ? 0 : last) |
394 QS_INJ_CTRL_EOF,
395 QS_INJ_CTRL, grp);
396
397 /* Add dummy CRC */
398 ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
399 skb_tx_timestamp(skb);
400
401 dev->stats.tx_packets++;
402 dev->stats.tx_bytes += skb->len;
403 dev_kfree_skb_any(skb);
404
405 return NETDEV_TX_OK;
406 }
407
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [PATCH net] bpf: enforce correct alignment for instructions
From: David Miller @ 2018-06-21 3:46 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet, daniel, kafai, ast
In-Reply-To: <20180621002409.63136-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 20 Jun 2018 17:24:09 -0700
> After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
> offsetof(struct bpf_binary_header, image) became 3 instead of 4,
> breaking powerpc BPF badly, since instructions need to be word aligned.
>
> Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
I'll apply this directly, thanks Eric.
^ permalink raw reply
* Re: [PATCH net] sctp: fix erroneous inc of snmp SctpFragUsrMsgs
From: David Miller @ 2018-06-21 3:49 UTC (permalink / raw)
To: marcelo.leitner; +Cc: netdev, linux-sctp, nhorman, vyasevich
In-Reply-To: <d89c1e422158d21710ce938aa093a20960bd55e9.1529509634.git.marcelo.leitner@gmail.com>
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 20 Jun 2018 12:47:52 -0300
> Currently it is incrementing SctpFragUsrMsgs when the user message size
> is of the exactly same size as the maximum fragment size, which is wrong.
>
> The fix is to increment it only when user message is bigger than the
> maximum fragment size.
>
> Fixes: bfd2e4b8734d ("sctp: refactor sctp_datamsg_from_user")
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] bpf: enforce correct alignment for instructions
From: Eric Dumazet @ 2018-06-21 4:08 UTC (permalink / raw)
To: David Miller, edumazet; +Cc: netdev, eric.dumazet, daniel, kafai, ast
In-Reply-To: <20180621.124639.2276956745930721354.davem@davemloft.net>
On 06/20/2018 08:46 PM, David Miller wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Wed, 20 Jun 2018 17:24:09 -0700
>
>> After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
>> offsetof(struct bpf_binary_header, image) became 3 instead of 4,
>> breaking powerpc BPF badly, since instructions need to be word aligned.
>>
>> Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> I'll apply this directly, thanks Eric.
>
Thanks David :)
^ permalink raw reply
* [PATCH net] ipvlan: fix IFLA_MTU ignored on NEWLINK
From: Xin Long @ 2018-06-21 4:56 UTC (permalink / raw)
To: network dev; +Cc: davem, Jarod Wilson, Mahesh Bandewar
Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted
the mtu from the master device when creating a ipvlan device, but it
would also override the mtu value set in rtnl_create_link. It causes
IFLA_MTU param not to take effect.
So this patch is to not adjust the mtu if IFLA_MTU param is set when
creating a ipvlan device.
Fixes: 296d48568042 ("ipvlan: inherit MTU from master device")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
drivers/net/ipvlan/ipvlan_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index d02f0a7..23c1d660 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -594,7 +594,8 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
ipvlan->phy_dev = phy_dev;
ipvlan->dev = dev;
ipvlan->sfeatures = IPVLAN_FEATURES;
- ipvlan_adjust_mtu(ipvlan, phy_dev);
+ if (!tb[IFLA_MTU])
+ ipvlan_adjust_mtu(ipvlan, phy_dev);
INIT_LIST_HEAD(&ipvlan->addrs);
spin_lock_init(&ipvlan->addrs_lock);
--
2.1.0
^ permalink raw reply related
* [PATCH net] vhost_net: validate sock before trying to put its fd
From: Jason Wang @ 2018-06-21 5:11 UTC (permalink / raw)
To: mst, kvm, virtualization, netdev, linux-kernel; +Cc: dan.carpenter
Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when
we meet errors during ubuf allocation, the code does not check for
NULL before calling sockfd_put(), this will lead NULL
dereferencing. Fixing by checking sock pointer before.
Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 986058a..b97a994 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1208,7 +1208,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
if (ubufs)
vhost_net_ubuf_put_wait_and_free(ubufs);
err_ubufs:
- sockfd_put(sock);
+ if (sock)
+ sockfd_put(sock);
err_vq:
mutex_unlock(&vq->mutex);
err:
--
2.7.4
^ permalink raw reply related
* Re: Route fallback issue
From: Grant Taylor @ 2018-06-21 5:13 UTC (permalink / raw)
To: Julian Anastasov, Akshat Kakkar
Cc: netdev, cronolog+lartc, lartc, Erik Auerswald
In-Reply-To: <alpine.LFD.2.20.1806202139330.1969@ja.home.ssi.bg>
On 06/20/2018 01:00 PM, Julian Anastasov wrote:
> You can also try alternative routes.
"Alternative routes"? I can't say as I've heard that description as a
specific technique / feature / capability before.
Is that it's official name?
Where can I find out more about it?
> But as the kernel supports only default alternative routes, you can put
> them in their own table:
I don't know that that is the case any more.
I was able to issue the following commands without a problem:
# ip route append 192.0.2.128/26 via 192.0.2.62
# ip route append 192.0.2.128/26 via 192.0.2.126
I crated two network namespaces and had a pair of vEths between them
(192.0.2.0/26 and 192.0.2.64/26). I added a dummy network to each NetNS
(192.0.2.128/26 and 192.0.2.192/26).
I ran the following commands while a persistent ping was running from
one NetNS to the IP on the other's dummy0 interface:
# ip link set ns2b up && ip route append 192.0.2.192/26 via 192.0.2.126
&& ip link set ns2a down
(pause and watch things)
# ip link set ns2a up && ip route append 192.0.2.192/26 via 192.0.2.62
&& ip link set ns2b down
(pause and watch things)
I could iterate between the two above commands and pings continued to work.
So, I think that it's now possible to use "alternate routes" (new to me)
on specific prefixes in addition to the default. Thus there is no
longer any need for a separate table and the associated IP rule.
I'm running kernel version 4.9.76.
I did go ahead and set net.ipv4.conf.ns2b.ignore_routes_with_linkdown to 1.
for i in /proc/sys/net/ipv4/conf/*/ignore_routes_with_linkdown; do echo
1 > $i; done
Doing that dropped the number of dropped pings from 60 ~ 90 (1 / second)
to 0 ~ 5 (1 / second). (Rarely, maybe 1 out of 20 flips, would it take
upwards of 10 pings / seconds.)
> # Alternative routes use same metric!!!
> ip route append default via 192.168.1.254 dev eno1 table 100
> ip route append default via 192.168.2.254 dev eno2 table 100
> ip rule add prio 100 to 172.16.0.0/12 table 100
I did have to "append" the route. I couldn't just "add" the route.
When I tried to "add" the second route, I got an error about the route
already existing. Using "append" instead of "add" with everything else
the same worked just fine.
Note: I did go ahead and remove the single route that was added via
"add" and used "append" for both.
> Of course, you will get better results if an user space tool puts only
> alive routes in service after doing health checks of all near gateways.
I've got to say, with as well as this is working, I don't feel any need
for a user space monitoring daemon. I agree that I've felt the need for
such in the past before I learned about "alternative routes".
I still want to learn more about "alternative routes".
Here's a diagram of the test network if someone wants to try to
reproduce my findings:
+-------------+ +-------------+
| NS1 | | NS2 |
| ns2a +-----vEth-A-----+ ns1a |
| | | |
+ dummy0 | | dummy0 +
| | | |
| ns2b +-----vEth-B-----+ ns1b |
| | | |
+-------------+ +-------------+
(vEths get the name of the NS that they face.)
NS1:ns2a 192.0.2.1 /26
NS1:ns2b 192.0.2.65 /26
NS1:dummy0 192.0.2.129 /26
NS2:ns1a 192.0.2.62 /26
NS2:ns1b 192.0.2.126 /26
NS2:dummy0 192.0.2.254 /26
--
Grant. . . .
unix || die
^ permalink raw reply
* Re: [PATCH bpf 1/2] tools: bpftool: remove duplicated error message on prog load
From: Song Liu @ 2018-06-21 5:36 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, oss-drivers
In-Reply-To: <20180620184246.18672-2-jakub.kicinski@netronome.com>
On Wed, Jun 20, 2018 at 11:42 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> do_pin_fd() will already print out an error message if something
> goes wrong. Printing another error is unnecessary and will break
> JSON output, since error messages are full objects:
>
> $ bpftool -jp prog load tracex1_kern.o /sys/fs/bpf/a
> {
> "error": "can't pin the object (/sys/fs/bpf/a): File exists"
> },{
> "error": "failed to pin program"
> }
>
> Fixes: 49a086c201a9 ("bpftool: implement prog load command")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
> ---
> tools/bpf/bpftool/prog.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 05f42a46d6ed..12b694fe0404 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -694,10 +694,8 @@ static int do_load(int argc, char **argv)
> return -1;
> }
>
> - if (do_pin_fd(prog_fd, argv[1])) {
> - p_err("failed to pin program");
> + if (do_pin_fd(prog_fd, argv[1]))
> return -1;
> - }
>
> if (json_output)
> jsonw_null(json_wtr);
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH bpf 2/2] tools: bpftool: remember to close the libbpf object after prog load
From: Song Liu @ 2018-06-21 5:37 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, oss-drivers
In-Reply-To: <20180620184246.18672-3-jakub.kicinski@netronome.com>
On Wed, Jun 20, 2018 at 11:42 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> Remembering to close all descriptors and free memory may not seem
> important in a user space tool like bpftool, but if we were to run
> in batch mode the consumed resources start to add up quickly. Make
> sure program load closes the libbpf object (which unloads and frees
> it).
>
> Fixes: 49a086c201a9 ("bpftool: implement prog load command")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
> ---
> tools/bpf/bpftool/prog.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 12b694fe0404..959aa53ab678 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -695,12 +695,18 @@ static int do_load(int argc, char **argv)
> }
>
> if (do_pin_fd(prog_fd, argv[1]))
> - return -1;
> + goto err_close_obj;
>
> if (json_output)
> jsonw_null(json_wtr);
>
> + bpf_object__close(obj);
> +
> return 0;
> +
> +err_close_obj:
> + bpf_object__close(obj);
> + return -1;
> }
>
> static int do_help(int argc, char **argv)
> --
> 2.17.1
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox