Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 07/14] net: smc911x: remove the dmaengine compat need
From: David Miller @ 2018-06-20 23:39 UTC (permalink / raw)
  To: robert.jarzmik; +Cc: daniel, linux-kernel, netdev
In-Reply-To: <87muvppebs.fsf@belgarion.home>

From: Robert Jarzmik <robert.jarzmik@free.fr>
Date: Wed, 20 Jun 2018 19:17:43 +0200

> Could you (or somebody from netdev) review it and either ack it (and I'll take
> it through the pxa tree), or take it for v4.19 please ?

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH v3 08/14] net: smc91x: remove the dmaengine compat need
From: David Miller @ 2018-06-20 23:40 UTC (permalink / raw)
  To: robert.jarzmik; +Cc: daniel, linux-kernel, netdev
In-Reply-To: <87lgb9pebm.fsf@belgarion.home>


Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Saeed Mahameed @ 2018-06-20 23:41 UTC (permalink / raw)
  To: eric.dumazet@gmail.com, kafai@fb.com, Tariq Toukan
  Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <1bd6da9b-fa46-25e7-8921-cb56eb91e71b@gmail.com>

On Tue, 2018-06-19 at 17:25 -0700, Eric Dumazet wrote:
> 
> On 06/19/2018 11:05 AM, Saeed Mahameed wrote:
> 
> > this is only true for XDP setup, for non XDP max stride_size can
> > only
> > be around ~3k and only for mtu > ~6k
> > 
> > For XDP setup you suggested:
> > -               priv->frag_info[0].frag_size = eff_mtu;
> > +               priv->frag_info[0].frag_size = PAGE_SIZE;
> > 
> > currently the condition is:
> > 
> > release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> > 
> > so my solution and yours have the same problem you described above.
> > 
> > the problem is not with the initial values or with stride/farg size
> > math, it just that in XDP we shouldn't reuse at ALL. I agree with
> > you
> > that we need to optimize and maybe for PAGE_SIZE > 8k we need to
> > allow
> > XDP setup to reuses. but for now there is a data corruption to
> > handle.
> 
> 
> Sure, we all agree there is a bug to fix.
> 
> The way you are fixing it is kind of illogical.
> 
> The NIC can use a frag if its _size_ is big enough to receive the
> frame.
> 
> The _stride_  is an abstraction created by the driver to report an
> estimation of the _truesize_,
> or memory consumption, so that linux can better track overall memory
> usage.
> 
> For example, if MTU=1500, the size of the fragment is 1536 bytes, but
> since we can put only
> 2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048
> bytes.
> 
> Declaring that a final blob of a page, being 1600 bytes, not able to
> receive a frame because
> _stride_ is 2048 is illogical and waste resources.
> 
> 

I see, I wanted to use _stride_ as grantee for how much a page frag can
grow, for example in mlx5 we need the whole stride to build_skb  around
the frag, since we always need the trailer, but it is different in here
and we can avoid resource waste.

so how a bout this: (As suggested by Martin).
currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
is always 0 in non-XDP setup, hence:

frags->page_offset += sz_align;

where it really should be:
frags->page_offset += sz_align + priv->rx_headroom;

we can use it as a hint to not reuse as below:
what do you think ?


diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
 {
        const struct mlx4_en_frag_info *frag_info = priv->frag_info;
        unsigned int truesize = 0;
+       bool release = true;
        int nr, frag_size;
        struct page *page;
        dma_addr_t dma;
-       bool release;
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c

        /* Collect used fragments while replacing them in the HW
descriptors */
        for (nr = 0;; frags++) {
@@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
                        release = page_count(page) != 1 ||
                                  page_is_pfmemalloc(page) ||
                                  page_to_nid(page) != numa_mem_id();
-               } else {
+               } elseif(!priv->rx_headroom) {
                        u32 sz_align = ALIGN(frag_size,
SMP_CACHE_BYTES);

                        frags->page_offset += sz_align;

^ permalink raw reply related

* Re: net-next compilation failures
From: David Miller @ 2018-06-20 23:43 UTC (permalink / raw)
  To: Manish.Chopra; +Cc: netdev
In-Reply-To: <CY1PR0701MB1163F3D32FBB527D5C7054E189770@CY1PR0701MB1163.namprd07.prod.outlook.com>

From: "Chopra, Manish" <Manish.Chopra@cavium.com>
Date: Wed, 20 Jun 2018 21:00:40 +0000

> I am trying to compile net-next kernel and I face these below
> compilation errros for some reason. Attached the kernel .config
> used.  Any idea for what reason these failures could be stemming ?

The net-next tree isn't open and therefore you shouldn't be doing
work against it.

When it is closed, I don't try hard to keep net-next up to date with
upstream and thus the recent upstream fixes for build problems or
other bugs.

^ permalink raw reply

* [PATCH net] bpf: enforce correct alignment for instructions
From: Eric Dumazet @ 2018-06-21  0:24 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Daniel Borkmann,
	Martin KaFai Lau, Alexei Starovoitov

After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
offsetof(struct bpf_binary_header, image) became 3 instead of 4,
breaking powerpc BPF badly, since instructions need to be word aligned.

Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/filter.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index b615df57b7d5b2ccb468c411c3a2aae103cd2aea..20f2659dd829256d7fef206087ab3262e1e291f5 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -472,7 +472,9 @@ struct sock_fprog_kern {
 struct bpf_binary_header {
 	u16 pages;
 	u16 locked:1;
-	u8 image[];
+
+	/* Some arches need word alignment for their instructions */
+	u8 image[] __aligned(4);
 };
 
 struct bpf_prog {
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH] bpfilter: fix user mode helper cross compilation
From: David Miller @ 2018-06-21  0:19 UTC (permalink / raw)
  To: mcroce; +Cc: netdev
In-Reply-To: <20180620140434.18139-1-mcroce@redhat.com>

From: Matteo Croce <mcroce@redhat.com>
Date: Wed, 20 Jun 2018 16:04:34 +0200

> Use $(OBJDUMP) instead of literal 'objdump' to avoid
> using host toolchain when cross compiling.
> 
> Fixes: 421780fd4983 ("bpfilter: fix build error")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Applied.

^ permalink raw reply

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Eric Dumazet @ 2018-06-21  0:28 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet@gmail.com, kafai@fb.com,
	Tariq Toukan
  Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <65f1a941a3013250e2a768a31f5e521dc21f73e8.camel@mellanox.com>



On 06/20/2018 04:41 PM, Saeed Mahameed wrote:
> 
> I see, I wanted to use _stride_ as grantee for how much a page frag can
> grow, for example in mlx5 we need the whole stride to build_skb  around
> the frag, since we always need the trailer, but it is different in here
> and we can avoid resource waste.
> 
> so how a bout this: (As suggested by Martin).
> currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
> is always 0 in non-XDP setup, hence:
> 
> frags->page_offset += sz_align;
> 
> where it really should be:
> frags->page_offset += sz_align + priv->rx_headroom;
> 
> we can use it as a hint to not reuse as below:
> what do you think ?
> 
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
>  {
>         const struct mlx4_en_frag_info *frag_info = priv->frag_info;
>         unsigned int truesize = 0;
> +       bool release = true;
>         int nr, frag_size;
>         struct page *page;
>         dma_addr_t dma;
> -       bool release;
> index 9f54ccbddea7..f14c7a574cc8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> 
>         /* Collect used fragments while replacing them in the HW
> descriptors */
>         for (nr = 0;; frags++) {
> @@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
> mlx4_en_priv *priv,
>                         release = page_count(page) != 1 ||
>                                   page_is_pfmemalloc(page) ||
>                                   page_to_nid(page) != numa_mem_id();
> -               } else {
> +               } elseif(!priv->rx_headroom) {
>                         u32 sz_align = ALIGN(frag_size,
> SMP_CACHE_BYTES);
> 
>                         frags->page_offset += sz_align;
> 

I guess that would work, please double check priv->rx_headroom wont need another cache line,
thanks !

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Shannon Nelson @ 2018-06-21  0:32 UTC (permalink / raw)
  To: Anders Roxell; +Cc: Networking, David Miller
In-Reply-To: <CADYN=9Jmgk91RBiEyEbSdMr5_3JQgk058CO6HLRhDW6NFaF6qQ@mail.gmail.com>

On 6/20/2018 4:18 PM, Anders Roxell wrote:
> On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>
>> On 6/20/2018 12:09 PM, Anders Roxell wrote:
>>> On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>>>
>>>> A couple of bad behaviors in the ipsec selftest were pointed out
>>>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
>>>>
>>>> Shannon Nelson (2):
>>>>     selftests: rtnetlink: hide complaint from terminated monitor
>>>>     selftests: rtnetlink: use a local IP address for IPsec tests
>>>>
>>>>    tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
>>>>    1 file changed, 7 insertions(+), 4 deletions(-)
>>>>
>>>> --
>>>> 2.7.4
>>>>
>>>
>>> Hi Shannon,
>>>
>>> With this patches applied and my config patch.
>>>
>>> I still get this error when I run the ipsec test:
>>>
>>> FAIL: can't add fou port 7777, skipping test
>>> RTNETLINK answers: Operation not supported
>>> FAIL: can't add macsec interface, skipping test
>>> RTNETLINK answers: Protocol not supported
>>> RTNETLINK answers: No such process
>>> RTNETLINK answers: No such process
>>> FAIL: ipsec
>>
>> One of the odd things I noticed about this script is that there really
>> aren't any diagnosis messages, just PASS or FAIL.  I followed this
>> custom when I added the ipsec tests, but I think this is something that
>> should change so we can get some idea of what breaks.
>>
>> I'm curious about the "RTNETLINK answers" messages and where they might
>> be coming from, especially "RTNETLINK answers: Protocol not supported".
> 
> I added: "set -x" in the beginning of the rtnetlink.sh script.
> + ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
> transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
> 0x3132333435
> 363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
> RTNETLINK answers: Protocol not supported

Okay, so ip didn't like this command...

>> What are the XFRM and AES settings in your kernel config - what is the
>> output from
>>          egrep -i "xfrm|_aes" .config
> 
> CONFIG_XFRM=y
> CONFIG_XFRM_ALGO=y
> CONFIG_XFRM_USER=y
> CONFIG_INET_XFRM_MODE_TUNNEL=y
> CONFIG_INET6_XFRM_MODE_TRANSPORT=y
> CONFIG_INET6_XFRM_MODE_TUNNEL=y
> CONFIG_INET6_XFRM_MODE_BEET=y
> CONFIG_CRYPTO_AES=y

And this is probably why - there seem to be a few config variables 
missing, including CONFIG_INET_XFRM_MODE_TRANSPORT, which might be why 
the ip command fails above.

Here's what I have in my config:
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_X86_64=m
CONFIG_CRYPTO_AES_NI_INTEL=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m

Can I talk you into adding CONFIG_INET_XFRM_MODE_TRANSPORT to your 
config and trying again?

sln

^ permalink raw reply

* Re: [PATCH] selftests: net: add config fragments
From: David Miller @ 2018-06-21  0:47 UTC (permalink / raw)
  To: anders.roxell
  Cc: shuah, fw, shannon.nelson, netdev, linux-kselftest, linux-kernel
In-Reply-To: <20180619164111.30785-1-anders.roxell@linaro.org>

From: Anders Roxell <anders.roxell@linaro.org>
Date: Tue, 19 Jun 2018 18:41:11 +0200

> Add fragments to pass bridge and vlan tests.
> 
> Fixes: 33b01b7b4f19 ("selftests: add rtnetlink test script")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net][RESEND] strparser: Don't schedule in workqueue in paused state
From: David Miller @ 2018-06-21  0:54 UTC (permalink / raw)
  To: vakul.garg
  Cc: doronrk, tom, john.fastabend, davejwatson, netdev, ebiggers,
	linux-kernel
In-Reply-To: <20180620215949.32334-1-vakul.garg@nxp.com>

From: Vakul Garg <vakul.garg@nxp.com>
Date: Thu, 21 Jun 2018 03:29:49 +0530

> In function strp_data_ready(), it is useless to call queue_work if
> the state of strparser is already paused. The state checking should
> be done before calling queue_work. The change reduces the context
> switches and improves the ktls-rx throughput by approx 20% (measured
> on cortex-a53 based platform).
> 
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] r8169: Fix netpoll oops
From: David Miller @ 2018-06-21  0:56 UTC (permalink / raw)
  To: ville.syrjala; +Cc: netdev, nic_swsd, hkallweit1
In-Reply-To: <20180620120153.11676-1-ville.syrjala@linux.intel.com>

From: Ville Syrjala <ville.syrjala@linux.intel.com>
Date: Wed, 20 Jun 2018 15:01:53 +0300

> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Pass the correct thing to rtl8169_interrupt() from netpoll.
> 
> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
> Cc: netdev@vger.kernel.org
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Fixes: ebcd5daa7ffd ("r8169: change interrupt handler argument type")
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Hangbin Liu @ 2018-06-21  1:18 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Stefano Brivio,
	Paolo Abeni, Mahesh Bandewar
In-Reply-To: <CAM_iQpVdCJ2wjujco0+xZ_Jjeyu-qZ_g+8cVHv9LmUARhJmMoQ@mail.gmail.com>

On Wed, Jun 20, 2018 at 10:45:39AM -0700, Cong Wang wrote:
> On Tue, Jun 19, 2018 at 10:31 PM, David Miller <davem@davemloft.net> wrote:
> > From: Hangbin Liu <liuhangbin@gmail.com>
> > Date: Wed, 20 Jun 2018 11:22:54 +0800
> >
> >> The only case dev_change_flags() return an err is when we change IFF_UP flag.
> >> Since we only set/reset IFF_NOARP, do you think we still need to check the
> >> return value?
> >
> > It is bad to try and take shortcuts on error handling using assumptions
> > like that.
> >
> > If dev_change_flags() is adjusted to return error codes in more
> > situations, nobody is going to remember to undo your "optimziation"
> > here.
> >
> > Please check for errors, thank you.
> 
> Yeah. Also since the notifier is triggered in this case:
> 
>         if (dev->flags & IFF_UP &&
>             (changes & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI | IFF_VOLATILE))) {
>                 struct netdev_notifier_change_info change_info = {
>                         .info = {
>                                 .dev = dev,
>                         },
>                         .flags_changed = changes,
>                 };
> 
>                 call_netdevice_notifiers_info(NETDEV_CHANGE, &change_info.info);
>         }
> 
> the return value of call_netdevice_notifiers_info() isn't captured
> either, but it should be.

Thanks for the explanation. I will fix it.

Regards
Hangbin

^ permalink raw reply

* [PATCH] cfg80211: use IDA to allocate wiphy indeces
From: Brian Norris @ 2018-06-21  1:29 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-kernel, linux-wireless, netdev, Brian Norris

It's annoying to see the phy index increase arbitrarily, just because a
device got removed and re-probed (e.g., during a device reset, or due to
probe testing). We can use the in-kernel index allocator for this,
instead of just an increasing counter.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---
 net/wireless/core.c | 31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/net/wireless/core.c b/net/wireless/core.c
index c0fd8a85e7f7..80c108c3ca38 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -8,6 +8,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/idr.h>
 #include <linux/if.h>
 #include <linux/module.h>
 #include <linux/err.h>
@@ -380,11 +381,11 @@ static void cfg80211_propagate_cac_done_wk(struct work_struct *work)
 
 /* exported functions */
 
+static DEFINE_IDA(wiphy_ida);
+
 struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 			   const char *requested_name)
 {
-	static atomic_t wiphy_counter = ATOMIC_INIT(0);
-
 	struct cfg80211_registered_device *rdev;
 	int alloc_size;
 
@@ -413,18 +414,12 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 
 	rdev->ops = ops;
 
-	rdev->wiphy_idx = atomic_inc_return(&wiphy_counter);
-
+	rdev->wiphy_idx = ida_simple_get(&wiphy_ida, 0, 0, GFP_KERNEL);
 	if (unlikely(rdev->wiphy_idx < 0)) {
-		/* ugh, wrapped! */
-		atomic_dec(&wiphy_counter);
 		kfree(rdev);
 		return NULL;
 	}
 
-	/* atomic_inc_return makes it start at 1, make it start at 0 */
-	rdev->wiphy_idx--;
-
 	/* give it a proper name */
 	if (requested_name && requested_name[0]) {
 		int rv;
@@ -452,10 +447,8 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 		 * value, and use a different name if this one exists?
 		 */
 		rv = dev_set_name(&rdev->wiphy.dev, PHY_NAME "%d", rdev->wiphy_idx);
-		if (rv < 0) {
-			kfree(rdev);
-			return NULL;
-		}
+		if (rv < 0)
+			goto err;
 	}
 
 	INIT_LIST_HEAD(&rdev->wiphy.wdev_list);
@@ -497,10 +490,8 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 				   &rdev->wiphy.dev, RFKILL_TYPE_WLAN,
 				   &rdev->rfkill_ops, rdev);
 
-	if (!rdev->rfkill) {
-		kfree(rdev);
-		return NULL;
-	}
+	if (!rdev->rfkill)
+		goto err;
 
 	INIT_WORK(&rdev->rfkill_sync, cfg80211_rfkill_sync_work);
 	INIT_WORK(&rdev->conn_work, cfg80211_conn_work);
@@ -525,6 +516,11 @@ struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 	rdev->wiphy.max_sched_scan_plan_interval = U32_MAX;
 
 	return &rdev->wiphy;
+
+err:
+	ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
+	kfree(rdev);
+	return NULL;
 }
 EXPORT_SYMBOL(wiphy_new_nm);
 
@@ -972,6 +968,7 @@ void cfg80211_dev_free(struct cfg80211_registered_device *rdev)
 	}
 	list_for_each_entry_safe(scan, tmp, &rdev->bss_list, list)
 		cfg80211_put_bss(&rdev->wiphy, &scan->pub);
+	ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
 	kfree(rdev);
 }
 
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH net] net: mscc: fix the injection header
From: David Miller @ 2018-06-21  1:30 UTC (permalink / raw)
  To: antoine.tenart
  Cc: f.fainelli, andrew, netdev, linux-kernel, thomas.petazzoni,
	alexandre.belloni, quentin.schulz, allan.nielsen
In-Reply-To: <20180620085046.2377-1-antoine.tenart@bootlin.com>

From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Wed, 20 Jun 2018 10:50:46 +0200

> When injecting frames in the Ocelot switch driver an injection header
> (IFH) should be used to configure various parameters related to a given
> frame, such as the port onto which the frame should be departed or its
> vlan id. Other parameters in the switch configuration can led to an
> injected frame being sent without an IFH but this led to various issues
> as the per-frame parameters are then not used. This is especially true
> when using multiple ports for injection.
> 
> The IFH was injected with the wrong endianness which led to the switch
> not taking it into account as the IFH_INJ_BYPASS bit was then unset.
> (The bit tells the switch to use the IFH over its internal
> configuration). This patch fixes it.
> 
> In addition to the endianness fix, the IFH is also fixed. As it was
> (unwillingly) unused, some of its fields were not configured the right
> way.
> 
> Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH v2 0/2] net: davinci_emac: fix suspend/resume (both a regression and a common clk problem)
From: Florian Fainelli @ 2018-06-21  2:44 UTC (permalink / raw)
  To: Bartosz Golaszewski, Grygorii Strashko, David S . Miller,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Lukas Wunner,
	Kevin Hilman, David Lechner, Sekhar Nori, Andrew Lunn
  Cc: linux-omap, netdev, linux-kernel, Bartosz Golaszewski
In-Reply-To: <20180620080356.11900-1-brgl@bgdev.pl>



On 06/20/2018 01:03 AM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> 
> Earlier I sent the first patch as a solution to a regression introduced
> during the v4.16 merge window, but after testing David's common clock
> series on top of 4.18-rc1 + this patch it turned out that the problem
> persisted.
> 
> This is a follow-up containing the regression fix and an additional
> patche that makes suspend/resume work with David's changes.
> 
> v1 -> v2:
> - dropped patch 2/3
> - in patch 2/2: check the device's parent's compatible

Much better:

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

> 
> Bartosz Golaszewski (2):
>   net: ethernet: fix suspend/resume in davinci_emac
>   net: davinci_emac: match the mdio device against its compatible if
>     possible
> 
>  drivers/net/ethernet/ti/davinci_emac.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 

-- 
Florian

^ permalink raw reply

* [PATCH v2 bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: dsahern @ 2018-06-21  3:00 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, kafai, David Ahern

From: David Ahern <dsahern@gmail.com>

For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.

In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.

The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
v2
- drop BPF_FIB_LKUP_RET_NO_NHDEV; check in dev in fib result not needed
- enhance documentation of BPF_FIB_LKUP_RET_ codes

 include/uapi/linux/bpf.h   | 28 ++++++++++++++----
 net/core/filter.c          | 72 ++++++++++++++++++++++++++++++----------------
 samples/bpf/xdp_fwd_kern.c |  8 +++---
 3 files changed, 74 insertions(+), 34 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59b19b6a40d7..b7db3261c62d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1857,7 +1857,8 @@ union bpf_attr {
  *		is resolved), the nexthop address is returned in ipv4_dst
  *		or ipv6_dst based on family, smac is set to mac address of
  *		egress device, dmac is set to nexthop mac address, rt_metric
- *		is set to metric from route (IPv4/IPv6 only).
+ *		is set to metric from route (IPv4/IPv6 only), and ifindex
+ *		is set to the device index of the nexthop from the FIB lookup.
  *
  *             *plen* argument is the size of the passed in struct.
  *             *flags* argument can be a combination of one or more of the
@@ -1873,9 +1874,10 @@ union bpf_attr {
  *             *ctx* is either **struct xdp_md** for XDP programs or
  *             **struct sk_buff** tc cls_act programs.
  *     Return
- *             Egress device index on success, 0 if packet needs to continue
- *             up the stack for further processing or a negative error in case
- *             of failure.
+ *		* < 0 if any input argument is invalid
+ *		*   0 on success (packet is forwarded, nexthop neighbor exists)
+ *		* > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
+ *		*     packet is not forwarded or needs assist from full stack
  *
  * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
  *	Description
@@ -2612,6 +2614,18 @@ struct bpf_raw_tracepoint_args {
 #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
 #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
 
+enum {
+	BPF_FIB_LKUP_RET_SUCCESS,      /* lookup successful */
+	BPF_FIB_LKUP_RET_BLACKHOLE,    /* dest is blackholed; can be dropped */
+	BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable; can be dropped */
+	BPF_FIB_LKUP_RET_PROHIBIT,     /* dest not allowed; can be dropped */
+	BPF_FIB_LKUP_RET_NOT_FWDED,    /* packet is not forwarded */
+	BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
+	BPF_FIB_LKUP_RET_UNSUPP_LWT,   /* fwd requires encapsulation */
+	BPF_FIB_LKUP_RET_NO_NEIGH,     /* no neighbor entry for nh */
+	BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
+};
+
 struct bpf_fib_lookup {
 	/* input:  network family for lookup (AF_INET, AF_INET6)
 	 * output: network family of egress nexthop
@@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
 
 	/* total length of packet from network header - used for MTU check */
 	__u16	tot_len;
-	__u32	ifindex;  /* L3 device index for lookup */
+
+	/* input: L3 device index for lookup
+	 * output: device index from FIB lookup
+	 */
+	__u32	ifindex;
 
 	union {
 		/* inputs to lookup */
diff --git a/net/core/filter.c b/net/core/filter.c
index e7f12e9f598c..f8dd8aa89de4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4073,8 +4073,9 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
 	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
 	params->h_vlan_TCI = 0;
 	params->h_vlan_proto = 0;
+	params->ifindex = dev->ifindex;
 
-	return dev->ifindex;
+	return 0;
 }
 #endif
 
@@ -4098,7 +4099,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	/* verify forwarding is enabled on this interface */
 	in_dev = __in_dev_get_rcu(dev);
 	if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
-		return 0;
+		return BPF_FIB_LKUP_RET_FWD_DISABLED;
 
 	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
 		fl4.flowi4_iif = 1;
@@ -4123,7 +4124,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 		tb = fib_get_table(net, tbid);
 		if (unlikely(!tb))
-			return 0;
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 		err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
 	} else {
@@ -4135,8 +4136,20 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 		err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
 	}
 
-	if (err || res.type != RTN_UNICAST)
-		return 0;
+	if (err) {
+		/* map fib lookup errors to RTN_ type */
+		if (err == -EINVAL)
+			return BPF_FIB_LKUP_RET_BLACKHOLE;
+		if (err == -EHOSTUNREACH)
+			return BPF_FIB_LKUP_RET_UNREACHABLE;
+		if (err == -EACCES)
+			return BPF_FIB_LKUP_RET_PROHIBIT;
+
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
+	}
+
+	if (res.type != RTN_UNICAST)
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 	if (res.fi->fib_nhs > 1)
 		fib_select_path(net, &res, &fl4, NULL);
@@ -4144,19 +4157,16 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (check_mtu) {
 		mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
 		if (params->tot_len > mtu)
-			return 0;
+			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
 	}
 
 	nh = &res.fi->fib_nh[res.nh_sel];
 
 	/* do not handle lwt encaps right now */
 	if (nh->nh_lwtstate)
-		return 0;
+		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
 
 	dev = nh->nh_dev;
-	if (unlikely(!dev))
-		return 0;
-
 	if (nh->nh_gw)
 		params->ipv4_dst = nh->nh_gw;
 
@@ -4166,10 +4176,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	 * rcu_read_lock_bh is not needed here
 	 */
 	neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
-	if (neigh)
-		return bpf_fib_set_fwd_params(params, neigh, dev);
+	if (!neigh)
+		return BPF_FIB_LKUP_RET_NO_NEIGH;
 
-	return 0;
+	return bpf_fib_set_fwd_params(params, neigh, dev);
 }
 #endif
 
@@ -4190,7 +4200,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 	/* link local addresses are never forwarded */
 	if (rt6_need_strict(dst) || rt6_need_strict(src))
-		return 0;
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 	dev = dev_get_by_index_rcu(net, params->ifindex);
 	if (unlikely(!dev))
@@ -4198,7 +4208,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 	idev = __in6_dev_get_safely(dev);
 	if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
-		return 0;
+		return BPF_FIB_LKUP_RET_FWD_DISABLED;
 
 	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
 		fl6.flowi6_iif = 1;
@@ -4225,7 +4235,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 		tb = ipv6_stub->fib6_get_table(net, tbid);
 		if (unlikely(!tb))
-			return 0;
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 		f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
 	} else {
@@ -4238,11 +4248,23 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	}
 
 	if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
-		return 0;
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
+
+	if (unlikely(f6i->fib6_flags & RTF_REJECT)) {
+		switch (f6i->fib6_type) {
+		case RTN_BLACKHOLE:
+			return BPF_FIB_LKUP_RET_BLACKHOLE;
+		case RTN_UNREACHABLE:
+			return BPF_FIB_LKUP_RET_UNREACHABLE;
+		case RTN_PROHIBIT:
+			return BPF_FIB_LKUP_RET_PROHIBIT;
+		default:
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
+		}
+	}
 
-	if (unlikely(f6i->fib6_flags & RTF_REJECT ||
-	    f6i->fib6_type != RTN_UNICAST))
-		return 0;
+	if (f6i->fib6_type != RTN_UNICAST)
+		return BPF_FIB_LKUP_RET_NOT_FWDED;
 
 	if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
 		f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
@@ -4252,11 +4274,11 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (check_mtu) {
 		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
 		if (params->tot_len > mtu)
-			return 0;
+			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
 	}
 
 	if (f6i->fib6_nh.nh_lwtstate)
-		return 0;
+		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
 
 	if (f6i->fib6_flags & RTF_GATEWAY)
 		*dst = f6i->fib6_nh.nh_gw;
@@ -4270,10 +4292,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	 */
 	neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
 				      ndisc_hashfn, dst, dev);
-	if (neigh)
-		return bpf_fib_set_fwd_params(params, neigh, dev);
+	if (!neigh)
+		return BPF_FIB_LKUP_RET_NO_NEIGH;
 
-	return 0;
+	return bpf_fib_set_fwd_params(params, neigh, dev);
 }
 #endif
 
diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index 6673cdb9f55c..a7e94e7ff87d 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -48,9 +48,9 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 	struct ethhdr *eth = data;
 	struct ipv6hdr *ip6h;
 	struct iphdr *iph;
-	int out_index;
 	u16 h_proto;
 	u64 nh_off;
+	int rc;
 
 	nh_off = sizeof(*eth);
 	if (data + nh_off > data_end)
@@ -101,7 +101,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 	fib_params.ifindex = ctx->ingress_ifindex;
 
-	out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
+	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
 
 	/* verify egress index has xdp support
 	 * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
@@ -109,7 +109,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 	 * NOTE: without verification that egress index supports XDP
 	 *       forwarding packets are dropped.
 	 */
-	if (out_index > 0) {
+	if (rc == 0) {
 		if (h_proto == htons(ETH_P_IP))
 			ip_decrease_ttl(iph);
 		else if (h_proto == htons(ETH_P_IPV6))
@@ -117,7 +117,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 		memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
 		memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
-		return bpf_redirect_map(&tx_port, out_index, 0);
+		return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
 	}
 
 	return XDP_PASS;
-- 
2.11.0

^ permalink raw reply related

* [net:master 6/6] drivers/net/ethernet/mscc/ocelot.c:377:17: sparse: incorrect type in argument 2 (different base types)
From: kbuild test robot @ 2018-06-21  3:43 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: kbuild-all, netdev, Alexandre Belloni

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   08d02364b12faa54d76dbfea2090321fd27996f2
commit: 08d02364b12faa54d76dbfea2090321fd27996f2 [6/6] net: mscc: fix the injection header
reproduce:
        # apt-get install sparse
        git checkout 08d02364b12faa54d76dbfea2090321fd27996f2
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/mscc/ocelot.c:377:17: sparse: incorrect type in argument 2 (different base types) @@    expected unsigned int [unsigned] [usertype] val @@    got ed int [unsigned] [usertype] val @@
   drivers/net/ethernet/mscc/ocelot.c:377:17:    expected unsigned int [unsigned] [usertype] val
   drivers/net/ethernet/mscc/ocelot.c:377:17:    got restricted __be32 [usertype] <noident>
   include/linux/device.h:678:13: sparse: undefined identifier '__builtin_mul_overflow'
   include/linux/device.h:678:13: sparse: call with no type!

vim +377 drivers/net/ethernet/mscc/ocelot.c

   353	
   354	static int ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
   355	{
   356		struct ocelot_port *port = netdev_priv(dev);
   357		struct ocelot *ocelot = port->ocelot;
   358		u32 val, ifh[IFH_LEN];
   359		struct frame_info info = {};
   360		u8 grp = 0; /* Send everything on CPU group 0 */
   361		unsigned int i, count, last;
   362	
   363		val = ocelot_read(ocelot, QS_INJ_STATUS);
   364		if (!(val & QS_INJ_STATUS_FIFO_RDY(BIT(grp))) ||
   365		    (val & QS_INJ_STATUS_WMARK_REACHED(BIT(grp))))
   366			return NETDEV_TX_BUSY;
   367	
   368		ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
   369				 QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
   370	
   371		info.port = BIT(port->chip_port);
   372		info.tag_type = IFH_TAG_TYPE_C;
   373		info.vid = skb_vlan_tag_get(skb);
   374		ocelot_gen_ifh(ifh, &info);
   375	
   376		for (i = 0; i < IFH_LEN; i++)
 > 377			ocelot_write_rix(ocelot, cpu_to_be32(ifh[i]), QS_INJ_WR, grp);
   378	
   379		count = (skb->len + 3) / 4;
   380		last = skb->len % 4;
   381		for (i = 0; i < count; i++) {
   382			ocelot_write_rix(ocelot, ((u32 *)skb->data)[i], QS_INJ_WR, grp);
   383		}
   384	
   385		/* Add padding */
   386		while (i < (OCELOT_BUFFER_CELL_SZ / 4)) {
   387			ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
   388			i++;
   389		}
   390	
   391		/* Indicate EOF and valid bytes in last word */
   392		ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
   393				 QS_INJ_CTRL_VLD_BYTES(skb->len < OCELOT_BUFFER_CELL_SZ ? 0 : last) |
   394				 QS_INJ_CTRL_EOF,
   395				 QS_INJ_CTRL, grp);
   396	
   397		/* Add dummy CRC */
   398		ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
   399		skb_tx_timestamp(skb);
   400	
   401		dev->stats.tx_packets++;
   402		dev->stats.tx_bytes += skb->len;
   403		dev_kfree_skb_any(skb);
   404	
   405		return NETDEV_TX_OK;
   406	}
   407	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH net] bpf: enforce correct alignment for instructions
From: David Miller @ 2018-06-21  3:46 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, daniel, kafai, ast
In-Reply-To: <20180621002409.63136-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 20 Jun 2018 17:24:09 -0700

> After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
> offsetof(struct bpf_binary_header, image) became 3 instead of 4,
> breaking powerpc BPF badly, since instructions need to be word aligned.
> 
> Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

I'll apply this directly, thanks Eric.

^ permalink raw reply

* Re: [PATCH net] sctp: fix erroneous inc of snmp SctpFragUsrMsgs
From: David Miller @ 2018-06-21  3:49 UTC (permalink / raw)
  To: marcelo.leitner; +Cc: netdev, linux-sctp, nhorman, vyasevich
In-Reply-To: <d89c1e422158d21710ce938aa093a20960bd55e9.1529509634.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 20 Jun 2018 12:47:52 -0300

> Currently it is incrementing SctpFragUsrMsgs when the user message size
> is of the exactly same size as the maximum fragment size, which is wrong.
> 
> The fix is to increment it only when user message is bigger than the
> maximum fragment size.
> 
> Fixes: bfd2e4b8734d ("sctp: refactor sctp_datamsg_from_user")
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] bpf: enforce correct alignment for instructions
From: Eric Dumazet @ 2018-06-21  4:08 UTC (permalink / raw)
  To: David Miller, edumazet; +Cc: netdev, eric.dumazet, daniel, kafai, ast
In-Reply-To: <20180621.124639.2276956745930721354.davem@davemloft.net>



On 06/20/2018 08:46 PM, David Miller wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Wed, 20 Jun 2018 17:24:09 -0700
> 
>> After commit 9facc336876f ("bpf: reject any prog that failed read-only lock")
>> offsetof(struct bpf_binary_header, image) became 3 instead of 4,
>> breaking powerpc BPF badly, since instructions need to be word aligned.
>>
>> Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> I'll apply this directly, thanks Eric.
> 

Thanks David :)

^ permalink raw reply

* [PATCH net] ipvlan: fix IFLA_MTU ignored on NEWLINK
From: Xin Long @ 2018-06-21  4:56 UTC (permalink / raw)
  To: network dev; +Cc: davem, Jarod Wilson, Mahesh Bandewar

Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted
the mtu from the master device when creating a ipvlan device, but it
would also override the mtu value set in rtnl_create_link. It causes
IFLA_MTU param not to take effect.

So this patch is to not adjust the mtu if IFLA_MTU param is set when
creating a ipvlan device.

Fixes: 296d48568042 ("ipvlan: inherit MTU from master device")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index d02f0a7..23c1d660 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -594,7 +594,8 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
 	ipvlan->phy_dev = phy_dev;
 	ipvlan->dev = dev;
 	ipvlan->sfeatures = IPVLAN_FEATURES;
-	ipvlan_adjust_mtu(ipvlan, phy_dev);
+	if (!tb[IFLA_MTU])
+		ipvlan_adjust_mtu(ipvlan, phy_dev);
 	INIT_LIST_HEAD(&ipvlan->addrs);
 	spin_lock_init(&ipvlan->addrs_lock);
 
-- 
2.1.0

^ permalink raw reply related

* [PATCH net] vhost_net: validate sock before trying to put its fd
From: Jason Wang @ 2018-06-21  5:11 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel; +Cc: dan.carpenter

Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when
we meet errors during ubuf allocation, the code does not check for
NULL before calling sockfd_put(), this will lead NULL
dereferencing. Fixing by checking sock pointer before.

Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 986058a..b97a994 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1208,7 +1208,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 	if (ubufs)
 		vhost_net_ubuf_put_wait_and_free(ubufs);
 err_ubufs:
-	sockfd_put(sock);
+	if (sock)
+		sockfd_put(sock);
 err_vq:
 	mutex_unlock(&vq->mutex);
 err:
-- 
2.7.4

^ permalink raw reply related

* Re: Route fallback issue
From: Grant Taylor @ 2018-06-21  5:13 UTC (permalink / raw)
  To: Julian Anastasov, Akshat Kakkar
  Cc: netdev, cronolog+lartc, lartc, Erik Auerswald
In-Reply-To: <alpine.LFD.2.20.1806202139330.1969@ja.home.ssi.bg>

On 06/20/2018 01:00 PM, Julian Anastasov wrote:
> You can also try alternative routes.

"Alternative routes"?  I can't say as I've heard that description as a 
specific technique / feature / capability before.

Is that it's official name?

Where can I find out more about it?

> But as the kernel supports only default alternative routes, you can put 
> them in their own table:

I don't know that that is the case any more.

I was able to issue the following commands without a problem:

# ip route append 192.0.2.128/26 via 192.0.2.62
# ip route append 192.0.2.128/26 via 192.0.2.126

I crated two network namespaces and had a pair of vEths between them 
(192.0.2.0/26 and 192.0.2.64/26).  I added a dummy network to each NetNS 
(192.0.2.128/26 and 192.0.2.192/26).

I ran the following commands while a persistent ping was running from 
one NetNS to the IP on the other's dummy0 interface:

# ip link set ns2b up && ip route append 192.0.2.192/26 via 192.0.2.126 
&& ip link set ns2a down
(pause and watch things)
# ip link set ns2a up && ip route append 192.0.2.192/26 via 192.0.2.62 
&& ip link set ns2b down
(pause and watch things)

I could iterate between the two above commands and pings continued to work.

So, I think that it's now possible to use "alternate routes" (new to me) 
on specific prefixes in addition to the default.  Thus there is no 
longer any need for a separate table and the associated IP rule.

I'm running kernel version 4.9.76.

I did go ahead and set net.ipv4.conf.ns2b.ignore_routes_with_linkdown to 1.

for i in /proc/sys/net/ipv4/conf/*/ignore_routes_with_linkdown; do echo 
1 > $i; done

Doing that dropped the number of dropped pings from 60 ~ 90 (1 / second) 
to 0 ~ 5 (1 / second).  (Rarely, maybe 1 out of 20 flips, would it take 
upwards of 10 pings / seconds.)

> # Alternative routes use same metric!!!
> ip route append default via 192.168.1.254 dev eno1 table 100
> ip route append default via 192.168.2.254 dev eno2 table 100
> ip rule add prio 100 to 172.16.0.0/12 table 100

I did have to "append" the route.  I couldn't just "add" the route. 
When I tried to "add" the second route, I got an error about the route 
already existing.  Using "append" instead of "add" with everything else 
the same worked just fine.

Note:  I did go ahead and remove the single route that was added via 
"add" and used "append" for both.

> Of course, you will get better results if an user space tool puts only 
> alive routes in service after doing health checks of all near gateways.

I've got to say, with as well as this is working, I don't feel any need 
for a user space monitoring daemon.  I agree that I've felt the need for 
such in the past before I learned about "alternative routes".

I still want to learn more about "alternative routes".

Here's a diagram of the test network if someone wants to try to 
reproduce my findings:

+-------------+                +-------------+
| NS1         |                |         NS2 |
|        ns2a +-----vEth-A-----+ ns1a        |
|             |                |             |
+ dummy0      |                |      dummy0 +
|             |                |             |
|        ns2b +-----vEth-B-----+ ns1b        |
|             |                |             |
+-------------+                +-------------+

(vEths get the name of the NS that they face.)

NS1:ns2a     192.0.2.1     /26
NS1:ns2b     192.0.2.65    /26
NS1:dummy0   192.0.2.129   /26
NS2:ns1a     192.0.2.62    /26
NS2:ns1b     192.0.2.126   /26
NS2:dummy0   192.0.2.254   /26

-- 
Grant. . . .
unix || die

^ permalink raw reply

* Re: [PATCH bpf 1/2] tools: bpftool: remove duplicated error message on prog load
From: Song Liu @ 2018-06-21  5:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, oss-drivers
In-Reply-To: <20180620184246.18672-2-jakub.kicinski@netronome.com>

On Wed, Jun 20, 2018 at 11:42 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> do_pin_fd() will already print out an error message if something
> goes wrong.  Printing another error is unnecessary and will break
> JSON output, since error messages are full objects:
>
> $ bpftool -jp prog load tracex1_kern.o /sys/fs/bpf/a
> {
>     "error": "can't pin the object (/sys/fs/bpf/a): File exists"
> },{
>     "error": "failed to pin program"
> }
>
> Fixes: 49a086c201a9 ("bpftool: implement prog load command")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/bpf/bpftool/prog.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 05f42a46d6ed..12b694fe0404 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -694,10 +694,8 @@ static int do_load(int argc, char **argv)
>                 return -1;
>         }
>
> -       if (do_pin_fd(prog_fd, argv[1])) {
> -               p_err("failed to pin program");
> +       if (do_pin_fd(prog_fd, argv[1]))
>                 return -1;
> -       }
>
>         if (json_output)
>                 jsonw_null(json_wtr);
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH bpf 2/2] tools: bpftool: remember to close the libbpf object after prog load
From: Song Liu @ 2018-06-21  5:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, oss-drivers
In-Reply-To: <20180620184246.18672-3-jakub.kicinski@netronome.com>

On Wed, Jun 20, 2018 at 11:42 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> Remembering to close all descriptors and free memory may not seem
> important in a user space tool like bpftool, but if we were to run
> in batch mode the consumed resources start to add up quickly.  Make
> sure program load closes the libbpf object (which unloads and frees
> it).
>
> Fixes: 49a086c201a9 ("bpftool: implement prog load command")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/bpf/bpftool/prog.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 12b694fe0404..959aa53ab678 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -695,12 +695,18 @@ static int do_load(int argc, char **argv)
>         }
>
>         if (do_pin_fd(prog_fd, argv[1]))
> -               return -1;
> +               goto err_close_obj;
>
>         if (json_output)
>                 jsonw_null(json_wtr);
>
> +       bpf_object__close(obj);
> +
>         return 0;
> +
> +err_close_obj:
> +       bpf_object__close(obj);
> +       return -1;
>  }
>
>  static int do_help(int argc, char **argv)
> --
> 2.17.1
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox