Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Andrew Lunn @ 2019-02-07 19:57 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Russell King,
	netdev@vger.kernel.org
In-Reply-To: <e784e845-5e25-9916-343a-0d60fbf7fc9b@gmail.com>

On Thu, Feb 07, 2019 at 08:05:55PM +0100, Heiner Kallweit wrote:
> Let genphy_c45_read_link manage the devices to check, this removes
> overhead from callers. Devices VEND1 and VEND2 will never be checked,
> for now adopt the logic of the Marvell driver to also exclude PHY XS.
> 
> At the moment we have very few clause 45 PHY drivers, so we are
> lacking experience whether other drivers will have to exclude further
> devices, or may need to check PHY XS. If we should figure out that
> list of devices to check needs to be configurable, I think best will
> be to add a device list member to struct phy_driver.

Hi Heiner

For the Aquantia PHY you probably need to exclude MDIO_MMD_C22EXT.

There is no register 1D:1 listed in the datasheet.

PHY XS is the interface towards the MAC. You cannot configure the MAC
to use the correct interface mode until the PHY has link to its peer
and the link mode has been determined. So you need the PHY to signal
link up independent of the MAC-PHY link, to kick off the configuration
of the MAC. When using phylink, the MAC can indicate it has a link to
the PHY using phylink_mac_change().

      Andrew

^ permalink raw reply

* Re: [PATCH v2 bpf-next 2/2] btf: expose API to work with raw btf data
From: Andrii Nakryiko @ 2019-02-07 20:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Song Liu, Yonghong Song, Alexei Starovoitov,
	Martin Lau, netdev, Daniel Borkmann
In-Reply-To: <CAEf4Bzbj8=YBYDs52TA8r4HAKS64rZMg9X0WGTdcJBLauL+F3A@mail.gmail.com>

On Thu, Feb 7, 2019 at 11:21 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Feb 5, 2019 at 10:25 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Feb 05, 2019 at 09:46:14PM -0800, Andrii Nakryiko wrote:
> > > On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> > > > > This patch exposes two new APIs btf__get_raw_data_size() and
> > > > > btf__get_raw_data() that allows to get a copy of raw BTF data out of
> > > > > struct btf. This is useful for external programs that need to manipulate
> > > > > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> > > > > and then writing it back to file.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > > Acked-by: Song Liu <songliubraving@fb.com>
> > > > > ---
> > > > >  tools/lib/bpf/btf.c      | 10 ++++++++++
> > > > >  tools/lib/bpf/btf.h      |  2 ++
> > > > >  tools/lib/bpf/libbpf.map |  2 ++
> > > > >  3 files changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > > > index 1c2ba7182400..34bfb3641aac 100644
> > > > > --- a/tools/lib/bpf/btf.c
> > > > > +++ b/tools/lib/bpf/btf.c
> > > > > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
> > > > >       return btf->fd;
> > > > >  }
> > > > >
> > > > > +__u32 btf__get_raw_data_size(const struct btf *btf)
> > > > > +{
> > > > > +     return btf->data_size;
> > > > > +}
> > > > > +
> > > > > +void btf__get_raw_data(const struct btf *btf, char *data)
> > > > > +{
> > > > > +     memcpy(data, btf->data, btf->data_size);
> > > > > +}
> > > >
> > > > I cannot think of any other way to use this api,
> > > > but to call btf__get_raw_data_size() first,
> > > > then malloc that much memory and then call btf__get_raw_data()
> > > > to store btf into it.
> > > >
> > > > If so, may be api should be single call that allocates, copies,
> > > > and returns pointer to new mem and its size?
> > > > Probably less error prone?
> > > >
> > >
> > > I don't have strong preference, but providing pointer to allocated memory
> > > seems more flexible and allows more clever/optimal use of memory from caller
> > > side. E.g., instead of doing two mallocs, you can imagine doing something
> > > like:
> > >
> > > int max_size = max(btf__get_raw_data_size(btf),
> > >                    btf_ext__get_raw_data_size(btf_ext));
> > > char *m = malloc(max_size);
> > > btf__get_raw_data(btf, m);
> > > dump_btf_section_to_file(m, some_file);
> > > btf_ext__get_raw_data(btf_ext, m);
> > > dump_btf_ext_section_to_file(m, some_file);
> > > free(m);
> > >
> > > Also, pointer to memory could be mmap()'ed file, for instance. In general,
> > > for a library it might be a good thing to not be prescriptive as to how one
> > > gets that piece of memory.
> >
> > Plausible, but I'd like to see pahole patches to be convinced ;)
> >
>
> Here's a summary of proposed ways to expose raw data through new api,
> with pros/cons.
>
> 1. Originally proposed two functions. `int btf__get_raw_data_size()`
> to get size, `void btf__get_raw_data(void* buf)` to write raw data to
> a provided buf.
>
> Pros:
>   - allows maximal flexibility for users of this API. They can manage
> memory as it's convenient for them (e.g., reusing same buffer for
> multiple btf and btf_ext raw data).
>   - allows using mmap()'ed memory, as allocation and memory ownership
> is delegated to user
>
> Cons:
>   - has potential of buffer overflows, if user doesn't provide big enough buffer
>
>
> 2. Alexei's proposal to combine getting size in single function that
> internally allocates new memory buffer, copies data and returns it to
> users to use and later free.
>
> Pros:
>   - one less API function
>   - more straightforward usage, it's hard to misuse it (except for
> memory leaking, if memory is not freed)
>
> Cons:
>   - always allocated for each call
>   - least flexible approach, doesn't allow caller to manage memory,
> prevents any kind of direct write to mmap()'ed file
>
> 3. Daniel proposed realloc-like approach, where caller optionally
> provides memory buffer, but we always call realloc() internally to
> ensure we have long enough buffer.
>
> Pros:
>   - allows callers to provide their memory buffer (similar to approach
> #1, but see cons below)
>   - prevents user error with providing too small buffer (similar to approach #2)
>
> Cons:
>   - realloc expects that memory was allocated by previous malloc()
> call, so caller can't allocate bigger chunk of memory and provide
> pointer inside that area (behavior is undefined in that case). This
> requirement is not immediately obvious, so this approach feels more
> error prone than either of approach #1 and #2
>   - still doesn't allow mmap()'ed usage, again due to realloc()'s requirements
>

There is actually approach #4 - just return const void* to an internal
memory buffer. This is trivial for struct btf, will require just
slight changes for struct btf_ext, but it puts all the control in
user's hands without imposing any unnecessary allocations. This
approach seems to provide best of all approaches with no downsides.

>
> Approach #3 looks most subtly-error-prone, as it's too easy to just
> provide pointer that's not at the beginning of malloc()'ed memory, but
> this might not be detected immediately, and could potentially lead to
> silent memory corruption.
>
> I'd still go with approach #1 as it provides least restrictive API,
> even though approach #2 will provide marginally better usability for
> common cases.
>
> Alexei, Daniel, which approach you'd prefer in the end after
> considering all pros and cons?

^ permalink raw reply

* [PATCH v2 net-next] net: fixed-phy: Add fixed_phy_register_with_gpiod() API
From: Moritz Fischer @ 2019-02-07 20:14 UTC (permalink / raw)
  To: netdev; +Cc: andrew, f.fainelli, hkallweit1, davem, linux-kernel,
	Moritz Fischer

Add fixed_phy_register_with_gpiod() API. It lets users create a
fixed_phy instance that uses a GPIO descriptor which was obtained
externally e.g. through platform data.
This enables platform devices (non-DT based) to use GPIOs for link
status.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
---

Changes from v1:
- Added Florian's Reviewed-by: tag

Changes from RFC:
- Implemented Andrew's/Florians suggestion to add new API
  instead of changing old API

---
 drivers/net/phy/fixed_phy.c | 32 +++++++++++++++++++++++++-------
 include/linux/phy_fixed.h   | 15 +++++++++++++++
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index d810f914aaa4..b0d1368c3400 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -229,12 +229,12 @@ static struct gpio_desc *fixed_phy_get_gpiod(struct device_node *np)
 }
 #endif
 
-struct phy_device *fixed_phy_register(unsigned int irq,
-				      struct fixed_phy_status *status,
-				      struct device_node *np)
+static struct phy_device *__fixed_phy_register(unsigned int irq,
+					       struct fixed_phy_status *status,
+					       struct device_node *np,
+					       struct gpio_desc *gpiod)
 {
 	struct fixed_mdio_bus *fmb = &platform_fmb;
-	struct gpio_desc *gpiod = NULL;
 	struct phy_device *phy;
 	int phy_addr;
 	int ret;
@@ -243,9 +243,11 @@ struct phy_device *fixed_phy_register(unsigned int irq,
 		return ERR_PTR(-EPROBE_DEFER);
 
 	/* Check if we have a GPIO associated with this fixed phy */
-	gpiod = fixed_phy_get_gpiod(np);
-	if (IS_ERR(gpiod))
-		return ERR_CAST(gpiod);
+	if (!gpiod) {
+		gpiod = fixed_phy_get_gpiod(np);
+		if (IS_ERR(gpiod))
+			return ERR_CAST(gpiod);
+	}
 
 	/* Get the next available PHY address, up to PHY_MAX_ADDR */
 	phy_addr = ida_simple_get(&phy_fixed_ida, 0, PHY_MAX_ADDR, GFP_KERNEL);
@@ -308,8 +310,24 @@ struct phy_device *fixed_phy_register(unsigned int irq,
 
 	return phy;
 }
+
+struct phy_device *fixed_phy_register(unsigned int irq,
+				      struct fixed_phy_status *status,
+				      struct device_node *np)
+{
+	return __fixed_phy_register(irq, status, np, NULL);
+}
 EXPORT_SYMBOL_GPL(fixed_phy_register);
 
+struct phy_device *
+fixed_phy_register_with_gpiod(unsigned int irq,
+			      struct fixed_phy_status *status,
+			      struct gpio_desc *gpiod)
+{
+	return __fixed_phy_register(irq, status, NULL, gpiod);
+}
+EXPORT_SYMBOL_GPL(fixed_phy_register_with_gpiod);
+
 void fixed_phy_unregister(struct phy_device *phy)
 {
 	phy_device_remove(phy);
diff --git a/include/linux/phy_fixed.h b/include/linux/phy_fixed.h
index c78fc203db43..1e5d86ebdaeb 100644
--- a/include/linux/phy_fixed.h
+++ b/include/linux/phy_fixed.h
@@ -19,6 +19,12 @@ extern int fixed_phy_add(unsigned int irq, int phy_id,
 extern struct phy_device *fixed_phy_register(unsigned int irq,
 					     struct fixed_phy_status *status,
 					     struct device_node *np);
+
+extern struct phy_device *
+fixed_phy_register_with_gpiod(unsigned int irq,
+			      struct fixed_phy_status *status,
+			      struct gpio_desc *gpiod);
+
 extern void fixed_phy_unregister(struct phy_device *phydev);
 extern int fixed_phy_set_link_update(struct phy_device *phydev,
 			int (*link_update)(struct net_device *,
@@ -35,6 +41,15 @@ static inline struct phy_device *fixed_phy_register(unsigned int irq,
 {
 	return ERR_PTR(-ENODEV);
 }
+
+static inline struct phy_device *
+fixed_phy_register_with_gpiod(unsigned int irq,
+			      struct fixed_phy_status *status,
+			      struct gpio_desc *gpiod)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 static inline void fixed_phy_unregister(struct phy_device *phydev)
 {
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Heiner Kallweit @ 2019-02-07 20:21 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Russell King,
	netdev@vger.kernel.org
In-Reply-To: <20190207195720.GP32483@lunn.ch>

On 07.02.2019 20:57, Andrew Lunn wrote:
> On Thu, Feb 07, 2019 at 08:05:55PM +0100, Heiner Kallweit wrote:
>> Let genphy_c45_read_link manage the devices to check, this removes
>> overhead from callers. Devices VEND1 and VEND2 will never be checked,
>> for now adopt the logic of the Marvell driver to also exclude PHY XS.
>>
>> At the moment we have very few clause 45 PHY drivers, so we are
>> lacking experience whether other drivers will have to exclude further
>> devices, or may need to check PHY XS. If we should figure out that
>> list of devices to check needs to be configurable, I think best will
>> be to add a device list member to struct phy_driver.
> 
> Hi Heiner
> 
> For the Aquantia PHY you probably need to exclude MDIO_MMD_C22EXT.
> 
> There is no register 1D:1 listed in the datasheet.
> 
Thanks, Andrew. Right, the Aquantia PHY doesn't seem to have the C22EXT
MMD. Because the Aquantia PHY has no device 29 in its package the code
should work.
I also checked the 802.3 spec and it says that registers 29.0 to
29.4 are reserved. At another location the 802.3 spec states that read
access to non-existing registers should return a zero value.
Therefore current code (when reading a 0 from 29.1) may come to the
false conclusion that device 29 reports link down.
So it seems we have to exclude device C22EXT in general. I'll add that
and submit a v2.

> PHY XS is the interface towards the MAC. You cannot configure the MAC
> to use the correct interface mode until the PHY has link to its peer
> and the link mode has been determined. So you need the PHY to signal
> link up independent of the MAC-PHY link, to kick off the configuration
> of the MAC. When using phylink, the MAC can indicate it has a link to
> the PHY using phylink_mac_change().
> 
OK, so excluding PHY XS is right.

>       Andrew
>  
> 
Heiner

^ permalink raw reply

* [PATCH net] vxlan: test dev->flags & IFF_UP before calling netif_rx()
From: Eric Dumazet @ 2019-02-07 20:27 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Petr Machata, Ido Schimmel,
	Roopa Prabhu, Stefano Brivio

netif_rx() must be called under a strict contract.

At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.

Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP

Virtual drivers do not have this guarantee, and must
therefore make the check themselves.

Otherwise we risk use-after-free and/or crashes.

Note this patch also fixes a small issue that came
with commit ce6502a8f957 ("vxlan: fix a use after free
in vxlan_encap_bypass"), since the dev->stats.rx_dropped
change was done on the wrong device.

Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Fixes: ce6502a8f957 ("vxlan: fix a use after free in vxlan_encap_bypass")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Machata <petrm@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
---
 drivers/net/vxlan.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 5209ee9aac47846367d7f469a7e69d08c030087e..2aae11feff0c22c1b073a6a119fc6c311cbe1691 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2219,7 +2219,7 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
 	struct pcpu_sw_netstats *tx_stats, *rx_stats;
 	union vxlan_addr loopback;
 	union vxlan_addr *remote_ip = &dst_vxlan->default_dst.remote_ip;
-	struct net_device *dev = skb->dev;
+	struct net_device *dev;
 	int len = skb->len;
 
 	tx_stats = this_cpu_ptr(src_vxlan->dev->tstats);
@@ -2239,9 +2239,15 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
 #endif
 	}
 
+	rcu_read_lock();
+	dev = skb->dev;
+	if (unlikely(!(dev->flags & IFF_UP))) {
+		kfree_skb(skb);
+		goto drop;
+	}
+
 	if (dst_vxlan->cfg.flags & VXLAN_F_LEARN)
-		vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source, 0,
-			    vni);
+		vxlan_snoop(dev, &loopback, eth_hdr(skb)->h_source, 0, vni);
 
 	u64_stats_update_begin(&tx_stats->syncp);
 	tx_stats->tx_packets++;
@@ -2254,8 +2260,10 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
 		rx_stats->rx_bytes += len;
 		u64_stats_update_end(&rx_stats->syncp);
 	} else {
+drop:
 		dev->stats.rx_dropped++;
 	}
+	rcu_read_unlock();
 }
 
 static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related

* Re: [PATCH net] vxlan: test dev->flags & IFF_UP before calling netif_rx()
From: Eric Dumazet @ 2019-02-07 20:34 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Petr Machata, Ido Schimmel, Roopa Prabhu,
	Stefano Brivio
In-Reply-To: <20190207202738.155940-1-edumazet@google.com>

On Thu, Feb 7, 2019 at 12:27 PM Eric Dumazet <edumazet@google.com> wrote:
>
> netif_rx() must be called under a strict contract.
>
> At device dismantle phase, core networking clears IFF_UP
> and flush_all_backlogs() is called after rcu grace period
> to make sure no incoming packet might be in a cpu backlog
> and still referencing the device.
>
> Most drivers call netif_rx() from their interrupt handler,
> and since the interrupts are disabled at device dismantle,
> netif_rx() does not have to check dev->flags & IFF_UP
>
> Virtual drivers do not have this guarantee, and must
> therefore make the check themselves.

Since other netif_rx() callers seem to have the same issue, I wonder
if we simply could add a generic check.

Most high performance drivers no longer call netif_rx() so an extra check
in it wont be a problem.

^ permalink raw reply

* [PATCH net] af_key: unconditionally clone on broadcast
From: Sean Tranchetti @ 2019-02-07 20:33 UTC (permalink / raw)
  To: eric.dumazet, davem, netdev; +Cc: Sean Tranchetti

Attempting to avoid cloning the skb when broadcasting by inflating
the refcount with sock_hold/sock_put while under RCU lock is dangerous
and violates RCU principles. It leads to subtle race conditions when
attempting to free the SKB, as we may reference sockets that have
already been freed by the stack.

Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b
[006b6b6b6b6b6c4b] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
task: fffffff78f65b380 task.stack: ffffff8049a88000
pc : sock_rfree+0x38/0x6c
lr : skb_release_head_state+0x6c/0xcc
Process repro (pid: 7117, stack limit = 0xffffff8049a88000)
Call trace:
	sock_rfree+0x38/0x6c
	skb_release_head_state+0x6c/0xcc
	skb_release_all+0x1c/0x38
	__kfree_skb+0x1c/0x30
	kfree_skb+0xd0/0xf4
	pfkey_broadcast+0x14c/0x18c
	pfkey_sendmsg+0x1d8/0x408
	sock_sendmsg+0x44/0x60
	___sys_sendmsg+0x1d0/0x2a8
	__sys_sendmsg+0x64/0xb4
	SyS_sendmsg+0x34/0x4c
	el0_svc_naked+0x34/0x38
Kernel panic - not syncing: Fatal exception

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
---
Realized I never actually sent this patch out after testing the changes
Eric recommended. Whoops. Better late then never, I suppose...

 net/key/af_key.c | 40 +++++++++++++++-------------------------
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 655c787..e800891 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -196,30 +196,22 @@ static int pfkey_release(struct socket *sock)
 	return 0;
 }
 
-static int pfkey_broadcast_one(struct sk_buff *skb, struct sk_buff **skb2,
-			       gfp_t allocation, struct sock *sk)
+static int pfkey_broadcast_one(struct sk_buff *skb, gfp_t allocation,
+			       struct sock *sk)
 {
 	int err = -ENOBUFS;
 
-	sock_hold(sk);
-	if (*skb2 == NULL) {
-		if (refcount_read(&skb->users) != 1) {
-			*skb2 = skb_clone(skb, allocation);
-		} else {
-			*skb2 = skb;
-			refcount_inc(&skb->users);
-		}
-	}
-	if (*skb2 != NULL) {
-		if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) {
-			skb_set_owner_r(*skb2, sk);
-			skb_queue_tail(&sk->sk_receive_queue, *skb2);
-			sk->sk_data_ready(sk);
-			*skb2 = NULL;
-			err = 0;
-		}
+	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
+		return err;
+
+	skb = skb_clone(skb, allocation);
+
+	if (skb) {
+		skb_set_owner_r(skb, sk);
+		skb_queue_tail(&sk->sk_receive_queue, skb);
+		sk->sk_data_ready(sk);
+		err = 0;
 	}
-	sock_put(sk);
 	return err;
 }
 
@@ -234,7 +226,6 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation,
 {
 	struct netns_pfkey *net_pfkey = net_generic(net, pfkey_net_id);
 	struct sock *sk;
-	struct sk_buff *skb2 = NULL;
 	int err = -ESRCH;
 
 	/* XXX Do we need something like netlink_overrun?  I think
@@ -253,7 +244,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation,
 		 * socket.
 		 */
 		if (pfk->promisc)
-			pfkey_broadcast_one(skb, &skb2, GFP_ATOMIC, sk);
+			pfkey_broadcast_one(skb, GFP_ATOMIC, sk);
 
 		/* the exact target will be processed later */
 		if (sk == one_sk)
@@ -268,7 +259,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation,
 				continue;
 		}
 
-		err2 = pfkey_broadcast_one(skb, &skb2, GFP_ATOMIC, sk);
+		err2 = pfkey_broadcast_one(skb, GFP_ATOMIC, sk);
 
 		/* Error is cleared after successful sending to at least one
 		 * registered KM */
@@ -278,9 +269,8 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation,
 	rcu_read_unlock();
 
 	if (one_sk != NULL)
-		err = pfkey_broadcast_one(skb, &skb2, allocation, one_sk);
+		err = pfkey_broadcast_one(skb, allocation, one_sk);
 
-	kfree_skb(skb2);
 	kfree_skb(skb);
 	return err;
 }
-- 
1.9.1


^ permalink raw reply related

* [PATCH v2 net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Heiner Kallweit @ 2019-02-07 20:41 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller, Russell King
  Cc: netdev@vger.kernel.org

Let genphy_c45_read_link manage the devices to check, this removes
overhead from callers. Add C22EXT to the list of excluded devices
because it doesn't implement the status register. According to the
802.3 clause 45 spec registers 29.0 - 29.4 are reserved.

At the moment we have very few clause 45 PHY drivers, so we are
lacking experience whether other drivers will have to exclude further
devices, or may need to check PHY XS. If we should figure out that
list of devices to check needs to be configurable, I think best will
be to add a device list member to struct phy_driver.

v2:
- adjusted commit message
- exclude also device C22EXT from link checking

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/marvell10g.c | 10 +---------
 drivers/net/phy/phy-c45.c    | 18 ++++++++++--------
 include/linux/phy.h          |  2 +-
 include/uapi/linux/mdio.h    |  2 ++
 4 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
index 296a537cd..96a79c6c7 100644
--- a/drivers/net/phy/marvell10g.c
+++ b/drivers/net/phy/marvell10g.c
@@ -428,16 +428,8 @@ static int mv3310_read_10gbr_status(struct phy_device *phydev)
 
 static int mv3310_read_status(struct phy_device *phydev)
 {
-	u32 mmd_mask = phydev->c45_ids.devices_in_package;
 	int val;
 
-	/* The vendor devads do not report link status.  Avoid the PHYXS
-	 * instance as there are three, and its status depends on the MAC
-	 * being appropriately configured for the negotiated speed.
-	 */
-	mmd_mask &= ~(BIT(MDIO_MMD_VEND1) | BIT(MDIO_MMD_VEND2) |
-		      BIT(MDIO_MMD_PHYXS));
-
 	phydev->speed = SPEED_UNKNOWN;
 	phydev->duplex = DUPLEX_UNKNOWN;
 	linkmode_zero(phydev->lp_advertising);
@@ -453,7 +445,7 @@ static int mv3310_read_status(struct phy_device *phydev)
 	if (val & MDIO_STAT1_LSTATUS)
 		return mv3310_read_10gbr_status(phydev);
 
-	val = genphy_c45_read_link(phydev, mmd_mask);
+	val = genphy_c45_read_link(phydev);
 	if (val < 0)
 		return val;
 
diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
index b09a5e134..eff9e5a4d 100644
--- a/drivers/net/phy/phy-c45.c
+++ b/drivers/net/phy/phy-c45.c
@@ -118,17 +118,24 @@ EXPORT_SYMBOL_GPL(genphy_c45_aneg_done);
 /**
  * genphy_c45_read_link - read the overall link status from the MMDs
  * @phydev: target phy_device struct
- * @mmd_mask: MMDs to read status from
  *
  * Read the link status from the specified MMDs, and if they all indicate
  * that the link is up, set phydev->link to 1.  If an error is encountered,
  * a negative errno will be returned, otherwise zero.
  */
-int genphy_c45_read_link(struct phy_device *phydev, u32 mmd_mask)
+int genphy_c45_read_link(struct phy_device *phydev)
 {
+	u32 mmd_mask = phydev->c45_ids.devices_in_package;
 	int val, devad;
 	bool link = true;
 
+	/* The vendor devads and C22EXT do not report link status. Avoid the
+	 * PHYXS instance as its status may depend on the MAC being
+	 * appropriately configured for the negotiated speed.
+	 */
+	mmd_mask &= ~(MDIO_DEVS_VEND1 | MDIO_DEVS_VEND2 | MDIO_DEVS_C22EXT |
+		      MDIO_DEVS_PHYXS);
+
 	while (mmd_mask && link) {
 		devad = __ffs(mmd_mask);
 		mmd_mask &= ~BIT(devad);
@@ -274,16 +281,11 @@ EXPORT_SYMBOL_GPL(gen10g_config_aneg);
 
 int gen10g_read_status(struct phy_device *phydev)
 {
-	u32 mmd_mask = phydev->c45_ids.devices_in_package;
-
 	/* For now just lie and say it's 10G all the time */
 	phydev->speed = SPEED_10000;
 	phydev->duplex = DUPLEX_FULL;
 
-	/* Avoid reading the vendor MMDs */
-	mmd_mask &= ~(BIT(MDIO_MMD_VEND1) | BIT(MDIO_MMD_VEND2));
-
-	return genphy_c45_read_link(phydev, mmd_mask);
+	return genphy_c45_read_link(phydev);
 }
 EXPORT_SYMBOL_GPL(gen10g_read_status);
 
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 237dd0358..f41bf651f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1094,7 +1094,7 @@ int genphy_write_mmd_unsupported(struct phy_device *phdev, int devnum,
 /* Clause 45 PHY */
 int genphy_c45_restart_aneg(struct phy_device *phydev);
 int genphy_c45_aneg_done(struct phy_device *phydev);
-int genphy_c45_read_link(struct phy_device *phydev, u32 mmd_mask);
+int genphy_c45_read_link(struct phy_device *phydev);
 int genphy_c45_read_lpa(struct phy_device *phydev);
 int genphy_c45_read_pma(struct phy_device *phydev);
 int genphy_c45_pma_setup_forced(struct phy_device *phydev);
diff --git a/include/uapi/linux/mdio.h b/include/uapi/linux/mdio.h
index d435b00d6..2e6e309f0 100644
--- a/include/uapi/linux/mdio.h
+++ b/include/uapi/linux/mdio.h
@@ -123,6 +123,8 @@
 #define MDIO_DEVS_TC			MDIO_DEVS_PRESENT(MDIO_MMD_TC)
 #define MDIO_DEVS_AN			MDIO_DEVS_PRESENT(MDIO_MMD_AN)
 #define MDIO_DEVS_C22EXT		MDIO_DEVS_PRESENT(MDIO_MMD_C22EXT)
+#define MDIO_DEVS_VEND1			MDIO_DEVS_PRESENT(MDIO_MMD_VEND1)
+#define MDIO_DEVS_VEND2			MDIO_DEVS_PRESENT(MDIO_MMD_VEND2)
 
 /* Control register 2. */
 #define MDIO_PMA_CTRL2_TYPE		0x000f	/* PMA/PMD type selection */
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Andrew Lunn @ 2019-02-07 20:50 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Russell King,
	netdev@vger.kernel.org
In-Reply-To: <25962cfd-f759-10ab-e7a5-9816852412c1@gmail.com>

> Thanks, Andrew. Right, the Aquantia PHY doesn't seem to have the C22EXT
> MMD. Because the Aquantia PHY has no device 29 in its package the code
> should work.

It lists device 29 in its devices in package. So the current code does
look there.

> So it seems we have to exclude device C22EXT in general. I'll add that
> and submit a v2.

Yes.

	Andrew

^ permalink raw reply

* Re: [PATCH v2 net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Andrew Lunn @ 2019-02-07 20:59 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, Russell King,
	netdev@vger.kernel.org
In-Reply-To: <7c00df20-b3e1-7ea9-2adb-004e6291d52c@gmail.com>

On Thu, Feb 07, 2019 at 09:41:46PM +0100, Heiner Kallweit wrote:
> Let genphy_c45_read_link manage the devices to check, this removes
> overhead from callers. Add C22EXT to the list of excluded devices
> because it doesn't implement the status register. According to the
> 802.3 clause 45 spec registers 29.0 - 29.4 are reserved.
> 
> At the moment we have very few clause 45 PHY drivers, so we are
> lacking experience whether other drivers will have to exclude further
> devices, or may need to check PHY XS. If we should figure out that
> list of devices to check needs to be configurable, I think best will
> be to add a device list member to struct phy_driver.
> 
> v2:
> - adjusted commit message
> - exclude also device C22EXT from link checking
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 0/2] qed*: SmartAN query support
From: David Miller @ 2019-02-07 21:22 UTC (permalink / raw)
  To: skalluru; +Cc: netdev, aelior, mkalderon
In-Reply-To: <20190207142012.4521-1-skalluru@marvell.com>

From: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Date: Thu, 7 Feb 2019 06:20:10 -0800

> SmartAN feature detects the peer/cable capabilities and establishes the
> link in the best possible configuration.
> The patch series adds support for querying the capability. Please consider
> applying it net-next.

Series applied, thanks.

^ permalink raw reply

* Re: [RFC, PATCH] net: page_pool: Don't use page->private to store dma_addr_t
From: David Miller @ 2019-02-07 21:25 UTC (permalink / raw)
  To: ilias.apalodimas; +Cc: willy, brouer, tariqt, toke, netdev, mgorman, linux-mm
In-Reply-To: <20190207152034.GA3295@apalos>

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: Thu, 7 Feb 2019 17:20:34 +0200

> Well updating struct page is the final goal, hence the comment. I am mostly
> looking for opinions here since we are trying to store dma addresses which are
> irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a
> bit controversial isn't it ? If we can add that, then yes the code would look
> better

I fundamentally disagree.

One of the core operations performed on a page is mapping it so that a device
and use it.

Why have ancillary data structure support for this all over the place, rather
than in the common spot which is the page.

A page really is not just a 'mm' structure, it is a system structure.

^ permalink raw reply

* Re: [PATCH net-next 0/2] qed*: SmartAN query support
From: David Miller @ 2019-02-07 21:32 UTC (permalink / raw)
  To: skalluru; +Cc: netdev, aelior, mkalderon
In-Reply-To: <20190207.132235.1892910927125881925.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 07 Feb 2019 13:22:35 -0800 (PST)

> From: Sudarsana Reddy Kalluru <skalluru@marvell.com>
> Date: Thu, 7 Feb 2019 06:20:10 -0800
> 
>> SmartAN feature detects the peer/cable capabilities and establishes the
>> link in the best possible configuration.
>> The patch series adds support for querying the capability. Please consider
>> applying it net-next.
> 
> Series applied, thanks.

Nevermind, I reverted before pushing out as I see Jakub has some questions.

^ permalink raw reply

* Re: [RFC, PATCH] net: page_pool: Don't use page->private to store dma_addr_t
From: Matthew Wilcox @ 2019-02-07 21:34 UTC (permalink / raw)
  To: David Miller
  Cc: ilias.apalodimas, brouer, tariqt, toke, netdev, mgorman, linux-mm
In-Reply-To: <20190207.132519.1698007650891404763.davem@davemloft.net>

On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote:
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Date: Thu, 7 Feb 2019 17:20:34 +0200
> 
> > Well updating struct page is the final goal, hence the comment. I am mostly
> > looking for opinions here since we are trying to store dma addresses which are
> > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a
> > bit controversial isn't it ? If we can add that, then yes the code would look
> > better
> 
> I fundamentally disagree.
> 
> One of the core operations performed on a page is mapping it so that a device
> and use it.
> 
> Why have ancillary data structure support for this all over the place, rather
> than in the common spot which is the page.
> 
> A page really is not just a 'mm' structure, it is a system structure.

+1

The fundamental point of computing is to do I/O.

^ permalink raw reply

* Re: [RFC, PATCH] net: page_pool: Don't use page->private to store dma_addr_t
From: Ilias Apalodimas @ 2019-02-07 21:37 UTC (permalink / raw)
  To: David Miller; +Cc: willy, brouer, tariqt, toke, netdev, mgorman, linux-mm
In-Reply-To: <20190207.132519.1698007650891404763.davem@davemloft.net>

Hi David,

On Thu, Feb 07, 2019 at 01:25I:19PM -0800, David Miller wrote:
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Date: Thu, 7 Feb 2019 17:20:34 +0200
> 
> > Well updating struct page is the final goal, hence the comment. I am mostly
> > looking for opinions here since we are trying to store dma addresses which are
> > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a
> > bit controversial isn't it ? If we can add that, then yes the code would look
> > better
> 
> I fundamentally disagree.
> 
> One of the core operations performed on a page is mapping it so that a device
> and use it.
> 
> Why have ancillary data structure support for this all over the place, rather
> than in the common spot which is the page.

You are right on that. Moreover the intention of this change is to facilitate
the page recycling patches we proposed with Jesper. In that context we do need
the dma mapping information in a common spot since we'll need to access it from
drivers, networking code etc. The struct page *is* the best place for that.

> 
> A page really is not just a 'mm' structure, it is a system structure.

Well if you put it that way i completely agree (also it makes our life a *lot*
easier :))


Thanks
/Ilias

^ permalink raw reply

* Re: [ISSUE][4.20.6] mlx5 and checksum failures
From: Marcelo Ricardo Leitner @ 2019-02-07 21:40 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: Cong Wang, David Miller, Saeed Mahameed,
	Linux Kernel Network Developers
In-Reply-To: <CAA85sZsbmJ=c69h6yzd0LAzd198=mppjqjPZu_EXhQP+soe4Yw@mail.gmail.com>

On Wed, Feb 06, 2019 at 11:41:28PM +0100, Ian Kumlien wrote:
> On Wed, Feb 6, 2019 at 11:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Feb 6, 2019 at 2:15 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > Could we please schedule this for 4.19 and 4.20 - it's kinda breaking things
> >
> > It doesn't break anything, packets are _not_ dropped, only that the
> > warning itself is noisy.
> 
> Not my experience, to me it slows the machine down and looses packets,
> I don't however know
> if this is the only culprit
> 
> You can actually see it on ping where it start out with 0.0xyx and
> ends up at ~10ms

Serial console is a/the killer in these situations.

^ permalink raw reply

* Re: [RFC, PATCH] net: page_pool: Don't use page->private to store dma_addr_t
From: Ilias Apalodimas @ 2019-02-07 21:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Miller, brouer, tariqt, toke, netdev, mgorman, linux-mm
In-Reply-To: <20190207213400.GA21860@bombadil.infradead.org>

Hi Matthew,

On Thu, Feb 07, 2019 at 01:34:00PM -0800, Matthew Wilcox wrote:
> On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote:
> > From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> > Date: Thu, 7 Feb 2019 17:20:34 +0200
> > 
> > > Well updating struct page is the final goal, hence the comment. I am mostly
> > > looking for opinions here since we are trying to store dma addresses which are
> > > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a
> > > bit controversial isn't it ? If we can add that, then yes the code would look
> > > better
> > 
> > I fundamentally disagree.
> > 
> > One of the core operations performed on a page is mapping it so that a device
> > and use it.
> > 
> > Why have ancillary data structure support for this all over the place, rather
> > than in the common spot which is the page.
> > 
> > A page really is not just a 'mm' structure, it is a system structure.
> 
> +1
> 
> The fundamental point of computing is to do I/O.
Ok, great that should sort it out then.
I'll use your proposal and base the patch on that. 

Thanks for taking the time with this

/Ilias

^ permalink raw reply

* Re: [PATCH v2] rhashtable: make walk safe from softirq context
From: Herbert Xu @ 2019-02-07 21:48 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless, netdev, Jouni Malinen, Thomas Graf
In-Reply-To: <fee1bdca8bb21a472a75c1a43fa61aad7fe4bff5.camel@sipsolutions.net>

On Thu, Feb 07, 2019 at 02:50:54PM +0100, Johannes Berg wrote:
> 
> > This interface wasn't designed for use in softirq contexts.
> 
> Well, it clearly was used there. You even gave it a gfp_t argument in
> rhashtable_walk_init(), so you can't really claim it wasn't designed for
> this. I see now that it's ignored, but still?

I see.  This was added behind my back so I wasn't aware of it.

> > Could you please show me who is doing this so I can review that
> > to see whether it's a legitimate use of this API?
> 
> I'm sure you'll say it's not legitimate, but it still exists ;-)
> 
> mesh_plink_broken() gets called from the TX status path, via
> ieee80211s_update_metric().

Let me take a look.

Thanks!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Heiner Kallweit @ 2019-02-07 21:54 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Russell King,
	netdev@vger.kernel.org
In-Reply-To: <20190207205054.GA5223@lunn.ch>

On 07.02.2019 21:50, Andrew Lunn wrote:
>> Thanks, Andrew. Right, the Aquantia PHY doesn't seem to have the C22EXT
>> MMD. Because the Aquantia PHY has no device 29 in its package the code
>> should work.
> 
> It lists device 29 in its devices in package. So the current code does
> look there.
> 
I just looked for a description of a device 29. Strange that the device
list states it's there and then it's not there.

When checking that I was scratching my head because of the following code
in genphy_c45_read_link:

devad = __ffs(mmd_mask);
mmd_mask &= ~BIT(devad);

AFAIK __ffs() returns the posix bit number, means it returns 1 for bit 0.
Then this code piece seems to be wrong because I think the intention is
to clear the bit we just found. Instead we clear the next bit.

And device 0 isn't really a device but a flag "Clause 22 registers present".
So far we may have been lucky because to supported 10G PHY has this flag set.
But if this flag is set and we try to access a register 0.1 then we may be
in trouble again. Therefore I think we need to exclude also device 0.
Or do I miss something?

>> So it seems we have to exclude device C22EXT in general. I'll add that
>> and submit a v2.
> 
> Yes.
> 
> 	Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH v2] rhashtable: make walk safe from softirq context
From: Johannes Berg @ 2019-02-07 21:56 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-wireless, netdev, Jouni Malinen, Thomas Graf
In-Reply-To: <20190207214834.awrpkxunnvfdorbe@gondor.apana.org.au>

On Fri, 2019-02-08 at 05:48 +0800, Herbert Xu wrote:
> On Thu, Feb 07, 2019 at 02:50:54PM +0100, Johannes Berg wrote:
> > 
> > > This interface wasn't designed for use in softirq contexts.
> > 
> > Well, it clearly was used there. You even gave it a gfp_t argument in
> > rhashtable_walk_init(), so you can't really claim it wasn't designed for
> > this. I see now that it's ignored, but still?
> 
> I see.  This was added behind my back so I wasn't aware of it.

It's not used and actually I was wrong anyway: this would have also
allowed doing the walk while holding a spinlock or with softirqs
disabled, rather than from IRQ/softirq context.

In any case, it's clearly working to iterate from this context, and
doing a spinlock_bh vs. a spinlock in the rhashtable code isn't really
that big a deal? Not sure I really understand your objection.

johannes

^ permalink raw reply

* Re: [ISSUE][4.20.6] mlx5 and checksum failures
From: Ian Kumlien @ 2019-02-07 22:01 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Cong Wang, David Miller, Saeed Mahameed,
	Linux Kernel Network Developers
In-Reply-To: <CALzJLG8wPpiS+aU674B=jRK3ugbu8EfxY5Wrcs1_RwsjDtLhUg@mail.gmail.com>

On Thu, Feb 7, 2019 at 7:43 PM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
>
> On Thu, Feb 7, 2019 at 2:17 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Thu, Feb 7, 2019 at 2:01 AM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> > > On Wed, Feb 6, 2019 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > It changes directly after the first hw checksum failure, I don't know why =/
> > >
> > > weird, Maybe a real check-summing issue/corruption on the PCI ?!
> >
> > Actually, it seems to have been introduced in 4.20.6 - 4.20.5 works just fine
> >
>
> Great, the difference is only 120 patches.
> that is bisect-able, it will only take 5 iterations to find the
> offending commit.

I just wish it wasn't a server that takes, what feels like 5 minutes to boot...

All of these seas of sensors 2d and 3d... =P

But, yep, that's the plan

> > Just FYI, my dmesg testcase:
> > time ssh <server> "dmesg && exit
> > real    3m5.845s
> > user    0m0.035s
> > sys     0m0.041s
> >
> > > can you try turning off checksum offloads
> > > ethtool -K ethX  rx off
> >
> > same test:
> > real    0m3.408s
> > user    0m0.022s
> > sys     0m0.032s
> >
> > So yes, something in 4.20.6 goes wrong on the receiving part :/

^ permalink raw reply

* Re: [PATCH v2] rhashtable: make walk safe from softirq context
From: Herbert Xu @ 2019-02-07 22:02 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless, netdev, Jouni Malinen, Thomas Graf
In-Reply-To: <20190207214834.awrpkxunnvfdorbe@gondor.apana.org.au>

On Fri, Feb 08, 2019 at 05:48:34AM +0800, Herbert Xu wrote:
>
> > > Could you please show me who is doing this so I can review that
> > > to see whether it's a legitimate use of this API?
> > 
> > I'm sure you'll say it's not legitimate, but it still exists ;-)
> > 
> > mesh_plink_broken() gets called from the TX status path, via
> > ieee80211s_update_metric().
> 
> Let me take a look.

OK I think it is wrong but you already knew that :)

First of all our current walk interface does not guarantee that you
will see all elements because a parallel remove can cause you to miss
elements.  Although Neil Brown was trying to fix that it is still
not in the tree yet.

But more fundamentally, iterating over an unbounded list in a sotirq
context is *broken*!

Either you don't really need a hash table at all because your list
is short enough to start with, or you need to add more hash tables
(or other data structures) to efficiently look things up using these
alternative keys so that you don't end up hogging the softirq.

So this needs to be fixed in mesh.

Once that is gone hopefully we can remove the long obsolete init
API that carries the GFP flag.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Heiner Kallweit @ 2019-02-07 22:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David Miller, Russell King,
	netdev@vger.kernel.org
In-Reply-To: <974f6bca-c938-aac9-28b4-291cca0734db@gmail.com>

On 07.02.2019 22:54, Heiner Kallweit wrote:
> On 07.02.2019 21:50, Andrew Lunn wrote:
>>> Thanks, Andrew. Right, the Aquantia PHY doesn't seem to have the C22EXT
>>> MMD. Because the Aquantia PHY has no device 29 in its package the code
>>> should work.
>>
>> It lists device 29 in its devices in package. So the current code does
>> look there.
>>
> I just looked for a description of a device 29. Strange that the device
> list states it's there and then it's not there.
> 
> When checking that I was scratching my head because of the following code
> in genphy_c45_read_link:
> 
> devad = __ffs(mmd_mask);
> mmd_mask &= ~BIT(devad);
> 
> AFAIK __ffs() returns the posix bit number, means it returns 1 for bit 0.
> Then this code piece seems to be wrong because I think the intention is
> to clear the bit we just found. Instead we clear the next bit.
> 
Nope, it doesn't seem to be the posix version .. Just checked the code of __ffs().

> And device 0 isn't really a device but a flag "Clause 22 registers present".
> So far we may have been lucky because to supported 10G PHY has this flag set.
> But if this flag is set and we try to access a register 0.1 then we may be
> in trouble again. Therefore I think we need to exclude also device 0.
> Or do I miss something?
> 
>>> So it seems we have to exclude device C22EXT in general. I'll add that
>>> and submit a v2.
>>
>> Yes.
>>
>> 	Andrew
>>
> Heiner
> 


^ permalink raw reply

* Re: [PATCH net-next] net: phy: let genphy_c45_read_link manage the devices to check
From: Russell King - ARM Linux admin @ 2019-02-07 22:19 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Andrew Lunn, Florian Fainelli, David Miller,
	netdev@vger.kernel.org
In-Reply-To: <974f6bca-c938-aac9-28b4-291cca0734db@gmail.com>

On Thu, Feb 07, 2019 at 10:54:51PM +0100, Heiner Kallweit wrote:
> On 07.02.2019 21:50, Andrew Lunn wrote:
> >> Thanks, Andrew. Right, the Aquantia PHY doesn't seem to have the C22EXT
> >> MMD. Because the Aquantia PHY has no device 29 in its package the code
> >> should work.
> > 
> > It lists device 29 in its devices in package. So the current code does
> > look there.
> > 
> I just looked for a description of a device 29. Strange that the device
> list states it's there and then it's not there.
> 
> When checking that I was scratching my head because of the following code
> in genphy_c45_read_link:
> 
> devad = __ffs(mmd_mask);
> mmd_mask &= ~BIT(devad);
> 
> AFAIK __ffs() returns the posix bit number, means it returns 1 for bit 0.

I think you're wrong.  Look at include/asm-generic/bitops/__ffs.h.  If
'word' is 0x00000001, then the function returns zero.  If 'word' is
0x00000002, it returns 1, etc.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH bpf-next 1/6] bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper
From: Daniel Borkmann @ 2019-02-07 22:21 UTC (permalink / raw)
  To: Martin Lau
  Cc: netdev@vger.kernel.org, Alexei Starovoitov, Kernel Team,
	Lawrence Brakmo
In-Reply-To: <20190207072716.qat44fx2t6wqvu4l@kafai-mbp.dhcp.thefacebook.com>

On 02/07/2019 08:27 AM, Martin Lau wrote:
[...]
> Following up the discussion in the iovisor conf call.
> 
> One of discussion was about:
> other than tw, can __sk_buff->sk always return a
> fullsock (PTR_TO_SOCKET_OR_NULL).  In request_sock case,
> it is doable because it can trace back to the listener sock.
> 
> However, that will go back to the sock_common accessing question.
> In particular, how to access the sock_common's fields of the
> request_sock itself?  Those fields in the request_sock are different
> from its listener sock.  e.g. the skc_daddr and skc_dport.
> 
> Also, if the sock_common fields of tw is needed, it will become weird
> because likely a new "struct bpf_tw_sock" is needed which is OK
> but all sock_common fields need to be copied from bpf_sock
> to bpf_tw_sock.
> 
> I think reading a sk from a ctx should return the
> most basic type PTR_TO_SOCK_COMMON_OR_NULL (unless the running
> ctx can guarantee that it always has a fullsock).
> Currently, it is __sk_buff->sk.  Later, sock_ops->sk...etc.
> One single 'struct bpf_sock' and limit fullsock field access
> at verification time.  The bpf_prog then moves down the chain
> based on what it needs.  It could be fullsock, tcp_sock...etc.
> 
> I think that will be the most flexible way to write bpf_prog
> while also avoid having duplicate fields in different
> bpf struct in uapi.

Ok, thanks for following up and sorry for late reply, lets go with
sock_common then. What's the plan to moving forward with accessing
full sk in case of req sk? Separate helper or backed into the newly
added bpf_sk_fullsock() one? Presumably latter?

Thanks,
Daniel

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox