Netdev List
 help / color / mirror / Atom feed
* Re: [v3,3/4] tools: bpftool: add bash-completion for net attach/detach
From: Daniel T. Lee @ 2019-08-08 19:28 UTC (permalink / raw)
  To: Quentin Monnet; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <5cd88036-8682-8d26-b795-caf96e1e883f@netronome.com>

On Fri, Aug 9, 2019 at 1:48 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> 2019-08-07 11:25 UTC+0900 ~ Daniel T. Lee <danieltimlee@gmail.com>
> > This commit adds bash-completion for new "net attach/detach"
> > subcommand for attaching XDP program on interface.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> >  tools/bpf/bpftool/bash-completion/bpftool | 64 +++++++++++++++++++----
> >  1 file changed, 55 insertions(+), 9 deletions(-)
> >
> > diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> > index c8f42e1fcbc9..1d81cb09d478 100644
> > --- a/tools/bpf/bpftool/bash-completion/bpftool
> > +++ b/tools/bpf/bpftool/bash-completion/bpftool
> > @@ -201,6 +201,10 @@ _bpftool()
> >              _bpftool_get_prog_tags
> >              return 0
> >              ;;
> > +        dev)
> > +            _sysfs_get_netdevs
> > +            return 0
> > +            ;;
>
> Makes sense to have this for "dev", thanks! But it seems you missed one
> place where it was used, for "bpftool feature probe" (We have "[[ $prev
> == "dev" ]] && _sysfs_get_netdevs && return 0"). Could you also remove
> that one please?
>
> Other than this looks good, thanks:
>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

My bad. Thanks for letting me know.
I'll update it with the next version of patch.

Thank you for your review.
I really appreciate it.

^ permalink raw reply

* Re: [PATCH net] net: phy: rtl8211f: do a double read to get real time link status
From: Heiner Kallweit @ 2019-08-08 19:26 UTC (permalink / raw)
  To: Yonglong Liu, davem, andrew
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang,
	shiju.jose
In-Reply-To: <c15f820b-cc80-9a93-4c48-1b60bc14f73a@huawei.com>

On 08.08.2019 08:21, Yonglong Liu wrote:
> 
> 
> On 2019/8/8 14:11, Heiner Kallweit wrote:
>> On 08.08.2019 03:15, Yonglong Liu wrote:
>>>
>>>
>>> On 2019/8/8 0:47, Heiner Kallweit wrote:
>>>> On 07.08.2019 15:16, Yonglong Liu wrote:
>>>>> [   27.232781] hns3 0000:bd:00.3 eth7: net open
>>>>> [   27.237303] 8021q: adding VLAN 0 to HW filter on device eth7
>>>>> [   27.242972] IPv6: ADDRCONF(NETDEV_CHANGE): eth7: link becomes ready
>>>>> [   27.244449] hns3 0000:bd:00.3: invalid speed (-1)
>>>>> [   27.253904] hns3 0000:bd:00.3 eth7: failed to adjust link.
>>>>> [   27.259379] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: PHY state change UP -> RUNNING
>>>>> [   27.924903] hns3 0000:bd:00.3 eth7: link up
>>>>> [   28.280479] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: PHY state change RUNNING -> NOLINK
>>>>> [   29.208452] hns3 0000:bd:00.3 eth7: link down
>>>>> [   32.376745] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: PHY state change NOLINK -> RUNNING
>>>>> [   33.208448] hns3 0000:bd:00.3 eth7: link up
>>>>> [   35.253821] hns3 0000:bd:00.3 eth7: net stop
>>>>> [   35.258270] hns3 0000:bd:00.3 eth7: link down
>>>>>
>>>>> When using rtl8211f in polling mode, may get a invalid speed,
>>>>> because of reading a fake link up and autoneg complete status
>>>>> immediately after starting autoneg:
>>>>>
>>>>>         ifconfig-1176  [007] ....    27.232763: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>>>>>   kworker/u257:1-670   [015] ....    27.232805: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x04 val:0x01e1
>>>>>   kworker/u257:1-670   [015] ....    27.232815: mdio_access: mii-0000:bd:00.3 write phy:0x07 reg:0x04 val:0x05e1
>>>>>   kworker/u257:1-670   [015] ....    27.232869: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>>>>>   kworker/u257:1-670   [015] ....    27.232904: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x09 val:0x0200
>>>>>   kworker/u257:1-670   [015] ....    27.232940: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>>>>>   kworker/u257:1-670   [015] ....    27.232949: mdio_access: mii-0000:bd:00.3 write phy:0x07 reg:0x00 val:0x1240
>>>>>   kworker/u257:1-670   [015] ....    27.233003: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>>>>>   kworker/u257:1-670   [015] ....    27.233039: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0a val:0x3002
>>>>>   kworker/u257:1-670   [015] ....    27.233074: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x09 val:0x0200
>>>>>   kworker/u257:1-670   [015] ....    27.233110: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x05 val:0x0000
>>>>>   kworker/u257:1-670   [000] ....    28.280475: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x7989
>>>>>   kworker/u257:1-670   [000] ....    29.304471: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x7989
>>>>>
>>>>> According to the datasheet of rtl8211f, to get the real time
>>>>> link status, need to read MII_BMSR twice.
>>>>>
>>>>> This patch add a read_status hook for rtl8211f, and do a fake
>>>>> phy_read before genphy_read_status(), so that can get real link
>>>>> status in genphy_read_status().
>>>>>
>>>>> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
>>>>> ---
>>>>>  drivers/net/phy/realtek.c | 13 +++++++++++++
>>>>>  1 file changed, 13 insertions(+)
>>>>>
>>>> Is this an accidental resubmit? Because we discussed this in
>>>> https://marc.info/?t=156413509900003&r=1&w=2 and a fix has
>>>> been applied already.
>>>>
>>>> Heiner
>>>>
>>>> .
>>>>
>>>
>>> In https://marc.info/?t=156413509900003&r=1&w=2 , the invalid speed
>>> recurrence rate is almost 100%, and I had test the solution about
>>> 5 times and it works. But yesterday it happen again suddenly, and than
>>> I fount that the recurrence rate reduce to 10%. This time we get 0x79ad
>>> after autoneg started which is not 0x798d from last discussion.
>>>
>>>
>>>
>> OK, I'll have a look.
>> However the approach is wrong. The double read is related to the latching
>> of link-down events. This is done by all PHY's and not specific to RT8211F.
>> Also it's not related to the problem. I assume any sufficient delay would
>> do instead of the read.
>>
>> .
>>
> 
> So you will send a new patch to fix this problem? I am waiting for it,
> and can do a full test this time.
> 
> 
Can you try the following? This delay should give thy PHY enough time
to clear both bits before the following read is done.

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ef7aa738e..32f327a44 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -568,6 +568,11 @@ int phy_start_aneg(struct phy_device *phydev)
 	if (err < 0)
 		goto out_unlock;
 
+	/* The PHY may not yet have cleared aneg-completed and link-up bit
+	 * w/o this delay when the following read is done.
+	 */
+	usleep_range(1000, 2000);
+
 	if (phy_is_started(phydev))
 		err = phy_check_link_status(phydev);
 out_unlock:
-- 
2.22.0



^ permalink raw reply related

* Re: [v3,1/4] tools: bpftool: add net attach command to attach XDP on interface
From: Daniel T. Lee @ 2019-08-08 19:22 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190808104948.7e72dea0@cakuba.netronome.com>

On Fri, Aug 9, 2019 at 2:50 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 8 Aug 2019 07:15:22 +0900, Daniel T. Lee wrote:
> > > > +             return -EINVAL;
> > > > +     }
> > > > +
> > > > +     NEXT_ARG();
> > >
> > > nit: the new line should be before NEXT_ARG(), IOV NEXT_ARG() belongs
> > > to the code which consumed the argument
> > >
> >
> > I'm not sure I'm following.
> > Are you saying that, at here the newline shouldn't be necessary?
>
> I mean this is better:
>
>         if (!is_prefix(*argv, "bla-bla"))
>                 return -EINVAL;
>         NEXT_ARG();
>
>         if (!is_prefix(*argv, "bla-bla"))
>                 return -EINVAL;
>         NEXT_ARG();
>
> Than this:
>
>         if (!is_prefix(*argv, "bla-bla"))
>                 return -EINVAL;
>
>         NEXT_ARG();
>         if (!is_prefix(*argv, "bla-bla"))
>                 return -EINVAL;
>
>         NEXT_ARG();
>
> Because the NEXT_ARG() "belongs" to the code that "consumed" the option.
>
> So instead of this:
>
>      attach_type = parse_attach_type(*argv);
>      if (attach_type == max_net_attach_type) {
>              p_err("invalid net attach/detach type");
>              return -EINVAL;
>      }
>
>      NEXT_ARG();
>      progfd = prog_parse_fd(&argc, &argv);
>      if (progfd < 0)
>              return -EINVAL;
>
> This seems more logical to me:
>
>      attach_type = parse_attach_type(*argv);
>      if (attach_type == max_net_attach_type) {
>              p_err("invalid net attach/detach type");
>              return -EINVAL;
>      }
>      NEXT_ARG();
>
>      progfd = prog_parse_fd(&argc, &argv);
>      if (progfd < 0)
>              return -EINVAL;

Oh. I see.
I'll update NEXT_ARG line stick to the code which "consumes" the option.

Thanks for the review! :)

^ permalink raw reply

* Re: [PATCH net v3] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Boris Pismenny @ 2019-08-08 19:18 UTC (permalink / raw)
  To: Jakub Kicinski, davem@davemloft.net
  Cc: netdev@vger.kernel.org, davejwatson@fb.com, Aviad Yehezkel,
	john.fastabend@gmail.com, daniel@iogearbox.net,
	willemb@google.com, edumazet@google.com,
	alexei.starovoitov@gmail.com, oss-drivers@netronome.com
In-Reply-To: <20190808000359.20785-1-jakub.kicinski@netronome.com>



On 08/08/2019 3:03, Jakub Kicinski wrote:
> sk_validate_xmit_skb() and drivers depend on the sk member of
> struct sk_buff to identify segments requiring encryption.
> Any operation which removes or does not preserve the original TLS
> socket such as skb_orphan() or skb_clone() will cause clear text
> leaks.
> 
> Make the TCP socket underlying an offloaded TLS connection
> mark all skbs as decrypted, if TLS TX is in offload mode.
> Then in sk_validate_xmit_skb() catch skbs which have no socket
> (or a socket with no validation) and decrypted flag set.
> 
> Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
> sk->sk_validate_xmit_skb are slightly interchangeable right now,
> they all imply TLS offload. The new checks are guarded by
> CONFIG_TLS_DEVICE because that's the option guarding the
> sk_buff->decrypted member.
> 
> Second, smaller issue with orphaning is that it breaks
> the guarantee that packets will be delivered to device
> queues in-order. All TLS offload drivers depend on that
> scheduling property. This means skb_orphan_partial()'s
> trick of preserving partial socket references will cause
> issues in the drivers. We need a full orphan, and as a
> result netem delay/throttling will cause all TLS offload
> skbs to be dropped.
> 
> Reusing the sk_buff->decrypted flag also protects from
> leaking clear text when incoming, decrypted skb is redirected
> (e.g. by TC).
> 
> See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect
> through ULP") for justification why the internal flag is safe.
> The only location which could leak the flag in is tcp_bpf_sendmsg(),
> which is taken care of by clearing the previously unused bit.
> 
> v2:
>  - remove superfluous decrypted mark copy (Willem);
>  - remove the stale doc entry (Boris);
>  - rely entirely on EOR marking to prevent coalescing (Boris);
>  - use an internal sendpages flag instead of marking the socket
>    (Boris).
> v3 (Willem):
>  - reorganize the can_skb_orphan_partial() condition;
>  - fix the flag leak-in through tcp_bpf_sendmsg.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

LGTM

Reviewed-by: Boris Pismenny <borisp@mellanox.com>

^ permalink raw reply

* Re: [bpf-next v3 PATCH 3/3] samples/bpf: xdp_fwd explain bpf_fib_lookup return codes
From: Toke Høiland-Jørgensen @ 2019-08-08 19:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528106777.22124.12162740342925045912.stgit@firesoul>

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> Make it clear that this XDP program depend on the network
> stack to do the ARP resolution.  This is connected with the
> BPF_FIB_LKUP_RET_NO_NEIGH return code from bpf_fib_lookup().
>
> Another common mistake (seen via XDP-tutorial) is that users
> don't realize that sysctl net.ipv{4,6}.conf.all.forwarding
> setting is honored by bpf_fib_lookup.
>
> Reported-by: Anton Protopopov <a.s.protopopov@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Reviewed-by: David Ahern <dsahern@gmail.com>
> Acked-by: Yonghong Song <yhs@fb.com>

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>

^ permalink raw reply

* Re: [bonding][patch] Regarding a bonding lacp issue
From: Jay Vosburgh @ 2019-08-08 19:17 UTC (permalink / raw)
  To: Felix; +Cc: vfalico, andy, netdev, linux-kernel
In-Reply-To: <8799b243-36da-4baf-8c67-aeb5f978c34f.fei.feng@linux.alibaba.com>

Felix <fei.feng@linux.alibaba.com> wrote:

>Dear Mainteners,
>
>Recently I hit a packet drop issue in bonding driver on Linux 4.9. Please
>see details below. Please take a look to see if my understanding is
>correct. Many thanks.
>
>What is the problem?
>The bonding driver starts to send packets even if the Partner(Switch)'s
>Collecting bit is not enabled yet. Partner would drop all packets until
>its Collecting bit is enabled.
>
>What is the root cuase?
>According to LACP spec, the Actor need to check Partner's Sync and
>Collecting bits before enable its Distributing bit and Distributing
>function. Please see the PIC below.

	The diagram you reference is found in 802.1AX-2014 figure 6-21,
which shows the state diagram for an independent control implementation,
i.e., collecting and distributing are managed independently.

	However, Linux bonding implements coupled control, which is
shown in figure 6-22.  Here, there is no Partner.Collecting requirement
on the state transition from ATTACHED to COLLECTING_DISTRIBUTING.

	To quote 802.1AX-2014 6.4.15:

	As independent control is not possible, the coupled control
	state machine does not wait for the Partner to signal that
	collection has started before enabling both collection and
	distribution.

	Now, that said, I agree that what you're seeing is likely
explained by this behavior, and your fix should resolve the immediate
problem (that bonding sends packets before the peer has enabled
COLLECTING).

	However, your fix does put bonding out of compliance with the
standard, as it does not really implement COLLECTING and DISTRIBUTING as
discrete states.  In particular, if the peer in your case were to later
clear Partner.Collecting, bonding will not react to this as a figure
6-21 independent control implementation would (which isn't a change from
current behavior, but currently this isn't expected).

	So, in my opinion a patch like this should have a comment
attached noting that we are deliberately not in compliance with the
standard in this specific situation.  The proper fix is to implement
figure 6-21 separate state.

	Lastly, are you able to test and generate a patch against
current upstream, instead of 4.9?

	-J

>How to fix?
>Please see the diff as following. And the patch is attached.
>
>--- ../origin/linux-4.9.188/drivers/net/bonding/bond_3ad.c 2019-08-07
>00:29:42.000000000 +0800
>+++ drivers/net/bonding/bond_3ad.c 2019-08-08 23:13:29.015640197 +0800
>@@ -937,6 +937,7 @@
>     */
>    if ((port->sm_vars & AD_PORT_SELECTED) &&
>        (port->partner_oper.port_state & AD_STATE_SYNCHRONIZATION) &&
>+       (port->partner_oper.port_state & AD_STATE_COLLECTING) &&
>        !__check_agg_selection_timer(port)) {
>     if (port->aggregator->is_active)
>      port->sm_mux_state =
>
>------
>Thanks,
>Felix

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [bpf-next v3 PATCH 2/3] samples/bpf: make xdp_fwd more practically usable via devmap lookup
From: Toke Høiland-Jørgensen @ 2019-08-08 19:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528106270.22124.2563148023961869582.stgit@firesoul>

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out
> that the chosen egress index should be checked for existence in the
> devmap. This can now be done via taking advantage of Toke's work in
> commit 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF").
>
> This change makes xdp_fwd more practically usable, as this allows for
> a mixed environment, where IP-forwarding fallback to network stack, if
> the egress device isn't configured to use XDP.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Reviewed-by: David Ahern <dsahern@gmail.com>

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>

^ permalink raw reply

* Re: [bpf-next v3 PATCH 1/3] samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
From: Toke Høiland-Jørgensen @ 2019-08-08 19:16 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528105763.22124.16079929264188823516.stgit@firesoul>

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map
> which only have a single TX port. Change name to xdp_tx_ports
> to make it more descriptive.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Reviewed-by: David Ahern <dsahern@gmail.com>
> Acked-by: Yonghong Song <yhs@fb.com>


Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>

^ permalink raw reply

* [PATCH net-next 2/3] net: phy: add phy_modify_paged_changed
From: Heiner Kallweit @ 2019-08-08 19:04 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <ddbf28b9-f32e-7399-10a6-27b79ca0aaf9@gmail.com>

Add helper function phy_modify_paged_changed, behavior is the same
as for phy_modify_changed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy-core.c | 29 ++++++++++++++++++++++++-----
 include/linux/phy.h        |  2 ++
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 16667fbac..9ae3abb2d 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -783,24 +783,43 @@ int phy_write_paged(struct phy_device *phydev, int page, u32 regnum, u16 val)
 EXPORT_SYMBOL(phy_write_paged);
 
 /**
- * phy_modify_paged() - Convenience function for modifying a paged register
+ * phy_modify_paged_changed() - Function for modifying a paged register
  * @phydev: a pointer to a &struct phy_device
  * @page: the page for the phy
  * @regnum: register number
  * @mask: bit mask of bits to clear
  * @set: bit mask of bits to set
  *
- * Same rules as for phy_read() and phy_write().
+ * Returns negative errno, 0 if there was no change, and 1 in case of change
  */
-int phy_modify_paged(struct phy_device *phydev, int page, u32 regnum,
-		     u16 mask, u16 set)
+int phy_modify_paged_changed(struct phy_device *phydev, int page, u32 regnum,
+			     u16 mask, u16 set)
 {
 	int ret = 0, oldpage;
 
 	oldpage = phy_select_page(phydev, page);
 	if (oldpage >= 0)
-		ret = __phy_modify(phydev, regnum, mask, set);
+		ret = __phy_modify_changed(phydev, regnum, mask, set);
 
 	return phy_restore_page(phydev, oldpage, ret);
 }
+EXPORT_SYMBOL(phy_modify_paged_changed);
+
+/**
+ * phy_modify_paged() - Convenience function for modifying a paged register
+ * @phydev: a pointer to a &struct phy_device
+ * @page: the page for the phy
+ * @regnum: register number
+ * @mask: bit mask of bits to clear
+ * @set: bit mask of bits to set
+ *
+ * Same rules as for phy_read() and phy_write().
+ */
+int phy_modify_paged(struct phy_device *phydev, int page, u32 regnum,
+		     u16 mask, u16 set)
+{
+	int ret = phy_modify_paged_changed(phydev, page, regnum, mask, set);
+
+	return ret < 0 ? ret : 0;
+}
 EXPORT_SYMBOL(phy_modify_paged);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7117825ee..781f4810c 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -984,6 +984,8 @@ int phy_select_page(struct phy_device *phydev, int page);
 int phy_restore_page(struct phy_device *phydev, int oldpage, int ret);
 int phy_read_paged(struct phy_device *phydev, int page, u32 regnum);
 int phy_write_paged(struct phy_device *phydev, int page, u32 regnum, u16 val);
+int phy_modify_paged_changed(struct phy_device *phydev, int page, u32 regnum,
+			     u16 mask, u16 set);
 int phy_modify_paged(struct phy_device *phydev, int page, u32 regnum,
 		     u16 mask, u16 set);
 
-- 
2.22.0



^ permalink raw reply related

* [PATCH net-next 1/3] net: phy: prepare phylib to deal with PHY's extending Clause 22
From: Heiner Kallweit @ 2019-08-08 19:03 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <ddbf28b9-f32e-7399-10a6-27b79ca0aaf9@gmail.com>

The integrated PHY in 2.5Gbps chip RTL8125 is the first (known to me)
PHY that uses standard Clause 22 for all modes up to 1Gbps and adds
2.5Gbps control using vendor-specific registers. To use phylib for
the standard part little extensions are needed:
- Move most of genphy_config_aneg to a new function
  __genphy_config_aneg that takes a parameter whether restarting
  auto-negotiation is needed (depending on whether content of
  vendor-specific advertisement register changed).
- Don't clear phydev->lp_advertising in genphy_read_status so that
  we can set non-C22 mode flags before.

Basically both changes mimic the behavior of the equivalent Clause 45
functions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 35 +++++++++++++++--------------------
 include/linux/phy.h          |  8 +++++++-
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 7ddd91df9..bd7e7db8c 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1571,11 +1571,9 @@ static int genphy_config_advert(struct phy_device *phydev)
 	/* Only allow advertising what this PHY supports */
 	linkmode_and(phydev->advertising, phydev->advertising,
 		     phydev->supported);
-	if (!ethtool_convert_link_mode_to_legacy_u32(&advertise,
-						     phydev->advertising))
-		phydev_warn(phydev, "PHY advertising (%*pb) more modes than genphy supports, some modes not advertised.\n",
-			    __ETHTOOL_LINK_MODE_MASK_NBITS,
-			    phydev->advertising);
+
+	ethtool_convert_link_mode_to_legacy_u32(&advertise,
+						phydev->advertising);
 
 	/* Setup standard advertisement */
 	err = phy_modify_changed(phydev, MII_ADVERTISE,
@@ -1681,18 +1679,20 @@ int genphy_restart_aneg(struct phy_device *phydev)
 EXPORT_SYMBOL(genphy_restart_aneg);
 
 /**
- * genphy_config_aneg - restart auto-negotiation or write BMCR
+ * __genphy_config_aneg - restart auto-negotiation or write BMCR
  * @phydev: target phy_device struct
+ * @changed: whether autoneg is requested
  *
  * Description: If auto-negotiation is enabled, we configure the
  *   advertising, and then restart auto-negotiation.  If it is not
  *   enabled, then we write the BMCR.
  */
-int genphy_config_aneg(struct phy_device *phydev)
+int __genphy_config_aneg(struct phy_device *phydev, bool changed)
 {
-	int err, changed;
+	int err;
 
-	changed = genphy_config_eee_advert(phydev);
+	if (genphy_config_eee_advert(phydev))
+		changed = true;
 
 	if (AUTONEG_ENABLE != phydev->autoneg)
 		return genphy_setup_forced(phydev);
@@ -1700,10 +1700,10 @@ int genphy_config_aneg(struct phy_device *phydev)
 	err = genphy_config_advert(phydev);
 	if (err < 0) /* error */
 		return err;
+	else if (err)
+		changed = true;
 
-	changed |= err;
-
-	if (changed == 0) {
+	if (!changed) {
 		/* Advertisement hasn't changed, but maybe aneg was never on to
 		 * begin with?  Or maybe phy was isolated?
 		 */
@@ -1713,18 +1713,15 @@ int genphy_config_aneg(struct phy_device *phydev)
 			return ctl;
 
 		if (!(ctl & BMCR_ANENABLE) || (ctl & BMCR_ISOLATE))
-			changed = 1; /* do restart aneg */
+			changed = true; /* do restart aneg */
 	}
 
 	/* Only restart aneg if we are advertising something different
 	 * than we were before.
 	 */
-	if (changed > 0)
-		return genphy_restart_aneg(phydev);
-
-	return 0;
+	return changed ? genphy_restart_aneg(phydev) : 0;
 }
-EXPORT_SYMBOL(genphy_config_aneg);
+EXPORT_SYMBOL(__genphy_config_aneg);
 
 /**
  * genphy_aneg_done - return auto-negotiation status
@@ -1811,8 +1808,6 @@ int genphy_read_status(struct phy_device *phydev)
 	phydev->pause = 0;
 	phydev->asym_pause = 0;
 
-	linkmode_zero(phydev->lp_advertising);
-
 	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
 		if (phydev->is_gigabit_capable) {
 			lpagb = phy_read(phydev, MII_STAT1000);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73..7117825ee 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1069,7 +1069,7 @@ int genphy_read_abilities(struct phy_device *phydev);
 int genphy_setup_forced(struct phy_device *phydev);
 int genphy_restart_aneg(struct phy_device *phydev);
 int genphy_config_eee_advert(struct phy_device *phydev);
-int genphy_config_aneg(struct phy_device *phydev);
+int __genphy_config_aneg(struct phy_device *phydev, bool changed);
 int genphy_aneg_done(struct phy_device *phydev);
 int genphy_update_link(struct phy_device *phydev);
 int genphy_read_status(struct phy_device *phydev);
@@ -1077,6 +1077,12 @@ int genphy_suspend(struct phy_device *phydev);
 int genphy_resume(struct phy_device *phydev);
 int genphy_loopback(struct phy_device *phydev, bool enable);
 int genphy_soft_reset(struct phy_device *phydev);
+
+static inline int genphy_config_aneg(struct phy_device *phydev)
+{
+	return __genphy_config_aneg(phydev, false);
+}
+
 static inline int genphy_no_soft_reset(struct phy_device *phydev)
 {
 	return 0;
-- 
2.22.0



^ permalink raw reply related

* [PATCH net-next 3/3] net: phy: realtek: add support for the 2.5Gbps PHY in RTL8125
From: Heiner Kallweit @ 2019-08-08 19:05 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <ddbf28b9-f32e-7399-10a6-27b79ca0aaf9@gmail.com>

This adds support for the integrated 2.5Gbps PHY in Realtek RTL8125.
Advertisement of 2.5Gbps mode is done via a vendor-specific register.
Same applies to reading NBase-T link partner advertisement.
Unfortunately this 2.5Gbps PHY shares the PHY ID with the integrated
1Gbps PHY's in other Realtek network chips and so far no method is
known to differentiate them. This means we can't autodetect 2.5Gbps
support and the network driver has to add 2.5Gbps to the supported
and advertised modes.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/realtek.c | 48 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index a669945eb..35c981476 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -39,6 +39,11 @@
 #define RTL8366RB_POWER_SAVE			0x15
 #define RTL8366RB_POWER_SAVE_ON			BIT(12)
 
+#define RTL_ADV_2500FULL			BIT(7)
+#define RTL_LPADV_10000FULL			BIT(11)
+#define RTL_LPADV_5000FULL			BIT(6)
+#define RTL_LPADV_2500FULL			BIT(5)
+
 MODULE_DESCRIPTION("Realtek PHY driver");
 MODULE_AUTHOR("Johnson Leung");
 MODULE_LICENSE("GPL");
@@ -256,6 +261,47 @@ static int rtl8366rb_config_init(struct phy_device *phydev)
 	return ret;
 }
 
+static int rtlgen_config_aneg(struct phy_device *phydev)
+{
+	int ret = 0;
+
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+	    phydev->supported) && phydev->autoneg == AUTONEG_ENABLE) {
+		u16 adv2500 = 0;
+
+		if (linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+				      phydev->advertising))
+			adv2500 = RTL_ADV_2500FULL;
+
+		ret = phy_modify_paged_changed(phydev, 0xa5d, 0x12,
+					       RTL_ADV_2500FULL, adv2500);
+		if (ret < 0)
+			return ret;
+	}
+
+	return __genphy_config_aneg(phydev, ret);
+}
+
+static int rtlgen_read_status(struct phy_device *phydev)
+{
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+	    phydev->supported) && phydev->autoneg == AUTONEG_ENABLE) {
+		int lpadv = phy_read_paged(phydev, 0xa5d, 0x13);
+
+		if (lpadv < 0)
+			return lpadv;
+
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_10000baseT_Full_BIT,
+			phydev->lp_advertising, lpadv & RTL_LPADV_10000FULL);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
+			phydev->lp_advertising, lpadv & RTL_LPADV_5000FULL);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+			phydev->lp_advertising, lpadv & RTL_LPADV_2500FULL);
+	}
+
+	return genphy_read_status(phydev);
+}
+
 static struct phy_driver realtek_drvs[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x00008201),
@@ -328,6 +374,8 @@ static struct phy_driver realtek_drvs[] = {
 	}, {
 		PHY_ID_MATCH_EXACT(0x001cc800),
 		.name		= "Generic Realtek PHY",
+		.config_aneg	= rtlgen_config_aneg,
+		.read_status	= rtlgen_read_status,
 		.suspend	= genphy_suspend,
 		.resume		= genphy_resume,
 		.read_page	= rtl821x_read_page,
-- 
2.22.0



^ permalink raw reply related

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Tao Ren @ 2019-08-08 19:02 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Samuel Mendoza-Jonas, David S . Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org, William Kennington, Joel Stanley
In-Reply-To: <20190808133209.GB32706@lunn.ch>

Hi Andrew,

On 8/8/19 6:32 AM, Andrew Lunn wrote:
>> Let me prepare patch v2 using device tree. I'm not sure if standard
>> "mac-address" fits this situation because all we need is an offset
>> (integer) and BMC MAC is calculated by adding the offset to NIC's
>> MAC address. Anyways, let me work out v2 patch we can discuss more
>> then.
> 
> Hi Tao
> 
> I don't know BMC terminology. By NICs MAC address, you are referring
> to the hosts MAC address? The MAC address the big CPU is using for its
> interface?  Where does this NIC get its MAC address from? If the BMCs
> bootloader has access to it, it can set the mac-address property in
> the device tree.

Sorry for the confusion and let me clarify more:

The NIC here refers to the Network controller which provide network connectivity for both BMC (via NC-SI) and Host (for example, via PCIe).

On Facebook Yamp BMC, BMC sends NCSI_OEM_GET_MAC command (as an ethernet packet) to the Network Controller while bringing up eth0, and the (Broadcom) Network Controller replies with the Base MAC Address reserved for the platform. As for Yamp, Base-MAC and Base-MAC+1 are used by Host (big CPU) and Base-MAC+2 are assigned to BMC. In my opinion, Base MAC and MAC address assignments are controlled by Network Controller, which is transparent to both BMC and Host.

I'm not sure if I understand your suggestion correctly: do you mean we should move the logic (GET_MAC from Network Controller, adding offset and configuring BMC MAC) from kernel to boot loader?

Sam posted several ncsi patches for u-boot recently. Sam, do we support the work (implemented in this patch) in uboot? Or are we planning to do so?


Thanks,

Tao

^ permalink raw reply

* [PATCH net-next 0/3] net: phy: realtek: add support for integrated 2.5Gbps PHY in RTL8125
From: Heiner Kallweit @ 2019-08-08 19:02 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

This series adds support for the integrated 2.5Gbps PHY in RTL8125.
First two patches add necessary functionality to phylib.

Heiner Kallweit (3):
  net: phy: prepare phylib to deal with PHY's extending Clause 22
  net: phy: add phy_modify_paged_changed
  net: phy: realtek: add support for the 2.5Gbps PHY in RTL8125

 drivers/net/phy/phy-core.c   | 29 ++++++++++++++++++----
 drivers/net/phy/phy_device.c | 35 +++++++++++---------------
 drivers/net/phy/realtek.c    | 48 ++++++++++++++++++++++++++++++++++++
 include/linux/phy.h          | 10 +++++++-
 4 files changed, 96 insertions(+), 26 deletions(-)

-- 
2.22.0


^ permalink raw reply

* Re: [PATCH net-next v1 1/8] netfilter: inlined four headers files into another one.
From: Jozsef Kadlecsik @ 2019-08-08 18:49 UTC (permalink / raw)
  To: Jeremy Sowden
  Cc: Pablo Neira Ayuso, Netfilter Devel, Net Dev, Masahiro Yamada
In-Reply-To: <20190807141705.4864-2-jeremy@azazel.net>

Hi Jeremy,

On Wed, 7 Aug 2019, Jeremy Sowden wrote:

> linux/netfilter/ipset/ip_set.h included four other header files:
> 
>   include/linux/netfilter/ipset/ip_set_comment.h
>   include/linux/netfilter/ipset/ip_set_counter.h
>   include/linux/netfilter/ipset/ip_set_skbinfo.h
>   include/linux/netfilter/ipset/ip_set_timeout.h
> 
> Of these the first three were not included anywhere else.  The last,
> ip_set_timeout.h, was included in a couple of other places, but defined
> inline functions which call other inline functions defined in ip_set.h,
> so ip_set.h had to be included before it.
> 
> Inlined all four into ip_set.h, and updated the other files that
> included ip_set_timeout.h.
> 
> Signed-off-by: Jeremy Sowden <jeremy@azazel.net>

Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>

Also, for the ipset part of patch 2/8, thank!

Best regards,
Jozsef

> ---
>  include/linux/netfilter/ipset/ip_set.h        | 238 +++++++++++++++++-
>  .../linux/netfilter/ipset/ip_set_comment.h    |  73 ------
>  .../linux/netfilter/ipset/ip_set_counter.h    |  84 -------
>  .../linux/netfilter/ipset/ip_set_skbinfo.h    |  42 ----
>  .../linux/netfilter/ipset/ip_set_timeout.h    |  77 ------
>  net/netfilter/ipset/ip_set_hash_gen.h         |   2 +-
>  net/netfilter/xt_set.c                        |   1 -
>  7 files changed, 235 insertions(+), 282 deletions(-)
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_comment.h
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_counter.h
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_skbinfo.h
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_timeout.h
> 
> diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h
> index 12ad9b1853b4..9bc255a8461b 100644
> --- a/include/linux/netfilter/ipset/ip_set.h
> +++ b/include/linux/netfilter/ipset/ip_set.h
> @@ -452,10 +452,240 @@ bitmap_bytes(u32 a, u32 b)
>  	return 4 * ((((b - a + 8) / 8) + 3) / 4);
>  }
>  
> -#include <linux/netfilter/ipset/ip_set_timeout.h>
> -#include <linux/netfilter/ipset/ip_set_comment.h>
> -#include <linux/netfilter/ipset/ip_set_counter.h>
> -#include <linux/netfilter/ipset/ip_set_skbinfo.h>
> +/* How often should the gc be run by default */
> +#define IPSET_GC_TIME			(3 * 60)
> +
> +/* Timeout period depending on the timeout value of the given set */
> +#define IPSET_GC_PERIOD(timeout) \
> +	((timeout/3) ? min_t(u32, (timeout)/3, IPSET_GC_TIME) : 1)
> +
> +/* Entry is set with no timeout value */
> +#define IPSET_ELEM_PERMANENT	0
> +
> +/* Set is defined with timeout support: timeout value may be 0 */
> +#define IPSET_NO_TIMEOUT	UINT_MAX
> +
> +/* Max timeout value, see msecs_to_jiffies() in jiffies.h */
> +#define IPSET_MAX_TIMEOUT	(UINT_MAX >> 1)/MSEC_PER_SEC
> +
> +#define ip_set_adt_opt_timeout(opt, set)	\
> +((opt)->ext.timeout != IPSET_NO_TIMEOUT ? (opt)->ext.timeout : (set)->timeout)
> +
> +static inline unsigned int
> +ip_set_timeout_uget(struct nlattr *tb)
> +{
> +	unsigned int timeout = ip_set_get_h32(tb);
> +
> +	/* Normalize to fit into jiffies */
> +	if (timeout > IPSET_MAX_TIMEOUT)
> +		timeout = IPSET_MAX_TIMEOUT;
> +
> +	return timeout;
> +}
> +
> +static inline bool
> +ip_set_timeout_expired(const unsigned long *t)
> +{
> +	return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t);
> +}
> +
> +static inline void
> +ip_set_timeout_set(unsigned long *timeout, u32 value)
> +{
> +	unsigned long t;
> +
> +	if (!value) {
> +		*timeout = IPSET_ELEM_PERMANENT;
> +		return;
> +	}
> +
> +	t = msecs_to_jiffies(value * MSEC_PER_SEC) + jiffies;
> +	if (t == IPSET_ELEM_PERMANENT)
> +		/* Bingo! :-) */
> +		t--;
> +	*timeout = t;
> +}
> +
> +static inline u32
> +ip_set_timeout_get(const unsigned long *timeout)
> +{
> +	u32 t;
> +
> +	if (*timeout == IPSET_ELEM_PERMANENT)
> +		return 0;
> +
> +	t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC;
> +	/* Zero value in userspace means no timeout */
> +	return t == 0 ? 1 : t;
> +}
> +
> +static inline char*
> +ip_set_comment_uget(struct nlattr *tb)
> +{
> +	return nla_data(tb);
> +}
> +
> +/* Called from uadd only, protected by the set spinlock.
> + * The kadt functions don't use the comment extensions in any way.
> + */
> +static inline void
> +ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
> +		    const struct ip_set_ext *ext)
> +{
> +	struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
> +	size_t len = ext->comment ? strlen(ext->comment) : 0;
> +
> +	if (unlikely(c)) {
> +		set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> +		kfree_rcu(c, rcu);
> +		rcu_assign_pointer(comment->c, NULL);
> +	}
> +	if (!len)
> +		return;
> +	if (unlikely(len > IPSET_MAX_COMMENT_SIZE))
> +		len = IPSET_MAX_COMMENT_SIZE;
> +	c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
> +	if (unlikely(!c))
> +		return;
> +	strlcpy(c->str, ext->comment, len + 1);
> +	set->ext_size += sizeof(*c) + strlen(c->str) + 1;
> +	rcu_assign_pointer(comment->c, c);
> +}
> +
> +/* Used only when dumping a set, protected by rcu_read_lock() */
> +static inline int
> +ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
> +{
> +	struct ip_set_comment_rcu *c = rcu_dereference(comment->c);
> +
> +	if (!c)
> +		return 0;
> +	return nla_put_string(skb, IPSET_ATTR_COMMENT, c->str);
> +}
> +
> +/* Called from uadd/udel, flush or the garbage collectors protected
> + * by the set spinlock.
> + * Called when the set is destroyed and when there can't be any user
> + * of the set data anymore.
> + */
> +static inline void
> +ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment)
> +{
> +	struct ip_set_comment_rcu *c;
> +
> +	c = rcu_dereference_protected(comment->c, 1);
> +	if (unlikely(!c))
> +		return;
> +	set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> +	kfree_rcu(c, rcu);
> +	rcu_assign_pointer(comment->c, NULL);
> +}
> +
> +static inline void
> +ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter)
> +{
> +	atomic64_add((long long)bytes, &(counter)->bytes);
> +}
> +
> +static inline void
> +ip_set_add_packets(u64 packets, struct ip_set_counter *counter)
> +{
> +	atomic64_add((long long)packets, &(counter)->packets);
> +}
> +
> +static inline u64
> +ip_set_get_bytes(const struct ip_set_counter *counter)
> +{
> +	return (u64)atomic64_read(&(counter)->bytes);
> +}
> +
> +static inline u64
> +ip_set_get_packets(const struct ip_set_counter *counter)
> +{
> +	return (u64)atomic64_read(&(counter)->packets);
> +}
> +
> +static inline bool
> +ip_set_match_counter(u64 counter, u64 match, u8 op)
> +{
> +	switch (op) {
> +	case IPSET_COUNTER_NONE:
> +		return true;
> +	case IPSET_COUNTER_EQ:
> +		return counter == match;
> +	case IPSET_COUNTER_NE:
> +		return counter != match;
> +	case IPSET_COUNTER_LT:
> +		return counter < match;
> +	case IPSET_COUNTER_GT:
> +		return counter > match;
> +	}
> +	return false;
> +}
> +
> +static inline void
> +ip_set_update_counter(struct ip_set_counter *counter,
> +		      const struct ip_set_ext *ext, u32 flags)
> +{
> +	if (ext->packets != ULLONG_MAX &&
> +	    !(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) {
> +		ip_set_add_bytes(ext->bytes, counter);
> +		ip_set_add_packets(ext->packets, counter);
> +	}
> +}
> +
> +static inline bool
> +ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
> +{
> +	return nla_put_net64(skb, IPSET_ATTR_BYTES,
> +			     cpu_to_be64(ip_set_get_bytes(counter)),
> +			     IPSET_ATTR_PAD) ||
> +	       nla_put_net64(skb, IPSET_ATTR_PACKETS,
> +			     cpu_to_be64(ip_set_get_packets(counter)),
> +			     IPSET_ATTR_PAD);
> +}
> +
> +static inline void
> +ip_set_init_counter(struct ip_set_counter *counter,
> +		    const struct ip_set_ext *ext)
> +{
> +	if (ext->bytes != ULLONG_MAX)
> +		atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
> +	if (ext->packets != ULLONG_MAX)
> +		atomic64_set(&(counter)->packets, (long long)(ext->packets));
> +}
> +
> +static inline void
> +ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
> +		   const struct ip_set_ext *ext,
> +		   struct ip_set_ext *mext, u32 flags)
> +{
> +	mext->skbinfo = *skbinfo;
> +}
> +
> +static inline bool
> +ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
> +{
> +	/* Send nonzero parameters only */
> +	return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
> +		nla_put_net64(skb, IPSET_ATTR_SKBMARK,
> +			      cpu_to_be64((u64)skbinfo->skbmark << 32 |
> +					  skbinfo->skbmarkmask),
> +			      IPSET_ATTR_PAD)) ||
> +	       (skbinfo->skbprio &&
> +		nla_put_net32(skb, IPSET_ATTR_SKBPRIO,
> +			      cpu_to_be32(skbinfo->skbprio))) ||
> +	       (skbinfo->skbqueue &&
> +		nla_put_net16(skb, IPSET_ATTR_SKBQUEUE,
> +			      cpu_to_be16(skbinfo->skbqueue)));
> +}
> +
> +static inline void
> +ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
> +		    const struct ip_set_ext *ext)
> +{
> +	*skbinfo = ext->skbinfo;
> +}
>  
>  #define IP_SET_INIT_KEXT(skb, opt, set)			\
>  	{ .bytes = (skb)->len, .packets = 1,		\
> diff --git a/include/linux/netfilter/ipset/ip_set_comment.h b/include/linux/netfilter/ipset/ip_set_comment.h
> deleted file mode 100644
> index 0b894d81bbf2..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_comment.h
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_COMMENT_H
> -#define _IP_SET_COMMENT_H
> -
> -/* Copyright (C) 2013 Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
> - */
> -
> -#ifdef __KERNEL__
> -
> -static inline char*
> -ip_set_comment_uget(struct nlattr *tb)
> -{
> -	return nla_data(tb);
> -}
> -
> -/* Called from uadd only, protected by the set spinlock.
> - * The kadt functions don't use the comment extensions in any way.
> - */
> -static inline void
> -ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
> -		    const struct ip_set_ext *ext)
> -{
> -	struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
> -	size_t len = ext->comment ? strlen(ext->comment) : 0;
> -
> -	if (unlikely(c)) {
> -		set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> -		kfree_rcu(c, rcu);
> -		rcu_assign_pointer(comment->c, NULL);
> -	}
> -	if (!len)
> -		return;
> -	if (unlikely(len > IPSET_MAX_COMMENT_SIZE))
> -		len = IPSET_MAX_COMMENT_SIZE;
> -	c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
> -	if (unlikely(!c))
> -		return;
> -	strlcpy(c->str, ext->comment, len + 1);
> -	set->ext_size += sizeof(*c) + strlen(c->str) + 1;
> -	rcu_assign_pointer(comment->c, c);
> -}
> -
> -/* Used only when dumping a set, protected by rcu_read_lock() */
> -static inline int
> -ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
> -{
> -	struct ip_set_comment_rcu *c = rcu_dereference(comment->c);
> -
> -	if (!c)
> -		return 0;
> -	return nla_put_string(skb, IPSET_ATTR_COMMENT, c->str);
> -}
> -
> -/* Called from uadd/udel, flush or the garbage collectors protected
> - * by the set spinlock.
> - * Called when the set is destroyed and when there can't be any user
> - * of the set data anymore.
> - */
> -static inline void
> -ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment)
> -{
> -	struct ip_set_comment_rcu *c;
> -
> -	c = rcu_dereference_protected(comment->c, 1);
> -	if (unlikely(!c))
> -		return;
> -	set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
> -	kfree_rcu(c, rcu);
> -	rcu_assign_pointer(comment->c, NULL);
> -}
> -
> -#endif
> -#endif
> diff --git a/include/linux/netfilter/ipset/ip_set_counter.h b/include/linux/netfilter/ipset/ip_set_counter.h
> deleted file mode 100644
> index 3400958c07be..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_counter.h
> +++ /dev/null
> @@ -1,84 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_COUNTER_H
> -#define _IP_SET_COUNTER_H
> -
> -/* Copyright (C) 2015 Jozsef Kadlecsik <kadlec@netfilter.org> */
> -
> -#ifdef __KERNEL__
> -
> -static inline void
> -ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter)
> -{
> -	atomic64_add((long long)bytes, &(counter)->bytes);
> -}
> -
> -static inline void
> -ip_set_add_packets(u64 packets, struct ip_set_counter *counter)
> -{
> -	atomic64_add((long long)packets, &(counter)->packets);
> -}
> -
> -static inline u64
> -ip_set_get_bytes(const struct ip_set_counter *counter)
> -{
> -	return (u64)atomic64_read(&(counter)->bytes);
> -}
> -
> -static inline u64
> -ip_set_get_packets(const struct ip_set_counter *counter)
> -{
> -	return (u64)atomic64_read(&(counter)->packets);
> -}
> -
> -static inline bool
> -ip_set_match_counter(u64 counter, u64 match, u8 op)
> -{
> -	switch (op) {
> -	case IPSET_COUNTER_NONE:
> -		return true;
> -	case IPSET_COUNTER_EQ:
> -		return counter == match;
> -	case IPSET_COUNTER_NE:
> -		return counter != match;
> -	case IPSET_COUNTER_LT:
> -		return counter < match;
> -	case IPSET_COUNTER_GT:
> -		return counter > match;
> -	}
> -	return false;
> -}
> -
> -static inline void
> -ip_set_update_counter(struct ip_set_counter *counter,
> -		      const struct ip_set_ext *ext, u32 flags)
> -{
> -	if (ext->packets != ULLONG_MAX &&
> -	    !(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) {
> -		ip_set_add_bytes(ext->bytes, counter);
> -		ip_set_add_packets(ext->packets, counter);
> -	}
> -}
> -
> -static inline bool
> -ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
> -{
> -	return nla_put_net64(skb, IPSET_ATTR_BYTES,
> -			     cpu_to_be64(ip_set_get_bytes(counter)),
> -			     IPSET_ATTR_PAD) ||
> -	       nla_put_net64(skb, IPSET_ATTR_PACKETS,
> -			     cpu_to_be64(ip_set_get_packets(counter)),
> -			     IPSET_ATTR_PAD);
> -}
> -
> -static inline void
> -ip_set_init_counter(struct ip_set_counter *counter,
> -		    const struct ip_set_ext *ext)
> -{
> -	if (ext->bytes != ULLONG_MAX)
> -		atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
> -	if (ext->packets != ULLONG_MAX)
> -		atomic64_set(&(counter)->packets, (long long)(ext->packets));
> -}
> -
> -#endif /* __KERNEL__ */
> -#endif /* _IP_SET_COUNTER_H */
> diff --git a/include/linux/netfilter/ipset/ip_set_skbinfo.h b/include/linux/netfilter/ipset/ip_set_skbinfo.h
> deleted file mode 100644
> index 3a2df02dbd55..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_skbinfo.h
> +++ /dev/null
> @@ -1,42 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_SKBINFO_H
> -#define _IP_SET_SKBINFO_H
> -
> -/* Copyright (C) 2015 Jozsef Kadlecsik <kadlec@netfilter.org> */
> -
> -#ifdef __KERNEL__
> -
> -static inline void
> -ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
> -		   const struct ip_set_ext *ext,
> -		   struct ip_set_ext *mext, u32 flags)
> -{
> -	mext->skbinfo = *skbinfo;
> -}
> -
> -static inline bool
> -ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
> -{
> -	/* Send nonzero parameters only */
> -	return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
> -		nla_put_net64(skb, IPSET_ATTR_SKBMARK,
> -			      cpu_to_be64((u64)skbinfo->skbmark << 32 |
> -					  skbinfo->skbmarkmask),
> -			      IPSET_ATTR_PAD)) ||
> -	       (skbinfo->skbprio &&
> -		nla_put_net32(skb, IPSET_ATTR_SKBPRIO,
> -			      cpu_to_be32(skbinfo->skbprio))) ||
> -	       (skbinfo->skbqueue &&
> -		nla_put_net16(skb, IPSET_ATTR_SKBQUEUE,
> -			      cpu_to_be16(skbinfo->skbqueue)));
> -}
> -
> -static inline void
> -ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
> -		    const struct ip_set_ext *ext)
> -{
> -	*skbinfo = ext->skbinfo;
> -}
> -
> -#endif /* __KERNEL__ */
> -#endif /* _IP_SET_SKBINFO_H */
> diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h b/include/linux/netfilter/ipset/ip_set_timeout.h
> deleted file mode 100644
> index 2be60e379ecf..000000000000
> --- a/include/linux/netfilter/ipset/ip_set_timeout.h
> +++ /dev/null
> @@ -1,77 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -#ifndef _IP_SET_TIMEOUT_H
> -#define _IP_SET_TIMEOUT_H
> -
> -/* Copyright (C) 2003-2013 Jozsef Kadlecsik <kadlec@netfilter.org> */
> -
> -#ifdef __KERNEL__
> -
> -/* How often should the gc be run by default */
> -#define IPSET_GC_TIME			(3 * 60)
> -
> -/* Timeout period depending on the timeout value of the given set */
> -#define IPSET_GC_PERIOD(timeout) \
> -	((timeout/3) ? min_t(u32, (timeout)/3, IPSET_GC_TIME) : 1)
> -
> -/* Entry is set with no timeout value */
> -#define IPSET_ELEM_PERMANENT	0
> -
> -/* Set is defined with timeout support: timeout value may be 0 */
> -#define IPSET_NO_TIMEOUT	UINT_MAX
> -
> -/* Max timeout value, see msecs_to_jiffies() in jiffies.h */
> -#define IPSET_MAX_TIMEOUT	(UINT_MAX >> 1)/MSEC_PER_SEC
> -
> -#define ip_set_adt_opt_timeout(opt, set)	\
> -((opt)->ext.timeout != IPSET_NO_TIMEOUT ? (opt)->ext.timeout : (set)->timeout)
> -
> -static inline unsigned int
> -ip_set_timeout_uget(struct nlattr *tb)
> -{
> -	unsigned int timeout = ip_set_get_h32(tb);
> -
> -	/* Normalize to fit into jiffies */
> -	if (timeout > IPSET_MAX_TIMEOUT)
> -		timeout = IPSET_MAX_TIMEOUT;
> -
> -	return timeout;
> -}
> -
> -static inline bool
> -ip_set_timeout_expired(const unsigned long *t)
> -{
> -	return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t);
> -}
> -
> -static inline void
> -ip_set_timeout_set(unsigned long *timeout, u32 value)
> -{
> -	unsigned long t;
> -
> -	if (!value) {
> -		*timeout = IPSET_ELEM_PERMANENT;
> -		return;
> -	}
> -
> -	t = msecs_to_jiffies(value * MSEC_PER_SEC) + jiffies;
> -	if (t == IPSET_ELEM_PERMANENT)
> -		/* Bingo! :-) */
> -		t--;
> -	*timeout = t;
> -}
> -
> -static inline u32
> -ip_set_timeout_get(const unsigned long *timeout)
> -{
> -	u32 t;
> -
> -	if (*timeout == IPSET_ELEM_PERMANENT)
> -		return 0;
> -
> -	t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC;
> -	/* Zero value in userspace means no timeout */
> -	return t == 0 ? 1 : t;
> -}
> -
> -#endif	/* __KERNEL__ */
> -#endif /* _IP_SET_TIMEOUT_H */
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> index 0feb77fa9edc..2e541cb3b37d 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -7,7 +7,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/jhash.h>
>  #include <linux/types.h>
> -#include <linux/netfilter/ipset/ip_set_timeout.h>
> +#include <linux/netfilter/ipset/ip_set.h>
>  
>  #define __ipset_dereference_protected(p, c)	rcu_dereference_protected(p, c)
>  #define ipset_dereference_protected(p, set) \
> diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
> index ecbfa291fb70..731bc2cafae4 100644
> --- a/net/netfilter/xt_set.c
> +++ b/net/netfilter/xt_set.c
> @@ -14,7 +14,6 @@
>  
>  #include <linux/netfilter/x_tables.h>
>  #include <linux/netfilter/ipset/ip_set.h>
> -#include <linux/netfilter/ipset/ip_set_timeout.h>
>  #include <uapi/linux/netfilter/xt_set.h>
>  
>  MODULE_LICENSE("GPL");
> -- 
> 2.20.1
> 
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: [PATCH bpf 2/2] tools: bpftool: add error message on pin failure
From: Andrii Nakryiko @ 2019-08-08 18:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, oss-drivers,
	Andy Lutomirski, Quentin Monnet
In-Reply-To: <20190807001923.19483-3-jakub.kicinski@netronome.com>

On Tue, Aug 6, 2019 at 5:21 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> No error message is currently printed if the pin syscall
> itself fails. It got lost in the loadall refactoring.
>
> Fixes: 77380998d91d ("bpftool: add loadall command")
> Reported-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

> CC: luto@kernel.org, sdf@google.com
>
>  tools/bpf/bpftool/common.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index c52a6ffb8949..6a71324be628 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -204,7 +204,11 @@ int do_pin_fd(int fd, const char *name)
>         if (err)
>                 return err;
>
> -       return bpf_obj_pin(fd, name);
> +       err = bpf_obj_pin(fd, name);
> +       if (err)
> +               p_err("can't pin the object (%s): %s", name, strerror(errno));
> +
> +       return err;
>  }
>
>  int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
> --
> 2.21.0
>

^ permalink raw reply

* Re: [PATCH] liquidio: Use pcie_flr() instead of reimplementing it
From: Bjorn Helgaas @ 2019-08-08 18:48 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Bjorn Helgaas, Derek Chickles, Satanand Burla, Felix Manlunas,
	netdev, linux-pci, linux-kernel
In-Reply-To: <20190808045753.5474-1-efremov@linux.com>

On Thu, Aug 08, 2019 at 07:57:53AM +0300, Denis Efremov wrote:
> octeon_mbox_process_cmd() directly writes the PCI_EXP_DEVCTL_BCR_FLR
> bit, which bypasses timing requirements imposed by the PCIe spec.
> This patch fixes the function to use the pcie_flr() interface instead.
> 
> Signed-off-by: Denis Efremov <efremov@linux.com>

Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>

Thanks for doing this, Denis.  When possible it's better to use a PCI
core interface than to fiddle with PCI config space directly from a
driver.

> ---
>  drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> index 021d99cd1665..614d07be7181 100644
> --- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
> @@ -260,9 +260,7 @@ static int octeon_mbox_process_cmd(struct octeon_mbox *mbox,
>  		dev_info(&oct->pci_dev->dev,
>  			 "got a request for FLR from VF that owns DPI ring %u\n",
>  			 mbox->q_no);
> -		pcie_capability_set_word(
> -			oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no],
> -			PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
> +		pcie_flr(oct->sriov_info.dpiring_to_vfpcidev_lut[mbox->q_no]);
>  		break;
>  
>  	case OCTEON_PF_CHANGED_VF_MACADDR:
> -- 
> 2.21.0
> 

^ permalink raw reply

* Re: [PATCH 1/1] bpf: introduce new helper udp_flow_src_port
From: Andrii Nakryiko @ 2019-08-08 18:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Y Song, Alexei Starovoitov, Farid Zakaria, Daniel Borkmann,
	netdev, bpf
In-Reply-To: <20190805171036.5a5bf790@cakuba.netronome.com>

On Mon, Aug 5, 2019 at 5:11 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Sat, 3 Aug 2019 23:52:16 -0700, Y Song wrote:
> > >  include/uapi/linux/bpf.h                      | 21 +++++++--
> > >  net/core/filter.c                             | 20 ++++++++
> > >  tools/include/uapi/linux/bpf.h                | 21 +++++++--
> > >  tools/testing/selftests/bpf/bpf_helpers.h     |  2 +
> > >  .../bpf/prog_tests/udp_flow_src_port.c        | 28 +++++++++++
> > >  .../bpf/progs/test_udp_flow_src_port_kern.c   | 47 +++++++++++++++++++
> > >  6 files changed, 131 insertions(+), 8 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/udp_flow_src_port.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_udp_flow_src_port_kern.c
> >
> > First, for each review, backport and sync with libbpf repo, in the future,
> > could you break the patch to two patches?
> >    1. kernel changes (net/core/filter.c, include/uapi/linux/bpf.h)
> >    2. tools/include/uapi/linux/bpf.h
> >    3. tools/testing/ changes
>
> A lot of people get caught off by this, could explain why this is
> necessary?

We are using script [0] to sync libbpf sources from linux repo to
Github. It does a lot of things to make this happen, given that Github
structure is not a simple copy/move into subdirectory. Instead it does
a bunch of cherry-picking and tree rewrites, so when there are patches
that touched both libbpf sources (including those tools/include/...
files) and some sources that we don't sync (e.g., just include/...),
then script/git gets confused which breaks the flow and requires more
manual work. Which is why we are asking to split those changes. Hope
this helps to clarify.

  [0] https://github.com/libbpf/libbpf/blob/master/scripts/sync-kernel.sh

>
> git can deal with this scenario without missing a step, format-patch
> takes paths:
>
> $ git show --oneline -s
> 1002f3e955d7 (HEAD) bpf: introduce new helper udp_flow_src_port
>
> $ git format-patch HEAD~ -- tools/include/uapi/linux/bpf.h
> 0001-bpf-introduce-new-helper-udp_flow_src_port.patch
>
> $ grep -B1 changed 0001-bpf-introduce-new-helper-udp_flow_src_port.patch
>  tools/include/uapi/linux/bpf.h | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> $ cd ../libbpf
> $ git am -p2 ../linux/0001-bpf-introduce-new-helper-udp_flow_src_port.patch
> Applying: bpf: introduce new helper udp_flow_src_port
> error: patch failed: include/uapi/linux/bpf.h:2853
> error: include/uapi/linux/bpf.h: patch does not apply
> ...
>
> Well, the patch doesn't apply to libbpf right now, but git finds the
> right paths and all that.
>
> IMO it'd be good to not have this artificial process obstacle and all
> the "sync headers" commits in the tree.

It might be the case that script can be written in some different way
to bypass this limitation, but someone has to dedicate time to write
it and test it. Feel free to contribute.

^ permalink raw reply

* Re: [PATCH net-next 5/5] r8152: change rx_frag_head_sz and rx_max_agg_num dynamically
From: Jakub Kicinski @ 2019-08-08 18:43 UTC (permalink / raw)
  To: Hayes Wang
  Cc: Maciej Fijalkowski, netdev@vger.kernel.org, nic_swsd,
	linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18D0F3F@RTITMBSVM03.realtek.com.tw>

On Thu, 8 Aug 2019 12:16:50 +0000, Hayes Wang wrote:
> Maciej Fijalkowski [mailto:maciejromanfijalkowski@gmail.com]
> > Sent: Thursday, August 08, 2019 7:50 PM  
> > > Excuse me again.
> > > I find the kernel supports the copybreak of Ethtool.
> > > However, I couldn't find a command of Ethtool to use it.  
> > 
> > Ummm there's set_tunable ops. Amazon's ena driver is making use of it from
> > what
> > I see. Look at ena_set_tunable() in
> > drivers/net/ethernet/amazon/ena/ena_ethtool.c.  
> 
> The kernel could support it. And I has finished it.
> However, when I want to test it by ethtool, I couldn't find suitable command.
> I couldn't find relative feature in the source code of ethtool, either.

It's possible it's not implemented in the user space tool 🤔

Looks like it got posted here:

https://www.spinics.net/lists/netdev/msg299877.html

But perhaps never finished? 

It should be fairly straightforward to implement by looking at how
phy-tunables are handled.

^ permalink raw reply

* RE: [PATCH v3 1/1] ixgbe: sync the first fragment unconditionally
From: Bowers, AndrewX @ 2019-08-08 18:42 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190808040312.21719-1-firo.yang@suse.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Firo Yang
> Sent: Wednesday, August 7, 2019 9:04 PM
> To: netdev@vger.kernel.org
> Cc: maciejromanfijalkowski@gmail.com; Firo Yang <firo.yang@suse.com>;
> linux-kernel@vger.kernel.org; intel-wired-lan@lists.osuosl.org;
> jian.w.wen@oracle.com; alexander.h.duyck@linux.intel.com;
> davem@davemloft.net
> Subject: [Intel-wired-lan] [PATCH v3 1/1] ixgbe: sync the first fragment
> unconditionally
> 
> In Xen environment, if Xen-swiotlb is enabled, ixgbe driver could possibly
> allocate a page, DMA memory buffer, for the first fragment which is not
> suitable for Xen-swiotlb to do DMA operations.
> Xen-swiotlb have to internally allocate another page for doing DMA
> operations. This mechanism requires syncing the data from the internal page
> to the page which ixgbe sends to upper network stack. However, since
> commit f3213d932173 ("ixgbe: Update driver to make use of DMA attributes
> in Rx path"), the unmap operation is performed with
> DMA_ATTR_SKIP_CPU_SYNC. As a result, the sync is not performed.
> Since the sync isn't performed, the upper network stack could receive a
> incomplete network packet. By incomplete, it means the linear data on the
> first fragment(between skb->head and skb->end) is invalid. So we have to
> copy the data from the internal xen-swiotlb page to the page which ixgbe
> sends to upper network stack through the sync operation.
> 
> More details from Alexander Duyck:
> Specifically since we are mapping the frame with
> DMA_ATTR_SKIP_CPU_SYNC we have to unmap with that as well. As a result
> a sync is not performed on an unmap and must be done manually as we
> skipped it for the first frag. As such we need to always sync before possibly
> performing a page unmap operation.
> 
> Fixes: f3213d932173 ("ixgbe: Update driver to make use of DMA attributes in
> Rx path")
> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Signed-off-by: Firo Yang <firo.yang@suse.com>
> ---
> Changes from v2:
>  * Added details on the problem caused by skipping the sync.
>  * Added more explanation from Alexander Duyck.
> 
> Changes from v1:
>  * Imporved the patch description.
>  * Added Reviewed-by: and Fixes: as suggested by Alexander Duyck.
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* Re: [PATCH net-next] taprio: remove unused variable 'entry_list_policy'
From: David Miller @ 2019-08-08 18:38 UTC (permalink / raw)
  To: yuehaibing
  Cc: jhs, xiyou.wangcong, jiri, vinicius.gomes, linux-kernel, netdev
In-Reply-To: <20190808142623.69188-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Thu, 8 Aug 2019 22:26:23 +0800

> net/sched/sch_taprio.c:680:32: warning:
>  entry_list_policy defined but not used [-Wunused-const-variable=]
> 
> It is not used since commit a3d43c0d56f1 ("taprio: Add
> support adding an admin schedule")
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

This is probably unintentional and a bug, we should be using that
policy value to validate that the sched list is indeed a nested
attribute.

I'm not applying this without at least a better and clear commit
message explaining why we shouldn't be using this policy any more.

^ permalink raw reply

* Re: [PATCH 0/4] pull request for net-next: batman-adv 2019-08-08
From: David Miller @ 2019-08-08 18:29 UTC (permalink / raw)
  To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20190808130619.4481-1-sw@simonwunderlich.de>

From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Thu,  8 Aug 2019 15:06:15 +0200

> here is a small feature/cleanup pull request of batman-adv to go into net-next.
> 
> Please pull or let me know of any problem!

Pulled, thanks.

That lockdep annotation in the 4th patch really helped with the review.

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] bpf: support cloning sk storage on accept()
From: Martin Lau @ 2019-08-08 18:27 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org,
	davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <20190808152830.GC2820@mini-arch>

On Thu, Aug 08, 2019 at 08:28:30AM -0700, Stanislav Fomichev wrote:
> On 08/08, Martin Lau wrote:
> > On Wed, Aug 07, 2019 at 08:47:18AM -0700, Stanislav Fomichev wrote:
> > > Add new helper bpf_sk_storage_clone which optionally clones sk storage
> > > and call it from bpf_sk_storage_clone. Reuse the gap in
> > > bpf_sk_storage_elem to store clone/non-clone flag.
> > > 
> > > Cc: Martin KaFai Lau <kafai@fb.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  include/net/bpf_sk_storage.h |  10 ++++
> > >  include/uapi/linux/bpf.h     |   1 +
> > >  net/core/bpf_sk_storage.c    | 102 +++++++++++++++++++++++++++++++++--
> > >  net/core/sock.c              |   9 ++--
> > >  4 files changed, 115 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
> > > index b9dcb02e756b..8e4f831d2e52 100644
> > > --- a/include/net/bpf_sk_storage.h
> > > +++ b/include/net/bpf_sk_storage.h
> > > @@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
> > >  extern const struct bpf_func_proto bpf_sk_storage_get_proto;
> > >  extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
> > >  
> > > +#ifdef CONFIG_BPF_SYSCALL
> > > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
> > > +#else
> > > +static inline int bpf_sk_storage_clone(const struct sock *sk,
> > > +				       struct sock *newsk)
> > > +{
> > > +	return 0;
> > > +}
> > > +#endif
> > > +
> > >  #endif /* _BPF_SK_STORAGE_H */
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 4393bd4b2419..00459ca4c8cf 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -2931,6 +2931,7 @@ enum bpf_func_id {
> > >  
> > >  /* BPF_FUNC_sk_storage_get flags */
> > >  #define BPF_SK_STORAGE_GET_F_CREATE	(1ULL << 0)
> > > +#define BPF_SK_STORAGE_GET_F_CLONE	(1ULL << 1)
> > It is only used in bpf_sk_storage_get().
> > What if the elem is created from bpf_fd_sk_storage_update_elem()
> > i.e. from the syscall API ?
> > 
> > What may be the use case for a map to have both CLONE and non-CLONE
> > elements?  If it is not the case, would it be better to add
> > BPF_F_CLONE to bpf_attr->map_flags?
> I didn't think about putting it on the map itself since the API
> is on a per-element, but it does make sense. I can't come up
> with a use-case for a per-element selective clone/non-clone.
> Thanks, will move to the map itself.
> 
> > >  
> > >  /* Mode for BPF_FUNC_skb_adjust_room helper. */
> > >  enum bpf_adj_room_mode {
> > > diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> > > index 94c7f77ecb6b..b6dea67965bc 100644
> > > --- a/net/core/bpf_sk_storage.c
> > > +++ b/net/core/bpf_sk_storage.c
> > > @@ -12,6 +12,9 @@
> > >  
> > >  static atomic_t cache_idx;
> > >  
> > > +#define BPF_SK_STORAGE_GET_F_MASK	(BPF_SK_STORAGE_GET_F_CREATE | \
> > > +					 BPF_SK_STORAGE_GET_F_CLONE)
> > > +
> > >  struct bucket {
> > >  	struct hlist_head list;
> > >  	raw_spinlock_t lock;
> > > @@ -66,7 +69,8 @@ struct bpf_sk_storage_elem {
> > >  	struct hlist_node snode;	/* Linked to bpf_sk_storage */
> > >  	struct bpf_sk_storage __rcu *sk_storage;
> > >  	struct rcu_head rcu;
> > > -	/* 8 bytes hole */
> > > +	u8 clone:1;
> > > +	/* 7 bytes hole */
> > >  	/* The data is stored in aother cacheline to minimize
> > >  	 * the number of cachelines access during a cache hit.
> > >  	 */
> > > @@ -509,7 +513,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
> > >  	return 0;
> > >  }
> > >  
> > > -/* Called by __sk_destruct() */
> > > +/* Called by __sk_destruct() & bpf_sk_storage_clone() */
> > >  void bpf_sk_storage_free(struct sock *sk)
> > >  {
> > >  	struct bpf_sk_storage_elem *selem;
> > > @@ -739,19 +743,106 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
> > >  	return err;
> > >  }
> > >  
> > > +static struct bpf_sk_storage_elem *
> > > +bpf_sk_storage_clone_elem(struct sock *newsk,
> > > +			  struct bpf_sk_storage_map *smap,
> > > +			  struct bpf_sk_storage_elem *selem)
> > > +{
> > > +	struct bpf_sk_storage_elem *copy_selem;
> > > +
> > > +	copy_selem = selem_alloc(smap, newsk, NULL, true);
> > > +	if (!copy_selem)
> > > +		return ERR_PTR(-ENOMEM);
> > nit.
> > may be just return NULL as selem_alloc() does.
> Sounds good.
> 
> > > +
> > > +	if (map_value_has_spin_lock(&smap->map))
> > > +		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
> > > +				      SDATA(selem)->data, true);
> > > +	else
> > > +		copy_map_value(&smap->map, SDATA(copy_selem)->data,
> > > +			       SDATA(selem)->data);
> > > +
> > > +	return copy_selem;
> > > +}
> > > +
> > > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > > +{
> > > +	struct bpf_sk_storage *new_sk_storage = NULL;
> > > +	struct bpf_sk_storage *sk_storage;
> > > +	struct bpf_sk_storage_elem *selem;
> > > +	int ret;
> > > +
> > > +	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > > +
> > > +	rcu_read_lock();
> > > +	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > > +
> > > +	if (!sk_storage || hlist_empty(&sk_storage->list))
> > > +		goto out;
> > > +
> > > +	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > > +		struct bpf_sk_storage_map *smap;
> > > +		struct bpf_sk_storage_elem *copy_selem;
> > > +
> > > +		if (!selem->clone)
> > > +			continue;
> > > +
> > > +		smap = rcu_dereference(SDATA(selem)->smap);
> > > +		if (!smap)
> > smap should not be NULL.
> I see; you never set it back to NULL and we are guaranteed that the
> map is still around due to rcu. Removed.
> 
> > > +			continue;
> > > +
> > > +		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > > +		if (IS_ERR(copy_selem)) {
> > > +			ret = PTR_ERR(copy_selem);
> > > +			goto err;
> > > +		}
> > > +
> > > +		if (!new_sk_storage) {
> > > +			ret = sk_storage_alloc(newsk, smap, copy_selem);
> > > +			if (ret) {
> > > +				kfree(copy_selem);
> > > +				atomic_sub(smap->elem_size,
> > > +					   &newsk->sk_omem_alloc);
> > > +				goto err;
> > > +			}
> > > +
> > > +			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > > +			continue;
> > > +		}
> > > +
> > > +		raw_spin_lock_bh(&new_sk_storage->lock);
> > > +		selem_link_map(smap, copy_selem);
> > Unlike the existing selem-update use-cases in bpf_sk_storage.c,
> > the smap->map.refcnt has not been held here.  Reading the smap
> > is fine.  However, adding a new selem to a deleting smap is an issue.
> > Hence, I think bpf_map_inc_not_zero() should be done first.
> In this case, I should probably do it after smap = rcu_deref()?
Right.

and bpf_map_put should be called when done.  Becasue of bpf_map_put,
it may be a good idea to add a comment to the first synchronize_rcu()
in bpf_sk_storage_map_free() since this new bpf_sk_storage_clone()
also depends on it now,
which makes it different from other bpf maps.

> 
> > > +		__selem_link_sk(new_sk_storage, copy_selem);
> > > +		raw_spin_unlock_bh(&new_sk_storage->lock);
> > > +	}
> > > +
> > > +out:
> > > +	rcu_read_unlock();
> > > +	return 0;
> > > +
> > > +err:
> > > +	rcu_read_unlock();
> > > +
> > > +	bpf_sk_storage_free(newsk);
> > > +	return ret;
> > > +}
> > > +
> > >  BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > >  	   void *, value, u64, flags)
> > >  {
> > >  	struct bpf_sk_storage_data *sdata;
> > >  
> > > -	if (flags > BPF_SK_STORAGE_GET_F_CREATE)
> > > +	if (flags & ~BPF_SK_STORAGE_GET_F_MASK)
> > > +		return (unsigned long)NULL;
> > > +
> > > +	if ((flags & BPF_SK_STORAGE_GET_F_CLONE) &&
> > > +	    !(flags & BPF_SK_STORAGE_GET_F_CREATE))
> > >  		return (unsigned long)NULL;
> > >  
> > >  	sdata = sk_storage_lookup(sk, map, true);
> > >  	if (sdata)
> > >  		return (unsigned long)sdata->data;
> > >  
> > > -	if (flags == BPF_SK_STORAGE_GET_F_CREATE &&
> > > +	if ((flags & BPF_SK_STORAGE_GET_F_CREATE) &&
> > >  	    /* Cannot add new elem to a going away sk.
> > >  	     * Otherwise, the new elem may become a leak
> > >  	     * (and also other memory issues during map
> > > @@ -762,6 +853,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > >  		/* sk must be a fullsock (guaranteed by verifier),
> > >  		 * so sock_gen_put() is unnecessary.
> > >  		 */
> > > +		if (!IS_ERR(sdata))
> > > +			SELEM(sdata)->clone =
> > > +				!!(flags & BPF_SK_STORAGE_GET_F_CLONE);
> > >  		sock_put(sk);
> > >  		return IS_ERR(sdata) ?
> > >  			(unsigned long)NULL : (unsigned long)sdata->data;
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index d57b0cc995a0..f5e801a9cea4 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
> > >  			goto out;
> > >  		}
> > >  		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
> > > -#ifdef CONFIG_BPF_SYSCALL
> > > -		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > > -#endif
> > > +
> > > +		if (bpf_sk_storage_clone(sk, newsk)) {
> > > +			sk_free_unlock_clone(newsk);
> > > +			newsk = NULL;
> > > +			goto out;
> > > +		}
> > >  
> > >  		newsk->sk_err	   = 0;
> > >  		newsk->sk_err_soft = 0;
> > > -- 
> > > 2.22.0.770.g0f2c4a37fd-goog
> > > 

^ permalink raw reply

* Re: [PATCH 0/2] pull request for net: batman-adv 2019-08-08
From: David Miller @ 2019-08-08 18:26 UTC (permalink / raw)
  To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20190808130208.2124-1-sw@simonwunderlich.de>

From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Thu,  8 Aug 2019 15:02:06 +0200

> here are some bugfixes which we would like to have integrated into net.
> 
> Please pull or let me know of any problem!

Pulled.

^ permalink raw reply

* Re: [PATCH v2 00/15] net: phy: adin: add support for Analog Devices PHYs
From: David Miller @ 2019-08-08 18:24 UTC (permalink / raw)
  To: alexandru.ardelean
  Cc: netdev, devicetree, linux-kernel, robh+dt, mark.rutland,
	f.fainelli, hkallweit1, andrew
In-Reply-To: <20190808123026.17382-1-alexandru.ardelean@analog.com>

From: Alexandru Ardelean <alexandru.ardelean@analog.com>
Date: Thu, 8 Aug 2019 15:30:11 +0300

> This changeset adds support for Analog Devices Industrial Ethernet PHYs.
> Particularly the PHYs this driver adds support for:
>  * ADIN1200 - Robust, Industrial, Low Power 10/100 Ethernet PHY
>  * ADIN1300 - Robust, Industrial, Low Latency 10/100/1000 Gigabit
>    Ethernet PHY
> 
> The 2 chips are pin & register compatible with one another. The main
> difference being that ADIN1200 doesn't operate in gigabit mode.
> 
> The chips can be operated by the Generic PHY driver as well via the
> standard IEEE PHY registers (0x0000 - 0x000F) which are supported by the
> kernel as well. This assumes that configuration of the PHY has been done
> completely in HW, according to spec, i.e. no extra SW configuration
> required.
> 
> This changeset also implements the ability to configure the chips via SW
> registers.
> 
> Datasheets:
>   https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1300.pdf
>   https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1200.pdf
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

I think, at a minimum, the c22 vs. c45 issues need to be discussed more
and even if no code changes occur there is definitely some adjustments
and clairifications that need to occur on this issue in the commit
messages and/or documentation.

^ permalink raw reply

* Re: [Linux-decnet-user] [PATCH] Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
From: David Miller @ 2019-08-08 18:21 UTC (permalink / raw)
  To: emserrat
  Cc: clabbe, corbet, linux-doc, netdev, linux-decnet-user,
	linux-kernel, tgraf
In-Reply-To: <DM5PR22MB03797234267E8B37EA3080BBC4D70@DM5PR22MB0379.namprd22.prod.outlook.com>

From: Eduardo Marcelo Serrat <emserrat@hotmail.com>
Date: Thu, 8 Aug 2019 11:44:14 +0000

> Sorry for using the list for this purpose but we are looking for
> senior engineers with knowledge in OpenVMS/ Tru64 Unix, Solaris,
> HP-UX and of course Linux and familiar with virtualization
> technologies, specially cross platform emulators. We need to fill
> support engineer roles. If anybody interested for positions in the
> US / Europe please send me an email.

Please do not ever use the vger.kernel.org mailing lists for this kind
of solicitation.

It is completely inappropriate.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox