Netdev List
 help / color / mirror / Atom feed
* Re: [patch net-next v2 2/2] selftests: netdevsim: add devlink params tests
From: Jakub Kicinski @ 2019-08-15 17:12 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, mlxsw
In-Reply-To: <20190815085214.GC2273@nanopsycho>

On Thu, 15 Aug 2019 10:52:14 +0200, Jiri Pirko wrote:
> Thu, Aug 15, 2019 at 10:45:45AM CEST, jiri@resnulli.us wrote:
> >Thu, Aug 15, 2019 at 03:09:00AM CEST, jakub.kicinski@netronome.com wrote:  
> >>On Wed, 14 Aug 2019 17:26:04 +0200, Jiri Pirko wrote:  
> >>> From: Jiri Pirko <jiri@mellanox.com>
> >>> 
> >>> Test recently added netdevsim devlink param implementation.
> >>> 
> >>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> >>> ---
> >>> v1->v2:
> >>> -using cmd_jq helper  
> >>
> >>Still failing here :(
> >>
> >># ./devlink.sh 
> >>TEST: fw flash test                                                 [ OK ]
> >>TEST: params test                                                   [FAIL]
> >>	Failed to get test1 param value
> >>TEST: regions test                                                  [ OK ]
> >>
> >># jq --version
> >>jq-1.5-1-a5b5cbe
> >># echo '{ "a" : false }' | jq -e -r '.[]'
> >>false
> >># echo $?
> >>1  
> >
> >Odd, could you please try:
> >$ jq --version
> >jq-1.5
> >$ echo '{"param":{"netdevsim/netdevsim11":[{"name":"test1","type":"driver-specific","values":[{"cmode":"driverinit","value":"false"}]}]}}' | jq -e -r '.[][][].values[] | select(.cmode == "driverinit").value'
> >false
> >$ echo $?
> >0  
> 
> Ah, it is not the jq version, it is the iproute2 version:
> 8257e6c49cca9847e01262f6e749c6e88e2ddb72
> 
> I'll think about how to fix this.

Ah, wow, you're right! Old iproute2 works fine here, too!

> >>
> >>On another machine:
> >>
> >>$ echo '{ "a" : false }' | jq -e -r '.[]'
> >>false
> >>$ echo $?
> >>1
> >>
> >>Did you mean to drop the -e ?  


^ permalink raw reply

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Toke Høiland-Jørgensen @ 2019-08-15 17:11 UTC (permalink / raw)
  To: Samudrala, Sridhar, magnus.karlsson, bjorn.topel, netdev, bpf,
	intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <b9423054-247e-8b57-ea59-42368f60ea1e@intel.com>

"Samudrala, Sridhar" <sridhar.samudrala@intel.com> writes:

> On 8/15/2019 4:12 AM, Toke Høiland-Jørgensen wrote:
>> Sridhar Samudrala <sridhar.samudrala@intel.com> writes:
>> 
>>> This patch series introduces XDP_SKIP_BPF flag that can be specified
>>> during the bind() call of an AF_XDP socket to skip calling the BPF
>>> program in the receive path and pass the buffer directly to the socket.
>>>
>>> When a single AF_XDP socket is associated with a queue and a HW
>>> filter is used to redirect the packets and the app is interested in
>>> receiving all the packets on that queue, we don't need an additional
>>> BPF program to do further filtering or lookup/redirect to a socket.
>>>
>>> Here are some performance numbers collected on
>>>    - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>>>    - Intel 40Gb Ethernet NIC (i40e)
>>>
>>> All tests use 2 cores and the results are in Mpps.
>>>
>>> turbo on (default)
>>> ---------------------------------------------	
>>>                        no-skip-bpf    skip-bpf
>>> ---------------------------------------------	
>>> rxdrop zerocopy           21.9         38.5
>>> l2fwd  zerocopy           17.0         20.5
>>> rxdrop copy               11.1         13.3
>>> l2fwd  copy                1.9          2.0
>>>
>>> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>>> ---------------------------------------------	
>>>                        no-skip-bpf    skip-bpf
>>> ---------------------------------------------	
>>> rxdrop zerocopy           15.4         29.0
>>> l2fwd  zerocopy           11.8         18.2
>>> rxdrop copy                8.2         10.5
>>> l2fwd  copy                1.7          1.7
>>> ---------------------------------------------
>> 
>> You're getting this performance boost by adding more code in the fast
>> path for every XDP program; so what's the performance impact of that for
>> cases where we do run an eBPF program?
>
> The no-skip-bpf results are pretty close to what i see before the 
> patches are applied. As umem is cached in rx_ring for zerocopy the 
> overhead is much smaller compared to the copy scenario where i am 
> currently calling xdp_get_umem_from_qid().

I meant more for other XDP programs; what is the performance impact of
XDP_DROP, for instance?

>> Also, this is basically a special-casing of a particular deployment
>> scenario. Without a way to control RX queue assignment and traffic
>> steering, you're basically hard-coding a particular app's takeover of
>> the network interface; I'm not sure that is such a good idea...
>
> Yes. This is mainly targeted for application that create 1 AF_XDP
> socket per RX queue and can use a HW filter (via ethtool or TC flower)
> to redirect the packets to a queue or a group of queues.

Yeah, and I'd prefer it if the handling of this to be unified somehow...

-Toke

^ permalink raw reply

* Re: [PATCH bpf] tools: bpftool: close prog FD before exit on showing a single program
From: Andrii Nakryiko @ 2019-08-15 17:09 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Networking, oss-drivers
In-Reply-To: <20190815142223.2203-1-quentin.monnet@netronome.com>

On Thu, Aug 15, 2019 at 7:24 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> When showing metadata about a single program by invoking
> "bpftool prog show PROG", the file descriptor referring to the program
> is not closed before returning from the function. Let's close it.
>
> Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  tools/bpf/bpftool/prog.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 66f04a4846a5..43fdbbfe41bb 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -363,7 +363,9 @@ static int do_show(int argc, char **argv)
>                 if (fd < 0)
>                         return -1;
>
> -               return show_prog(fd);
> +               err = show_prog(fd);
> +               close(fd);
> +               return err;

There is a similar problem few lines above for special case of argc ==
2, which you didn't fix.
Would it be better to make show_prog(fd) close provided fd instead or
is it used in some other context where FD should live longer (I
haven't checked, sorry)?

>         }
>
>         if (argc)
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH -next] btf: fix return value check in btf_vmlinux_init()
From: Andrii Nakryiko @ 2019-08-15 17:06 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, Networking, bpf, kernel-janitors
In-Reply-To: <20190815142432.101401-1-weiyongjun1@huawei.com>

On Thu, Aug 15, 2019 at 7:21 AM Wei Yongjun <weiyongjun1@huawei.com> wrote:
>
> In case of error, the function kobject_create_and_add() returns NULL
> pointer not ERR_PTR(). The IS_ERR() test in the return value check
> should be replaced with NULL test.
>
> Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---


Argh... Thanks for the fix! Fix the comment below addressed:

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  kernel/bpf/sysfs_btf.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
> index 4659349fc795..be5557deb958 100644
> --- a/kernel/bpf/sysfs_btf.c
> +++ b/kernel/bpf/sysfs_btf.c
> @@ -30,16 +30,13 @@ static struct kobject *btf_kobj;
>
>  static int __init btf_vmlinux_init(void)
>  {
> -       int err;
> -
>         if (!_binary__btf_vmlinux_bin_start)
>                 return 0;
>
>         btf_kobj = kobject_create_and_add("btf", kernel_kobj);
> -       if (IS_ERR(btf_kobj)) {
> -               err = PTR_ERR(btf_kobj);
> +       if (!btf_kobj) {
>                 btf_kobj = NULL;

This is now not necessary, please drop (and don't forget to remove {}
for this single-line if afterwards).

> -               return err;
> +               return -ENOMEM;
>         }
>
>         bin_attr_btf_vmlinux.size = _binary__btf_vmlinux_bin_end -
>
>
>

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Samudrala, Sridhar @ 2019-08-15 16:46 UTC (permalink / raw)
  To: Björn Töpel, magnus.karlsson, netdev, bpf,
	intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <bebfb097-5357-91d8-ebc7-2f8ede392ad7@intel.com>

On 8/15/2019 5:51 AM, Björn Töpel wrote:
> On 2019-08-15 05:46, Sridhar Samudrala wrote:
>> This patch series introduces XDP_SKIP_BPF flag that can be specified
>> during the bind() call of an AF_XDP socket to skip calling the BPF
>> program in the receive path and pass the buffer directly to the socket.
>>
>> When a single AF_XDP socket is associated with a queue and a HW
>> filter is used to redirect the packets and the app is interested in
>> receiving all the packets on that queue, we don't need an additional
>> BPF program to do further filtering or lookup/redirect to a socket.
>>
>> Here are some performance numbers collected on
>>    - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>>    - Intel 40Gb Ethernet NIC (i40e)
>>
>> All tests use 2 cores and the results are in Mpps.
>>
>> turbo on (default)
>> ---------------------------------------------
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------
>> rxdrop zerocopy           21.9         38.5
>> l2fwd  zerocopy           17.0         20.5
>> rxdrop copy               11.1         13.3
>> l2fwd  copy                1.9          2.0
>>
>> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>> ---------------------------------------------
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------
>> rxdrop zerocopy           15.4         29.0
>> l2fwd  zerocopy           11.8         18.2
>> rxdrop copy                8.2         10.5
>> l2fwd  copy                1.7          1.7
>> ---------------------------------------------
>>
> 
> This work is somewhat similar to the XDP_ATTACH work [1]. Avoiding the
> retpoline in the XDP program call is a nice performance boost! I like
> the numbers! :-) I also like the idea of adding a flag that just does
> what most AF_XDP Rx users want -- just getting all packets of a
> certain queue into the XDP sockets.
> 
> In addition to Toke's mail, I have some more concerns with the series:
> 
> * AFAIU the SKIP_BPF only works for zero-copy enabled sockets. IMO, it
>    should work for all modes (including XDP_SKB).

This patch enables SKIP_BPF for AF_XDP sockets where an XDP program is 
attached at driver level (both zerocopy and copy modes)
I tried a quick hack to see the perf benefit with generic XDP mode, but 
i didn't see any significant improvement in performance in that 
scenario. so i didn't include that mode.

> 
> * In order to work, a user still needs an XDP program running. That's
>    clunky. I'd like the behavior that if no XDP program is attached,
>    and the option is set, the packets for a that queue end up in the
>    socket. If there's an XDP program attached, the program has
>    precedence.

I think this would require more changes in the drivers to take XDP 
datapath even when there is no XDP program loaded.

> 
> * It requires changes in all drivers. Not nice, and scales badly. Try
>    making it generic (xdp_do_redirect/xdp_flush), so it Just Works for
>    all XDP capable drivers.

I tried to make this as generic as possible and make the changes to the 
driver very minimal, but could not find a way to avoid any changes at 
all to the driver. xdp_do_direct() gets called based after the call to 
bpf_prog_run_xdp() in the drivers.

> 
> Thanks for working on this!
> 
> 
> Björn
> 
> [1] 
> https://lore.kernel.org/netdev/20181207114431.18038-1-bjorn.topel@gmail.com/ 
> 
> 
> 
>> Sridhar Samudrala (5):
>>    xsk: Convert bool 'zc' field in struct xdp_umem to a u32 bitmap
>>    xsk: Introduce XDP_SKIP_BPF bind option
>>    i40e: Enable XDP_SKIP_BPF option for AF_XDP sockets
>>    ixgbe: Enable XDP_SKIP_BPF option for AF_XDP sockets
>>    xdpsock_user: Add skip_bpf option
>>
>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +++++++++-
>>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  6 +++
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 20 ++++++++-
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 16 ++++++-
>>   include/net/xdp_sock.h                        | 21 ++++++++-
>>   include/uapi/linux/if_xdp.h                   |  1 +
>>   include/uapi/linux/xdp_diag.h                 |  1 +
>>   net/xdp/xdp_umem.c                            |  9 ++--
>>   net/xdp/xsk.c                                 | 43 ++++++++++++++++---
>>   net/xdp/xsk_diag.c                            |  5 ++-
>>   samples/bpf/xdpsock_user.c                    |  8 ++++
>>   11 files changed, 135 insertions(+), 17 deletions(-)
>>

^ permalink raw reply

* Re: [PATCH] arm64: do_csum: implement accelerated scalar version
From: Will Deacon @ 2019-08-15 16:46 UTC (permalink / raw)
  To: Zhangshaokun
  Cc: Robin Murphy, Ard Biesheuvel, linux-arm-kernel, netdev,
	ilias.apalodimas, huanglingyan (A), steve.capper
In-Reply-To: <440eb674-0e59-a97e-4a90-0026e2327069@hisilicon.com>

On Thu, May 16, 2019 at 11:14:35AM +0800, Zhangshaokun wrote:
> On 2019/5/15 17:47, Will Deacon wrote:
> > On Mon, Apr 15, 2019 at 07:18:22PM +0100, Robin Murphy wrote:
> >> On 12/04/2019 10:52, Will Deacon wrote:
> >>> I'm waiting for Robin to come back with numbers for a C implementation.
> >>>
> >>> Robin -- did you get anywhere with that?
> >>
> >> Still not what I would call finished, but where I've got so far (besides an
> >> increasingly elaborate test rig) is as below - it still wants some unrolling
> >> in the middle to really fly (and actual testing on BE), but the worst-case
> >> performance already equals or just beats this asm version on Cortex-A53 with
> >> GCC 7 (by virtue of being alignment-insensitive and branchless except for
> >> the loop). Unfortunately, the advantage of C code being instrumentable does
> >> also come around to bite me...
> > 
> > Is there any interest from anybody in spinning a proper patch out of this?
> > Shaokun?
> 
> HiSilicon's Kunpeng920(Hi1620) benefits from do_csum optimization, if Ard and
> Robin are ok, Lingyan or I can try to do it.
> Of course, if any guy posts the patch, we are happy to test it.
> Any will be ok.

I don't mind who posts it, but Robin is super busy with SMMU stuff at the
moment so it probably makes more sense for you or Lingyan to do it.

Will

^ permalink raw reply

* Re: [PATCH net-next] net/rds: Add RDS6_INFO_SOCKETS and RDS6_INFO_RECV_MESSAGES options
From: santosh.shilimkar @ 2019-08-15 16:36 UTC (permalink / raw)
  To: Ka-Cheong Poon, netdev; +Cc: davem, rds-devel
In-Reply-To: <1565861803-31268-1-git-send-email-ka-cheong.poon@oracle.com>

On 8/15/19 2:36 AM, Ka-Cheong Poon wrote:
> Add support of the socket options RDS6_INFO_SOCKETS and
> RDS6_INFO_RECV_MESSAGES which update the RDS_INFO_SOCKETS and
> RDS_INFO_RECV_MESSAGES options respectively.  The old options work
> for IPv4 sockets only.
> 
> Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
> ---
Thanks Ka-Cheong for getting this one out on list.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net-next 1/1] Added BASE-T1 PHY support to PHY Subsystem
From: Heiner Kallweit @ 2019-08-15 16:34 UTC (permalink / raw)
  To: Andrew Lunn, Christian Herber
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florian Fainelli
In-Reply-To: <20190815155613.GE15291@lunn.ch>

On 15.08.2019 17:56, Andrew Lunn wrote:
> On Thu, Aug 15, 2019 at 03:32:29PM +0000, Christian Herber wrote:
>> BASE-T1 is a category of Ethernet PHYs.
>> They use a single copper pair for transmission.
>> This patch add basic support for this category of PHYs.
>> It coveres the discovery of abilities and basic configuration.
>> It includes setting fixed speed and enabling auto-negotiation.
>> BASE-T1 devices should always Clause-45 managed.
>> Therefore, this patch extends phy-c45.c.
>> While for some functions like auto-neogtiation different registers are
>> used, the layout of these registers is the same for the used fields.
>> Thus, much of the logic of basic Clause-45 devices can be reused.
>>
>> Signed-off-by: Christian Herber <christian.herber@nxp.com>
>> ---
>>  drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>>  drivers/net/phy/phy-core.c   |   4 +-
>>  include/uapi/linux/ethtool.h |   2 +
>>  include/uapi/linux/mdio.h    |  21 +++++++
>>  4 files changed, 129 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
>> index b9d4145781ca..9ff0b8c785de 100644
>> --- a/drivers/net/phy/phy-c45.c
>> +++ b/drivers/net/phy/phy-c45.c
>> @@ -8,13 +8,23 @@
>>  #include <linux/mii.h>
>>  #include <linux/phy.h>
>>  
>> +#define IS_100BASET1(phy) (linkmode_test_bit( \
>> +			   ETHTOOL_LINK_MODE_100baseT1_Full_BIT, \
>> +			   (phy)->supported))
>> +#define IS_1000BASET1(phy) (linkmode_test_bit( \
>> +			    ETHTOOL_LINK_MODE_1000baseT1_Full_BIT, \
>> +			    (phy)->supported))
> 
> Hi Christian
> 
> We already have the flag phydev->is_gigabit_capable. Maybe add a flag
> phydev->is_t1_capable
> 
>> +
>> +static u32 get_aneg_ctrl(struct phy_device *phydev);
>> +static u32 get_aneg_stat(struct phy_device *phydev);
> 
> No forward declarations please. Put the code in the right order so
> they are not needed.
> 
> Thanks
> 
>      Andrew
> 

For whatever reason I don't have the original mail in my netdev inbox (yet).

+	if (IS_100BASET1(phydev) || IS_1000BASET1(phydev))
+		ctrl = MDIO_AN_BT1_CTRL;

Code like this could be problematic once a PHY supports one of the T1 modes
AND normal modes. Then normal modes would be unusable.

I think this scenario isn't completely hypothetical. See the Aquantia
AQCS109 that supports normal modes and (proprietary) 1000Base-T2.

Maybe we need separate versions of the generic functions for T1.
Then it would be up to the PHY driver to decide when to use which
version.

Heiner

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Samudrala, Sridhar @ 2019-08-15 16:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, magnus.karlsson, bjorn.topel,
	netdev, bpf, intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <87ftm2adi2.fsf@toke.dk>


On 8/15/2019 4:12 AM, Toke Høiland-Jørgensen wrote:
> Sridhar Samudrala <sridhar.samudrala@intel.com> writes:
> 
>> This patch series introduces XDP_SKIP_BPF flag that can be specified
>> during the bind() call of an AF_XDP socket to skip calling the BPF
>> program in the receive path and pass the buffer directly to the socket.
>>
>> When a single AF_XDP socket is associated with a queue and a HW
>> filter is used to redirect the packets and the app is interested in
>> receiving all the packets on that queue, we don't need an additional
>> BPF program to do further filtering or lookup/redirect to a socket.
>>
>> Here are some performance numbers collected on
>>    - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>>    - Intel 40Gb Ethernet NIC (i40e)
>>
>> All tests use 2 cores and the results are in Mpps.
>>
>> turbo on (default)
>> ---------------------------------------------	
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------	
>> rxdrop zerocopy           21.9         38.5
>> l2fwd  zerocopy           17.0         20.5
>> rxdrop copy               11.1         13.3
>> l2fwd  copy                1.9          2.0
>>
>> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>> ---------------------------------------------	
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------	
>> rxdrop zerocopy           15.4         29.0
>> l2fwd  zerocopy           11.8         18.2
>> rxdrop copy                8.2         10.5
>> l2fwd  copy                1.7          1.7
>> ---------------------------------------------
> 
> You're getting this performance boost by adding more code in the fast
> path for every XDP program; so what's the performance impact of that for
> cases where we do run an eBPF program?

The no-skip-bpf results are pretty close to what i see before the 
patches are applied. As umem is cached in rx_ring for zerocopy the 
overhead is much smaller compared to the copy scenario where i am 
currently calling xdp_get_umem_from_qid().

> 
> Also, this is basically a special-casing of a particular deployment
> scenario. Without a way to control RX queue assignment and traffic
> steering, you're basically hard-coding a particular app's takeover of
> the network interface; I'm not sure that is such a good idea...

Yes. This is mainly targeted for application that create 1 AF_XDP socket 
per RX queue and can use a HW filter (via ethtool or TC flower) to 
redirect the packets to a queue or a group of queues.

> 
> -Toke
> 

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] rds: check for excessive looping in rds_send_xmit
From: santosh.shilimkar @ 2019-08-15 16:18 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <d91e3273-48bb-13bf-af65-40472890f975@oracle.com>

On 8/15/19 7:43 AM, Gerd Rausch wrote:
> From: Andy Grover <andy.grover@oracle.com>
> Date: Thu, 13 Jan 2011 11:40:31 -0800
> 
> Original commit from 2011 updated to include a change by
> Yuval Shaia <yuval.shaia@oracle.com>
> that adds a new statistic counter "send_stuck_rm"
> to capture the messages looping exessively
> in the send path.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net-next v2 3/4] net/rds: Add a few missing rds_stat_names entries
From: santosh.shilimkar @ 2019-08-15 16:17 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <2d604055-a49e-637f-a1e6-afefa8482316@oracle.com>

On 8/15/19 7:42 AM, Gerd Rausch wrote:
> From: Gerd Rausch <gerd.rausch@oracle.com>
> Date: Thu, 11 Jul 2019 12:15:50 -0700
> 
> In a previous commit, fields were added to "struct rds_statistics"
> but array "rds_stat_names" was not updated accordingly.
> 
> Please note the inconsistent naming of the string representations
> that is done in the name of compatibility
> with the Oracle internal code-base.
> 
> s_recv_bytes_added_to_socket     -> "recv_bytes_added_to_sock"
> s_recv_bytes_removed_from_socket -> "recv_bytes_freed_fromsock"
> 
> Fixes: 192a798f5299 ("RDS: add stat for socket recv memory usage")
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] RDS: don't use GFP_ATOMIC for sk_alloc in rds_create
From: santosh.shilimkar @ 2019-08-15 16:17 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <31c65073-0a9a-28b5-eb73-4ec784b0393e@oracle.com>

On 8/15/19 7:42 AM, Gerd Rausch wrote:
> From: Chris Mason <chris.mason@oracle.com>
> Date: Fri, 3 Feb 2012 11:08:51 -0500
> 
> Signed-off-by: Chris Mason <chris.mason@oracle.com>
> Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net-next v2 1/4] RDS: limit the number of times we loop in rds_send_xmit
From: santosh.shilimkar @ 2019-08-15 16:16 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <90b76f24-d799-5362-df53-19102d781e3e@oracle.com>

On 8/15/19 7:42 AM, Gerd Rausch wrote:
> From: Chris Mason <chris.mason@oracle.com>
> Date: Fri, 3 Feb 2012 11:07:54 -0500
> 
> This will kick the RDS worker thread if we have been looping
> too long.
> 
> Original commit from 2012 updated to include a change by
> Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> that triggers "must_wake" if "rds_ib_recv_refill_one" fails.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Thought I acked V1 series.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net 1/2] igb: Enable media autosense for the i350.
From: Jeff Kirsher @ 2019-08-15 16:01 UTC (permalink / raw)
  To: Manfred Rudigier, davem; +Cc: carolyn.wyborny, todd.fujinaka, netdev
In-Reply-To: <f50fd188-fe43-4bd7-aaa4-4c1c8cb022c3@EXC04-ATKLA.omicron.at>

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

On Wed, 2019-08-14 at 13:59 +0200, Manfred Rudigier wrote:
> This patch enables the hardware feature "Media Auto Sense" also on
> the
> i350. It works in the same way as on the 82850 devices. Hardware
> designs
> using dual PHYs (fiber/copper) can enable this feature by setting the
> MAS
> enable bits in the NVM_COMPAT register (0x03) in the EEPROM.
> 
> Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
> ---
>  drivers/net/ethernet/intel/igb/e1000_82575.c | 2 +-
>  drivers/net/ethernet/intel/igb/igb_main.c    | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

I will get this 2 patch series sent to intel-wired-lan@lists.osuosl.org
  list so that we can get these patches into review and test for
upstream inclusion.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net-next 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev
In-Reply-To: <cover.1565882584.git.dcaratti@redhat.com>

When an application configures kernel TLS on top of a TCP socket, it's
now possible for inet_diag_handler() to collect information regarding the
protocol version, the cipher type and TX / RX configuration, in case
INET_DIAG_INFO is requested.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 include/net/tls.h              | 19 ++++++++++++
 include/uapi/linux/inet_diag.h |  1 +
 include/uapi/linux/tls.h       | 15 +++++++++
 net/tls/tls_main.c             | 56 ++++++++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4997742475cd..990f1d9182a3 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -431,6 +431,25 @@ static inline bool is_tx_ready(struct tls_sw_context_tx *ctx)
 	return READ_ONCE(rec->tx_ready);
 }
 
+static inline u16 tls_user_config(struct tls_context *ctx, bool tx)
+{
+	u16 config = tx ? ctx->tx_conf : ctx->rx_conf;
+
+	switch (config) {
+	case TLS_BASE:
+		return TLS_CONF_BASE;
+	case TLS_SW:
+		return TLS_CONF_SW;
+#ifdef CONFIG_TLS_DEVICE
+	case TLS_HW:
+		return TLS_CONF_HW;
+#endif
+	case TLS_HW_RECORD:
+		return TLS_CONF_HW_RECORD;
+	}
+	return 0;
+}
+
 struct sk_buff *
 tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
 		      struct sk_buff *skb);
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index e2c6273274f3..a1ff345b3f33 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -162,6 +162,7 @@ enum {
 enum {
 	INET_ULP_INFO_UNSPEC,
 	INET_ULP_INFO_NAME,
+	INET_ULP_INFO_TLS,
 	__INET_ULP_INFO_MAX,
 };
 #define INET_ULP_INFO_MAX (__INET_ULP_INFO_MAX - 1)
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 5b9c26753e46..bcd2869ed472 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -109,4 +109,19 @@ struct tls12_crypto_info_aes_ccm_128 {
 	unsigned char rec_seq[TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE];
 };
 
+enum {
+	TLS_INFO_UNSPEC,
+	TLS_INFO_VERSION,
+	TLS_INFO_CIPHER,
+	TLS_INFO_TXCONF,
+	TLS_INFO_RXCONF,
+	__TLS_INFO_MAX,
+};
+#define TLS_INFO_MAX (__TLS_INFO_MAX - 1)
+
+#define TLS_CONF_BASE 1
+#define TLS_CONF_SW 2
+#define TLS_CONF_HW 3
+#define TLS_CONF_HW_RECORD 4
+
 #endif /* _UAPI_LINUX_TLS_H */
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 04829bef514c..957d937c72d2 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -39,6 +39,7 @@
 #include <linux/netdevice.h>
 #include <linux/sched/signal.h>
 #include <linux/inetdevice.h>
+#include <linux/inet_diag.h>
 
 #include <net/tls.h>
 
@@ -838,6 +839,59 @@ static void tls_update(struct sock *sk, struct proto *p)
 	}
 }
 
+static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
+{
+	struct tls_context *ctx = tls_get_ctx(sk);
+	u16 version, cipher_type;
+	struct nlattr *start;
+	int err;
+
+	start = nla_nest_start_noflag(skb, INET_ULP_INFO_TLS);
+	if (!start)
+		return -EMSGSIZE;
+
+	version = ctx->prot_info.version;
+	if (version) {
+		err = nla_put_u16(skb, TLS_INFO_VERSION, version);
+		if (err)
+			goto nla_failure;
+	}
+	cipher_type = ctx->prot_info.cipher_type;
+	if (cipher_type) {
+		err = nla_put_u16(skb, TLS_INFO_CIPHER, cipher_type);
+		if (err)
+			goto nla_failure;
+	}
+	err = nla_put_u16(skb, TLS_INFO_TXCONF, tls_user_config(ctx, true));
+	if (err)
+		goto nla_failure;
+
+	err = nla_put_u16(skb, TLS_INFO_RXCONF, tls_user_config(ctx, false));
+	if (err)
+		goto nla_failure;
+
+	nla_nest_end(skb, start);
+	return 0;
+
+nla_failure:
+	nla_nest_cancel(skb, start);
+	return err;
+}
+
+static size_t tls_get_info_size(const struct sock *sk)
+{
+	size_t size = 0;
+
+	size += nla_total_size(0) +		/* INET_ULP_INFO_TLS */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_VERSION */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_CIPHER */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_RXCONF */
+		nla_total_size(sizeof(u16)) +	/* TLS_INFO_TXCONF */
+		0;
+
+	return size;
+}
+
 void tls_register_device(struct tls_device *device)
 {
 	spin_lock_bh(&device_spinlock);
@@ -859,6 +913,8 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
 	.owner			= THIS_MODULE,
 	.init			= tls_init,
 	.update			= tls_update,
+	.get_info		= tls_get_info,
+	.get_info_size		= tls_get_info_size,
 };
 
 static int __init tls_register(void)
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 2/3] tcp: ulp: add functions to dump ulp-specific information
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev
In-Reply-To: <cover.1565882584.git.dcaratti@redhat.com>

currently, only getsockopt(TCP_ULP) can be invoked to know if a ULP is on
top of a TCP socket. Extend idiag_get_aux() and idiag_get_aux_size(),
introduced by commit b37e88407c1d ("inet_diag: allow protocols to provide
additional data"), to report the ULP name and other information that can
be made available by the ULP through optional functions.

Users having CAP_NET_ADMIN privileges will then be able to retrieve this
information through inet_diag_handler, if they specify INET_DIAG_INFO in
the request.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 include/net/tcp.h              |  3 ++
 include/uapi/linux/inet_diag.h |  8 +++++
 net/ipv4/tcp_diag.c            | 56 +++++++++++++++++++++++++++++++++-
 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 77fe87f7a992..c9a3f9688223 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2122,6 +2122,9 @@ struct tcp_ulp_ops {
 	void (*update)(struct sock *sk, struct proto *p);
 	/* cleanup ulp */
 	void (*release)(struct sock *sk);
+	/* diagnostic */
+	int (*get_info)(const struct sock *sk, struct sk_buff *skb);
+	size_t (*get_info_size)(const struct sock *sk);
 
 	char		name[TCP_ULP_NAME_MAX];
 	struct module	*owner;
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index e8baca85bac6..e2c6273274f3 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -153,11 +153,19 @@ enum {
 	INET_DIAG_BBRINFO,	/* request as INET_DIAG_VEGASINFO */
 	INET_DIAG_CLASS_ID,	/* request as INET_DIAG_TCLASS */
 	INET_DIAG_MD5SIG,
+	INET_DIAG_ULP_INFO,
 	__INET_DIAG_MAX,
 };
 
 #define INET_DIAG_MAX (__INET_DIAG_MAX - 1)
 
+enum {
+	INET_ULP_INFO_UNSPEC,
+	INET_ULP_INFO_NAME,
+	__INET_ULP_INFO_MAX,
+};
+#define INET_ULP_INFO_MAX (__INET_ULP_INFO_MAX - 1)
+
 /* INET_DIAG_MEM */
 
 struct inet_diag_meminfo {
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index a3a386236d93..1cec262ac8eb 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -81,13 +81,42 @@ static int tcp_diag_put_md5sig(struct sk_buff *skb,
 }
 #endif
 
+static int tcp_diag_put_ulp(struct sk_buff *skb, struct sock *sk,
+			    const struct tcp_ulp_ops *ulp_ops)
+{
+	struct nlattr *nest;
+	int err;
+
+	nest = nla_nest_start_noflag(skb, INET_DIAG_ULP_INFO);
+	if (!nest)
+		return -EMSGSIZE;
+
+	err = nla_put_string(skb, INET_ULP_INFO_NAME, ulp_ops->name);
+	if (err)
+		goto nla_failure;
+
+	if (ulp_ops->get_info)
+		err = ulp_ops->get_info(sk, skb);
+	if (err)
+		goto nla_failure;
+
+	nla_nest_end(skb, nest);
+	return 0;
+
+nla_failure:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
 static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 			    struct sk_buff *skb)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
+	int err = 0;
+
 #ifdef CONFIG_TCP_MD5SIG
 	if (net_admin) {
 		struct tcp_md5sig_info *md5sig;
-		int err = 0;
 
 		rcu_read_lock();
 		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
@@ -99,11 +128,23 @@ static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
 	}
 #endif
 
+	if (net_admin) {
+		const struct tcp_ulp_ops *ulp_ops;
+
+		rcu_read_lock();
+		ulp_ops = icsk->icsk_ulp_ops;
+		if (ulp_ops)
+			err = tcp_diag_put_ulp(skb, sk, ulp_ops);
+		rcu_read_unlock();
+		if (err)
+			return err;
+	}
 	return 0;
 }
 
 static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	size_t size = 0;
 
 #ifdef CONFIG_TCP_MD5SIG
@@ -124,6 +165,19 @@ static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 	}
 #endif
 
+	if (net_admin) {
+		const struct tcp_ulp_ops *ulp_ops;
+
+		rcu_read_lock();
+		ulp_ops = icsk->icsk_ulp_ops;
+		if (ulp_ops) {
+			size += nla_total_size(0) +
+				nla_total_size(TCP_ULP_NAME_MAX);
+			if (ulp_ops->get_info_size)
+				size += ulp_ops->get_info_size(sk);
+		}
+		rcu_read_unlock();
+	}
 	return size;
 }
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 1/3] net/tls: use RCU protection on icsk->icsk_ulp_data
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev
In-Reply-To: <cover.1565882584.git.dcaratti@redhat.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>

We need to make sure context does not get freed while diag
code is interrogating it. Free struct tls_context with
kfree_rcu().

We add the __rcu annotation directly in icsk, and cast it
away in the datapath accessor. Presumably all ULPs will
do a similar thing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/inet_connection_sock.h |  2 +-
 include/net/tls.h                  |  9 +++++++--
 net/core/sock_map.c                |  2 +-
 net/tls/tls_device.c               |  2 +-
 net/tls/tls_main.c                 | 31 +++++++++++++++++++++++-------
 5 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c57d53e7e02c..895546058a20 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -97,7 +97,7 @@ struct inet_connection_sock {
 	const struct tcp_congestion_ops *icsk_ca_ops;
 	const struct inet_connection_sock_af_ops *icsk_af_ops;
 	const struct tcp_ulp_ops  *icsk_ulp_ops;
-	void			  *icsk_ulp_data;
+	void __rcu		  *icsk_ulp_data;
 	void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
 	struct hlist_node         icsk_listen_portaddr_node;
 	unsigned int		  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
diff --git a/include/net/tls.h b/include/net/tls.h
index 41b2d41bb1b8..4997742475cd 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -41,6 +41,7 @@
 #include <linux/tcp.h>
 #include <linux/skmsg.h>
 #include <linux/netdevice.h>
+#include <linux/rcupdate.h>
 
 #include <net/tcp.h>
 #include <net/strparser.h>
@@ -290,6 +291,7 @@ struct tls_context {
 
 	struct list_head list;
 	refcount_t refcount;
+	struct rcu_head rcu;
 };
 
 enum tls_offload_ctx_dir {
@@ -348,7 +350,7 @@ struct tls_offload_context_rx {
 #define TLS_OFFLOAD_CONTEXT_SIZE_RX					\
 	(sizeof(struct tls_offload_context_rx) + TLS_DRIVER_STATE_SIZE_RX)
 
-void tls_ctx_free(struct tls_context *ctx);
+void tls_ctx_free(struct sock *sk, struct tls_context *ctx);
 int wait_on_pending_writer(struct sock *sk, long *timeo);
 int tls_sk_query(struct sock *sk, int optname, char __user *optval,
 		int __user *optlen);
@@ -467,7 +469,10 @@ static inline struct tls_context *tls_get_ctx(const struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
-	return icsk->icsk_ulp_data;
+	/* Use RCU on icsk_ulp_data only for sock diag code,
+	 * TLS data path doesn't need rcu_dereference().
+	 */
+	return (__force void *)icsk->icsk_ulp_data;
 }
 
 static inline void tls_advance_record_sn(struct sock *sk,
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 1330a7442e5b..01998860afaa 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -345,7 +345,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 		return -EINVAL;
 	if (unlikely(idx >= map->max_entries))
 		return -E2BIG;
-	if (unlikely(icsk->icsk_ulp_data))
+	if (unlikely(rcu_access_pointer(icsk->icsk_ulp_data)))
 		return -EINVAL;
 
 	link = sk_psock_init_link();
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index d184230665eb..436df5b4bb60 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -61,7 +61,7 @@ static void tls_device_free_ctx(struct tls_context *ctx)
 	if (ctx->rx_conf == TLS_HW)
 		kfree(tls_offload_ctx_rx(ctx));
 
-	tls_ctx_free(ctx);
+	tls_ctx_free(NULL, ctx);
 }
 
 static void tls_device_gc_task(struct work_struct *work)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 9cbbae606ced..04829bef514c 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -251,14 +251,31 @@ static void tls_write_space(struct sock *sk)
 	ctx->sk_write_space(sk);
 }
 
-void tls_ctx_free(struct tls_context *ctx)
+/**
+ * tls_ctx_free() - free TLS ULP context
+ * @sk:  socket to with @ctx is attached
+ * @ctx: TLS context structure
+ *
+ * Free TLS context. If @sk is %NULL caller guarantees that the socket
+ * to which @ctx was attached has no outstanding references.
+ */
+void tls_ctx_free(struct sock *sk, struct tls_context *ctx)
 {
+	struct inet_connection_sock *icsk;
+
 	if (!ctx)
 		return;
 
 	memzero_explicit(&ctx->crypto_send, sizeof(ctx->crypto_send));
 	memzero_explicit(&ctx->crypto_recv, sizeof(ctx->crypto_recv));
-	kfree(ctx);
+
+	if (sk) {
+		icsk = inet_csk(sk);
+		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
+		kfree_rcu(ctx, rcu);
+	} else {
+		kfree(ctx);
+	}
 }
 
 static void tls_sk_proto_cleanup(struct sock *sk,
@@ -306,7 +323,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 
 	write_lock_bh(&sk->sk_callback_lock);
 	if (free_ctx)
-		icsk->icsk_ulp_data = NULL;
+		rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
 	sk->sk_prot = ctx->sk_proto;
 	write_unlock_bh(&sk->sk_callback_lock);
 	release_sock(sk);
@@ -319,7 +336,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
 	ctx->sk_proto_close(sk, timeout);
 
 	if (free_ctx)
-		tls_ctx_free(ctx);
+		tls_ctx_free(sk, ctx);
 }
 
 static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
@@ -608,7 +625,7 @@ static struct tls_context *create_ctx(struct sock *sk)
 	if (!ctx)
 		return NULL;
 
-	icsk->icsk_ulp_data = ctx;
+	rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
 	ctx->setsockopt = sk->sk_prot->setsockopt;
 	ctx->getsockopt = sk->sk_prot->getsockopt;
 	ctx->sk_proto_close = sk->sk_prot->close;
@@ -649,8 +666,8 @@ static void tls_hw_sk_destruct(struct sock *sk)
 
 	ctx->sk_destruct(sk);
 	/* Free ctx */
-	tls_ctx_free(ctx);
-	icsk->icsk_ulp_data = NULL;
+	tls_ctx_free(sk, ctx);
+	rcu_assign_pointer(icsk->icsk_ulp_data, NULL);
 }
 
 static int tls_hw_prot(struct sock *sk)
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 0/3] net: tls: add socket diag
From: Davide Caratti @ 2019-08-15 16:00 UTC (permalink / raw)
  To: Boris Pismenny, Jakub Kicinski, John Fastabend, Dave Watson,
	Aviad Yehezkel, David S. Miller, netdev

The current kernel does not provide any diagnostic tool, except
getsockopt(TCP_ULP), to know more about TCP sockets that have an upper
layer protocol (ULP) on top of them. This series extends the set of
information exported by INET_DIAG_INFO, to include data that are specific
to the ULP (and that might be meaningful for debug/testing purposes).

patch 1/3 ensures that the control plane reads/updates ULP specific data
using RCU.

patch 2/3 extends INET_DIAG_INFO and allows knowing the ULP name for
each TCP socket that has done setsockopt(TCP_ULP) successfully.

patch 3/3 extends kTLS to let programs like 'ss' know the protocol
version and the cipher in use.

Changes since RFC:
- some coding style fixes, thanks to Jakub Kicinski
- add X_UNSPEC as lowest value of uAPI enums, thanks to Jakub Kicinski
- fix assignment of struct nlattr *start, thanks to Jakub Kicinski
- let tls dump RXCONF and TXCONF, suggested by Jakub Kicinski
- don't dump anything if TLS version or cipher are 0 (but still return a
  constant size in get_aux_size()), thanks to Boris Pismenny
- constify first argument of get_info() and get_size()
- use RCU to access access ulp_ops, like it's done for ca_ops
- add patch 1/3, from Jakub Kicinski

Davide Caratti (2):
  tcp: ulp: add functions to dump ulp-specific information
  net: tls: export protocol version, cipher, tx_conf/rx_conf to socket
    diag

Jakub Kicinski (1):
  net/tls: use RCU protection on icsk->icsk_ulp_data

 include/net/inet_connection_sock.h |  2 +-
 include/net/tcp.h                  |  3 ++
 include/net/tls.h                  | 28 +++++++++-
 include/uapi/linux/inet_diag.h     |  9 ++++
 include/uapi/linux/tls.h           | 15 ++++++
 net/core/sock_map.c                |  2 +-
 net/ipv4/tcp_diag.c                | 56 ++++++++++++++++++-
 net/tls/tls_device.c               |  2 +-
 net/tls/tls_main.c                 | 87 +++++++++++++++++++++++++++---
 9 files changed, 191 insertions(+), 13 deletions(-)

-- 
2.20.1


^ permalink raw reply

* Re: [PATCH net-next 2/2] r8169: use the generic EEE management functions
From: Heiner Kallweit @ 2019-08-15 16:02 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <24146e48-c498-d13a-8c12-76519455d0d4@gmail.com>

On 15.08.2019 17:43, Florian Fainelli wrote:
> 
> 
> On 8/15/2019 6:02 AM, Heiner Kallweit wrote:
>> On 15.08.2019 14:35, Andrew Lunn wrote:
>>> On Thu, Aug 15, 2019 at 11:47:33AM +0200, Heiner Kallweit wrote:
>>>> Now that the Realtek PHY driver maps the vendor-specific EEE registers
>>>> to the standard MMD registers, we can remove all special handling and
>>>> use the generic functions phy_ethtool_get/set_eee.
>>>
>>> Hi Heiner
>>>
>> Hi Andrew,
>>
>>> I think you should also add a call the phy_init_eee()?
>>>
>> I think it's not strictly needed. And few things regarding
>> phy_init_eee are not fully clear to me:
>>
>> - When is it supposed to be called? Before each call to
>>   phy_ethtool_set_eee? Or once in the drivers init path?
>>
>> - The name is a little bit misleading as it's mainly a
>>   validity check. An actual "init" is done only if
>>   parameter clk_stop_enable is set.
>>
>> - It returns -EPROTONOSUPPORT if at least one link partner
>>   doesn't advertise EEE for current speed/duplex. To me this
>>   seems to be too restrictive. Example:
>>   We're at 1Gbps/full and link partner advertises EEE for
>>   100Mbps only. Then phy_init_eee returns -EPROTONOSUPPORT.
>>   This keeps me from controlling 100Mbps EEE advertisement.  
> 
> That function needs a complete rework, it does not say what its name
> implies, and there is an assumption that you have already locally
> advertised EEE for it to work properly, that does not make any sense
> since the whole purpose is to see whether EEE can/will be active with
> the link partner (that's how I read it at least).
> 
> Regarding whether the clock stop enable can be turned on or off is also
> a bit suspicious, because a MAC driver does not know whether the PHY
> supports doing that, I had started something in that area years ago:
> 
> https://github.com/ffainelli/linux/commits/phy-eee-tx-clk
> 
Not related to this patch, but to EEE support in general:

There's something in the back of my mind to create linkmodes for all
EEE modes. They could be used with the normal supported, advertising,
and lp_advertising bitmaps. Means:
- extend genphy_read_abilities to read supported EEE modes
- extend genphy_read_status to read lp-advertised EEE modes
- let genphy_config_aneg handle EEE advertising
- ..
This should help to make EEE mode handling consistent with link mode
handling.
Also still missing is support for the new C45 registers for 2.5Gbps and
5Gbps EEE (3.21, 7.62, 7.63).

Open for me is how to deal best with the scenario that older PHY's
use the C22 MMD registers for proprietary purposes. Writing to the
MMD address register then may cause misbehavior of the PHY (that was
the case for RTL8211B), and MMD reads may return random data.
Maybe we need a flag to explicitly state whether MMD is supported
or not.

Heiner









^ permalink raw reply

* Re: [PATCH net-next 1/1] Added BASE-T1 PHY support to PHY Subsystem
From: Andrew Lunn @ 2019-08-15 15:56 UTC (permalink / raw)
  To: Christian Herber
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190815153209.21529-2-christian.herber@nxp.com>

On Thu, Aug 15, 2019 at 03:32:29PM +0000, Christian Herber wrote:
> BASE-T1 is a category of Ethernet PHYs.
> They use a single copper pair for transmission.
> This patch add basic support for this category of PHYs.
> It coveres the discovery of abilities and basic configuration.
> It includes setting fixed speed and enabling auto-negotiation.
> BASE-T1 devices should always Clause-45 managed.
> Therefore, this patch extends phy-c45.c.
> While for some functions like auto-neogtiation different registers are
> used, the layout of these registers is the same for the used fields.
> Thus, much of the logic of basic Clause-45 devices can be reused.
> 
> Signed-off-by: Christian Herber <christian.herber@nxp.com>
> ---
>  drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>  drivers/net/phy/phy-core.c   |   4 +-
>  include/uapi/linux/ethtool.h |   2 +
>  include/uapi/linux/mdio.h    |  21 +++++++
>  4 files changed, 129 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
> index b9d4145781ca..9ff0b8c785de 100644
> --- a/drivers/net/phy/phy-c45.c
> +++ b/drivers/net/phy/phy-c45.c
> @@ -8,13 +8,23 @@
>  #include <linux/mii.h>
>  #include <linux/phy.h>
>  
> +#define IS_100BASET1(phy) (linkmode_test_bit( \
> +			   ETHTOOL_LINK_MODE_100baseT1_Full_BIT, \
> +			   (phy)->supported))
> +#define IS_1000BASET1(phy) (linkmode_test_bit( \
> +			    ETHTOOL_LINK_MODE_1000baseT1_Full_BIT, \
> +			    (phy)->supported))

Hi Christian

We already have the flag phydev->is_gigabit_capable. Maybe add a flag
phydev->is_t1_capable

> +
> +static u32 get_aneg_ctrl(struct phy_device *phydev);
> +static u32 get_aneg_stat(struct phy_device *phydev);

No forward declarations please. Put the code in the right order so
they are not needed.

Thanks

     Andrew

^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: William Tu @ 2019-08-15 15:46 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Linux Kernel Network Developers, bpf
In-Reply-To: <20190813120558.6151-1-toshiaki.makita1@gmail.com>

On Tue, Aug 13, 2019 at 5:07 AM Toshiaki Makita
<toshiaki.makita1@gmail.com> wrote:
>
> This is a rough PoC for an idea to offload TC flower to XDP.
>
>
> * Motivation
>
> The purpose is to speed up software TC flower by using XDP.
>
> I chose TC flower because my current interest is in OVS. OVS uses TC to
> offload flow tables to hardware, so if TC can offload flows to XDP, OVS
> also can be offloaded to XDP.
>
> When TC flower filter is offloaded to XDP, the received packets are
> handled by XDP first, and if their protocol or something is not
> supported by the eBPF program, the program returns XDP_PASS and packets
> are passed to upper layer TC.
>
> The packet processing flow will be like this when this mechanism,
> xdp_flow, is used with OVS.
>
>  +-------------+
>  | openvswitch |
>  |    kmod     |
>  +-------------+
>         ^
>         | if not match in filters (flow key or action not supported by TC)
>  +-------------+
>  |  TC flower  |
>  +-------------+
>         ^
>         | if not match in flow tables (flow key or action not supported by XDP)
>  +-------------+
>  |  XDP prog   |
>  +-------------+
>         ^
>         | incoming packets
>
I like this idea, some comments about the OVS AF_XDP work.

Another way when using OVS AF_XDP is to serve as slow path of TC flow
HW offload.
For example:

 Userspace OVS datapath (The one used by OVS-DPDK)
     ^
      |
  +------------------------------+
  |  OVS AF_XDP netdev |
  +------------------------------+
         ^
         | if not supported or not match in flow tables
  +---------------------+
  |  TC HW flower  |
  +---------------------+
         ^
         | incoming packets

So in this case it's either TC HW flower offload, or the userspace PMD OVS.
Both cases should be pretty fast.

I think xdp_flow can also be used by OVS AF_XDP netdev, sitting between
TC HW flower and OVS AF_XDP netdev.
Before the XDP program sending packet to AF_XDP socket, the
xdp_flow can execute first, and if not match, then send to AF_XDP.
So in your patch set, implement s.t like
  bpf_redirect_map(&xsks_map, index, 0);

Another thing is that at each layer we are doing its own packet parsing.
From your graph, first parse at XDP program, then at TC flow, then at
openvswitch kmod.
I wonder if we can reuse some parsing result.

Regards,
William

> This is useful especially when the device does not support HW-offload.
> Such interfaces include virtual interfaces like veth.
>
>
> * How to use
>
> It only supports ingress (clsact) flower filter at this point.
> Enable the feature via ethtool before adding ingress/clsact qdisc.
>
>  $ ethtool -K eth0 tc-offload-xdp on
>
> Then add qdisc/filters as normal.
>
>  $ tc qdisc add dev eth0 clsact
>  $ tc filter add dev eth0 ingress protocol ip flower skip_sw ...
>
> Alternatively, when using OVS, adding qdisc and filters will be
> automatically done by setting hw-offload.
>
>  $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>  $ systemctl stop openvswitch
>  $ tc qdisc del dev eth0 ingress # or reboot
>  $ ethtool -K eth0 tc-offload-xdp on
>  $ systemctl start openvswitch
>
>
> * Performance
>
> I measured drop rate at veth interface with redirect action from physical
> interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon Silver 4114
> (2.20 GHz).
>                                                                  XDP_DROP
>                     +------+                        +-------+    +-------+
>  pktgen -- wire --> | eth0 | -- TC/OVS redirect --> | veth0 |----| veth1 |
>                     +------+   (offloaded to XDP)   +-------+    +-------+
>
> The setup for redirect is done by OVS like this.
>
>  $ ovs-vsctl add-br ovsbr0
>  $ ovs-vsctl add-port ovsbr0 eth0
>  $ ovs-vsctl add-port ovsbr0 veth0
>  $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>  $ systemctl stop openvswitch
>  $ tc qdisc del dev eth0 ingress
>  $ tc qdisc del dev veth0 ingress
>  $ ethtool -K eth0 tc-offload-xdp on
>  $ ethtool -K veth0 tc-offload-xdp on
>  $ systemctl start openvswitch
>
> Tested single core/single flow with 3 configurations.
> - xdp_flow: hw-offload=true, tc-offload-xdp on
> - TC:       hw-offload=true, tc-offload-xdp off (software TC)
> - ovs kmod: hw-offload=false
>
>  xdp_flow  TC        ovs kmod
>  --------  --------  --------
>  4.0 Mpps  1.1 Mpps  1.1 Mpps
>
> So xdp_flow drop rate is roughly 4x faster than software TC or ovs kmod.
>
> OTOH the time to add a flow increases with xdp_flow.
>
> ping latency of first packet when veth1 does XDP_PASS instead of DROP:
>
>  xdp_flow  TC        ovs kmod
>  --------  --------  --------
>  25ms      12ms      0.6ms
>
> xdp_flow does a lot of work to emulate TC behavior including UMH
> transaction and multiple bpf map update from UMH which I think increases
> the latency.
>
>
> * Implementation
>
> xdp_flow makes use of UMH to load an eBPF program for XDP, similar to
> bpfilter. The difference is that xdp_flow does not generate the eBPF
> program dynamically but a prebuilt program is embedded in UMH. This is
> mainly because flow insertion is considerably frequent. If we generate
> and load an eBPF program on each insertion of a flow, the latency of the
> first packet of ping in above test will incease, which I want to avoid.
>
>                          +----------------------+
>                          |    xdp_flow_umh      | load eBPF prog for XDP
>                          | (eBPF prog embedded) | update maps for flow tables
>                          +----------------------+
>                                    ^ |
>                            request | v eBPF prog id
>  +-----------+  offload  +-----------------------+
>  | TC flower | --------> |    xdp_flow kmod      | attach the prog to XDP
>  +-----------+           | (flow offload driver) |
>                          +-----------------------+
>
> - When ingress/clsact qdisc is created, i.e. a device is bound to a flow
>   block, xdp_flow kmod requests xdp_flow_umh to load eBPF prog.
>   xdp_flow_umh returns prog id and xdp_flow kmod attach the prog to XDP
>   (the reason of attaching XDP from kmod is that rtnl_lock is held here).
>
> - When flower filter is added, xdp_flow kmod requests xdp_flow_umh to
>   update maps for flow tables.
>
>
> * Patches
>
> - patch 1
>  Basic framework for xdp_flow kmod and UMH.
>
> - patch 2
>  Add prebuilt eBPF program embedded in UMH.
>
> - patch 3, 4
>  Attach the prog to XDP in kmod after using the prog id returned from
>  UMH.
>
> - patch 5, 6
>  Add maps for flow tables and flow table manipulation logic in UMH.
>
> - patch 7
>  Implement flow lookup and basic actions in eBPF prog.
>
> - patch 8
>  Implement flow manipulation logic, serialize flow key and actions from
>  TC flower and make requests to UMH in kmod.
>
> - patch 9
>  Add tc-offload-xdp netdev feature and hooks to call xdp_flow kmod in
>  TC flower offload code.
>
> - patch 10, 11
>  Add example actions, redirect and vlan_push.
>
> - patch 12
>  Add testcase for xdp_flow.
>
> - patch 13, 14
>  These are unrelated patches. They just improves XDP program's
>  performance. They are included to demonstrate to what extent xdp_flow
>  performance can increase. Without them, drop rate goes down from 4Mpps
>  to 3Mpps.
>
>
> * About OVS AF_XDP netdev
>
> Recently OVS has added AF_XDP netdev type support. This also makes use
> of XDP, but in some ways different from this patch set.
>
> - AF_XDP work originally started in order to bring BPF's flexibility to
>   OVS, which enables us to upgrade datapath without updating kernel.
>   AF_XDP solution uses userland datapath so it achieved its goal.
>   xdp_flow will not replace OVS datapath completely, but offload it
>   partially just for speed up.
>
> - OVS AF_XDP requires PMD for the best performance so consumes 100% CPU.
>
> - OVS AF_XDP needs packet copy when forwarding packets.
>
> - xdp_flow can be used not only for OVS. It works for direct use of TC
>   flower. nftables also can be offloaded by the same mechanism in the
>   future.
>
>
> * About alternative userland (ovs-vswitchd etc.) implementation
>
> Maybe a similar logic can be implemented in ovs-vswitchd offload
> mechanism, instead of adding code to kernel. I just thought offloading
> TC is more generic and allows wider usage with direct TC command.
>
> For example, considering that OVS inserts a flow to kernel only when
> flow miss happens in kernel, we can in advance add offloaded flows via
> tc filter to avoid flow insertion latency for certain sensitive flows.
> TC flower usage without using OVS is also possible.
>
> Also as written above nftables can be offloaded to XDP with this
> mechanism as well.
>
>
> * Note
>
> This patch set is based on top of commit a664a834579a ("tools: bpftool:
> fix reading from /proc/config.gz").
>
> Any feedback is welcome.
> Thanks!
>
> Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
>
> Toshiaki Makita (14):
>   xdp_flow: Add skeleton of XDP based TC offload driver
>   xdp_flow: Add skeleton bpf program for XDP
>   bpf: Add API to get program from id
>   xdp_flow: Attach bpf prog to XDP in kernel after UMH loaded program
>   xdp_flow: Prepare flow tables in bpf
>   xdp_flow: Add flow entry insertion/deletion logic in UMH
>   xdp_flow: Add flow handling and basic actions in bpf prog
>   xdp_flow: Implement flow replacement/deletion logic in xdp_flow kmod
>   xdp_flow: Add netdev feature for enabling TC flower offload to XDP
>   xdp_flow: Implement redirect action
>   xdp_flow: Implement vlan_push action
>   bpf, selftest: Add test for xdp_flow
>   i40e: prefetch xdp->data before running XDP prog
>   bpf, hashtab: Compare keys in long
>
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c  |    1 +
>  include/linux/bpf.h                          |    6 +
>  include/linux/netdev_features.h              |    2 +
>  include/linux/netdevice.h                    |    4 +
>  include/net/flow_offload_xdp.h               |   33 +
>  include/net/pkt_cls.h                        |    5 +
>  include/net/sch_generic.h                    |    1 +
>  kernel/bpf/hashtab.c                         |   27 +-
>  kernel/bpf/syscall.c                         |   26 +-
>  net/Kconfig                                  |    1 +
>  net/Makefile                                 |    1 +
>  net/core/dev.c                               |   13 +-
>  net/core/ethtool.c                           |    1 +
>  net/sched/cls_api.c                          |   67 +-
>  net/xdp_flow/.gitignore                      |    1 +
>  net/xdp_flow/Kconfig                         |   16 +
>  net/xdp_flow/Makefile                        |  112 +++
>  net/xdp_flow/msgfmt.h                        |  102 +++
>  net/xdp_flow/umh_bpf.h                       |   34 +
>  net/xdp_flow/xdp_flow_core.c                 |  126 ++++
>  net/xdp_flow/xdp_flow_kern_bpf.c             |  358 +++++++++
>  net/xdp_flow/xdp_flow_kern_bpf_blob.S        |    7 +
>  net/xdp_flow/xdp_flow_kern_mod.c             |  645 ++++++++++++++++
>  net/xdp_flow/xdp_flow_umh.c                  | 1034 ++++++++++++++++++++++++++
>  net/xdp_flow/xdp_flow_umh_blob.S             |    7 +
>  tools/testing/selftests/bpf/Makefile         |    1 +
>  tools/testing/selftests/bpf/test_xdp_flow.sh |  103 +++
>  27 files changed, 2716 insertions(+), 18 deletions(-)
>  create mode 100644 include/net/flow_offload_xdp.h
>  create mode 100644 net/xdp_flow/.gitignore
>  create mode 100644 net/xdp_flow/Kconfig
>  create mode 100644 net/xdp_flow/Makefile
>  create mode 100644 net/xdp_flow/msgfmt.h
>  create mode 100644 net/xdp_flow/umh_bpf.h
>  create mode 100644 net/xdp_flow/xdp_flow_core.c
>  create mode 100644 net/xdp_flow/xdp_flow_kern_bpf.c
>  create mode 100644 net/xdp_flow/xdp_flow_kern_bpf_blob.S
>  create mode 100644 net/xdp_flow/xdp_flow_kern_mod.c
>  create mode 100644 net/xdp_flow/xdp_flow_umh.c
>  create mode 100644 net/xdp_flow/xdp_flow_umh_blob.S
>  create mode 100755 tools/testing/selftests/bpf/test_xdp_flow.sh
>
> --
> 1.8.3.1
>

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] r8169: use the generic EEE management functions
From: Florian Fainelli @ 2019-08-15 15:45 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <14f2831d-5f89-1345-5674-b25f7d95255f@gmail.com>



On 8/15/2019 5:14 AM, Heiner Kallweit wrote:
> Now that the Realtek PHY driver maps the vendor-specific EEE registers
> to the standard MMD registers, we can remove all special handling and
> use the generic functions phy_ethtool_get/set_eee.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 0/1] Add BASE-T1 PHY support
From: Andrew Lunn @ 2019-08-15 15:43 UTC (permalink / raw)
  To: Christian Herber
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florian Fainelli, Heiner Kallweit
In-Reply-To: <20190815153209.21529-1-christian.herber@nxp.com>

On Thu, Aug 15, 2019 at 03:32:27PM +0000, Christian Herber wrote:
> This patch adds basic support for BASE-T1 PHYs in the framework.
> BASE-T1 PHYs main area of application are automotive and industrial.
> BASE-T1 is standardized in IEEE 802.3, namely
> - IEEE 802.3bw: 100BASE-T1
> - IEEE 802.3bp 1000BASE-T1
> - IEEE 802.3cg: 10BASE-T1L and 10BASE-T1S

Hi Christian

Please make sure you Cc: the PHY subsystem maintainers.

       Andrew

^ permalink raw reply

* Re: [PATCH net-next 2/2] r8169: use the generic EEE management functions
From: Florian Fainelli @ 2019-08-15 15:43 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <bfd67eb3-0da7-b8a5-928a-a66802185b68@gmail.com>



On 8/15/2019 6:02 AM, Heiner Kallweit wrote:
> On 15.08.2019 14:35, Andrew Lunn wrote:
>> On Thu, Aug 15, 2019 at 11:47:33AM +0200, Heiner Kallweit wrote:
>>> Now that the Realtek PHY driver maps the vendor-specific EEE registers
>>> to the standard MMD registers, we can remove all special handling and
>>> use the generic functions phy_ethtool_get/set_eee.
>>
>> Hi Heiner
>>
> Hi Andrew,
> 
>> I think you should also add a call the phy_init_eee()?
>>
> I think it's not strictly needed. And few things regarding
> phy_init_eee are not fully clear to me:
> 
> - When is it supposed to be called? Before each call to
>   phy_ethtool_set_eee? Or once in the drivers init path?
> 
> - The name is a little bit misleading as it's mainly a
>   validity check. An actual "init" is done only if
>   parameter clk_stop_enable is set.
> 
> - It returns -EPROTONOSUPPORT if at least one link partner
>   doesn't advertise EEE for current speed/duplex. To me this
>   seems to be too restrictive. Example:
>   We're at 1Gbps/full and link partner advertises EEE for
>   100Mbps only. Then phy_init_eee returns -EPROTONOSUPPORT.
>   This keeps me from controlling 100Mbps EEE advertisement.  

That function needs a complete rework, it does not say what its name
implies, and there is an assumption that you have already locally
advertised EEE for it to work properly, that does not make any sense
since the whole purpose is to see whether EEE can/will be active with
the link partner (that's how I read it at least).

Regarding whether the clock stop enable can be turned on or off is also
a bit suspicious, because a MAC driver does not know whether the PHY
supports doing that, I had started something in that area years ago:

https://github.com/ffainelli/linux/commits/phy-eee-tx-clk
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] net: phy: realtek: add support for EEE registers on integrated PHY's
From: Florian Fainelli @ 2019-08-15 15:39 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <b9d96a3b-8301-fb4f-c7f5-911c964c15cf@gmail.com>



On 8/15/2019 5:12 AM, Heiner Kallweit wrote:
> EEE-related registers on newer integrated PHY's have the standard
> layout, but are accessible not via MMD but via vendor-specific
> registers. Emulating the standard MMD registers allows to use the
> generic functions for EEE control.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox