Netdev List
 help / color / mirror / Atom feed
* Re: [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()
From: Kalle Valo @ 2016-11-16 18:49 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Eugene Krasnikov, Kalle Valo, Andy Gross, wcn36xx, linux-wireless,
	netdev, linux-kernel, linux-arm-msm
In-Reply-To: <1479190014-11297-1-git-send-email-bjorn.andersson@linaro.org>

Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> The correct include file for getting errno constants and ERR_PTR() is
> linux/err.h, rather than linux/errno.h, so fix the include.
> 
> Fixes: e8b123e60084 ("soc: qcom: smem_state: Add stubs for disabled smem_state")
> Acked-by: Andy Gross <andy.gross@linaro.org>
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

For some reason this fails to compile now. Can you take a look, please?

ERROR: "qcom_wcnss_open_channel" [drivers/net/wireless/ath/wcn36xx/wcn36xx.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

5 patches set to Changes Requested.

9429045 [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()
9429047 [v5,2/5] wcn36xx: Transition driver to SMD client
9429037 [v5,3/5] wcn36xx: Implement firmware assisted scan
9429043 [v5,4/5] wcn36xx: Implement print_reg indication
9429023 [v5,5/5] wcn36xx: Don't use the destroyed hal_mutex

-- 
https://patchwork.kernel.org/patch/9429045/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: Implenent ethtool::nway_reset for a few drivers
From: David Miller @ 2016-11-16 18:44 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, tremyfr
In-Reply-To: <20161115191949.15361-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 15 Nov 2016 11:19:44 -0800

> This patch series depends on "net: phy: Centralize auto-negotation restart"
> since it provides phy_ethtool_nway_reset as a helper function.
> 
> The drivers here already support PHYLIB, so there really is no reason why
> restarting auto-negotiation would not be possible with these.

Series applied, thanks Florian.

^ permalink raw reply

* Re: [PATCH net][v2] bpf: fix range arithmetic for bpf map access
From: Jann Horn @ 2016-11-16 18:41 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, netdev
In-Reply-To: <de657ab3-9d6c-a700-a593-53d5a4dcde75@fb.com>

On Tue, Nov 15, 2016 at 3:20 PM, Josef Bacik <jbacik@fb.com> wrote:
> On 11/15/2016 08:47 AM, Jann Horn wrote:
>> In states_equal():
>> if (rold->type == NOT_INIT ||
>>    (rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT))
>> <------------
>> continue;
>>
>> I think this is broken in code like the following:
>>
>> int value;
>> if (condition) {
>>   value = 1; // visited first by verifier
>> } else {
>>   value = 1000000; // visited second by verifier
>> }
>> int dummy = 1; // states seem to converge here, but actually don't
>> map[value] = 1234;
>>
>> `value` would be an UNKNOWN_VALUE for both paths, right? So
>> states_equal() would decide that the states converge after the
>> conditionally executed code?
>>
>
> Value would be CONST_IMM for both paths, and wouldn't match so they wouldn't
> converge.  I think I understood your question right, let me know if I'm
> addressing the wrong part of it.

Okay, true, but what if you load the values from a map and bounds-check them
instead of hardcoding them? Then they will be of type UNKNOWN_VALUE, right?
Like this:

int value = map[0];
if (condition) {
  value &= 0x1; // visited first by verifier
} else {
  // nothing; visited second by verifier
}
int dummy = 1; // states seem to converge here, but actually don't
map[value] = 1234;

And then `rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT` will be
true in the `dummy = 1` line, and the states converge. Am I missing something?

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: busy-poll: allow preemption and other optimizations
From: David Miller @ 2016-11-16 18:41 UTC (permalink / raw)
  To: edumazet
  Cc: netdev, willemb, abelay, zach.brown, tariqt, Yuval.Mintz,
	ariel.elior, eric.dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 15 Nov 2016 10:15:10 -0800

> It is time to have preemption points in sk_busy_loop() and improve
> its scalability.
> 
> Also napi_complete() and friends can tell drivers when it is safe to
> not re-enable device interrupts, saving some overhead under
> high busy polling.
> 
> mlx4 and bnx2x are changed accordingly, to show how this busy polling
> status can be exploited by drivers.
> 
> Next steps will implement Zach Brown suggestion, where NAPI polling
> would be enabled all the time for some chosen queues.
> This is needed for efficient epoll() support anyway.

Series applied, thanks a lot Eric.

^ permalink raw reply

* Re: [PATCH v2 0/5] net: thunderx: Miscellaneous fixes
From: David Miller @ 2016-11-16 18:28 UTC (permalink / raw)
  To: sunil.kovvuri; +Cc: netdev, linux-kernel, linux-arm-kernel, sgoutham
In-Reply-To: <1479211562-14170-1-git-send-email-sunil.kovvuri@gmail.com>

From: sunil.kovvuri@gmail.com
Date: Tue, 15 Nov 2016 17:35:56 +0530

> This patchset includes fixes for incorrect LMAC credits,
> unreliable driver statistics, memory leak upon interface
> down e.t.c
> 
> Changes from v1:
> - As suggested replaced bit shifting with BIT() macro
>   in the patch 'Fix configuration of L3/L4 length checking'.

Series applied, thanks.

^ permalink raw reply

* [Patch net] net: check dead netns for peernet2id_alloc()
From: Cong Wang @ 2016-11-16 18:27 UTC (permalink / raw)
  To: netdev; +Cc: avagin, Cong Wang, Nicolas Dichtel

Andrei reports we still allocate netns ID from idr after we destroy
it in cleanup_net().

cleanup_net():
  ...
  idr_destroy(&net->netns_ids);
  ...
  list_for_each_entry_reverse(ops, &pernet_list, list)
    ops_exit_list(ops, &net_exit_list);
      -> rollback_registered_many()
        -> rtmsg_ifinfo_build_skb()
         -> rtnl_fill_ifinfo()
           -> peernet2id_alloc()

After that point we should not even access net->netns_ids, we
should check the death of the current netns as early as we can in
peernet2id_alloc().

For net-next we can consider to avoid sending rtmsg totally,
it is a good optimization for netns teardown path.

Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Andrei Vagin <avagin@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/core/net_namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f61c0e0..7001da9 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -219,6 +219,8 @@ int peernet2id_alloc(struct net *net, struct net *peer)
 	bool alloc;
 	int id;
 
+	if (atomic_read(&net->count) == 0)
+		return NETNSA_NSID_NOT_ASSIGNED;
 	spin_lock_irqsave(&net->nsid_lock, flags);
 	alloc = atomic_read(&peer->count) == 0 ? false : true;
 	id = __peernet2id_alloc(net, peer, &alloc);
-- 
2.1.0

^ permalink raw reply related

* mlx4 BUG_ON in probe path
From: Bjorn Helgaas @ 2016-11-16 18:25 UTC (permalink / raw)
  To: Yishai Hadas; +Cc: netdev, linux-rdma, Johannes Thumshirn, linux-kernel

Hi Yishai,

Johannes has been working on an mlx4 initialization problem on an
IBM x3850 X6.  The underlying problem is a PCI core issue -- we're
setting RCB in the Mellanox device, which means it thinks it can
generate 128-byte Completions, even though the Root Port above it
can't handle them.  That issue is
https://bugzilla.kernel.org/show_bug.cgi?id=187781

The machine crashed when this happened, apparently not because of any
error reported via AER, but because mlx4 contains a BUG_ON, probably
the one in mlx4_enter_error_state().

That one happens if pci_channel_offline() returns false.  Is this
telling us about a problem in PCI error handling, or is it just a case
where mlx4 isn't as smart as it could be?

Ideally, if mlx4 can't initialize the device, it should just return an
error from the probe function instead of crashing the whole machine.

Here's the crash (the entire dmesg log is in the bugzilla above):

  mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
  mlx4_core 0000:41:00.0: device is going to be reset
  mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
  mlx4_core 0000:41:00.0: Fail to reset HCA
  ------------[ cut here ]------------
  kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!
  invalid opcode: 0000 [#1] SMP 
  Modules linked in: sr_mod(E) cdrom(E) uas(E) usb_storage(E) mlx4_core(E+) cdc_ether(E) usbnet(E) mii(E) joydev(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) drbg(E) ansi_cprng(E) aesni_intel(E) iTCO_wdt(E) aes_x86_64(E) igb(E) ipmi_devintf(E) iTCO_vendor_support(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) ptp(E) cryptd(E) pps_core(E) sb_edac(E) pcspkr(E) lpc_ich(E) ipmi_ssif(E) ioatdma(E) edac_core(E) shpchp(E) mfd_core(E) dca(E) wmi(E) ipmi_si(E) ipmi_msghandler(E) fjes(E) button(E) processor(E) acpi_pad(E) hid_generic(E) usbhid(E) ext4(E) crc16(E) jbd2(E) mbcache(E) sd_mod(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) xhci_pci(E) sysfillrect(E) ehci_pci(E) sysimgbl
 t(E)
   fb_sys_fops(E) xhci_hcd(E) ehci_hcd(E) ttm(E) usbcore(E) drm(E) usb_common(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
  Supported: Yes
  CPU: 27 PID: 2867 Comm: modprobe Tainted: G            E      4.4.21-default #6
  Hardware name: IBM x3850 X6 -[3837Z7P]-/00FN772, BIOS -[A8E120CUS-1.30]- 08/22/2016
  task: ffff881fb2ff9280 ti: ffff881fbd3c4000 task.ti: ffff881fbd3c4000
  RIP: 0010:[<ffffffffa0446740>]  [<ffffffffa0446740>] mlx4_enter_error_state+0x240/0x320 [mlx4_core]
  RSP: 0018:ffff881fbd3c79a0  EFLAGS: 00010246
  RAX: ffff8820b2486e00 RBX: ffff883fbe240000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff881fbf63b000
  RBP: ffff8820b2486e60 R08: 0000000000000029 R09: ffff88803feda50f
  R10: 00000000000d1b50 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000000 R14: ffff883fbe240460 R15: 00000000fffffffb
  FS:  00007f7c55203700(0000) GS:ffff883fbf900000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f1813c88000 CR3: 0000003fbe637000 CR4: 00000000001406e0
  Stack:
   15b30000c0000100 ffff883fbe240000 0000000000000fff 0000000000000000
   ffffffffa0447d54 000000000000ffff ffffffff00000000 000000000000ea60
   0000000000000000 000000000000ea60 ffffc90031dba680 ffff883fbe240000
  Call Trace:
   [<ffffffffa0447d54>] __mlx4_cmd+0x594/0x8a0 [mlx4_core]
   [<ffffffffa045191b>] mlx4_map_cmd+0x2ab/0x3c0 [mlx4_core]
   [<ffffffffa045a855>] mlx4_load_one+0x515/0x1220 [mlx4_core]
   [<ffffffffa045bb69>] mlx4_init_one+0x4e9/0x6a0 [mlx4_core]
   [<ffffffff8135626f>] local_pci_probe+0x3f/0xa0
   [<ffffffff81357694>] pci_device_probe+0xd4/0x120
   [<ffffffff8144d0b7>] driver_probe_device+0x1f7/0x420
   [<ffffffff8144d35b>] __driver_attach+0x7b/0x80
   [<ffffffff8144afc8>] bus_for_each_dev+0x58/0x90
   [<ffffffff8144c519>] bus_add_driver+0x1c9/0x280
   [<ffffffff8144dccb>] driver_register+0x5b/0xd0
   [<ffffffffa03f911a>] mlx4_init+0x11a/0x1000 [mlx4_core]
   [<ffffffff81002138>] do_one_initcall+0xc8/0x1f0
   [<ffffffff81182a08>] do_init_module+0x5a/0x1d7
   [<ffffffff81103726>] load_module+0x1366/0x1c50
   [<ffffffff811041c0>] SYSC_finit_module+0x70/0xa0
   [<ffffffff815e14ae>] entry_SYSCALL_64_fastpath+0x12/0x71

^ permalink raw reply

* Re: [net PATCH 0/2] ipv4: Fix memory leaks and reference issues in fib
From: David Miller @ 2016-11-16 18:25 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: netdev
In-Reply-To: <20161115104306.13711.67911.stgit@ahduyck-blue-test.jf.intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Tue, 15 Nov 2016 05:46:01 -0500

> This series fixes one major issue and one minor issue in the fib tables.
> 
> The major issue is that we had lost the functionality that was flushing the
> local table entries from main after we had unmerged the two tries.  In
> order to regain the functionality I have performed a partial revert and
> then moved the functionality for flushing the external entries from main
> into fib_unmerge.
> 
> The minor issue was a memory leak that could occur in the event that we
> weren't able to add an alias to the local trie resulting in the fib alias
> being leaked.

Series applied and queued up for -stable, thanks Alexander.

^ permalink raw reply

* Re: [PATCH net][v2] bpf: fix range arithmetic for bpf map access
From: David Miller @ 2016-11-16 18:22 UTC (permalink / raw)
  To: jbacik; +Cc: jannh, ast, daniel, netdev
In-Reply-To: <1479156336-6211-1-git-send-email-jbacik@fb.com>

From: Josef Bacik <jbacik@fb.com>
Date: Mon, 14 Nov 2016 15:45:36 -0500

> I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
> invalid accesses to bpf map entries.  Fix this up by doing a few things
> 
> 1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in real
> life and just adds extra complexity.
> 
> 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
> minimum value to 0 for positive AND's.
> 
> 3) Don't do operations on the ranges if they are set to the limits, as they are
> by definition undefined, and allowing arithmetic operations on those values
> could make them appear valid when they really aren't.
> 
> This fixes the testcase provided by Jann as well as a few other theoretical
> problems.
> 
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Florian Fainelli @ 2016-11-16 18:14 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, netdev,
	linux-kernel
In-Reply-To: <20161116171145.GE4864@localhost>

On 11/16/2016 09:11 AM, Johan Hovold wrote:
> On Wed, Nov 16, 2016 at 09:06:26AM -0800, Florian Fainelli wrote:
>> On 11/16/2016 06:47 AM, Johan Hovold wrote:
>>> Make sure to drop the reference taken by of_phy_find_device() when
>>> registering and deregistering the fixed-link PHY-device.
>>>
>>> Note that we need to put both references held at deregistration.
>>>
>>> Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
>>> speeds/duplex")
>>> Signed-off-by: Johan Hovold <johan@kernel.org>
>>> ---
>>>
>>> Hi,
>>>
>>> This is one has been compile tested only, but fixes a couple of leaks
>>> similar to one that was found in the cpsw driver for which I just posted
>>> a patch.
>>>
>>> It turns out all drivers but DSA fail to deregister the fixed-link PHYs
>>> registered by of_phy_register_fixed_link(). Due to the way this
>>> interface was designed, deregistering such a PHY is a bit cumbersome and
>>> looks like it would benefit from a common helper.
>>>
>>> However, perhaps the interface should instead be changed so that the PHY
>>> device is returned so that drivers do not need to use
>>> of_phy_find_device() when they need to access properties of the fixed
>>> link (e.g. as in dsu_cpu_dsa_setup below).
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Johan
>>>
>>>
>>>  net/dsa/dsa.c | 8 +++++++-
>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
>>> index a6902c1e2f28..798a6a776a5f 100644
>>> --- a/net/dsa/dsa.c
>>> +++ b/net/dsa/dsa.c
>>> @@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
>>>  		genphy_read_status(phydev);
>>>  		if (ds->ops->adjust_link)
>>>  			ds->ops->adjust_link(ds, port, phydev);
>>> +
>>> +		phy_device_free(phydev);	/* of_phy_find_device */
>>>  	}
>>>  
>>>  	return 0;
>>> @@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
>>>  	if (of_phy_is_fixed_link(port_dn)) {
>>>  		phydev = of_phy_find_device(port_dn);
>>>  		if (phydev) {
>>> -			phy_device_free(phydev);
>>>  			fixed_phy_unregister(phydev);
>>> +			/* Put references taken by of_phy_find_device() and
>>> +			 * of_phy_register_fixed_link().
>>> +			 */
>>> +			phy_device_free(phydev);
>>> +			phy_device_free(phydev);
>>
>> Double free, this looks bogus here. Actually would not this mean a
>> triple free since you already free in dsa_cpu_dsa_setup() which is
>> paired with dsa_cpu_dsa_destroy()?
> 
> The naming of phy_device_free() is unfortunate when it's really a put():
> 
> void phy_device_free(struct phy_device *phydev)
> {
> 	put_device(&phydev->mdio.dev);
> }

Indeed, should have looked a little harder.

> 
> which may need to be called multiple times, specifically after a call to
> of_phy_find_device() which takes another reference.
> 
> With this patch the refcounts are properly balanced.

The intent of your patch is good, but it still feels like having to
double imbalance the refcount is symptomatic of a larger issue here, it
does not seem like having several refcounts are necessary, so we may
really want to rework the API.
-- 
Florian

^ permalink raw reply

* Re: linux-next: net->netns_ids is used after calling idr_destroy for it
From: Cong Wang @ 2016-11-16 18:04 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: Nicolas Dichtel, Linux Kernel Network Developers
In-Reply-To: <CANaxB-ypW4cFxNTAaS-n6ehNHxGyxe9x0G8SJap6PBp8UJX9gA@mail.gmail.com>

On Tue, Nov 15, 2016 at 2:45 PM, Andrei Vagin <avagin@gmail.com> wrote:
> On Tue, Nov 15, 2016 at 2:21 PM, Andrei Vagin <avagin@gmail.com> wrote:
>> I would like to agree with you here, but looks like sockets with
>> NETLINK_F_LISTEN_ALL_NSID are able to catch these messages.
>
> Actually I found that I was wrong.
>
> do_one_broadcast() sends a notification only if a device network
> namespace has an id in a socket netns. But cleanup_net() removes all
> id-s to a target namespace, so just ignore my last comment.

The point is all sockets in that netns are already gone at that point
because of refcount.

cleanup_net() also destroys net->netns_ids, so it should not be even
accessed after that point.

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: busy-poll: allow preemption and other optimizations
From: John Fastabend @ 2016-11-16 18:03 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Willem de Bruijn, Adam Belay, Zach Brown, Tariq Toukan,
	Yuval Mintz, Ariel Elior, Eric Dumazet
In-Reply-To: <1479233715-9905-1-git-send-email-edumazet@google.com>

On 16-11-15 10:15 AM, Eric Dumazet wrote:
> It is time to have preemption points in sk_busy_loop() and improve
> its scalability.
> 
> Also napi_complete() and friends can tell drivers when it is safe to
> not re-enable device interrupts, saving some overhead under
> high busy polling.
> 
> mlx4 and bnx2x are changed accordingly, to show how this busy polling
> status can be exploited by drivers.
> 
> Next steps will implement Zach Brown suggestion, where NAPI polling
> would be enabled all the time for some chosen queues.
> This is needed for efficient epoll() support anyway.

Would you expect to make this a per queue option of the hardware
configured via ethtool/netlink/sysfs and like where users steer traffic
to particular queues using existing ntuple filters or 'tc' or infer it
from the socket layer?

So configuration would be (a) enable busy-polling on queues x,y,z and
then (b) use ntuple/RSS/etc to steer relative traffic to queues. In
this case traffic doesn't need to be bound to a socket in any way.
Seems like a useful generalization.

Thanks,
John

> 
> Eric Dumazet (5):
>   net: busy-poll: allow preemption in sk_busy_loop()
>   net: busy-poll: remove need_resched() from sk_can_busy_loop()
>   net: busy-poll: return busypolling status to drivers
>   net/mlx4_en: use napi_complete_done() return value
>   bnx2x: switch to napi_complete_done()
> 
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |  15 ++--
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c      |   4 +-
>  include/linux/netdevice.h                       |  17 +++-
>  include/net/busy_poll.h                         |   5 +-
>  net/core/dev.c                                  | 110 +++++++++++++++++++-----
>  5 files changed, 113 insertions(+), 38 deletions(-)
> 

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Rick Jones @ 2016-11-16 17:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev@vger.kernel.org, Eric Dumazet
In-Reply-To: <20161116131609.4e5726b4@redhat.com>

On 11/16/2016 04:16 AM, Jesper Dangaard Brouer wrote:
> [1] Subj: High perf top ip_idents_reserve doing netperf UDP_STREAM
>  - https://www.spinics.net/lists/netdev/msg294752.html
>
> Not fixed in version 2.7.0.
>  - ftp://ftp.netperf.org/netperf/netperf-2.7.0.tar.gz
>
> Used extra netperf configure compile options:
>  ./configure  --enable-histogram --enable-demo
>
> It seems like some fix attempts exists in the SVN repository::
>
>  svn checkout http://www.netperf.org/svn/netperf2/trunk/ netperf2-svn
>  svn log -r709
>  # A quick stab at getting remote connect going for UDP_STREAM
>  svn diff -r708:709
>
> Testing with SVN version, still show __ip_select_ident() in top#1.

Indeed, there was a fix for getting the remote side connect()ed. 
Looking at what I have for the top of trunk I do though see a connect() 
call being made at the local end:

socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
getsockopt(4, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
getsockopt(4, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(4, {sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
setsockopt(4, SOL_SOCKET, SO_DONTROUTE, [1], 4) = 0
setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
brk(0xe53000)                           = 0xe53000
getsockname(4, {sa_family=AF_INET, sin_port=htons(59758), 
sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
sendto(3, 
"\0\0\0a\377\377\377\377\377\377\377\377\377\377\377\377\0\0\0\10\0\0\0\0\0\0\0\321\377\377\377\377"..., 
656, 0, NULL, 0) = 656
select(1024, [3], NULL, NULL, {120, 0}) = 1 (in [3], left {119, 995630})
recvfrom(3, 
"\0\0\0b\0\0\0\0\0\3@\0\0\3@\0\0\0\0\2\0\3@\0\377\377\377\377\0\0\0\321"..., 
656, 0, NULL, NULL) = 656
write(1, "need to connect is 1\n", 21)  = 21
rt_sigaction(SIGALRM, {0x402ea6, [ALRM], SA_RESTORER|SA_INTERRUPT, 
0x7f2824eb2cb0}, NULL, 8) = 0
rt_sigaction(SIGINT, {0x402ea6, [INT], SA_RESTORER|SA_INTERRUPT, 
0x7f2824eb2cb0}, NULL, 8) = 0
alarm(1)                                = 0
connect(4, {sa_family=AF_INET, sin_port=htons(34832), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) = 
1024
sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) = 
1024
sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) = 
1024

the only difference there with top of trunk is that "need to connect" 
write/printf I just put in the code to be a nice marker in the system 
call trace.

It is a wild guess, but does setting SO_DONTROUTE affect whether or not 
a connect() would have the desired effect?  That is there to protect 
people from themselves (long story about people using UDP_STREAM to 
stress improperly air-gapped systems during link up/down testing....) 
It can be disabled with a test-specific -R 1 option, so your netperf 
command would become:

netperf -H 198.18.50.1 -t UDP_STREAM -l 120 -- -m 1472 -n -N -R 1

>
> (p.s. is netperf ever going to be converted from SVN to git?)
>

Well....  my git-fu could use some work (gentle, offlinetaps with a 
clueful tutorial bat would be welcome), and at least in the past, going 
to git was held back because there were a bunch of netperf users on 
Windows and there wasn't (at the time) support for git under Windows.

But I am not against the idea in principle.

happy benchmarking,

rick jones

PS - rick.jones@hp.com no longer works.  rick.jones2@hpe.com should be 
used instead.

^ permalink raw reply

* [PATCHv2 perf/core 2/2] tools lib bpf: Sync with samples/bpf/libbpf
From: Joe Stringer @ 2016-11-16 17:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, wangnan0, ast, daniel, acme
In-Reply-To: <20161116174324.29675-1-joe@ovn.org>

Extend the tools/ version of libbpf to include all of the functionality
provided in the samples/bpf version.

Signed-off-by: Joe Stringer <joe@ovn.org>
---
v2: Don't shift non-bpf changes across.
    Various type cleanups, removal of extraneous declarations
---
 tools/lib/bpf/bpf.c    | 107 ++++++++++++++++++++------
 tools/lib/bpf/bpf.h    | 202 +++++++++++++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/libbpf.c |   3 +-
 3 files changed, 279 insertions(+), 33 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4212ed62235b..5e061851ac00 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -20,10 +20,17 @@
  */
 
 #include <stdlib.h>
-#include <memory.h>
+#include <stdio.h>
 #include <unistd.h>
 #include <asm/unistd.h>
+#include <string.h>
+#include <linux/netlink.h>
 #include <linux/bpf.h>
+#include <errno.h>
+#include <net/ethernet.h>
+#include <net/if.h>
+#include <linux/if_packet.h>
+#include <arpa/inet.h>
 #include "bpf.h"
 
 /*
@@ -53,24 +60,71 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
 	return syscall(__NR_bpf, cmd, attr, size);
 }
 
-int bpf_create_map(enum bpf_map_type map_type, int key_size,
-		   int value_size, int max_entries)
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+		   int max_entries, int map_flags)
 {
-	union bpf_attr attr;
+	union bpf_attr attr = {
+		.map_type = map_type,
+		.key_size = key_size,
+		.value_size = value_size,
+		.max_entries = max_entries,
+		.map_flags = map_flags,
+	};
 
-	memset(&attr, '\0', sizeof(attr));
+	return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
 
-	attr.map_type = map_type;
-	attr.key_size = key_size;
-	attr.value_size = value_size;
-	attr.max_entries = max_entries;
+int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
+{
+	union bpf_attr attr = {
+		.map_fd = fd,
+		.key = ptr_to_u64(key),
+		.value = ptr_to_u64(value),
+		.flags = flags,
+	};
 
-	return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+	return sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_lookup_elem(int fd, void *key, void *value)
+{
+	union bpf_attr attr = {
+		.map_fd = fd,
+		.key = ptr_to_u64(key),
+		.value = ptr_to_u64(value),
+	};
+
+	return sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_delete_elem(int fd, void *key)
+{
+	union bpf_attr attr = {
+		.map_fd = fd,
+		.key = ptr_to_u64(key),
+	};
+
+	return sys_bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
 }
 
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
-		     size_t insns_cnt, char *license,
-		     u32 kern_version, char *log_buf, size_t log_buf_sz)
+int bpf_get_next_key(int fd, void *key, void *next_key)
+{
+	union bpf_attr attr = {
+		.map_fd = fd,
+		.key = ptr_to_u64(key),
+		.next_key = ptr_to_u64(next_key),
+	};
+
+	return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
+}
+
+#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
+
+
+int bpf_load_program(enum bpf_prog_type type,
+		     const struct bpf_insn *insns, int insns_cnt,
+		     const char *license, int kern_version,
+		     char *log_buf, size_t log_buf_sz)
 {
 	int fd;
 	union bpf_attr attr;
@@ -78,8 +132,8 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
 	bzero(&attr, sizeof(attr));
 	attr.prog_type = type;
 	attr.insn_cnt = (__u32)insns_cnt;
-	attr.insns = ptr_to_u64(insns);
-	attr.license = ptr_to_u64(license);
+	attr.insns = ptr_to_u64((void *)insns);
+	attr.license = ptr_to_u64((void *)license);
 	attr.log_buf = ptr_to_u64(NULL);
 	attr.log_size = 0;
 	attr.log_level = 0;
@@ -97,16 +151,21 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
 	return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
 }
 
-int bpf_map_update_elem(int fd, void *key, void *value,
-			u64 flags)
+int bpf_obj_pin(int fd, const char *pathname)
 {
-	union bpf_attr attr;
+	union bpf_attr attr = {
+		.pathname	= ptr_to_u64((void *)pathname),
+		.bpf_fd		= fd,
+	};
 
-	bzero(&attr, sizeof(attr));
-	attr.map_fd = fd;
-	attr.key = ptr_to_u64(key);
-	attr.value = ptr_to_u64(value);
-	attr.flags = flags;
+	return sys_bpf(BPF_OBJ_PIN, &attr, sizeof(attr));
+}
 
-	return sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+int bpf_obj_get(const char *pathname)
+{
+	union bpf_attr attr = {
+		.pathname	= ptr_to_u64((void *)pathname),
+	};
+
+	return sys_bpf(BPF_OBJ_GET, &attr, sizeof(attr));
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index e8ba54087497..4dba36995771 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -23,16 +23,202 @@
 
 #include <linux/bpf.h>
 
+struct bpf_insn;
+
 int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
-		   int max_entries);
+		   int max_entries, int map_flags);
+int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags);
+int bpf_lookup_elem(int fd, void *key, void *value);
+int bpf_delete_elem(int fd, void *key);
+int bpf_get_next_key(int fd, void *key, void *next_key);
+
+int bpf_load_program(enum bpf_prog_type prog_type,
+		     const struct bpf_insn *insns, int insn_len,
+		     const char *license, int kern_version,
+		     char *log_buf, size_t log_buf_sz);
+
+int bpf_obj_pin(int fd, const char *pathname);
+int bpf_obj_get(const char *pathname);
 
-/* Recommend log buffer size */
 #define BPF_LOG_BUF_SIZE 65536
-int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
-		     size_t insns_cnt, char *license,
-		     u32 kern_version, char *log_buf,
-		     size_t log_buf_sz);
 
-int bpf_map_update_elem(int fd, void *key, void *value,
-			u64 flags);
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_K,	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
+#define BPF_LD_IMM64(DST, IMM)					\
+	BPF_LD_IMM64_RAW(DST, 0, IMM)
+
+#define BPF_LD_IMM64_RAW(DST, SRC, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_DW | BPF_IMM,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = (__u32) (IMM) }),			\
+	((struct bpf_insn) {					\
+		.code  = 0, /* zero is reserved opcode */	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = ((__u64) (IMM)) >> 32 })
+
+#ifndef BPF_PSEUDO_MAP_FD
+# define BPF_PSEUDO_MAP_FD	1
+#endif
+
+/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
+#define BPF_LD_MAP_FD(DST, MAP_FD)				\
+	BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct bpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
 #endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b699aea9a025..951e11023554 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -796,7 +796,8 @@ bpf_object__create_maps(struct bpf_object *obj)
 		*pfd = bpf_create_map(def->type,
 				      def->key_size,
 				      def->value_size,
-				      def->max_entries);
+				      def->max_entries,
+				      0);
 		if (*pfd < 0) {
 			size_t j;
 			int err = *pfd;
-- 
2.9.3

^ permalink raw reply related

* [PATCHv2 perf/core 1/2] tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
From: Joe Stringer @ 2016-11-16 17:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, wangnan0, ast, daniel, acme
In-Reply-To: <20161116174324.29675-1-joe@ovn.org>

The tools version of this header is out of date; update it to the latest
version from the kernel headers.

Signed-off-by: Joe Stringer <joe@ovn.org>
---
v2: No change.
---
 tools/include/uapi/linux/bpf.h | 51 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9e5fc168c8a3..f09c70b97eca 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -95,6 +95,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SCHED_ACT,
 	BPF_PROG_TYPE_TRACEPOINT,
 	BPF_PROG_TYPE_XDP,
+	BPF_PROG_TYPE_PERF_EVENT,
 };
 
 #define BPF_PSEUDO_MAP_FD	1
@@ -375,6 +376,56 @@ enum bpf_func_id {
 	 */
 	BPF_FUNC_probe_write_user,
 
+	/**
+	 * bpf_current_task_under_cgroup(map, index) - Check cgroup2 membership of current task
+	 * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+	 * @index: index of the cgroup in the bpf_map
+	 * Return:
+	 *   == 0 current failed the cgroup2 descendant test
+	 *   == 1 current succeeded the cgroup2 descendant test
+	 *    < 0 error
+	 */
+	BPF_FUNC_current_task_under_cgroup,
+
+	/**
+	 * bpf_skb_change_tail(skb, len, flags)
+	 * The helper will resize the skb to the given new size,
+	 * to be used f.e. with control messages.
+	 * @skb: pointer to skb
+	 * @len: new skb length
+	 * @flags: reserved
+	 * Return: 0 on success or negative error
+	 */
+	BPF_FUNC_skb_change_tail,
+
+	/**
+	 * bpf_skb_pull_data(skb, len)
+	 * The helper will pull in non-linear data in case the
+	 * skb is non-linear and not all of len are part of the
+	 * linear section. Only needed for read/write with direct
+	 * packet access.
+	 * @skb: pointer to skb
+	 * @len: len to make read/writeable
+	 * Return: 0 on success or negative error
+	 */
+	BPF_FUNC_skb_pull_data,
+
+	/**
+	 * bpf_csum_update(skb, csum)
+	 * Adds csum into skb->csum in case of CHECKSUM_COMPLETE.
+	 * @skb: pointer to skb
+	 * @csum: csum to add
+	 * Return: csum on success or negative error
+	 */
+	BPF_FUNC_csum_update,
+
+	/**
+	 * bpf_set_hash_invalid(skb)
+	 * Invalidate current skb>hash.
+	 * @skb: pointer to skb
+	 */
+	BPF_FUNC_set_hash_invalid,
+
 	__BPF_FUNC_MAX_ID,
 };
 
-- 
2.9.3

^ permalink raw reply related

* [PATCHv2 perf/core 0/2] libbpf: Synchronize implementations
From: Joe Stringer @ 2016-11-16 17:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, wangnan0, ast, daniel, acme

Update tools/lib/bpf to provide more functionality and improve interoperation
with other tools that generate and use eBPF code:
* The kernel uapi headers are a bit newer than the version in the tools/
  directory; synchronize those.
* samples/bpf/libbpf* has a bit more functionality than tools/lib/bpf, so
  extend tools/lib/bpf/bpf* with these functions to bring them into parity.

I've got a separate series to update samples/bpf/* to rely on these libraries,
but there's a conflict with davem's tree at the moment so I suppose that the
way forward is to get these patches through first, then take the samples
through net-next at a later time.

---
v2: Don't shift non-bpf code into libbpf.
    Drop the patch to synchronize ELF definitions with tc.
v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
    First post.

Joe Stringer (2):
  tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
  tools lib bpf: Sync with samples/bpf/libbpf

 tools/include/uapi/linux/bpf.h |  51 +++++++++++
 tools/lib/bpf/bpf.c            | 107 +++++++++++++++++-----
 tools/lib/bpf/bpf.h            | 202 +++++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/libbpf.c         |   3 +-
 4 files changed, 330 insertions(+), 33 deletions(-)

-- 
2.9.3

^ permalink raw reply

* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Hannes Frederic Sowa @ 2016-11-16 17:35 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
	ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
	f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20161116151829.h6kapbjes7xxxcyi@splinter.mtl.com>

On 16.11.2016 16:18, Ido Schimmel wrote:
> Hi Hannes,
> 
> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
>> On 16.11.2016 15:09, Jiri Pirko wrote:
>>> From: Ido Schimmel <idosch@mellanox.com>
>>>
>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
>>> introduced a new notification chain to notify listeners (f.e., switchdev
>>> drivers) about addition and deletion of routes.
>>>
>>> However, upon registration to the chain the FIB tables can already be
>>> populated, which means potential listeners will have an incomplete view
>>> of the tables.
>>>
>>> Solve that by adding an API to request a FIB dump. The dump itself it
>>> done using RCU in order not to starve consumers that need RTNL to make
>>> progress.
>>>
>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>
>> Have you looked at potential inconsistencies resulting of RCU walking
>> the table and having concurrent inserts?
> 
> Yes. I did try to think about situations in which this approach will
> fail, but I could only find problems with concurrent removals, which I
> addressed in 5/8. In case of concurrent insertions, even if you missed
> the node, you would still get the ENTRY_ADD event to your listener.

Theoretically a node could still be installed while the deletion event
fired before registering the notifier. E.g. a synchronize_net before
dumping could help here?

I don't know how you prepare the data structures for inserting in into
the hardware, but if ordering matters, the notifier for a delete event
can be called before the dump installed the fib entry? E.g. you also
don't check what events are already queued in the delayed work queue so
far. I hope I understand the patches correctly?

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH ipsec-next 6/8] xfrm: policy: only use rcu in xfrm_sk_policy_lookup
From: Florian Westphal @ 2016-11-16 17:30 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: Florian Westphal, netdev, Steffen Klassert
In-Reply-To: <0e235471-cb63-2376-c42d-5872e78e4c98@6wind.com>

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> Le 11/08/2016 à 15:17, Florian Westphal a écrit :
> > Don't acquire the readlock anymore and rely on rcu alone.
> > 
> > In case writer on other CPU changed policy at the wrong moment (after we
> > obtained sk policy pointer but before we could obtain the reference)
> > just repeat the lookup.
> > 
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> Since this patch, our IKEv1 Transport tests (using charon) fail to establish the
> connection. If I revert it, the IKE negociation is ok again.
> charon logs are enclosed.
> 
> I didn't had time to investigate now, but any idea is welcomed ;-)

I'm an idiot.  Thanks for figuring out the faulty commit!

This should fix it (if we succeed grabbing the refcount, then if (err && !xfrm_pol_hold_rcu
evaluates to false and we hit last else branch and set pol to ERR_PTR(0)...

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index fd6986634e6f..5bf7e1bfeac7 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1268,12 +1268,14 @@ static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir,
                        err = security_xfrm_policy_lookup(pol->security,
                                                      fl->flowi_secid,
                                                      policy_to_flow_dir(dir));
-                       if (!err && !xfrm_pol_hold_rcu(pol))
-                               goto again;
-                       else if (err == -ESRCH)
+                       if (!err) {
+                               if (!xfrm_pol_hold_rcu(pol))
+                                       goto again;
+                       } else if (err == -ESRCH) {
                                pol = NULL;
-                       else
+                       } else {
                                pol = ERR_PTR(err);
+                       }
                } else

^ permalink raw reply related

* Re: [PATCH net v2] net: nsid cannot be allocated for a dead netns
From: Cong Wang @ 2016-11-16 17:29 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Andrey Wagin, David Miller, Linux Kernel Network Developers
In-Reply-To: <1479289775-1715-1-git-send-email-nicolas.dichtel@6wind.com>

On Wed, Nov 16, 2016 at 1:49 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index f61c0e02a413..63f65387f4e1 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -159,6 +159,9 @@ static int alloc_netid(struct net *net, struct net *peer, int reqid)
>                 max = reqid + 1;
>         }
>
> +       if (!atomic_read(&net->count) || !atomic_read(&peer->count))
> +               return -EINVAL;
> +
>         return idr_alloc(&net->netns_ids, peer, min, max, GFP_ATOMIC);
>  }


There is already a check in peernet2id_alloc(), so why not just the following?

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f61c0e0..7001da9 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -219,6 +219,8 @@ int peernet2id_alloc(struct net *net, struct net *peer)
        bool alloc;
        int id;

+       if (atomic_read(&net->count) == 0)
+               return NETNSA_NSID_NOT_ASSIGNED;
        spin_lock_irqsave(&net->nsid_lock, flags);
        alloc = atomic_read(&peer->count) == 0 ? false : true;
        id = __peernet2id_alloc(net, peer, &alloc);

^ permalink raw reply related

* Re: [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Johan Hovold @ 2016-11-16 17:11 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Johan Hovold, Andrew Lunn, Vivien Didelot, David S. Miller,
	netdev, linux-kernel
In-Reply-To: <42af57ed-df84-f628-659d-c982d30da7d5@gmail.com>

On Wed, Nov 16, 2016 at 09:06:26AM -0800, Florian Fainelli wrote:
> On 11/16/2016 06:47 AM, Johan Hovold wrote:
> > Make sure to drop the reference taken by of_phy_find_device() when
> > registering and deregistering the fixed-link PHY-device.
> > 
> > Note that we need to put both references held at deregistration.
> > 
> > Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
> > speeds/duplex")
> > Signed-off-by: Johan Hovold <johan@kernel.org>
> > ---
> > 
> > Hi,
> > 
> > This is one has been compile tested only, but fixes a couple of leaks
> > similar to one that was found in the cpsw driver for which I just posted
> > a patch.
> > 
> > It turns out all drivers but DSA fail to deregister the fixed-link PHYs
> > registered by of_phy_register_fixed_link(). Due to the way this
> > interface was designed, deregistering such a PHY is a bit cumbersome and
> > looks like it would benefit from a common helper.
> > 
> > However, perhaps the interface should instead be changed so that the PHY
> > device is returned so that drivers do not need to use
> > of_phy_find_device() when they need to access properties of the fixed
> > link (e.g. as in dsu_cpu_dsa_setup below).
> > 
> > Thoughts?
> > 
> > Thanks,
> > Johan
> > 
> > 
> >  net/dsa/dsa.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> > index a6902c1e2f28..798a6a776a5f 100644
> > --- a/net/dsa/dsa.c
> > +++ b/net/dsa/dsa.c
> > @@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
> >  		genphy_read_status(phydev);
> >  		if (ds->ops->adjust_link)
> >  			ds->ops->adjust_link(ds, port, phydev);
> > +
> > +		phy_device_free(phydev);	/* of_phy_find_device */
> >  	}
> >  
> >  	return 0;
> > @@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
> >  	if (of_phy_is_fixed_link(port_dn)) {
> >  		phydev = of_phy_find_device(port_dn);
> >  		if (phydev) {
> > -			phy_device_free(phydev);
> >  			fixed_phy_unregister(phydev);
> > +			/* Put references taken by of_phy_find_device() and
> > +			 * of_phy_register_fixed_link().
> > +			 */
> > +			phy_device_free(phydev);
> > +			phy_device_free(phydev);
> 
> Double free, this looks bogus here. Actually would not this mean a
> triple free since you already free in dsa_cpu_dsa_setup() which is
> paired with dsa_cpu_dsa_destroy()?

The naming of phy_device_free() is unfortunate when it's really a put():

void phy_device_free(struct phy_device *phydev)
{
	put_device(&phydev->mdio.dev);
}

which may need to be called multiple times, specifically after a call to
of_phy_find_device() which takes another reference.

With this patch the refcounts are properly balanced.

Johan

^ permalink raw reply

* [PATCH net-next] udp: enable busy polling for all sockets
From: Eric Dumazet @ 2016-11-16 17:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Willem de Bruijn

From: Eric Dumazet <edumazet@google.com>

UDP busy polling is restricted to connected UDP sockets.

This is because sk_busy_loop() only takes care of one NAPI context.

There are cases where it could be extended.

1) Some hosts receive traffic on a single NIC, with one RX queue.

2) Some applications use SO_REUSEPORT and associated BPF filter
   to split the incoming traffic on one UDP socket per RX
queue/thread/cpu

3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()


This patch records the napi_id of first received skb, giving more
reach to busy polling.

Tested:

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read

lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done

Before patch :
   27867   28870   37324   41060   41215
   36764   36838   44455   41282   43843
After patch :
   73920   73213   70147   74845   71697
   68315   68028   75219   70082   73707

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
 include/net/busy_poll.h |   28 +++++++++++++++++++---------
 net/ipv4/udp.c          |    2 ++
 net/ipv6/udp.c          |    2 ++
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 2fbeb1313c0f4f78ac82ddf6c18d1016a901f99a..34c57a0ec040a58f09b12e3878f534433dc30015 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -81,11 +81,6 @@ static inline void skb_mark_napi_id(struct sk_buff *skb,
 	skb->napi_id = napi->napi_id;
 }
 
-/* used in the protocol hanlder to propagate the napi_id to the socket */
-static inline void sk_mark_napi_id(struct sock *sk, struct sk_buff *skb)
-{
-	sk->sk_napi_id = skb->napi_id;
-}
 
 #else /* CONFIG_NET_RX_BUSY_POLL */
 static inline unsigned long net_busy_loop_on(void)
@@ -108,10 +103,6 @@ static inline void skb_mark_napi_id(struct sk_buff *skb,
 {
 }
 
-static inline void sk_mark_napi_id(struct sock *sk, struct sk_buff *skb)
-{
-}
-
 static inline bool busy_loop_timeout(unsigned long end_time)
 {
 	return true;
@@ -123,4 +114,23 @@ static inline bool sk_busy_loop(struct sock *sk, int nonblock)
 }
 
 #endif /* CONFIG_NET_RX_BUSY_POLL */
+
+/* used in the protocol hanlder to propagate the napi_id to the socket */
+static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	sk->sk_napi_id = skb->napi_id;
+#endif
+}
+
+/* variant used for unconnected sockets */
+static inline void sk_mark_napi_id_once(struct sock *sk,
+					const struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	if (!sk->sk_napi_id)
+		sk->sk_napi_id = skb->napi_id;
+#endif
+}
+
 #endif /* _LINUX_NET_BUSY_POLL_H */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9ae7c63a8b131f56a3c05e00a8f7d106a0362c54..e1fc0116e8d59d8185670c6e55d1219bde55610d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1569,6 +1569,8 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
 		sk_incoming_cpu_update(sk);
+	} else {
+		sk_mark_napi_id_once(sk, skb);
 	}
 
 	rc = __udp_enqueue_schedule_skb(sk, skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 86a8cacd333b64f40acc7350a45213cb422a9c1a..4f99417d9b401f2a65c7828e7d6b86d1d6161794 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -519,6 +519,8 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
 		sk_incoming_cpu_update(sk);
+	} else {
+		sk_mark_napi_id_once(sk, skb);
 	}
 
 	rc = __udp_enqueue_schedule_skb(sk, skb);

^ permalink raw reply related

* Re: [PATCH v2] net/phy/vitesse: Configure RGMII skew on VSC8601, if needed
From: Alex @ 2016-11-16 17:10 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: f.fainelli, netdev, linux-kernel, davem, Gokhan Cosgul
In-Reply-To: <20161116165437.GK23231@lunn.ch>



On 11/16/2016 08:54 AM, Andrew Lunn wrote:
> On Wed, Nov 16, 2016 at 08:44:30AM -0800, Alex wrote:
>>
>>
>> On 11/16/2016 05:50 AM, Andrew Lunn wrote:
>>> On Wed, Nov 16, 2016 at 01:02:33AM -0800, Alexandru Gagniuc wrote:
>>>> With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The
>>>> VSC8601 can handle this internally. While the VSC8601 can set more
>>>> fine-grained delays, the standard skew settings work out of the box.
>>>> The same heuristic is used to determine when this skew should be enabled
>>>> as in vsc824x_config_init().
>>>>
>>>> +/* This adds a skew for both TX and RX clocks, so the skew should only be
>>>> + * applied to "rgmii-id" interfaces. It may not work as expected
>>>> + * on "rgmii-txid", "rgmii-rxid" or "rgmii" interfaces. */
>>>
>>> Hi Alexandru
>>>
>>> You should be able to make "rgmii" work as expected. If that is the
>>> phy mode, disable the skew.
>>
>> And that's exactly the implemented behavior. See
>> vsc8601_config_init() below.
>
> I don't think so. vsc8601_config_init() will not cause the skew to be
> cleared if the phy-mode is "rgmii" and something else like the
> bootloader could of set the skew. So saying that "rgmii" might not
> work as expected is true. But with a minor change, you can make it
> work as expected.

That's not within the scope of this change. The scope is to make 
rgmii-id work. Any additional changes would be untested.

Alex

^ permalink raw reply

* Re: [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Florian Fainelli @ 2016-11-16 17:06 UTC (permalink / raw)
  To: Johan Hovold, Andrew Lunn
  Cc: Vivien Didelot, David S. Miller, netdev, linux-kernel
In-Reply-To: <1479307664-32428-1-git-send-email-johan@kernel.org>

On 11/16/2016 06:47 AM, Johan Hovold wrote:
> Make sure to drop the reference taken by of_phy_find_device() when
> registering and deregistering the fixed-link PHY-device.
> 
> Note that we need to put both references held at deregistration.
> 
> Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
> speeds/duplex")
> Signed-off-by: Johan Hovold <johan@kernel.org>
> ---
> 
> Hi,
> 
> This is one has been compile tested only, but fixes a couple of leaks
> similar to one that was found in the cpsw driver for which I just posted
> a patch.
> 
> It turns out all drivers but DSA fail to deregister the fixed-link PHYs
> registered by of_phy_register_fixed_link(). Due to the way this
> interface was designed, deregistering such a PHY is a bit cumbersome and
> looks like it would benefit from a common helper.
> 
> However, perhaps the interface should instead be changed so that the PHY
> device is returned so that drivers do not need to use
> of_phy_find_device() when they need to access properties of the fixed
> link (e.g. as in dsu_cpu_dsa_setup below).
> 
> Thoughts?
> 
> Thanks,
> Johan
> 
> 
>  net/dsa/dsa.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index a6902c1e2f28..798a6a776a5f 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
>  		genphy_read_status(phydev);
>  		if (ds->ops->adjust_link)
>  			ds->ops->adjust_link(ds, port, phydev);
> +
> +		phy_device_free(phydev);	/* of_phy_find_device */
>  	}
>  
>  	return 0;
> @@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
>  	if (of_phy_is_fixed_link(port_dn)) {
>  		phydev = of_phy_find_device(port_dn);
>  		if (phydev) {
> -			phy_device_free(phydev);
>  			fixed_phy_unregister(phydev);
> +			/* Put references taken by of_phy_find_device() and
> +			 * of_phy_register_fixed_link().
> +			 */
> +			phy_device_free(phydev);
> +			phy_device_free(phydev);

Double free, this looks bogus here. Actually would not this mean a
triple free since you already free in dsa_cpu_dsa_setup() which is
paired with dsa_cpu_dsa_destroy()?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Anand Moon @ 2016-11-16 17:06 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree, Florian Fainelli,
	Alexandre TORGUE, Neil Armstrong, Martin Blumenstingl,
	Kevin Hilman, Linux Kernel, Andre Roth,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Carlo Caione,
	Giuseppe Cavallaro, linux-arm-kernel
In-Reply-To: <1479220154-25851-2-git-send-email-jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>

 Hi Jerome.

On 15 November 2016 at 19:59, Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org> wrote:
> On some platforms, energy efficient ethernet with rtl8211 devices is
> causing issue, like throughput drop or broken link.
>
> This was reported on the OdroidC2 (DWMAC + RTL8211F). While the issue root
> cause is not fully understood yet, disabling EEE advertisement prevent auto
> negotiation from enabling EEE.
>
> This patch provides options to disable 1000T and 100TX EEE advertisement
> individually for the realtek phys supporting this feature.
>
> Reported-by: Martin Blumenstingl <martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
> Cc: Giuseppe Cavallaro <peppe.cavallaro-qxv4g6HH51o@public.gmane.org>
> Cc: Alexandre TORGUE <alexandre.torgue-qxv4g6HH51o@public.gmane.org>
> Signed-off-by: Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Neil Armstrong <narmstrong-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> Tested-by: Andre Roth <neolynx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/net/phy/realtek.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 64 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
> index aadd6e9f54ad..77235fd5faaf 100644
> --- a/drivers/net/phy/realtek.c
> +++ b/drivers/net/phy/realtek.c
> @@ -15,6 +15,12 @@
>   */
>  #include <linux/phy.h>
>  #include <linux/module.h>
> +#include <linux/of.h>
> +
> +struct rtl8211x_phy_priv {
> +       bool eee_1000t_disable;
> +       bool eee_100tx_disable;
> +};
>
>  #define RTL821x_PHYSR          0x11
>  #define RTL821x_PHYSR_DUPLEX   0x2000
> @@ -93,12 +99,44 @@ static int rtl8211f_config_intr(struct phy_device *phydev)
>         return err;
>  }
>
> +static void rtl8211x_clear_eee_adv(struct phy_device *phydev)
> +{
> +       struct rtl8211x_phy_priv *priv = phydev->priv;
> +       u16 val;
> +
> +       if (priv->eee_1000t_disable || priv->eee_100tx_disable) {
> +               val = phy_read_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
> +                                           MDIO_MMD_AN);
> +
> +               if (priv->eee_1000t_disable)
> +                       val &= ~MDIO_AN_EEE_ADV_1000T;
> +               if (priv->eee_100tx_disable)
> +                       val &= ~MDIO_AN_EEE_ADV_100TX;
> +
> +               phy_write_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
> +                                      MDIO_MMD_AN, val);
> +       }
> +}
> +
> +static int rtl8211x_config_init(struct phy_device *phydev)
> +{
> +       int ret;
> +
> +       ret = genphy_config_init(phydev);
> +       if (ret < 0)
> +               return ret;
> +
> +       rtl8211x_clear_eee_adv(phydev);
> +
> +       return 0;
> +}
> +
>  static int rtl8211f_config_init(struct phy_device *phydev)
>  {
>         int ret;
>         u16 reg;
>
> -       ret = genphy_config_init(phydev);
> +       ret = rtl8211x_config_init(phydev);
>         if (ret < 0)
>                 return ret;
>
> @@ -115,6 +153,26 @@ static int rtl8211f_config_init(struct phy_device *phydev)
>         return 0;
>  }
>
> +static int rtl8211x_phy_probe(struct phy_device *phydev)
> +{
> +       struct device *dev = &phydev->mdio.dev;
> +       struct device_node *of_node = dev->of_node;
> +       struct rtl8211x_phy_priv *priv;
> +
> +       priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return -ENOMEM;
> +
> +       priv->eee_1000t_disable =
> +               of_property_read_bool(of_node, "realtek,disable-eee-1000t");
> +       priv->eee_100tx_disable =
> +               of_property_read_bool(of_node, "realtek,disable-eee-100tx");
> +
> +       phydev->priv = priv;
> +
> +       return 0;
> +}
> +
>  static struct phy_driver realtek_drvs[] = {
>         {
>                 .phy_id         = 0x00008201,
> @@ -140,7 +198,9 @@ static struct phy_driver realtek_drvs[] = {
>                 .phy_id_mask    = 0x001fffff,
>                 .features       = PHY_GBIT_FEATURES,
>                 .flags          = PHY_HAS_INTERRUPT,
> +               .probe          = &rtl8211x_phy_probe,
>                 .config_aneg    = genphy_config_aneg,
> +               .config_init    = &rtl8211x_config_init,
>                 .read_status    = genphy_read_status,
>                 .ack_interrupt  = rtl821x_ack_interrupt,
>                 .config_intr    = rtl8211e_config_intr,
> @@ -152,7 +212,9 @@ static struct phy_driver realtek_drvs[] = {
>                 .phy_id_mask    = 0x001fffff,
>                 .features       = PHY_GBIT_FEATURES,
>                 .flags          = PHY_HAS_INTERRUPT,
> +               .probe          = &rtl8211x_phy_probe,
>                 .config_aneg    = &genphy_config_aneg,
> +               .config_init    = &rtl8211x_config_init,
>                 .read_status    = &genphy_read_status,
>                 .ack_interrupt  = &rtl821x_ack_interrupt,
>                 .config_intr    = &rtl8211e_config_intr,
> @@ -164,6 +226,7 @@ static struct phy_driver realtek_drvs[] = {
>                 .phy_id_mask    = 0x001fffff,
>                 .features       = PHY_GBIT_FEATURES,
>                 .flags          = PHY_HAS_INTERRUPT,
> +               .probe          = &rtl8211x_phy_probe,
>                 .config_aneg    = &genphy_config_aneg,
>                 .config_init    = &rtl8211f_config_init,
>                 .read_status    = &genphy_read_status,
> --
> 2.7.4
>

How about adding callback functionality for .soft_reset to handle BMCR
where we update the Auto-Negotiation for the phy,
as per the datasheet of the rtl8211f.

-Best Regard
Anand Moon

>
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Florian Fainelli @ 2016-11-16 17:01 UTC (permalink / raw)
  To: Jerome Brunet, Andrew Lunn
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Carlo Caione, Kevin Hilman, Giuseppe Cavallaro, Alexandre TORGUE,
	Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479310731.17538.53.camel-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>

On 11/16/2016 07:38 AM, Jerome Brunet wrote:
> On Wed, 2016-11-16 at 16:06 +0100, Andrew Lunn wrote:
>> On Wed, Nov 16, 2016 at 03:51:30PM +0100, Jerome Brunet wrote:
>>>
>>> On Wed, 2016-11-16 at 14:23 +0100, Andrew Lunn wrote:
>>>>
>>>>>
>>>>>
>>>>> There two kind of PHYs supporting eee, the one advertising eee
>>>>> by
>>>>> default (like realtek) and the one not advertising it (like
>>>>> micrel).
>>>
>>> This is just the default register value.
>>>
>>>>
>>>>
>>>> I don't know too much about EEE. So maybe a dumb question. Does
>>>> the
>>>> MAC need to be involved? Or is it just the PHY?
>>>>
>>>> If the MAC needs to be involved, the PHY should not be
>>>> advertising
>>>> EEE
>>>> unless the MAC asks for it by calling phy_init_eee(). If this is
>>>> true,
>>>> maybe we need to change the realtek driver, and others in that
>>>> class.
>>>
>>> As far I understand, the advertised capabilities are exchanged
>>> during
>>> the auto-negotiation.
>>>
>>> At this stage, if the advertisement is disabled (regarless of the
>>> actual support) on either side of the link, there will be no low
>>> power
>>> idle state on the Tx nor the Rx path.
>>>
>>> If the advertisement is enabled on both side but we don't call
>>> phy_init_eee, I suppose Tx won't enter LPI, but Rx could.
>>
>> What i was trying to find out is, if the MAC needs to support EEE as
>> well as the PHY, what happens when the MAC does not support EEE, but
>> the PHYs do negotiate EEE? Does it break?
> 
> Interesting question. In a regular case, I suppose it should be fine.
> As you would have LPI only on the Rx path this should be transparent to
> the MAC. That's my understanding. Maybe people knowing EEE better than
> me could confirm (or not) ? Peppe? Alexandre?

EEE is a MAC and PHY feature, and both need to agree on what is enabled,
especially in the transmit path because the way packets may be
transmitted with or without EEE can be done differently at the HW level
(faster/slower return to idle, different clock source).

> 
> I just checked with the OdroidC2, I disabled eee support by forcing
> "dma_cap.eee = 0" in stmmac_get_hw_features. As expected, no tx_LPI
> interrupts but plenty of rx_LPI interrupts.
> 
> What was not expected is test failing like before.
> So in our case, having LPI on the Rx path is fine for receiving data,
> but not for sending.

OK, which really sounds like a potential interoperability problem, or
just the Realtek PHY with EEE enabled acting funky (irrespective of
being attached to stmmac).
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox