Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: regression with napi/softirq ?
From: Sudip Mukherjee @ 2019-07-18 12:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), David S. Miller, netdev,
	linux-kernel
In-Reply-To: <8124bbe5-eaa8-2106-2695-4788ec0f6544@gmail.com>

On Thu, Jul 18, 2019 at 12:42 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/18/19 1:18 PM, Sudip Mukherjee wrote:
> > Hi Eric,
> >
> > On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >>
> >>
> >> On 7/17/19 11:52 PM, Thomas Gleixner wrote:
> >>> Sudip,
> >>>
> >>> On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> >>>> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>>> You can hack ksoftirq_running() to return always false to avoid this, but
> >>>>> that might cause application starvation and a huge packet buffer backlog
> >>>>> when the amount of incoming packets makes the CPU do nothing else than
> >>>>> softirq processing.
> >>>>
> >>>> I tried that now, it is better but still not as good as v3.8
> >>>> Now I am getting 375.9usec as the maximum time between raising the softirq
> >>>> and it starting to execute and packet drops still there.
> >>>>
> >>>> And just a thought, do you think there should be a CONFIG_ option for
> >>>> this feature of ksoftirqd_running() so that it can be disabled if needed
> >>>> by users like us?
> >>>
> >>> If at all then a sysctl to allow runtime control.
> >>>
> >
> > <snip>
> >
> >>
> >> ksoftirqd might be spuriously scheduled from tx path, when
> >> __qdisc_run() also reacts to need_resched().
> >>
> >> By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet
> >> in response to incoming packet), we force __do_softirq() to perform
> >> another loop, but before doing an other round, it will also check need_resched()
> >> and eventually call wakeup_softirqd()
> >>
> >> I wonder if following patch makes any difference.
> >>
> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> >> index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644
> >> --- a/net/sched/sch_generic.c
> >> +++ b/net/sched/sch_generic.c
> >> @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q)
> >>         int packets;
> >>
> >>         while (qdisc_restart(q, &packets)) {
> >> +               if (qdisc_is_empty(q))
> >> +                       break;
> >
> > unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced.
> > And I can not backport 28cff537ef2e ("net: sched: add empty status
> > flag for NOLOCK qdisc")
> > also as TCQ_F_NOLOCK is there. :(
> >
>
> On old kernels, you can simply use
>
> static inline bool qdisc_is_empty(struct Qdisc *q)
> {
>         return !qdisc_qlen(q);
> }
>

Thanks Eric. But there is no improvement in delay between
softirq_raise and softirq_entry with this change.
But moving to a later kernel (linus master branch? ) like Thomas has
said in the other mail might be difficult atm. I can definitely
move to v4.14.133 if that helps. Thomas ?


-- 
Regards
Sudip

^ permalink raw reply

* Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive
From: Michael S. Tsirkin @ 2019-07-18 13:04 UTC (permalink / raw)
  To: ? jiang
  Cc: jasowang@redhat.com, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, xdp-newbies@vger.kernel.org,
	bpf@vger.kernel.org, jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB32056583C4963342F5D817C4A6C80@BYAPR14MB3205.namprd14.prod.outlook.com>

On Thu, Jul 18, 2019 at 12:55:50PM +0000, ? jiang wrote:
> This change makes ring buffer reclaim threshold num_free configurable
> for better performance, while it's hard coded as 1/2 * queue now.
> According to our test with qemu + dpdk, packet dropping happens when
> the guest is not able to provide free buffer in avail ring timely.
> Smaller value of num_free does decrease the number of packet dropping
> during our test as it makes virtio_net reclaim buffer earlier.
> 
> At least, we should leave the value changeable to user while the
> default value as 1/2 * queue is kept.
> 
> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>

That would be one reason, but I suspect it's not the
true one. If you need more buffer due to jitter
then just increase the queue size. Would be cleaner.


However are you sure this is the reason for
packet drops? Do you see them dropped by dpdk
due to lack of space in the ring? As opposed to
by guest?


> ---
>  drivers/net/virtio_net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0d4115c9e20b..bc190dec6084 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,6 +26,9 @@
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>  
> +static int min_numfree;
> +module_param(min_numfree, int, 0444);
> +
>  static bool csum = true, gso = true, napi_tx;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
> @@ -1315,6 +1318,9 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  	void *buf;
>  	int i;
>  
> +	if (!min_numfree)
> +		min_numfree = virtqueue_get_vring_size(rq->vq) / 2;
> +
>  	if (!vi->big_packets || vi->mergeable_rx_bufs) {
>  		void *ctx;
>  
> @@ -1331,7 +1337,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		}
>  	}
>  
> -	if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
> +	if (rq->vq->num_free > min_numfree) {
>  		if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>  			schedule_delayed_work(&vi->refill, 0);
>  	}
> -- 
> 2.11.0

^ permalink raw reply

* Re: [PATCH] cxgb4: Prefer pcie_capability_read_word()
From: Bjorn Helgaas @ 2019-07-18 13:50 UTC (permalink / raw)
  To: Frederick Lawler; +Cc: vishal, netdev, linux-kernel, Bjorn Helgaas
In-Reply-To: <20190718020745.8867-1-fred@fredlawl.com>

On Wed, Jul 17, 2019 at 9:08 PM Frederick Lawler <fred@fredlawl.com> wrote:
>
> Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
> added accessors for the PCI Express Capability so that drivers didn't
> need to be aware of differences between v1 and v2 of the PCI
> Express Capability.
>
> Replace pci_read_config_word() and pci_write_config_word() calls with
> pcie_capability_read_word() and pcie_capability_write_word().
>
> Signed-off-by: Frederick Lawler <fred@fredlawl.com>

Nice job on all these patches!  These all help avoid errors and
identify possibilities for refactoring.

If there were a cover letter for the series, I would have replied to
that, but for all of them:

Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>

If you post the series again for any reason, you can add that.
Otherwise, whoever applies them can add my reviewed-by.

> ---
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 6 ++----
>  drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      | 9 +++------
>  2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 715e4edcf4a2..98ff71434673 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -5441,7 +5441,6 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
>                 char name[IFNAMSIZ];
>                 u32 devcap2;
>                 u16 flags;
> -               int pos;
>
>                 /* If we want to instantiate Virtual Functions, then our
>                  * parent bridge's PCI-E needs to support Alternative Routing
> @@ -5449,9 +5448,8 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
>                  * and above.
>                  */
>                 pbridge = pdev->bus->self;
> -               pos = pci_find_capability(pbridge, PCI_CAP_ID_EXP);
> -               pci_read_config_word(pbridge, pos + PCI_EXP_FLAGS, &flags);
> -               pci_read_config_dword(pbridge, pos + PCI_EXP_DEVCAP2, &devcap2);
> +               pcie_capability_read_word(pbridge, PCI_EXP_FLAGS, &flags);
> +               pcie_capability_read_dword(pbridge, PCI_EXP_DEVCAP2, &devcap2);
>
>                 if ((flags & PCI_EXP_FLAGS_VERS) < 2 ||
>                     !(devcap2 & PCI_EXP_DEVCAP2_ARI)) {
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> index f9b70be59792..346d7b59c50b 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> @@ -7267,7 +7267,6 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
>         } else {
>                 unsigned int pack_align;
>                 unsigned int ingpad, ingpack;
> -               unsigned int pcie_cap;
>
>                 /* T5 introduced the separation of the Free List Padding and
>                  * Packing Boundaries.  Thus, we can select a smaller Padding
> @@ -7292,8 +7291,7 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
>                  * multiple of the Maximum Payload Size.
>                  */
>                 pack_align = fl_align;
> -               pcie_cap = pci_find_capability(adap->pdev, PCI_CAP_ID_EXP);
> -               if (pcie_cap) {
> +               if (pci_is_pcie(adap->pdev)) {
>                         unsigned int mps, mps_log;
>                         u16 devctl;
>
> @@ -7301,9 +7299,8 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
>                          * [bits 7:5] encodes sizes as powers of 2 starting at
>                          * 128 bytes.
>                          */
> -                       pci_read_config_word(adap->pdev,
> -                                            pcie_cap + PCI_EXP_DEVCTL,
> -                                            &devctl);
> +                       pcie_capability_read_word(adap->pdev, PCI_EXP_DEVCTL,
> +                                                 &devctl);
>                         mps_log = ((devctl & PCI_EXP_DEVCTL_PAYLOAD) >> 5) + 7;
>                         mps = 1 << mps_log;
>                         if (mps > pack_align)
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive
From: Jason Wang @ 2019-07-18 14:01 UTC (permalink / raw)
  To: Michael S. Tsirkin, ? jiang
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, kafai@fb.com, songliubraving@fb.com,
	yhs@fb.com, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xdp-newbies@vger.kernel.org, bpf@vger.kernel.org,
	jiangran.jr@alibaba-inc.com
In-Reply-To: <20190718085836-mutt-send-email-mst@kernel.org>


On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 12:55:50PM +0000, ? jiang wrote:
>> This change makes ring buffer reclaim threshold num_free configurable
>> for better performance, while it's hard coded as 1/2 * queue now.
>> According to our test with qemu + dpdk, packet dropping happens when
>> the guest is not able to provide free buffer in avail ring timely.
>> Smaller value of num_free does decrease the number of packet dropping
>> during our test as it makes virtio_net reclaim buffer earlier.
>>
>> At least, we should leave the value changeable to user while the
>> default value as 1/2 * queue is kept.
>>
>> Signed-off-by: jiangkidd<jiangkidd@hotmail.com>
> That would be one reason, but I suspect it's not the
> true one. If you need more buffer due to jitter
> then just increase the queue size. Would be cleaner.
>
>
> However are you sure this is the reason for
> packet drops? Do you see them dropped by dpdk
> due to lack of space in the ring? As opposed to
> by guest?
>
>

Besides those, this patch depends on the user to choose a suitable 
threshold which is not good. You need either a good value with 
demonstrated numbers or something smarter.

Thanks


^ permalink raw reply

* Re: [PATCH] Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
From: David Ahern @ 2019-07-18 14:02 UTC (permalink / raw)
  To: Peter Kosyh; +Cc: davem, Shrijeet Mukherjee, netdev, linux-kernel
In-Reply-To: <20190718094114.13718-1-p.kosyh@gmail.com>

your subject line needs a proper Subject - a one-line summary of the
change starting with 'vrf:'. See examples from 'git log drivers/net/vrf.c'


On 7/18/19 3:41 AM, Peter Kosyh wrote:
> vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
> using ip/ipv6 addresses, but don't make sure the header is available in
> skb->data[] (skb_headlen() is less then header size).
> 
> The situation may occures while forwarding from MPLS layer to vrf, for
> example.

so the use case is a label pop with the nexthop as the VRF device?

> 
> So, this patch adds pskb_may_pull() calls in is_ip_tx_frame(), just before
> call to vrf_process_... functions.
> 
> Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
> ---
>  drivers/net/vrf.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 54edf8956a25..d552f29a58d1 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -292,13 +292,16 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
>  {
>  	switch (skb->protocol) {
>  	case htons(ETH_P_IP):
> +		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr))
> +			break;

that check goes in vrf_process_v4_outbound.

>  		return vrf_process_v4_outbound(skb, dev);
>  	case htons(ETH_P_IPV6):
> +		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr))
> +			break;

that check goes in vrf_process_v6_outbound

leave this higher level sorter untouched.

>  		return vrf_process_v6_outbound(skb, dev);
> -	default:
> -		vrf_tx_error(dev, skb);
> -		return NET_XMIT_DROP;
>  	}
> +	vrf_tx_error(dev, skb);
> +	return NET_XMIT_DROP;
>  }
>  
>  static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
> 


^ permalink raw reply

* pull-request: wireless-drivers 2019-07-18
From: Kalle Valo @ 2019-07-18 14:03 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel

Hi Dave,

here are first fixes which have accumulated during the merge window.
This pull request is to net tree for 5.3. Please let me know if there
are any problems.

Kalle

The following changes since commit 76104862cccaeaa84fdd23e39f2610a96296291c:

  sky2: Disable MSI on P5W DH Deluxe (2019-07-14 13:45:54 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2019-07-18

for you to fetch changes up to 41a531ffa4c5aeb062f892227c00fabb3b4a9c91:

  rt2x00usb: fix rx queue hang (2019-07-15 20:52:18 +0300)

----------------------------------------------------------------
wireless-drivers fixes for 5.3

First set of fixes for 5.3.

iwlwifi

* add new cards for 9000 and 20000 series and qu c-step devices

ath10k

* workaround an uninitialised variable warning

rt2x00

* fix rx queue hand on USB

----------------------------------------------------------------
Arnd Bergmann (1):
      ath10k: work around uninitialized vht_pfr variable

Ihab Zhaika (1):
      iwlwifi: add new cards for 9000 and 20000 series

Luca Coelho (1):
      iwlwifi: pcie: add support for qu c-step devices

Soeren Moch (1):
      rt2x00usb: fix rx queue hang

 drivers/net/wireless/ath/ath10k/mac.c           |  2 +
 drivers/net/wireless/intel/iwlwifi/cfg/22000.c  | 53 +++++++++++++++++++++++++
 drivers/net/wireless/intel/iwlwifi/iwl-config.h |  7 ++++
 drivers/net/wireless/intel/iwlwifi/iwl-csr.h    |  2 +
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c   | 23 +++++++++++
 drivers/net/wireless/ralink/rt2x00/rt2x00usb.c  | 12 +++---
 6 files changed, 93 insertions(+), 6 deletions(-)

^ permalink raw reply

* Re: [net-next 1/2] ipvs: batch __ip_vs_cleanup
From: Haishuang Yan @ 2019-07-18 14:16 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: David S. Miller, Pablo Neira Ayuso, Simon Horman, netdev,
	lvs-devel, linux-kernel, netfilter-devel
In-Reply-To: <alpine.LFD.2.21.1907152333300.5700@ja.home.ssi.bg>


> On 2019年7月16日, at 上午4:39, Julian Anastasov <ja@ssi.bg> wrote:
> 
> 
> 	Hello,
> 
> On Sat, 13 Jul 2019, Haishuang Yan wrote:
> 
>> It's better to batch __ip_vs_cleanup to speedup ipvs
>> connections dismantle.
>> 
>> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
>> ---
>> include/net/ip_vs.h             |  2 +-
>> net/netfilter/ipvs/ip_vs_core.c | 29 +++++++++++++++++------------
>> net/netfilter/ipvs/ip_vs_ctl.c  | 13 ++++++++++---
>> 3 files changed, 28 insertions(+), 16 deletions(-)
>> 
>> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
>> index 3759167..93e7a25 100644
>> --- a/include/net/ip_vs.h
>> +++ b/include/net/ip_vs.h
>> @@ -1324,7 +1324,7 @@ static inline void ip_vs_control_del(struct ip_vs_conn *cp)
>> void ip_vs_control_net_cleanup(struct netns_ipvs *ipvs);
>> void ip_vs_estimator_net_cleanup(struct netns_ipvs *ipvs);
>> void ip_vs_sync_net_cleanup(struct netns_ipvs *ipvs);
>> -void ip_vs_service_net_cleanup(struct netns_ipvs *ipvs);
>> +void ip_vs_service_nets_cleanup(struct list_head *net_list);
>> 
>> /* IPVS application functions
>>  * (from ip_vs_app.c)
>> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
>> index 46f06f9..b4d79b7 100644
>> --- a/net/netfilter/ipvs/ip_vs_core.c
>> +++ b/net/netfilter/ipvs/ip_vs_core.c
>> @@ -2402,18 +2402,23 @@ static int __net_init __ip_vs_init(struct net *net)
>> 	return -ENOMEM;
>> }
>> 
>> -static void __net_exit __ip_vs_cleanup(struct net *net)
>> +static void __net_exit __ip_vs_cleanup_batch(struct list_head *net_list)
>> {
>> -	struct netns_ipvs *ipvs = net_ipvs(net);
>> -
>> -	ip_vs_service_net_cleanup(ipvs);	/* ip_vs_flush() with locks */
>> -	ip_vs_conn_net_cleanup(ipvs);
>> -	ip_vs_app_net_cleanup(ipvs);
>> -	ip_vs_protocol_net_cleanup(ipvs);
>> -	ip_vs_control_net_cleanup(ipvs);
>> -	ip_vs_estimator_net_cleanup(ipvs);
>> -	IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen);
>> -	net->ipvs = NULL;
>> +	struct netns_ipvs *ipvs;
>> +	struct net *net;
>> +	LIST_HEAD(list);
>> +
>> +	ip_vs_service_nets_cleanup(net_list);	/* ip_vs_flush() with locks */
>> +	list_for_each_entry(net, net_list, exit_list) {
> 
> 	How much faster is to replace list_for_each_entry in
> ops_exit_list() with this one. IPVS can waste time in calls
> such as kthread_stop() and del_timer_sync() but I'm not sure
> we can solve it easily. What gain do you see in benchmarks?

Hi, 

As the following benchmark testing results show, there is a little performance improvement:

$  cat add_del_unshare.sh
#!/bin/bash

for i in `seq 1 100`
    do
     (for j in `seq 1 40` ; do  unshare -n ipvsadm -A -t 172.16.$i.$j:80 >/dev/null ; done) &
    done
wait; grep net_namespace /proc/slabinfo

Befor patch:
$  time sh add_del_unshare.sh
net_namespace       4020   4020   4736    6    8 : tunables    0    0    0 : slabdata    670    670      0

real    0m8.086s
user    0m2.025s
sys     0m36.956s

After patch:
$  time sh add_del_unshare.sh
net_namespace       4020   4020   4736    6    8 : tunables    0    0    0 : slabdata    670    670      0

real    0m7.623s
user    0m2.003s
sys     0m32.935s


> 
>> +		ipvs = net_ipvs(net);
>> +		ip_vs_conn_net_cleanup(ipvs);
>> +		ip_vs_app_net_cleanup(ipvs);
>> +		ip_vs_protocol_net_cleanup(ipvs);
>> +		ip_vs_control_net_cleanup(ipvs);
>> +		ip_vs_estimator_net_cleanup(ipvs);
>> +		IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen);
>> +		net->ipvs = NULL;
>> +	}
>> }
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 




^ permalink raw reply

* [PATCH bpf] tools/bpf: fix bpftool build with OUTPUT set
From: Ilya Leoshkevich @ 2019-07-18 14:20 UTC (permalink / raw)
  To: bpf, netdev, lmb; +Cc: gor, heiko.carstens, Ilya Leoshkevich
In-Reply-To: <CACAyw9-CWRHVH3TJ=Tke2x8YiLsH47sLCijdp=V+5M836R9aAA@mail.gmail.com>

Hi Lorenz,

I've been using the following patch for quite some time now.
Please let me know if it works for you.

Best regards,
Ilya

---

When OUTPUT is set, bpftool and libbpf put their objects into the same
directory, and since some of them have the same names, the collision
happens.

Fix by invoking libbpf build in a manner similar to $(call descend) -
descend itself cannot be used, since libbpf is a sibling, and not a
child, of bpftool.

Also, don't link bpftool with libbpf.a twice.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/bpf/bpftool/Makefile | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index a7afea4dec47..2cbc3c166f44 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -15,23 +15,18 @@ else
 endif
 
 BPF_DIR = $(srctree)/tools/lib/bpf/
-
-ifneq ($(OUTPUT),)
-  BPF_PATH = $(OUTPUT)
-else
-  BPF_PATH = $(BPF_DIR)
-endif
-
-LIBBPF = $(BPF_PATH)libbpf.a
+BPF_PATH = $(objtree)/tools/lib/bpf
+LIBBPF = $(BPF_PATH)/libbpf.a
 
 BPFTOOL_VERSION := $(shell make --no-print-directory -sC ../../.. kernelversion)
 
 $(LIBBPF): FORCE
-	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a
+	$(Q)mkdir -p $(BPF_PATH)
+	$(Q)$(MAKE) $(COMMAND_O) subdir=tools/lib/bpf -C $(BPF_DIR) $(LIBBPF)
 
 $(LIBBPF)-clean:
 	$(call QUIET_CLEAN, libbpf)
-	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) clean >/dev/null
+	$(Q)$(MAKE) $(COMMAND_O) subdir=tools/lib/bpf -C $(BPF_DIR) clean >/dev/null
 
 prefix ?= /usr/local
 bash_compdir ?= /usr/share/bash-completion/completions
@@ -112,7 +107,7 @@ $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c
 	$(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
 
 $(OUTPUT)bpftool: $(OBJS) $(LIBBPF)
-	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^ $(LIBS)
+	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJS) $(LIBS)
 
 $(OUTPUT)%.o: %.c
 	$(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net] ipv6: Unlink sibling route in case of failure
From: David Ahern @ 2019-07-18 14:21 UTC (permalink / raw)
  To: Ido Schimmel, netdev; +Cc: davem, alexpe, mlxsw, Ido Schimmel
In-Reply-To: <20190717203933.3073-1-idosch@idosch.org>

On 7/17/19 2:39 PM, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> When a route needs to be appended to an existing multipath route,
> fib6_add_rt2node() first appends it to the siblings list and increments
> the number of sibling routes on each sibling.
> 
> Later, the function notifies the route via call_fib6_entry_notifiers().
> In case the notification is vetoed, the route is not unlinked from the
> siblings list, which can result in a use-after-free.
> 
> Fix this by unlinking the route from the siblings list before returning
> an error.
> 
> Audited the rest of the call sites from which the FIB notification chain
> is called and could not find more problems.
> 
> Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
> ---
> Dave, this will not apply cleanly to stable trees due to recent changes
> in net-next. I can prepare another patch for stable if needed.
> ---
>  net/ipv6/ip6_fib.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 

Thanks for the fix, Ido. I can help with the ports as well.

Reviewed-by: David Ahern <dsahern@gmail.com>


^ permalink raw reply

* [PATCH] net: fec: generate warning when using deprecated phy reset
From: Sven Van Asbroeck @ 2019-07-18 14:34 UTC (permalink / raw)
  To: Fugang Duan; +Cc: David S . Miller, netdev, linux-kernel

Allowing the fec to reset its PHY via the phy-reset-gpios
devicetree property is deprecated. To improve developer
awareness, generate a warning whenever the deprecated
property is used.

Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 38f10f7dcbc3..00e1b5e4ef71 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3244,6 +3244,12 @@ static int fec_reset_phy(struct platform_device *pdev)
 	else if (!gpio_is_valid(phy_reset))
 		return 0;
 
+	/* Recommended way to provide a PHY reset:
+	 * - create a phy devicetree node, and link it to its fec (phy-handle)
+	 * - add your reset gpio to the phy devicetree node
+	 */
+	dev_warn(&pdev->dev, "devicetree: phy-reset-gpios is deprecated\n");
+
 	err = of_property_read_u32(np, "phy-reset-post-delay", &phy_post_delay);
 	/* valid reset duration should be less than 1s */
 	if (!err && phy_post_delay > 1000)
-- 
2.17.1


^ permalink raw reply related

* [PATCH] MAINTAINERS: update netsec driver
From: Ilias Apalodimas @ 2019-07-18 14:38 UTC (permalink / raw)
  To: netdev, jaswinder.singh, davem; +Cc: ard.biesheuvel, Ilias Apalodimas

Add myself to maintainers since i provided the XDP and page_pool
implementation

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 211ea3a199bd..64f659d8346c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14789,6 +14789,7 @@ F:	Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
 
 SOCIONEXT (SNI) NETSEC NETWORK DRIVER
 M:	Jassi Brar <jaswinder.singh@linaro.org>
+M:	Ilias Apalodimas <ilias.apalodimas@linaro.org>
 L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/socionext/netsec.c
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive
From: Michael S. Tsirkin @ 2019-07-18 14:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: ? jiang, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, xdp-newbies@vger.kernel.org,
	bpf@vger.kernel.org, jiangran.jr@alibaba-inc.com
In-Reply-To: <bdd30ef5-4f69-8218-eed0-38c6daac42db@redhat.com>

On Thu, Jul 18, 2019 at 10:01:05PM +0800, Jason Wang wrote:
> 
> On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
> > On Thu, Jul 18, 2019 at 12:55:50PM +0000, ? jiang wrote:
> > > This change makes ring buffer reclaim threshold num_free configurable
> > > for better performance, while it's hard coded as 1/2 * queue now.
> > > According to our test with qemu + dpdk, packet dropping happens when
> > > the guest is not able to provide free buffer in avail ring timely.
> > > Smaller value of num_free does decrease the number of packet dropping
> > > during our test as it makes virtio_net reclaim buffer earlier.
> > > 
> > > At least, we should leave the value changeable to user while the
> > > default value as 1/2 * queue is kept.
> > > 
> > > Signed-off-by: jiangkidd<jiangkidd@hotmail.com>
> > That would be one reason, but I suspect it's not the
> > true one. If you need more buffer due to jitter
> > then just increase the queue size. Would be cleaner.
> > 
> > 
> > However are you sure this is the reason for
> > packet drops? Do you see them dropped by dpdk
> > due to lack of space in the ring? As opposed to
> > by guest?
> > 
> > 
> 
> Besides those, this patch depends on the user to choose a suitable threshold
> which is not good. You need either a good value with demonstrated numbers or
> something smarter.
> 
> Thanks

I do however think that we have a problem right now: try_fill_recv can
take up a long time during which net stack does not run at all. Imagine
a 1K queue - we are talking 512 packets. That's exceessive.  napi poll
weight solves a similar problem, so it might make sense to cap this at
napi_poll_weight.

Which will allow tweaking it through a module parameter as a
side effect :) Maybe just do NAPI_POLL_WEIGHT.

Need to be careful though: queues can also be small and I don't think we
want to exceed queue size / 2, or maybe queue size - napi_poll_weight.
Definitely must not exceed the full queue size.

-- 
MST

^ permalink raw reply

* Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive
From: Michael S. Tsirkin @ 2019-07-18 14:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: ? jiang, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, xdp-newbies@vger.kernel.org,
	bpf@vger.kernel.org, jiangran.jr@alibaba-inc.com
In-Reply-To: <20190718103641-mutt-send-email-mst@kernel.org>

On Thu, Jul 18, 2019 at 10:42:47AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 10:01:05PM +0800, Jason Wang wrote:
> > 
> > On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
> > > On Thu, Jul 18, 2019 at 12:55:50PM +0000, ? jiang wrote:
> > > > This change makes ring buffer reclaim threshold num_free configurable
> > > > for better performance, while it's hard coded as 1/2 * queue now.
> > > > According to our test with qemu + dpdk, packet dropping happens when
> > > > the guest is not able to provide free buffer in avail ring timely.
> > > > Smaller value of num_free does decrease the number of packet dropping
> > > > during our test as it makes virtio_net reclaim buffer earlier.
> > > > 
> > > > At least, we should leave the value changeable to user while the
> > > > default value as 1/2 * queue is kept.
> > > > 
> > > > Signed-off-by: jiangkidd<jiangkidd@hotmail.com>
> > > That would be one reason, but I suspect it's not the
> > > true one. If you need more buffer due to jitter
> > > then just increase the queue size. Would be cleaner.
> > > 
> > > 
> > > However are you sure this is the reason for
> > > packet drops? Do you see them dropped by dpdk
> > > due to lack of space in the ring? As opposed to
> > > by guest?
> > > 
> > > 
> > 
> > Besides those, this patch depends on the user to choose a suitable threshold
> > which is not good. You need either a good value with demonstrated numbers or
> > something smarter.
> > 
> > Thanks
> 
> I do however think that we have a problem right now: try_fill_recv can
> take up a long time during which net stack does not run at all. Imagine
> a 1K queue - we are talking 512 packets. That's exceessive.  napi poll
> weight solves a similar problem, so it might make sense to cap this at
> napi_poll_weight.
> 
> Which will allow tweaking it through a module parameter as a
> side effect :) Maybe just do NAPI_POLL_WEIGHT.

Or maybe NAPI_POLL_WEIGHT/2 like we do at half the queue ;). Please
experiment, measure performance and let the list know

> Need to be careful though: queues can also be small and I don't think we
> want to exceed queue size / 2, or maybe queue size - napi_poll_weight.
> Definitely must not exceed the full queue size.
> 
> -- 
> MST

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: update netsec driver
From: Jassi Brar @ 2019-07-18 14:52 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: <netdev@vger.kernel.org>, David S. Miller, Ard Biesheuvel
In-Reply-To: <1563460710-28454-1-git-send-email-ilias.apalodimas@linaro.org>

On Thu, 18 Jul 2019 at 09:38, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> Add myself to maintainers since i provided the XDP and page_pool
> implementation
>
Yes, please.

Acked-by: Jassi Brar <jaswinder.singh@linaro.org>

> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply

* Re: [PATCH net-next iproute2 v2 0/3] net/sched: Introduce tc connection tracking
From: Paul Blakey @ 2019-07-18 15:00 UTC (permalink / raw)
  To: Jiri Pirko, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	Marcelo Ricardo Leitner, netdev@vger.kernel.org, David Miller,
	Aaron Conole, Zhike Wang
  Cc: Rony Efraim, nst-kernel@redhat.com, John Hurley, Simon Horman,
	Justin Pettit
In-Reply-To: <1562832867-32347-1-git-send-email-paulb@mellanox.com>

Hey guys,

any more comments?

thanks,

Paul.



^ permalink raw reply

* [PATCH bpf v2] bpf: fix narrower loads on s390
From: Ilya Leoshkevich @ 2019-07-18 15:01 UTC (permalink / raw)
  To: bpf, netdev, ys114321; +Cc: gor, heiko.carstens, Ilya Leoshkevich

The very first check in test_pkt_md_access is failing on s390, which
happens because loading a part of a struct __sk_buff field produces
an incorrect result.

The preprocessed code of the check is:

{
	__u8 tmp = *((volatile __u8 *)&skb->len +
		((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
	if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
};

clang generates the following code for it:

      0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
      1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
      2:	57 30 00 00 00 00 00 ff	r3 &= 255
      3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 <LBB0_10>

Finally, verifier transforms it to:

  0: (61) r2 = *(u32 *)(r1 +104)
  1: (bc) w2 = w2
  2: (74) w2 >>= 24
  3: (bc) w2 = w2
  4: (54) w2 &= 255
  5: (bc) w2 = w2

The problem is that when verifier emits the code to replace a partial
load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
bitwise AND, it assumes that the machine is little endian and
incorrectly decides to use a shift.

Adjust shift count calculation to account for endianness.

Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 include/linux/filter.h | 13 +++++++++++++
 kernel/bpf/verifier.c  |  4 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ff65d22cf336..4fe88e43f0fe 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -24,6 +24,8 @@
 
 #include <net/sch_generic.h>
 
+#include <asm/byteorder.h>
+
 #include <uapi/linux/filter.h>
 #include <uapi/linux/bpf.h>
 
@@ -1216,4 +1218,15 @@ struct bpf_sockopt_kern {
 	s32		retval;
 };
 
+static inline u8 bpf_narrower_load_shift(u32 size_default, u32 size, u32 off)
+{
+	u8 load_off = off & (size_default - 1);
+
+#ifdef __LITTLE_ENDIAN
+	return load_off * 8;
+#else
+	return (size_default - (load_off + size)) * 8;
+#endif
+}
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5900cbb966b1..48edc9c9a879 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8616,8 +8616,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		}
 
 		if (is_narrower_load && size < target_size) {
-			u8 shift = (off & (size_default - 1)) * 8;
-
+			u8 shift = bpf_narrower_load_shift(size_default, size,
+							   off);
 			if (ctx_field_size <= 4) {
 				if (shift)
 					insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
-- 
2.21.0


^ permalink raw reply related

* Re: regression with napi/softirq ?
From: Eric Dumazet @ 2019-07-18 15:08 UTC (permalink / raw)
  To: Sudip Mukherjee, Eric Dumazet
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), David S. Miller, netdev,
	linux-kernel
In-Reply-To: <CADVatmPQRf9A9z1LbHe5cd+bFLrPGG12YxPh2-yXAj_C9s8ZeA@mail.gmail.com>



On 7/18/19 2:55 PM, Sudip Mukherjee wrote:

> Thanks Eric. But there is no improvement in delay between
> softirq_raise and softirq_entry with this change.
> But moving to a later kernel (linus master branch? ) like Thomas has
> said in the other mail might be difficult atm. I can definitely
> move to v4.14.133 if that helps. Thomas ?

If you are tracking max latency then I guess you have to tweak SOFTIRQ_NOW_MASK
to include NET_RX_SOFTIRQ

The patch I gave earlier would only lower the probability of events, not completely get rid of them.



diff --git a/kernel/softirq.c b/kernel/softirq.c
index 0427a86743a46b7e1891f7b6c1ff585a8a1695f5..302046dd8d7e6740e466c422954f22565fe19e69 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -81,7 +81,7 @@ static void wakeup_softirqd(void)
  * right now. Let ksoftirqd handle this at its own rate, to get fairness,
  * unless we're doing some of the synchronous softirqs.
  */
-#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
+#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << NET_RX_SOFTIRQ))
 static bool ksoftirqd_running(unsigned long pending)
 {
        struct task_struct *tsk = __this_cpu_read(ksoftirqd);




^ permalink raw reply related

* Re: [PATCH v6 0/5] net: macb: cover letter
From: Andrew Lunn @ 2019-07-18 15:13 UTC (permalink / raw)
  To: Parshuram Thombare
  Cc: nicolas.ferre, davem, f.fainelli, linux, netdev, hkallweit1,
	linux-kernel, rafalc, piotrs, aniljoy, arthurm, stevenh, mparab
In-Reply-To: <1562769391-31803-1-git-send-email-pthombar@cadence.com>

On Wed, Jul 10, 2019 at 03:36:31PM +0100, Parshuram Thombare wrote:
> Hello !
> 
> This is 6th version of patch set containing following patches
> for Cadence ethernet controller driver.

Hi Parshuram

One thing which was never clear is how you are testing the features
you are adding. Please could you describe your test setup and how each
new feature is tested using that hardware. I'm particularly interested
in what C45 device are you using? But i expect Russell would like to
know more about SFP modules you are using. Do you have any which
require 1000BaseX, 2500BaseX, or provide copper 1G?

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH bpf v2] bpf: fix narrower loads on s390
From: Y Song @ 2019-07-18 15:24 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, netdev, gor, heiko.carstens
In-Reply-To: <20190718150103.84837-1-iii@linux.ibm.com>

On Thu, Jul 18, 2019 at 8:01 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> The very first check in test_pkt_md_access is failing on s390, which
> happens because loading a part of a struct __sk_buff field produces
> an incorrect result.
>
> The preprocessed code of the check is:
>
> {
>         __u8 tmp = *((volatile __u8 *)&skb->len +
>                 ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
>         if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
> };
>
> clang generates the following code for it:
>
>       0:        71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3)
>       1:        61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0)
>       2:        57 30 00 00 00 00 00 ff r3 &= 255
>       3:        5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10>
>
> Finally, verifier transforms it to:
>
>   0: (61) r2 = *(u32 *)(r1 +104)
>   1: (bc) w2 = w2
>   2: (74) w2 >>= 24
>   3: (bc) w2 = w2
>   4: (54) w2 &= 255
>   5: (bc) w2 = w2
>
> The problem is that when verifier emits the code to replace a partial
> load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
> struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
> bitwise AND, it assumes that the machine is little endian and
> incorrectly decides to use a shift.
>
> Adjust shift count calculation to account for endianness.
>
> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>

Acked-by: Yonghong Song <yhs@fb.com>

> ---
>  include/linux/filter.h | 13 +++++++++++++
>  kernel/bpf/verifier.c  |  4 ++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index ff65d22cf336..4fe88e43f0fe 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -24,6 +24,8 @@
>
>  #include <net/sch_generic.h>
>
> +#include <asm/byteorder.h>
> +
>  #include <uapi/linux/filter.h>
>  #include <uapi/linux/bpf.h>
>
> @@ -1216,4 +1218,15 @@ struct bpf_sockopt_kern {
>         s32             retval;
>  };
>
> +static inline u8 bpf_narrower_load_shift(u32 size_default, u32 size, u32 off)
> +{
> +       u8 load_off = off & (size_default - 1);
> +
> +#ifdef __LITTLE_ENDIAN
> +       return load_off * 8;
> +#else
> +       return (size_default - (load_off + size)) * 8;
> +#endif
> +}
> +
>  #endif /* __LINUX_FILTER_H__ */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 5900cbb966b1..48edc9c9a879 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8616,8 +8616,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                 }
>
>                 if (is_narrower_load && size < target_size) {
> -                       u8 shift = (off & (size_default - 1)) * 8;
> -
> +                       u8 shift = bpf_narrower_load_shift(size_default, size,
> +                                                          off);
>                         if (ctx_field_size <= 4) {
>                                 if (shift)
>                                         insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
> --
> 2.21.0
>

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: update netsec driver
From: Ard Biesheuvel @ 2019-07-18 16:06 UTC (permalink / raw)
  To: Jassi Brar
  Cc: Ilias Apalodimas, <netdev@vger.kernel.org>, David S. Miller
In-Reply-To: <CAJe_ZhdFBUWQwf+OcDX_0_wYTpTqHJvqJi2QE3CP+8rwXCLjMw@mail.gmail.com>

On Thu, 18 Jul 2019 at 16:52, Jassi Brar <jaswinder.singh@linaro.org> wrote:
>
> On Thu, 18 Jul 2019 at 09:38, Ilias Apalodimas
> <ilias.apalodimas@linaro.org> wrote:
> >
> > Add myself to maintainers since i provided the XDP and page_pool
> > implementation
> >
> Yes, please.
>
> Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
>

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

^ permalink raw reply

* [PATCH] openvswitch: Fix a possible memory leak on dst_cache
From: Haishuang Yan @ 2019-07-18 16:07 UTC (permalink / raw)
  To: Pravin B Shelar, David S. Miller; +Cc: netdev, linux-kernel, Haishuang Yan

dst_cache should be destroyed when fail to add flow actions.

Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/openvswitch/flow_netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d7559c6..1fd1cdd 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2608,6 +2608,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 			 sizeof(*ovs_tun), log);
 	if (IS_ERR(a)) {
 		dst_release((struct dst_entry *)tun_dst);
+		dst_cache_destroy(&tun_dst->u.tun_info.dst_cache);
 		return PTR_ERR(a);
 	}
 
-- 
1.8.3.1




^ permalink raw reply related

* Re: [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Richard Cochran @ 2019-07-18 16:41 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <87ftn3iuqp.fsf@linux.intel.com>

On Thu, Jul 18, 2019 at 11:59:10AM +0300, Felipe Balbi wrote:
> no problem, anything in particular in mind? Just create new versions of
> all the IOCTLs so we can actually use the reserved fields in the future?

Yes, please!

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH] net: dsa: sja1105: Fix missing unlock on error in sk_buff()
From: Vivien Didelot @ 2019-07-18 16:42 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Andrew Lunn, Florian Fainelli, Vladimir Oltean, Wei Yongjun,
	netdev, kernel-janitors
In-Reply-To: <20190717062956.127446-1-weiyongjun1@huawei.com>

On Wed, 17 Jul 2019 06:29:56 +0000, Wei Yongjun <weiyongjun1@huawei.com> wrote:
> Add the missing unlock before return from function sk_buff()
> in the error handling case.
> 
> Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>

^ permalink raw reply

* Re: [PATCH] net: fec: generate warning when using deprecated phy reset
From: Lucas Stach @ 2019-07-18 16:47 UTC (permalink / raw)
  To: Sven Van Asbroeck, Fugang Duan; +Cc: David S . Miller, netdev, linux-kernel
In-Reply-To: <20190718143428.2392-1-TheSven73@gmail.com>

Am Donnerstag, den 18.07.2019, 10:34 -0400 schrieb Sven Van Asbroeck:
> Allowing the fec to reset its PHY via the phy-reset-gpios
> devicetree property is deprecated. To improve developer
> awareness, generate a warning whenever the deprecated
> property is used.

Not really a fan of this. This will cause existing DTs, which are
provided by the firmware in an ideal world and may not change at the
same rate as the kernel, to generate a warning with new kernels. Not
really helpful from the user experience point of view.

Regards,
Lucas

> Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index 38f10f7dcbc3..00e1b5e4ef71 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -3244,6 +3244,12 @@ static int fec_reset_phy(struct platform_device *pdev)
> >  	else if (!gpio_is_valid(phy_reset))
> >  		return 0;
>  
> > +	/* Recommended way to provide a PHY reset:
> > +	 * - create a phy devicetree node, and link it to its fec (phy-handle)
> > +	 * - add your reset gpio to the phy devicetree node
> > +	 */
> > +	dev_warn(&pdev->dev, "devicetree: phy-reset-gpios is deprecated\n");
> +
> >  	err = of_property_read_u32(np, "phy-reset-post-delay", &phy_post_delay);
> >  	/* valid reset duration should be less than 1s */
> >  	if (!err && phy_post_delay > 1000)

^ permalink raw reply

* Re: KASAN: use-after-free Read in nr_insert_socket
From: Cong Wang @ 2019-07-18 16:48 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, linux-hams, LKML, Linux Kernel Network Developers,
	Ralf Baechle, syzkaller-bugs
In-Reply-To: <00000000000035f65d058df39aed@google.com>

On Thu, Jul 18, 2019 at 5:18 AM syzbot
<syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    a5b64700 fix: taprio: Change type of txtime-delay paramete..
> git tree:       net
> console output: https://syzkaller.appspot.com/x/log.txt?x=1588b458600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=87305c3ca9c25c70
> dashboard link: https://syzkaller.appspot.com/bug?extid=9399c158fcc09b21d0d2
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=105a61a4600000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=153ef948600000
>
> The bug was bisected to:
>
> commit c8c8218ec5af5d2598381883acbefbf604e56b5e
> Author: Cong Wang <xiyou.wangcong@gmail.com>
> Date:   Thu Jun 27 21:30:58 2019 +0000
>
>      netrom: fix a memory leak in nr_rx_frame()
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=159ef948600000
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=179ef948600000
> console output: https://syzkaller.appspot.com/x/log.txt?x=139ef948600000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com
> Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")
>
> ==================================================================
> BUG: KASAN: use-after-free in atomic_read
> /./include/asm-generic/atomic-instrumented.h:26 [inline]
> BUG: KASAN: use-after-free in refcount_inc_not_zero_checked+0x81/0x200
> /lib/refcount.c:123
> Read of size 4 at addr ffff8880a5d3f380 by task swapper/1/0
>
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0+ #89
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>   <IRQ>
>   __dump_stack /lib/dump_stack.c:77 [inline]
>   dump_stack+0x172/0x1f0 /lib/dump_stack.c:113
>   print_address_description.cold+0xd4/0x306 /mm/kasan/report.c:351
>   __kasan_report.cold+0x1b/0x36 /mm/kasan/report.c:482
>   kasan_report+0x12/0x20 /mm/kasan/common.c:612
>   check_memory_region_inline /mm/kasan/generic.c:185 [inline]
>   check_memory_region+0x134/0x1a0 /mm/kasan/generic.c:192
>   __kasan_check_read+0x11/0x20 /mm/kasan/common.c:92
>   atomic_read /./include/asm-generic/atomic-instrumented.h:26 [inline]
>   refcount_inc_not_zero_checked+0x81/0x200 /lib/refcount.c:123
>   refcount_inc_checked+0x17/0x70 /lib/refcount.c:156
>   sock_hold /./include/net/sock.h:649 [inline]
>   sk_add_node /./include/net/sock.h:701 [inline]
>   nr_insert_socket+0x2d/0xe0 /net/netrom/af_netrom.c:137


Looks like nr_insert_socket() doesn't hold a refcnt before inserting
it into a global list.

Let me think about how to fix this.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox