Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2] tg3: Be drop monitor friendly
From: David Miller @ 2017-08-25  1:28 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, edumazet, siva.kallam, prashant, mchan, linux-kernel
In-Reply-To: <20170825004711.19793-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 17:47:11 -0700

> tg3_tx() does the normal packet TX completion,
> tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
> new SKB that is suitable to workaround HW bugs, and finally
> tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
> these 3 locations to be SKB drop monitor friendly.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* RE: Question about ip_defrag
From: liujian (CE) @ 2017-08-25  1:33 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, elena.reshetova@intel.com,
	edumazet@google.com, netdev@vger.kernel.org, Wangkefeng (Kevin),
	weiyongjun (A)
In-Reply-To: <20170824205926.2c45e3a1@redhat.com>


> -----Original Message-----
> From: Jesper Dangaard Brouer [mailto:brouer@redhat.com]
> Sent: Friday, August 25, 2017 2:59 AM
> To: liujian (CE)
> Cc: davem@davemloft.net; kuznet@ms2.inr.ac.ru; yoshfuji@linux-ipv6.org;
> elena.reshetova@intel.com; edumazet@google.com; netdev@vger.kernel.org;
> brouer@redhat.com
> Subject: Re: Question about ip_defrag
> 
> 
> On Thu, 24 Aug 2017 16:04:41 +0000 "liujian (CE)" <liujian56@huawei.com>
> wrote:
> 
> > >What kernel version have you seen this issue with?
> >
> > 3.10,with some backport.
> >
> >  >As far as I remember, this issue have been fixed before...
> >
> > which one patch? I didnot find out the patch:(
> 
> AFAIK it was some bugs in the percpu_counter code.  If you need to backport
> look at the git commits:
> 
>  git log lib/percpu_counter.c include/linux/percpu_counter.h
> 
> Are you maintaining your own 3.10 kernel?
> 
> I know that for RHEL7 (also kernel 3.10) we backported the percpu_counter
> fixes...
>
Could you tell me which one patch?  we have backported most of the two files's change. 
Thank you ~


> --Jesper
> 
> 
> > 发件人: Jesper Dangaard Brouer
> > 收件人: liujian
> (CE)<liujian56@huawei.com<mailto:liujian56@huawei.com>>
> > 抄送:
> >
> davem@davemloft.net<mailto:davem@davemloft.net>;kuznet@ms2.inr.ac.ru
> <m
> > ailto:kuznet@ms2.inr.ac.ru>;yoshfuji@linux-ipv6.org<mailto:yoshfuji@li
> > nux-ipv6.org>;elena.reshetova@intel.com<mailto:elena.reshetova@intel.c
> >
> om>;edumazet@google.com<mailto:edumazet@google.com>;netdev@vger.k
> ernel
> > .org<mailto:netdev@vger.kernel.org>;brouer@redhat.com<mailto:brouer@r
> e
> > dhat.com>
> > 主题: Re: Question about ip_defrag
> > 时间: 2017-08-24 21:53:17
> >
> >
> > On Thu, 24 Aug 2017 13:15:33 +0000 "liujian (CE)" <liujian56@huawei.com>
> wrote:
> > > Hello,
> > >
> > > With below patch we met one issue.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > ommit/?h=v4.13-rc6&id=6d7b857d541e
> > >
> > > the issue:
> > > Ip_defrag fail caused by frag_mem_limit reached 4M(frags.high_thresh).
> > > At this moment,sum_frag_mem_limit is about 10K.
> > > and my test machine's cpu num is 64.
> > >
> > > Can i only change frag_mem_limit to sum_ frag_mem_limit?
> > >
> > >
> > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > > index 96e95e8..f09c00b 100644
> > > --- a/net/ipv4/inet_fragment.c
> > > +++ b/net/ipv4/inet_fragment.c
> > > @@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct
> > > inet_frags *f)  static bool inet_fragq_should_evict(const struct
> > > inet_frag_queue *q)  {
> > >         return q->net->low_thresh == 0 ||
> > > -              frag_mem_limit(q->net) >= q->net->low_thresh;
> > > +              sum_frag_mem_limit(q->net) >= q->net->low_thresh;
> > >  }
> > >
> > >  static unsigned int
> > > @@ -355,7 +355,7 @@ static struct inet_frag_queue
> > > *inet_frag_alloc(struct netns_frags *nf,  {
> > >         struct inet_frag_queue *q;
> > >
> > > -       if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
> > > +       if (!nf->high_thresh || sum_frag_mem_limit(nf) >
> > > + nf->high_thresh) {
> > >                 inet_frag_schedule_worker(f);
> > >                 return NULL;
> > >         }
> > > @@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct
> netns_frags *nf,
> > >         struct inet_frag_queue *q;
> > >         int depth = 0;
> > >
> > > -       if (frag_mem_limit(nf) > nf->low_thresh)
> > > +       if (sum_frag_mem_limit(nf) > nf->low_thresh)
> > >                 inet_frag_schedule_worker(f);
> > >
> > >         hash &= (INETFRAGS_HASHSZ - 1);
> > > --
> > >
> > > Thank you for your time.
> >
> > What kernel version have you seen this issue with?
> >
> > As far as I remember, this issue have been fixed before...
> >
> > --
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> 
> 
> 
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net 0/2] r8169: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:33 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli

Hi all,

First patch may be questionable but no other driver appears to be doing that
and while it is defendable to account for left packets as dropped during TX
clean, this appears misleadning. I picked Stanislaw changes which brings us
back to 2010, but this was present from pre-git days as well.

Second patch fixes the two missing calls to dev_consume_skb_any().

Florian Fainelli (2):
  r8169: Do not increment tx_dropped in TX ring cleaning
  r8169: Be drop monitor friendly

 drivers/net/ethernet/realtek/r8169.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net 0/2] r8169: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli

Hi all,

First patch may be questionable but no other driver appears to be doing that
and while it is defendable to account for left packets as dropped during TX
clean, this appears misleadning. I picked Stanislaw changes which brings us
back to 2010, but this was present from pre-git days as well.

Second patch fixes the two missing calls to dev_consume_skb_any().

Florian Fainelli (2):
  r8169: Do not increment tx_dropped in TX ring cleaning
  r8169: Be drop monitor friendly

 drivers/net/ethernet/realtek/r8169.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net 1/2] r8169: Do not increment tx_dropped in TX ring cleaning
From: Florian Fainelli @ 2017-08-25  1:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli
In-Reply-To: <20170825013444.27326-1-f.fainelli@gmail.com>

rtl8169_tx_clear_range() is responsible for cleaning up the TX ring
during interface shutdown, incrementing tx_dropped for every SKB that we
left at the time in the ring is misleading.

Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index bd07a15d3b7c..8a1bbd2a6a20 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6863,7 +6863,6 @@ static void rtl8169_tx_clear_range(struct rtl8169_private *tp, u32 start,
 			rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 					     tp->TxDescArray + entry);
 			if (skb) {
-				tp->dev->stats.tx_dropped++;
 				dev_kfree_skb_any(skb);
 				tx_skb->skb = NULL;
 			}
-- 
2.9.3

^ permalink raw reply related

* [PATCH net 2/2] r8169: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli
In-Reply-To: <20170825013444.27326-1-f.fainelli@gmail.com>

rtl_tx() is the TX reclamation process whereas rtl8169_tx_clear_range() does
the TX ring cleaning during shutdown, both of these functions should call
dev_consume_skb_any() to be drop monitor friendly.

Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Fixes: eb781397904e ("r8169: Do not use dev_kfree_skb in xmit path")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 8a1bbd2a6a20..e03fcf914690 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6863,7 +6863,7 @@ static void rtl8169_tx_clear_range(struct rtl8169_private *tp, u32 start,
 			rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 					     tp->TxDescArray + entry);
 			if (skb) {
-				dev_kfree_skb_any(skb);
+				dev_consume_skb_any(skb);
 				tx_skb->skb = NULL;
 			}
 		}
@@ -7318,7 +7318,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 			tp->tx_stats.packets++;
 			tp->tx_stats.bytes += tx_skb->skb->len;
 			u64_stats_update_end(&tp->tx_stats.syncp);
-			dev_kfree_skb_any(tx_skb->skb);
+			dev_consume_skb_any(tx_skb->skb);
 			tx_skb->skb = NULL;
 		}
 		dirty_tx++;
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next] e1000e: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:36 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, davem, Florian Fainelli, Jeff Kirsher,
	moderated list:INTEL ETHERNET DRIVERS, open list

e1000_put_txbuf() cleans up the successfully transmitted TX packets,
e1000e_tx_hwtstamp_work() also does the successfully completes the
timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
e1000_remove() cleans up the timestampted packets. None of these
functions should be reporting dropped packets, so make them use
dev_consume_skb*() to be drop monitor friendly.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 327dfe5bedc0..91303544975a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1085,7 +1085,7 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,
 		buffer_info->dma = 0;
 	}
 	if (buffer_info->skb) {
-		dev_kfree_skb_any(buffer_info->skb);
+		dev_consume_skb_any(buffer_info->skb);
 		buffer_info->skb = NULL;
 	}
 	buffer_info->time_stamp = 0;
@@ -1199,7 +1199,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct *work)
 		wmb(); /* force write prior to skb_tstamp_tx */
 
 		skb_tstamp_tx(skb, &shhwtstamps);
-		dev_kfree_skb_any(skb);
+		dev_consume_skb_any(skb);
 	} else if (time_after(jiffies, adapter->tx_hwtstamp_start
 			      + adapter->tx_timeout_factor * HZ)) {
 		dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
@@ -1716,7 +1716,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
 		}
 
 		if (buffer_info->skb) {
-			dev_kfree_skb(buffer_info->skb);
+			dev_consume_skb(buffer_info->skb);
 			buffer_info->skb = NULL;
 		}
 
@@ -1734,7 +1734,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
 
 	/* there also may be some cached data from a chained receive */
 	if (rx_ring->rx_skb_top) {
-		dev_kfree_skb(rx_ring->rx_skb_top);
+		dev_consume_skb(rx_ring->rx_skb_top);
 		rx_ring->rx_skb_top = NULL;
 	}
 
@@ -7411,7 +7411,7 @@ static void e1000_remove(struct pci_dev *pdev)
 	if (adapter->flags & FLAG_HAS_HW_TIMESTAMP) {
 		cancel_work_sync(&adapter->tx_hwtstamp_work);
 		if (adapter->tx_hwtstamp_skb) {
-			dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
+			dev_consume_skb_any(adapter->tx_hwtstamp_skb);
 			adapter->tx_hwtstamp_skb = NULL;
 		}
 	}
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next] net: mv643xx_eth: Be drop monitor friendly
From: David Miller @ 2017-08-25  1:52 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, edumazet, andrew, sebastian.hesselbarth, linux-kernel
In-Reply-To: <20170824.182756.386299059058749250.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 24 Aug 2017 18:27:56 -0700 (PDT)

> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Thu, 24 Aug 2017 16:04:25 -0700
> 
>> txq_reclaim() does the normal transmit queue reclamation and
>> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
>> so use dev_consume_skb() for both locations.
>> 
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Applied.

-ENOCOMPILE?

drivers/net/ethernet/marvell/mv643xx_eth.c: In function ‘txq_reclaim’:
drivers/net/ethernet/marvell/mv643xx_eth.c:1124:5: error: implicit declaration of function ‘dev_consume_skb’; did you mean ‘dev_consume_skb_any’? [-Werror=implicit-function-declaration]
     dev_consume_skb(skb);
     ^~~~~~~~~~~~~~~
     dev_consume_skb_any

Reverted.

^ permalink raw reply

* Re: [PATCH] staging: rtlwifi: Improve debugging by using debugfs
From: Andrew Lunn @ 2017-08-25  1:54 UTC (permalink / raw)
  To: Larry Finger
  Cc: gregkh, netdev, devel, Ping-Ke Shih, Yan-Hsuan Chuang,
	Birming Chiu, Shaofu, Steven Ting
In-Reply-To: <20170824212808.26632-1-Larry.Finger@lwfinger.net>

On Thu, Aug 24, 2017 at 04:28:08PM -0500, Larry Finger wrote:
> The changes in this commit are also being sent to the main rtlwifi
> drivers in wireless-next; however, these changes will also be useful for
> any debugging of r8822be before it gets moved into the main tree.
> 
> Use debugfs to dump register and btcoex status, and also write registers
> and h2c.
> 
> We create topdir in /sys/kernel/debug/rtlwifi/, and use the MAC address
> as subdirectory with several entries to dump mac_reg, bb_reg, rf_reg etc.
> An example is
>     /sys/kernel/debug/rtlwifi/00-11-22-33-44-55-66/mac_0
> 
> This change permits examination of device registers in a dynamic manner,
> a feature not available with the current debug mechanism.

Hi Larry

netdev frowns upon debugfs. You should try to keep this altogether,
making it easy to throw away before the driver is moved out of
staging.

You might want to look at ethtool -d. That will be accepted.

    Andrew

^ permalink raw reply

* [PATCH] pktgen: add a new sample script for 40G and above link testing
From: Robert Hoo @ 2017-08-25  2:24 UTC (permalink / raw)
  To: robert.hu; +Cc: Robert Ho

From: Robert Ho <robert.hu@intel.com>

It's hard to benchmark 40G+ network bandwidth using ordinary
tools like iperf, netperf. I then tried with pktgen multiqueue sample
scripts, but still cannot reach line rate.
I then derived this NUMA awared irq affinity sample script from
multi-queue sample one, successfully benchmarked 40G link. I think this can
also be useful for 100G reference, though I haven't got device to test.

This script simply does:
Detect $DEV's NUMA node belonging.
Bind each thread (processor from that NUMA node) with each $DEV queue's
irq affinity, 1:1 mapping.
How many '-t' threads input determines how many queues will be
utilized.

Tested with Intel XL710 NIC with Cisco 3172 switch.

It would be even slightly better if the irqbalance service is turned
off outside.

Referrences:
https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf

Signed-off-by: Robert Hoo <robert.hu@intel.com>
---
 ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 +++++++++++++++++++++
 1 file changed, 132 insertions(+)
 create mode 100755 samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh

diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
new file mode 100755
index 0000000..f0ee25c
--- /dev/null
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Multiqueue: Using pktgen threads for sending on multiple CPUs
+#  * adding devices to kernel threads which are in the same NUMA node
+#  * bound devices queue's irq affinity to the threads, 1:1 mapping
+#  * notice the naming scheme for keeping device names unique
+#  * nameing scheme: dev@thread_number
+#  * flow variation via random UDP source port
+#
+basedir=`dirname $0`
+source ${basedir}/functions.sh
+root_check_run_with_sudo "$@"
+#
+# Required param: -i dev in $DEV
+source ${basedir}/parameters.sh
+
+get_iface_node()
+{
+	echo `cat /sys/class/net/$1/device/numa_node`
+}
+
+get_iface_irqs()
+{
+	local IFACE=$1
+	local queues="${IFACE}-.*TxRx"
+
+	irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
+	[ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
+	[ -z "$irqs" ] && irqs=$(for i in `ls -Ux /sys/class/net/$IFACE/device/msi_irqs` ;\
+		do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 -d : ;\
+	    done)
+	[ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"
+
+	echo $irqs
+}
+
+get_node_cpus()
+{
+	local node=$1
+	local node_cpu_list
+	local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \
+			/sys/devices/system/node/node$node/cpulist`
+
+	for cpu_range in $node_cpu_range_list
+	do
+		node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }`
+	done
+
+	echo $node_cpu_list
+}
+
+
+# Base Config
+DELAY="0"        # Zero means max speed
+COUNT="20000000"   # Zero means indefinitely
+[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
+
+# Flow variation random source port between min and max
+UDP_MIN=9
+UDP_MAX=109
+
+node=`get_iface_node $DEV`
+irq_array=(`get_iface_irqs $DEV`)
+cpu_array=(`get_node_cpus $node`)
+
+[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]}  ] && \
+	err 1 "Thread number $THREADS exceeds: min (${#irq_array[*]},${#cpu_array[*]})"
+
+# (example of setting default params in your script)
+if [ -z "$DEST_IP" ]; then
+    [ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
+fi
+[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+
+# General cleanup everything since last run
+pg_ctrl "reset"
+
+# Threads are specified with parameter -t value in $THREADS
+for ((i = 0; i < $THREADS; i++)); do
+    # The device name is extended with @name, using thread number to
+    # make then unique, but any name will do.
+    # Set the queue's irq affinity to this $thread (processor)
+    thread=${cpu_array[$i]}
+    dev=${DEV}@${thread}
+    echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list
+    echo "irq ${irq_array[$i]} is set affinity to `cat /proc/irq/${irq_array[$i]}/smp_affinity_list`"
+
+    # Add remove all other devices and add_device $dev to thread
+    pg_thread $thread "rem_device_all"
+    pg_thread $thread "add_device" $dev
+
+    # select queue and bind the queue and $dev in 1:1 relationship
+    queue_num=$i
+    echo "queue number is $queue_num"
+    pg_set $dev "queue_map_min $queue_num"
+    pg_set $dev "queue_map_max $queue_num"
+
+    # Notice config queue to map to cpu (mirrors smp_processor_id())
+    # It is beneficial to map IRQ /proc/irq/*/smp_affinity 1:1 to CPU number
+    pg_set $dev "flag QUEUE_MAP_CPU"
+
+    # Base config of dev
+    pg_set $dev "count $COUNT"
+    pg_set $dev "clone_skb $CLONE_SKB"
+    pg_set $dev "pkt_size $PKT_SIZE"
+    pg_set $dev "delay $DELAY"
+
+    # Flag example disabling timestamping
+    pg_set $dev "flag NO_TIMESTAMP"
+
+    # Destination
+    pg_set $dev "dst_mac $DST_MAC"
+    pg_set $dev "dst$IP6 $DEST_IP"
+
+    # Setup random UDP port src range
+    pg_set $dev "flag UDPSRC_RND"
+    pg_set $dev "udp_src_min $UDP_MIN"
+    pg_set $dev "udp_src_max $UDP_MAX"
+done
+
+# start_run
+echo "Running... ctrl^C to stop" >&2
+pg_ctrl "start"
+echo "Done" >&2
+
+# Print results
+for ((i = 0; i < $THREADS; i++)); do
+    thread=${cpu_array[$i]}
+    dev=${DEV}@${thread}
+    echo "Device: $dev"
+    cat /proc/net/pktgen/$dev | grep -A2 "Result:"
+done
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Chen-Yu Tsai @ 2017-08-25  2:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Corentin Labbe, Maxime Ripard, Chen-Yu Tsai, Andrew Lunn,
	Rob Herring, Mark Rutland, Russell King, Giuseppe Cavallaro,
	Alexandre Torgue, devicetree, linux-arm-kernel, linux-kernel,
	netdev
In-Reply-To: <dfc8e7f3-e374-b30b-b320-017f1c50ab58@gmail.com>

On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>>> Hi Florian,
>>>>
>>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>>>>>>> So I think what you are saying is either impossible or engineering-wise
>>>>>>>> a very stupid design, like using an external MAC with a discrete PHY
>>>>>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>>>>>>> with the internal PHY.
>>>>>>>>
>>>>>>>> Now can we please decide on something? We're a week and a half from
>>>>>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>>>>>>> nodes (internal-mdio & external-mdio).
>>>>>>>
>>>>>>> I really don't see a need for a mdio-mux in the first place, just have
>>>>>>> one MDIO controller (current state) sub-node which describes the
>>>>>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>>>>>> node (along with 'phy-is-integrated'). If a different configuration is
>>>>>>> used, then just put the external PHY as a child node there.
>>>>>>>
>>>>>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>>>>>
>>>>>>> Works for everyone?
>>>>>>
>>>>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>>>>> il will be merged with internal PHY node and get
>>>>>> phy-is-integrated.
>>>>>
>>>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>>>> external PHY and push all the internal and external PHY node definition
>>>>> (in its entirety) to the per-board DTS file, does not that work?
>>>>
>>>> If possible, I'd really like to have the internal PHY in the
>>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>>> with its clock, reset line, and whatever info we might need in the
>>>> future in each and every board DTS that uses it will be very error
>>>> prone and we will have the usual bunch of issues that come up with
>>>> duplication.
>>>
>>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>>> status = "disabled" property, and have the per-board DTS put a status =
>>> "okay" property along with a "phy-is-integrated" boolean property? Would
>>> that work?
>>
>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.
>
> Is not there is a mistake in the unit address not matching the "reg"
> property, or am I not looking at the right tree?
>
> &mdio {
>         ext_rgmii_phy: ethernet-phy@1 {
>                 compatible = "ethernet-phy-ieee802.3-c22";
>                 reg = <0>;
>         };
> };
>
> If the PHY is really at MDIO address 0, then it should be
> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
> merging?

That is wrong. The board described in the example likely has a Realtek
RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
so it still works, but is the wrong representation.

>
>
>> So that adding a 'status = "disabled"' does not bring anything.
>>
>>>
>>> What I really don't think is necessary is:
>>>
>>> - duplicating the "mdio" controller node for external vs. internal PHY,
>>> because this is not accurate, there is just one MDIO controller, but
>>> there may be different kinds of MDIO/PHY devices attached
>>
>> For me, if we want to represent the reality, we need two MDIO:
>> - since two PHY at the same address could co-exists
>> - since they are isolated so not on the same MDIO bus
>
> Is that really true? It might be, but from experience with e.g:
> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
> bus, which is convenient, except when you have an address conflict.

There's a mux in the hardware: either the internal MDIO+MII lines
from the internal PHY are connected to the MAC, or the external
MDIO+MII lines from the pin controller are connected. I believe
this was already mentioned?

>
>>
>>> - having the STMMAC driver MDIO probing code having to deal with a
>>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>>> and requiring more driver-level changes that are error prone
>>
>> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
>> have to be changed to something like "register_parent_mdio"
>>
>>
>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
>> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
>> Really having two MDIO seems cleaner.
>
> The only valid thing that you have provided so far is this merging
> problem. Anything else ranging from "we will face with lots of small
> patch for adding phy-is-integrated" to "Really having two MDIO seems
> cleaner." are hard to receive as technical arguments for correctness.
>
> What happens if someone connects an external PHY at the same MDIO
> address than the internal PHY, which one do you get responses from? If
> you shutdown the internal PHY and it stops responding, then this
> probably becomes deterministic, but it still supports the fact there is
> just one MDIO bus controller per MAC.

Depends on whichever set of pins/lines are selected. But yeah, there's
only one MDIO bus controller in the MAC.

ChenYu

> Anyway, do whatever works best for you I guess.
> --
> Florian

^ permalink raw reply

* Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
From: Denys Fedoryshchenko @ 2017-08-25  2:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Linux Kernel Network Developers
In-Reply-To: <20170724162039.GC23964@breakpoint.cc>

On 2017-07-24 19:20, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>> > Hi,
>> >
>> > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>> > approx 2gbps of pppoe users traffic) and noticed that after while server
>> > rebooting(i have set reboot on panic and etc).
>> > I can't run serial console, and in pstore / netconsole there is nothing.
>> > Best i got is some very short message about softlockup in ipmi, but as
>> > storage very limited there - it is near useless.
>> >
>> > By preliminary testing (can't do it much, as it's production) - it seems
>> > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>> 
>> Wild guess here, does this help?
>> 
>> diff --git a/net/netfilter/nf_conntrack_helper.c 
>> b/net/netfilter/nf_conntrack_helper.c
>> --- a/net/netfilter/nf_conntrack_helper.c
>> +++ b/net/netfilter/nf_conntrack_helper.c
>> @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
>> struct nf_conn *tmpl,
>>                 help = nf_ct_helper_ext_add(ct, helper, flags);
>>                 if (help == NULL)
>>                         return -ENOMEM;
>> +              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> 
> sigh, stupid typo, should be no ';' at the end above.
Sorry, is there any plans to push this to 4.12 stable queue?

^ permalink raw reply

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Florian Fainelli @ 2017-08-25  3:05 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Corentin Labbe, Maxime Ripard, Andrew Lunn, Rob Herring,
	Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <CAGb2v64jrDPSff+QL_PQBMaQJ+CcTVbskjuOUd1OqR4smbu+ig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>



On 08/24/2017 07:54 PM, Chen-Yu Tsai wrote:
> On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>>>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>>>> Hi Florian,
>>>>>
>>>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>>>>>>>> So I think what you are saying is either impossible or engineering-wise
>>>>>>>>> a very stupid design, like using an external MAC with a discrete PHY
>>>>>>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>>>>>>>> with the internal PHY.
>>>>>>>>>
>>>>>>>>> Now can we please decide on something? We're a week and a half from
>>>>>>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>>>>>>>> nodes (internal-mdio & external-mdio).
>>>>>>>>
>>>>>>>> I really don't see a need for a mdio-mux in the first place, just have
>>>>>>>> one MDIO controller (current state) sub-node which describes the
>>>>>>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>>>>>>> node (along with 'phy-is-integrated'). If a different configuration is
>>>>>>>> used, then just put the external PHY as a child node there.
>>>>>>>>
>>>>>>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>>>>>>
>>>>>>>> Works for everyone?
>>>>>>>
>>>>>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>>>>>> il will be merged with internal PHY node and get
>>>>>>> phy-is-integrated.
>>>>>>
>>>>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>>>>> external PHY and push all the internal and external PHY node definition
>>>>>> (in its entirety) to the per-board DTS file, does not that work?
>>>>>
>>>>> If possible, I'd really like to have the internal PHY in the
>>>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>>>> with its clock, reset line, and whatever info we might need in the
>>>>> future in each and every board DTS that uses it will be very error
>>>>> prone and we will have the usual bunch of issues that come up with
>>>>> duplication.
>>>>
>>>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>>>> status = "disabled" property, and have the per-board DTS put a status =
>>>> "okay" property along with a "phy-is-integrated" boolean property? Would
>>>> that work?
>>>
>>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.
>>
>> Is not there is a mistake in the unit address not matching the "reg"
>> property, or am I not looking at the right tree?
>>
>> &mdio {
>>         ext_rgmii_phy: ethernet-phy@1 {
>>                 compatible = "ethernet-phy-ieee802.3-c22";
>>                 reg = <0>;
>>         };
>> };
>>
>> If the PHY is really at MDIO address 0, then it should be
>> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
>> merging?
> 
> That is wrong. The board described in the example likely has a Realtek
> RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
> so it still works, but is the wrong representation.
> 
>>
>>
>>> So that adding a 'status = "disabled"' does not bring anything.
>>>
>>>>
>>>> What I really don't think is necessary is:
>>>>
>>>> - duplicating the "mdio" controller node for external vs. internal PHY,
>>>> because this is not accurate, there is just one MDIO controller, but
>>>> there may be different kinds of MDIO/PHY devices attached
>>>
>>> For me, if we want to represent the reality, we need two MDIO:
>>> - since two PHY at the same address could co-exists
>>> - since they are isolated so not on the same MDIO bus
>>
>> Is that really true? It might be, but from experience with e.g:
>> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
>> bus, which is convenient, except when you have an address conflict.
> 
> There's a mux in the hardware: either the internal MDIO+MII lines
> from the internal PHY are connected to the MAC, or the external
> MDIO+MII lines from the pin controller are connected. I believe
> this was already mentioned?

There is obviously a mux for the data lines and clock to switch between
internal PHY and external PHYs, that does not mean there is one for MDIO
and MDC lines, which is what is being suggested to be used here, does
the mux also takes care of these lines?

> 
>>
>>>
>>>> - having the STMMAC driver MDIO probing code having to deal with a
>>>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>>>> and requiring more driver-level changes that are error prone
>>>
>>> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
>>> have to be changed to something like "register_parent_mdio"
>>>
>>>
>>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
>>> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
>>> Really having two MDIO seems cleaner.
>>
>> The only valid thing that you have provided so far is this merging
>> problem. Anything else ranging from "we will face with lots of small
>> patch for adding phy-is-integrated" to "Really having two MDIO seems
>> cleaner." are hard to receive as technical arguments for correctness.
>>
>> What happens if someone connects an external PHY at the same MDIO
>> address than the internal PHY, which one do you get responses from? If
>> you shutdown the internal PHY and it stops responding, then this
>> probably becomes deterministic, but it still supports the fact there is
>> just one MDIO bus controller per MAC.
> 
> Depends on whichever set of pins/lines are selected. But yeah, there's
> only one MDIO bus controller in the MAC.

OK, so one MDIO controller, but what about the MDIO/MDC lines then, are
they also muxed, like the data/clock lines or not?
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] e1000e: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  3:06 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, davem, Jeff Kirsher,
	moderated list:INTEL ETHERNET DRIVERS, open list
In-Reply-To: <20170825013650.27463-1-f.fainelli@gmail.com>



On 08/24/2017 06:36 PM, Florian Fainelli wrote:
> e1000_put_txbuf() cleans up the successfully transmitted TX packets,
> e1000e_tx_hwtstamp_work() also does the successfully completes the
> timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
> e1000_remove() cleans up the timestampted packets. None of these
> functions should be reporting dropped packets, so make them use
> dev_consume_skb*() to be drop monitor friendly.

This also won't compile, I marked it a Superseded in patchwork since
there is a v2 coming.

-- 
Florian

^ permalink raw reply

* RE: [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
From: Dexuan Cui @ 2017-08-25  3:19 UTC (permalink / raw)
  To: David Miller
  Cc: jhansen@vmware.com, stefanha@redhat.com, netdev@vger.kernel.org,
	mkubecek@suse.cz, joe@perches.com, olaf@aepfle.de,
	Stephen Hemminger, jasowang@redhat.com, KY Srinivasan,
	Haiyang Zhang, dave.scott@docker.com,
	linux-kernel@vger.kernel.org, apw@canonical.com,
	rolf.neugebauer@docker.com, gregkh@linuxfoundation.org,
	marcelo.cerri@canonical.com, "devel@linuxdriverpr
In-Reply-To: <20170824.181931.584865895464286033.davem@davemloft.net>

> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, August 24, 2017 18:20
> > +#define VMBUS_PKT_TRAILER  (sizeof(u64))
>
> This is not the packet trailer, it's the size of the packet trailer.

Thanks! I'll change it to VMBUS_PKT_TRAILER_SIZE.

> > +   /* Have we sent the zero-length packet (FIN)? */
> > +   unsigned long fin_sent;
>
> Why does this need to be atomic?  Why can't a smaller simpler
It doesn't have to be. It was originally made for a quick workaround.
Thanks! I should do it in the right way now.

> mechanism be used to make sure hvs_shutdown() only performs
> hvs_send_data() call once on the channel?
I'll change "fin_sent" to bool, and avoid test_and_set_bit().
I'll add lock_sock/release_sock()  in hvs_shutdown() like this:

static int hvs_shutdown(struct vsock_sock *vsk, int mode)
{
 ...
       lock_sock(sk);

        hvs = vsk->trans;
        if (hvs->fin_sent)
                goto out;

        send_buf = (struct hvs_send_buf *)&hdr;
        (void)hvs_send_data(hvs->chan, send_buf, 0);

        hvs->fin_sent = true;
out:
        release_sock(sk);
        return 0;
}

> > +static inline bool is_valid_srv_id(const uuid_le *id)
> > +{
> > +   return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) -
> 4);
> > +}
>
> Do not use the inline function attribute in *.c code.  Let the
> compiler decide.

OK. Will remove all the inline usages.

> > +   *((u32 *)&h->vm_srv_id) = vsk->local_addr.svm_port;
> > +   *((u32 *)&h->host_srv_id) = vsk->remote_addr.svm_port;
>
> There has to be a better way to express this.
I may need to define a uinon here. Let me try it.

 > And if this is partially initializing vm_srv_id, at a minimum
> endianness needs to be taken into account.
I may need to use cpu_to_le32(). Let me check it.

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Michael Chan @ 2017-08-25  3:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexander Duyck, Duyck, Alexander H, john.fastabend@gmail.com,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, andy@greyhouse.net,
	borkmann@iogearbox.net
In-Reply-To: <20170823102937.79a9c4ed@redhat.com>

On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Tue, 22 Aug 2017 23:59:05 -0700
> Michael Chan <michael.chan@broadcom.com> wrote:
>
>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>> > On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> >>
>> >> Right, but it's conceivable to add an API to "return" the buffer to
>> >> the input device, right?
>
> Yes, I would really like to see an API like this.
>
>> >
>> > You could, it is just added complexity. "just free the buffer" in
>> > ixgbe usually just amounts to one atomic operation to decrement the
>> > total page count since page recycling is already implemented in the
>> > driver. You still would have to unmap the buffer regardless of if you
>> > were recycling it or not so all you would save is 1.000015259 atomic
>> > operations per packet. The fraction is because once every 64K uses we
>> > have to bulk update the count on the page.
>> >
>>
>> If the buffer is returned to the input device, the input device can
>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
>> the input device when the buffer is returned.
>
> Yes, exactly, return to the input device. I really think we should
> work on a solution where we can keep the DMA mapping around.  We have
> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> page return call, to achieve this. (I imagine other arch's have a high
> DMA overhead than Intel)
>
> I'm not sure how the API should look.  The ixgbe recycle mechanism and
> splitting the page (into two packets) actually complicates things, and
> tie us into a page-refcnt based model.  We could get around this by
> each driver implementing a page-return-callback, that allow us to
> return the page to the input device?  Then, drivers implementing the
> 1-packet-per-page can simply check/read the page-refcnt, and if it is
> "1" DMA-sync and reuse it in the RX queue.
>

Yeah, based on Alex' description, it's not clear to me whether ixgbe
redirecting to a non-intel NIC or vice versa will actually work.  It
sounds like the output device has to make some assumptions about how
the page was allocated by the input device.  With buffer return API,
each driver can cleanly recycle or free its own buffers properly.

Let me discuss this further with Andy to see if we can come up with a
good scheme.

^ permalink raw reply

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Chen-Yu Tsai @ 2017-08-25  3:41 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Chen-Yu Tsai, Corentin Labbe, Maxime Ripard, Andrew Lunn,
	Rob Herring, Mark Rutland, Russell King, Giuseppe Cavallaro,
	Alexandre Torgue, devicetree, linux-arm-kernel, linux-kernel,
	netdev
In-Reply-To: <77ff97c1-9d0a-8b3f-d0b9-25c951f86fec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Fri, Aug 25, 2017 at 11:05 AM, Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>
> On 08/24/2017 07:54 PM, Chen-Yu Tsai wrote:
>> On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>>>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>>>>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>>>>> Hi Florian,
>>>>>>
>>>>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>>>>>>>>> So I think what you are saying is either impossible or engineering-wise
>>>>>>>>>> a very stupid design, like using an external MAC with a discrete PHY
>>>>>>>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>>>>>>>>> with the internal PHY.
>>>>>>>>>>
>>>>>>>>>> Now can we please decide on something? We're a week and a half from
>>>>>>>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>>>>>>>>> nodes (internal-mdio & external-mdio).
>>>>>>>>>
>>>>>>>>> I really don't see a need for a mdio-mux in the first place, just have
>>>>>>>>> one MDIO controller (current state) sub-node which describes the
>>>>>>>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>>>>>>>> node (along with 'phy-is-integrated'). If a different configuration is
>>>>>>>>> used, then just put the external PHY as a child node there.
>>>>>>>>>
>>>>>>>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>>>>>>>
>>>>>>>>> Works for everyone?
>>>>>>>>
>>>>>>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>>>>>>> il will be merged with internal PHY node and get
>>>>>>>> phy-is-integrated.
>>>>>>>
>>>>>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>>>>>> external PHY and push all the internal and external PHY node definition
>>>>>>> (in its entirety) to the per-board DTS file, does not that work?
>>>>>>
>>>>>> If possible, I'd really like to have the internal PHY in the
>>>>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>>>>> with its clock, reset line, and whatever info we might need in the
>>>>>> future in each and every board DTS that uses it will be very error
>>>>>> prone and we will have the usual bunch of issues that come up with
>>>>>> duplication.
>>>>>
>>>>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>>>>> status = "disabled" property, and have the per-board DTS put a status =
>>>>> "okay" property along with a "phy-is-integrated" boolean property? Would
>>>>> that work?
>>>>
>>>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.
>>>
>>> Is not there is a mistake in the unit address not matching the "reg"
>>> property, or am I not looking at the right tree?
>>>
>>> &mdio {
>>>         ext_rgmii_phy: ethernet-phy@1 {
>>>                 compatible = "ethernet-phy-ieee802.3-c22";
>>>                 reg = <0>;
>>>         };
>>> };
>>>
>>> If the PHY is really at MDIO address 0, then it should be
>>> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
>>> merging?
>>
>> That is wrong. The board described in the example likely has a Realtek
>> RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
>> so it still works, but is the wrong representation.
>>
>>>
>>>
>>>> So that adding a 'status = "disabled"' does not bring anything.
>>>>
>>>>>
>>>>> What I really don't think is necessary is:
>>>>>
>>>>> - duplicating the "mdio" controller node for external vs. internal PHY,
>>>>> because this is not accurate, there is just one MDIO controller, but
>>>>> there may be different kinds of MDIO/PHY devices attached
>>>>
>>>> For me, if we want to represent the reality, we need two MDIO:
>>>> - since two PHY at the same address could co-exists
>>>> - since they are isolated so not on the same MDIO bus
>>>
>>> Is that really true? It might be, but from experience with e.g:
>>> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
>>> bus, which is convenient, except when you have an address conflict.
>>
>> There's a mux in the hardware: either the internal MDIO+MII lines
>> from the internal PHY are connected to the MAC, or the external
>> MDIO+MII lines from the pin controller are connected. I believe
>> this was already mentioned?
>
> There is obviously a mux for the data lines and clock to switch between
> internal PHY and external PHYs, that does not mean there is one for MDIO
> and MDC lines, which is what is being suggested to be used here, does
> the mux also takes care of these lines?
>
>>
>>>
>>>>
>>>>> - having the STMMAC driver MDIO probing code having to deal with a
>>>>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>>>>> and requiring more driver-level changes that are error prone
>>>>
>>>> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
>>>> have to be changed to something like "register_parent_mdio"
>>>>
>>>>
>>>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
>>>> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
>>>> Really having two MDIO seems cleaner.
>>>
>>> The only valid thing that you have provided so far is this merging
>>> problem. Anything else ranging from "we will face with lots of small
>>> patch for adding phy-is-integrated" to "Really having two MDIO seems
>>> cleaner." are hard to receive as technical arguments for correctness.
>>>
>>> What happens if someone connects an external PHY at the same MDIO
>>> address than the internal PHY, which one do you get responses from? If
>>> you shutdown the internal PHY and it stops responding, then this
>>> probably becomes deterministic, but it still supports the fact there is
>>> just one MDIO bus controller per MAC.
>>
>> Depends on whichever set of pins/lines are selected. But yeah, there's
>> only one MDIO bus controller in the MAC.
>
> OK, so one MDIO controller, but what about the MDIO/MDC lines then, are
> they also muxed, like the data/clock lines or not?

Just tested. Yes the MDIO/MDC lines are also muxed and controlled through
the same mux bit.

ChenYu
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v2] net: mv643xx_eth: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  3:55 UTC (permalink / raw)
  To: netdev; +Cc: davem, andrew, Florian Fainelli, Sebastian Hesselbarth, open list

txq_reclaim() does the normal transmit queue reclamation and
rxq_deinit() does the RX ring cleanup, none of these are packet drops,
so use dev_consume_skb() for both locations.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index fb2d533ae4ef..81c1fac00d33 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -1121,7 +1121,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force)
 			struct sk_buff *skb = __skb_dequeue(&txq->tx_skb);
 
 			if (!WARN_ON(!skb))
-				dev_kfree_skb(skb);
+				dev_consume_skb_any(skb);
 		}
 
 		if (cmd_sts & ERROR_SUMMARY) {
@@ -2024,7 +2024,7 @@ static void rxq_deinit(struct rx_queue *rxq)
 
 	for (i = 0; i < rxq->rx_ring_size; i++) {
 		if (rxq->rx_skb[i]) {
-			dev_kfree_skb(rxq->rx_skb[i]);
+			dev_consume_skb_any(rxq->rx_skb[i]);
 			rxq->rx_desc_count--;
 		}
 	}
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next v2] e1000e: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  3:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, Florian Fainelli, Jeff Kirsher,
	moderated list:INTEL ETHERNET DRIVERS, open list

e1000_put_txbuf() cleans up the successfully transmitted TX packets,
e1000e_tx_hwtstamp_work() also does the successfully completes the
timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
e1000_remove() cleans up the timestampted packets. None of these
functions should be reporting dropped packets, so make them use
dev_consume_skb_any() to be drop monitor friendly.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 327dfe5bedc0..a90e459c5b8a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1085,7 +1085,7 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,
 		buffer_info->dma = 0;
 	}
 	if (buffer_info->skb) {
-		dev_kfree_skb_any(buffer_info->skb);
+		dev_consume_skb_any(buffer_info->skb);
 		buffer_info->skb = NULL;
 	}
 	buffer_info->time_stamp = 0;
@@ -1199,7 +1199,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct *work)
 		wmb(); /* force write prior to skb_tstamp_tx */
 
 		skb_tstamp_tx(skb, &shhwtstamps);
-		dev_kfree_skb_any(skb);
+		dev_consume_skb_any(skb);
 	} else if (time_after(jiffies, adapter->tx_hwtstamp_start
 			      + adapter->tx_timeout_factor * HZ)) {
 		dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
@@ -1716,7 +1716,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
 		}
 
 		if (buffer_info->skb) {
-			dev_kfree_skb(buffer_info->skb);
+			dev_consume_skb_any(buffer_info->skb);
 			buffer_info->skb = NULL;
 		}
 
@@ -1734,7 +1734,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
 
 	/* there also may be some cached data from a chained receive */
 	if (rx_ring->rx_skb_top) {
-		dev_kfree_skb(rx_ring->rx_skb_top);
+		dev_consume_skb_any(rx_ring->rx_skb_top);
 		rx_ring->rx_skb_top = NULL;
 	}
 
@@ -7411,7 +7411,7 @@ static void e1000_remove(struct pci_dev *pdev)
 	if (adapter->flags & FLAG_HAS_HW_TIMESTAMP) {
 		cancel_work_sync(&adapter->tx_hwtstamp_work);
 		if (adapter->tx_hwtstamp_skb) {
-			dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
+			dev_consume_skb_any(adapter->tx_hwtstamp_skb);
 			adapter->tx_hwtstamp_skb = NULL;
 		}
 	}
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Florian Fainelli @ 2017-08-25  3:59 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Corentin Labbe, Maxime Ripard, Andrew Lunn, Rob Herring,
	Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <CAGb2v64Bx7vzzK8nKbiQPsYkgu5JZ6_0iCkx=mgpWY=8b_GeXQ@mail.gmail.com>



On 08/24/2017 08:41 PM, Chen-Yu Tsai wrote:
> On Fri, Aug 25, 2017 at 11:05 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>> On 08/24/2017 07:54 PM, Chen-Yu Tsai wrote:
>>> On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>>>>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>>>>>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>>>>>> Hi Florian,
>>>>>>>
>>>>>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>>>>>>>>>> So I think what you are saying is either impossible or engineering-wise
>>>>>>>>>>> a very stupid design, like using an external MAC with a discrete PHY
>>>>>>>>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>>>>>>>>>> with the internal PHY.
>>>>>>>>>>>
>>>>>>>>>>> Now can we please decide on something? We're a week and a half from
>>>>>>>>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>>>>>>>>>> nodes (internal-mdio & external-mdio).
>>>>>>>>>>
>>>>>>>>>> I really don't see a need for a mdio-mux in the first place, just have
>>>>>>>>>> one MDIO controller (current state) sub-node which describes the
>>>>>>>>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>>>>>>>>> node (along with 'phy-is-integrated'). If a different configuration is
>>>>>>>>>> used, then just put the external PHY as a child node there.
>>>>>>>>>>
>>>>>>>>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>>>>>>>>
>>>>>>>>>> Works for everyone?
>>>>>>>>>
>>>>>>>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>>>>>>>> il will be merged with internal PHY node and get
>>>>>>>>> phy-is-integrated.
>>>>>>>>
>>>>>>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>>>>>>> external PHY and push all the internal and external PHY node definition
>>>>>>>> (in its entirety) to the per-board DTS file, does not that work?
>>>>>>>
>>>>>>> If possible, I'd really like to have the internal PHY in the
>>>>>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>>>>>> with its clock, reset line, and whatever info we might need in the
>>>>>>> future in each and every board DTS that uses it will be very error
>>>>>>> prone and we will have the usual bunch of issues that come up with
>>>>>>> duplication.
>>>>>>
>>>>>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>>>>>> status = "disabled" property, and have the per-board DTS put a status =
>>>>>> "okay" property along with a "phy-is-integrated" boolean property? Would
>>>>>> that work?
>>>>>
>>>>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.
>>>>
>>>> Is not there is a mistake in the unit address not matching the "reg"
>>>> property, or am I not looking at the right tree?
>>>>
>>>> &mdio {
>>>>         ext_rgmii_phy: ethernet-phy@1 {
>>>>                 compatible = "ethernet-phy-ieee802.3-c22";
>>>>                 reg = <0>;
>>>>         };
>>>> };
>>>>
>>>> If the PHY is really at MDIO address 0, then it should be
>>>> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
>>>> merging?
>>>
>>> That is wrong. The board described in the example likely has a Realtek
>>> RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
>>> so it still works, but is the wrong representation.
>>>
>>>>
>>>>
>>>>> So that adding a 'status = "disabled"' does not bring anything.
>>>>>
>>>>>>
>>>>>> What I really don't think is necessary is:
>>>>>>
>>>>>> - duplicating the "mdio" controller node for external vs. internal PHY,
>>>>>> because this is not accurate, there is just one MDIO controller, but
>>>>>> there may be different kinds of MDIO/PHY devices attached
>>>>>
>>>>> For me, if we want to represent the reality, we need two MDIO:
>>>>> - since two PHY at the same address could co-exists
>>>>> - since they are isolated so not on the same MDIO bus
>>>>
>>>> Is that really true? It might be, but from experience with e.g:
>>>> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
>>>> bus, which is convenient, except when you have an address conflict.
>>>
>>> There's a mux in the hardware: either the internal MDIO+MII lines
>>> from the internal PHY are connected to the MAC, or the external
>>> MDIO+MII lines from the pin controller are connected. I believe
>>> this was already mentioned?
>>
>> There is obviously a mux for the data lines and clock to switch between
>> internal PHY and external PHYs, that does not mean there is one for MDIO
>> and MDC lines, which is what is being suggested to be used here, does
>> the mux also takes care of these lines?
>>
>>>
>>>>
>>>>>
>>>>>> - having the STMMAC driver MDIO probing code having to deal with a
>>>>>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>>>>>> and requiring more driver-level changes that are error prone
>>>>>
>>>>> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
>>>>> have to be changed to something like "register_parent_mdio"
>>>>>
>>>>>
>>>>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
>>>>> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
>>>>> Really having two MDIO seems cleaner.
>>>>
>>>> The only valid thing that you have provided so far is this merging
>>>> problem. Anything else ranging from "we will face with lots of small
>>>> patch for adding phy-is-integrated" to "Really having two MDIO seems
>>>> cleaner." are hard to receive as technical arguments for correctness.
>>>>
>>>> What happens if someone connects an external PHY at the same MDIO
>>>> address than the internal PHY, which one do you get responses from? If
>>>> you shutdown the internal PHY and it stops responding, then this
>>>> probably becomes deterministic, but it still supports the fact there is
>>>> just one MDIO bus controller per MAC.
>>>
>>> Depends on whichever set of pins/lines are selected. But yeah, there's
>>> only one MDIO bus controller in the MAC.
>>
>> OK, so one MDIO controller, but what about the MDIO/MDC lines then, are
>> they also muxed, like the data/clock lines or not?
> 
> Just tested. Yes the MDIO/MDC lines are also muxed and controlled through
> the same mux bit.

Alright then the mdio-mux seems appropriate, thanks.
-- 
Florian

^ permalink raw reply

* [PATCH net] net_sched: fix a refcount_t issue with noop_qdisc
From: Eric Dumazet @ 2017-08-25  4:12 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Cong Wang, Jiri Pirko, Reshetova Elena, Jamal Hadi Salim

From: Eric Dumazet <edumazet@google.com>

syzkaller reported a refcount_t warning [1]

Issue here is that noop_qdisc refcnt was never really considered as
a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :

if (qdisc->flags & TCQ_F_BUILTIN ||
    !refcount_dec_and_test(&qdisc->refcnt)))
	return;

Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
really needed, but harmless until refcount_t came.

To fix this problem, we simply need to not increment noop_qdisc.refcnt,
since we never decrement it.

[1]
refcount_t: increment on 0; use-after-free.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 panic+0x1e4/0x417 kernel/panic.c:180
 __warn+0x1c4/0x1d9 kernel/panic.c:541
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282
RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000
RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8
RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0
R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000
 attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
 dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
 __dev_open+0x227/0x330 net/core/dev.c:1380
 __dev_change_flags+0x695/0x990 net/core/dev.c:6726
 dev_change_flags+0x88/0x140 net/core/dev.c:6792
 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
 sock_do_ioctl+0x94/0xb0 net/socket.c:968
 sock_ioctl+0x2c2/0x440 net/socket.c:1058
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691

Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Reshetova, Elena <elena.reshetova@intel.com>
---
 include/net/sch_generic.h |    7 +++++++
 net/sched/sch_api.c       |    6 +++---
 net/sched/sch_generic.c   |    2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 67f815e5d52517390226bc3531b1ea7b5f1020bc..c1109cdbbfa6afb9aff0d6033aef7b615630ffc1 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -101,6 +101,13 @@ struct Qdisc {
 	spinlock_t		busylock ____cacheline_aligned_in_smp;
 };
 
+static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN)
+		return;
+	refcount_inc(&qdisc->refcnt);
+}
+
 static inline bool qdisc_is_running(const struct Qdisc *qdisc)
 {
 	return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index a3fa144b864871088209386fd573bded1886432f..4fb5a3222d0d324167f079f755be14eb028b4a50 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -836,7 +836,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 
 			old = dev_graft_qdisc(dev_queue, new);
 			if (new && i > 0)
-				refcount_inc(&new->refcnt);
+				qdisc_refcount_inc(new);
 
 			if (!ingress)
 				qdisc_destroy(old);
@@ -847,7 +847,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 			notify_and_destroy(net, skb, n, classid,
 					   dev->qdisc, new);
 			if (new && !new->ops->attach)
-				refcount_inc(&new->refcnt);
+				qdisc_refcount_inc(new);
 			dev->qdisc = new ? : &noop_qdisc;
 
 			if (new && new->ops->attach)
@@ -1256,7 +1256,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 				if (q == p ||
 				    (p && check_loop(q, p, 0)))
 					return -ELOOP;
-				refcount_inc(&q->refcnt);
+				qdisc_refcount_inc(q);
 				goto graft;
 			} else {
 				if (!q)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 57ba406f1437323a4ba172d52f14a8b687b86708..4ba6da5fb2546c35ad48fe1f3632df8ca9957b34 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -785,7 +785,7 @@ static void attach_default_qdiscs(struct net_device *dev)
 	    dev->priv_flags & IFF_NO_QUEUE) {
 		netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
 		dev->qdisc = txq->qdisc_sleeping;
-		refcount_inc(&dev->qdisc->refcnt);
+		qdisc_refcount_inc(dev->qdisc);
 	} else {
 		qdisc = qdisc_create_dflt(txq, &mq_qdisc_ops, TC_H_ROOT);
 		if (qdisc) {

^ permalink raw reply related

* Re: [PATCH net-next v2] net: mv643xx_eth: Be drop monitor friendly
From: David Miller @ 2017-08-25  4:29 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, sebastian.hesselbarth, linux-kernel
In-Reply-To: <20170825035540.26713-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 20:55:40 -0700

> txq_reclaim() does the normal transmit queue reclamation and
> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
> so use dev_consume_skb() for both locations.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2] e1000e: Be drop monitor friendly
From: David Miller @ 2017-08-25  4:29 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, edumazet, jeffrey.t.kirsher, intel-wired-lan,
	linux-kernel
In-Reply-To: <20170825035824.26935-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 20:58:24 -0700

> e1000_put_txbuf() cleans up the successfully transmitted TX packets,
> e1000e_tx_hwtstamp_work() also does the successfully completes the
> timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
> e1000_remove() cleans up the timestampted packets. None of these
> functions should be reporting dropped packets, so make them use
> dev_consume_skb_any() to be drop monitor friendly.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

I'll let the Intel folks pick this up.

^ permalink raw reply

* Re: [PATCH net] net_sched: fix a refcount_t issue with noop_qdisc
From: David Miller @ 2017-08-25  4:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, xiyou.wangcong, jiri, elena.reshetova, jhs
In-Reply-To: <1503634348.2499.98.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 24 Aug 2017 21:12:28 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> syzkaller reported a refcount_t warning [1]
> 
> Issue here is that noop_qdisc refcnt was never really considered as
> a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :
> 
> if (qdisc->flags & TCQ_F_BUILTIN ||
>     !refcount_dec_and_test(&qdisc->refcnt)))
> 	return;
> 
> Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
> really needed, but harmless until refcount_t came.
> 
> To fix this problem, we simply need to not increment noop_qdisc.refcnt,
> since we never decrement it.
> 
> [1]
> refcount_t: increment on 0; use-after-free.
 ...
> Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Reshetova, Elena <elena.reshetova@intel.com>

Applied.

^ permalink raw reply

* [PATCH net-next 0/2] nfp: SR-IOV ndos support
From: Jakub Kicinski @ 2017-08-25  4:31 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

Hi!

This set adds basic SR-IOV including setting/getting VF MAC addresses,
VLANs, link state and spoofcheck settings.  It is wired up for both
vNICs and representors (note: ip link will not report VF settings on
VF/PF representors because they are not linked to the PF PCI device).

Pablo and team add the basic implementation, Simon and Dirk follow
up with the representor plumbing.

Pablo Cascón (1):
  nfp: add basic SR-IOV ndo functions

Simon Horman (1):
  nfp: add basic SR-IOV ndo functions to representors

 drivers/net/ethernet/netronome/nfp/Makefile        |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.h      |   4 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |   6 +
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  20 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |   6 +
 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.c | 243 +++++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.h |  86 ++++++++
 drivers/net/ethernet/netronome/nfp/nic/main.c      |  12 +
 9 files changed, 378 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.h

-- 
2.11.0

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox