Netdev List
 help / color / mirror / Atom feed
* Re: XDP redirect measurements, gotchas and tracepoints
From: Michael Chan @ 2017-08-25  3:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexander Duyck, Duyck, Alexander H, john.fastabend@gmail.com,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, andy@greyhouse.net,
	borkmann@iogearbox.net
In-Reply-To: <20170823102937.79a9c4ed@redhat.com>

On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Tue, 22 Aug 2017 23:59:05 -0700
> Michael Chan <michael.chan@broadcom.com> wrote:
>
>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>> > On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> >>
>> >> Right, but it's conceivable to add an API to "return" the buffer to
>> >> the input device, right?
>
> Yes, I would really like to see an API like this.
>
>> >
>> > You could, it is just added complexity. "just free the buffer" in
>> > ixgbe usually just amounts to one atomic operation to decrement the
>> > total page count since page recycling is already implemented in the
>> > driver. You still would have to unmap the buffer regardless of if you
>> > were recycling it or not so all you would save is 1.000015259 atomic
>> > operations per packet. The fraction is because once every 64K uses we
>> > have to bulk update the count on the page.
>> >
>>
>> If the buffer is returned to the input device, the input device can
>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
>> the input device when the buffer is returned.
>
> Yes, exactly, return to the input device. I really think we should
> work on a solution where we can keep the DMA mapping around.  We have
> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> page return call, to achieve this. (I imagine other arch's have a high
> DMA overhead than Intel)
>
> I'm not sure how the API should look.  The ixgbe recycle mechanism and
> splitting the page (into two packets) actually complicates things, and
> tie us into a page-refcnt based model.  We could get around this by
> each driver implementing a page-return-callback, that allow us to
> return the page to the input device?  Then, drivers implementing the
> 1-packet-per-page can simply check/read the page-refcnt, and if it is
> "1" DMA-sync and reuse it in the RX queue.
>

Yeah, based on Alex' description, it's not clear to me whether ixgbe
redirecting to a non-intel NIC or vice versa will actually work.  It
sounds like the output device has to make some assumptions about how
the page was allocated by the input device.  With buffer return API,
each driver can cleanly recycle or free its own buffers properly.

Let me discuss this further with Andy to see if we can come up with a
good scheme.

^ permalink raw reply

* RE: [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
From: Dexuan Cui @ 2017-08-25  3:19 UTC (permalink / raw)
  To: David Miller
  Cc: jhansen@vmware.com, stefanha@redhat.com, netdev@vger.kernel.org,
	mkubecek@suse.cz, joe@perches.com, olaf@aepfle.de,
	Stephen Hemminger, jasowang@redhat.com, KY Srinivasan,
	Haiyang Zhang, dave.scott@docker.com,
	linux-kernel@vger.kernel.org, apw@canonical.com,
	rolf.neugebauer@docker.com, gregkh@linuxfoundation.org,
	marcelo.cerri@canonical.com, "devel@linuxdriverpr
In-Reply-To: <20170824.181931.584865895464286033.davem@davemloft.net>

> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, August 24, 2017 18:20
> > +#define VMBUS_PKT_TRAILER  (sizeof(u64))
>
> This is not the packet trailer, it's the size of the packet trailer.

Thanks! I'll change it to VMBUS_PKT_TRAILER_SIZE.

> > +   /* Have we sent the zero-length packet (FIN)? */
> > +   unsigned long fin_sent;
>
> Why does this need to be atomic?  Why can't a smaller simpler
It doesn't have to be. It was originally made for a quick workaround.
Thanks! I should do it in the right way now.

> mechanism be used to make sure hvs_shutdown() only performs
> hvs_send_data() call once on the channel?
I'll change "fin_sent" to bool, and avoid test_and_set_bit().
I'll add lock_sock/release_sock()  in hvs_shutdown() like this:

static int hvs_shutdown(struct vsock_sock *vsk, int mode)
{
 ...
       lock_sock(sk);

        hvs = vsk->trans;
        if (hvs->fin_sent)
                goto out;

        send_buf = (struct hvs_send_buf *)&hdr;
        (void)hvs_send_data(hvs->chan, send_buf, 0);

        hvs->fin_sent = true;
out:
        release_sock(sk);
        return 0;
}

> > +static inline bool is_valid_srv_id(const uuid_le *id)
> > +{
> > +   return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) -
> 4);
> > +}
>
> Do not use the inline function attribute in *.c code.  Let the
> compiler decide.

OK. Will remove all the inline usages.

> > +   *((u32 *)&h->vm_srv_id) = vsk->local_addr.svm_port;
> > +   *((u32 *)&h->host_srv_id) = vsk->remote_addr.svm_port;
>
> There has to be a better way to express this.
I may need to define a uinon here. Let me try it.

 > And if this is partially initializing vm_srv_id, at a minimum
> endianness needs to be taken into account.
I may need to use cpu_to_le32(). Let me check it.

^ permalink raw reply

* Re: [PATCH net-next] e1000e: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  3:06 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, davem, Jeff Kirsher,
	moderated list:INTEL ETHERNET DRIVERS, open list
In-Reply-To: <20170825013650.27463-1-f.fainelli@gmail.com>



On 08/24/2017 06:36 PM, Florian Fainelli wrote:
> e1000_put_txbuf() cleans up the successfully transmitted TX packets,
> e1000e_tx_hwtstamp_work() also does the successfully completes the
> timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
> e1000_remove() cleans up the timestampted packets. None of these
> functions should be reporting dropped packets, so make them use
> dev_consume_skb*() to be drop monitor friendly.

This also won't compile, I marked it a Superseded in patchwork since
there is a v2 coming.

-- 
Florian

^ permalink raw reply

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Florian Fainelli @ 2017-08-25  3:05 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Corentin Labbe, Maxime Ripard, Andrew Lunn, Rob Herring,
	Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <CAGb2v64jrDPSff+QL_PQBMaQJ+CcTVbskjuOUd1OqR4smbu+ig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>



On 08/24/2017 07:54 PM, Chen-Yu Tsai wrote:
> On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>>>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>>>> Hi Florian,
>>>>>
>>>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>>>>>>>> So I think what you are saying is either impossible or engineering-wise
>>>>>>>>> a very stupid design, like using an external MAC with a discrete PHY
>>>>>>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>>>>>>>> with the internal PHY.
>>>>>>>>>
>>>>>>>>> Now can we please decide on something? We're a week and a half from
>>>>>>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>>>>>>>> nodes (internal-mdio & external-mdio).
>>>>>>>>
>>>>>>>> I really don't see a need for a mdio-mux in the first place, just have
>>>>>>>> one MDIO controller (current state) sub-node which describes the
>>>>>>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>>>>>>> node (along with 'phy-is-integrated'). If a different configuration is
>>>>>>>> used, then just put the external PHY as a child node there.
>>>>>>>>
>>>>>>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>>>>>>
>>>>>>>> Works for everyone?
>>>>>>>
>>>>>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>>>>>> il will be merged with internal PHY node and get
>>>>>>> phy-is-integrated.
>>>>>>
>>>>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>>>>> external PHY and push all the internal and external PHY node definition
>>>>>> (in its entirety) to the per-board DTS file, does not that work?
>>>>>
>>>>> If possible, I'd really like to have the internal PHY in the
>>>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>>>> with its clock, reset line, and whatever info we might need in the
>>>>> future in each and every board DTS that uses it will be very error
>>>>> prone and we will have the usual bunch of issues that come up with
>>>>> duplication.
>>>>
>>>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>>>> status = "disabled" property, and have the per-board DTS put a status =
>>>> "okay" property along with a "phy-is-integrated" boolean property? Would
>>>> that work?
>>>
>>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.
>>
>> Is not there is a mistake in the unit address not matching the "reg"
>> property, or am I not looking at the right tree?
>>
>> &mdio {
>>         ext_rgmii_phy: ethernet-phy@1 {
>>                 compatible = "ethernet-phy-ieee802.3-c22";
>>                 reg = <0>;
>>         };
>> };
>>
>> If the PHY is really at MDIO address 0, then it should be
>> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
>> merging?
> 
> That is wrong. The board described in the example likely has a Realtek
> RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
> so it still works, but is the wrong representation.
> 
>>
>>
>>> So that adding a 'status = "disabled"' does not bring anything.
>>>
>>>>
>>>> What I really don't think is necessary is:
>>>>
>>>> - duplicating the "mdio" controller node for external vs. internal PHY,
>>>> because this is not accurate, there is just one MDIO controller, but
>>>> there may be different kinds of MDIO/PHY devices attached
>>>
>>> For me, if we want to represent the reality, we need two MDIO:
>>> - since two PHY at the same address could co-exists
>>> - since they are isolated so not on the same MDIO bus
>>
>> Is that really true? It might be, but from experience with e.g:
>> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
>> bus, which is convenient, except when you have an address conflict.
> 
> There's a mux in the hardware: either the internal MDIO+MII lines
> from the internal PHY are connected to the MAC, or the external
> MDIO+MII lines from the pin controller are connected. I believe
> this was already mentioned?

There is obviously a mux for the data lines and clock to switch between
internal PHY and external PHYs, that does not mean there is one for MDIO
and MDC lines, which is what is being suggested to be used here, does
the mux also takes care of these lines?

> 
>>
>>>
>>>> - having the STMMAC driver MDIO probing code having to deal with a
>>>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>>>> and requiring more driver-level changes that are error prone
>>>
>>> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
>>> have to be changed to something like "register_parent_mdio"
>>>
>>>
>>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
>>> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
>>> Really having two MDIO seems cleaner.
>>
>> The only valid thing that you have provided so far is this merging
>> problem. Anything else ranging from "we will face with lots of small
>> patch for adding phy-is-integrated" to "Really having two MDIO seems
>> cleaner." are hard to receive as technical arguments for correctness.
>>
>> What happens if someone connects an external PHY at the same MDIO
>> address than the internal PHY, which one do you get responses from? If
>> you shutdown the internal PHY and it stops responding, then this
>> probably becomes deterministic, but it still supports the fact there is
>> just one MDIO bus controller per MAC.
> 
> Depends on whichever set of pins/lines are selected. But yeah, there's
> only one MDIO bus controller in the MAC.

OK, so one MDIO controller, but what about the MDIO/MDC lines then, are
they also muxed, like the data/clock lines or not?
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: nf_nat_pptp 4.12.3 kernel lockup/reboot
From: Denys Fedoryshchenko @ 2017-08-25  2:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Linux Kernel Network Developers
In-Reply-To: <20170724162039.GC23964@breakpoint.cc>

On 2017-07-24 19:20, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>> > Hi,
>> >
>> > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>> > approx 2gbps of pppoe users traffic) and noticed that after while server
>> > rebooting(i have set reboot on panic and etc).
>> > I can't run serial console, and in pstore / netconsole there is nothing.
>> > Best i got is some very short message about softlockup in ipmi, but as
>> > storage very limited there - it is near useless.
>> >
>> > By preliminary testing (can't do it much, as it's production) - it seems
>> > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>> 
>> Wild guess here, does this help?
>> 
>> diff --git a/net/netfilter/nf_conntrack_helper.c 
>> b/net/netfilter/nf_conntrack_helper.c
>> --- a/net/netfilter/nf_conntrack_helper.c
>> +++ b/net/netfilter/nf_conntrack_helper.c
>> @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
>> struct nf_conn *tmpl,
>>                 help = nf_ct_helper_ext_add(ct, helper, flags);
>>                 if (help == NULL)
>>                         return -ENOMEM;
>> +              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> 
> sigh, stupid typo, should be no ';' at the end above.
Sorry, is there any plans to push this to 4.12 stable queue?

^ permalink raw reply

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Chen-Yu Tsai @ 2017-08-25  2:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Corentin Labbe, Maxime Ripard, Chen-Yu Tsai, Andrew Lunn,
	Rob Herring, Mark Rutland, Russell King, Giuseppe Cavallaro,
	Alexandre Torgue, devicetree, linux-arm-kernel, linux-kernel,
	netdev
In-Reply-To: <dfc8e7f3-e374-b30b-b320-017f1c50ab58@gmail.com>

On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>>> Hi Florian,
>>>>
>>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>>>>>>> So I think what you are saying is either impossible or engineering-wise
>>>>>>>> a very stupid design, like using an external MAC with a discrete PHY
>>>>>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>>>>>>> with the internal PHY.
>>>>>>>>
>>>>>>>> Now can we please decide on something? We're a week and a half from
>>>>>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>>>>>>> nodes (internal-mdio & external-mdio).
>>>>>>>
>>>>>>> I really don't see a need for a mdio-mux in the first place, just have
>>>>>>> one MDIO controller (current state) sub-node which describes the
>>>>>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>>>>>> node (along with 'phy-is-integrated'). If a different configuration is
>>>>>>> used, then just put the external PHY as a child node there.
>>>>>>>
>>>>>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>>>>>
>>>>>>> Works for everyone?
>>>>>>
>>>>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>>>>> il will be merged with internal PHY node and get
>>>>>> phy-is-integrated.
>>>>>
>>>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>>>> external PHY and push all the internal and external PHY node definition
>>>>> (in its entirety) to the per-board DTS file, does not that work?
>>>>
>>>> If possible, I'd really like to have the internal PHY in the
>>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>>> with its clock, reset line, and whatever info we might need in the
>>>> future in each and every board DTS that uses it will be very error
>>>> prone and we will have the usual bunch of issues that come up with
>>>> duplication.
>>>
>>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>>> status = "disabled" property, and have the per-board DTS put a status =
>>> "okay" property along with a "phy-is-integrated" boolean property? Would
>>> that work?
>>
>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.
>
> Is not there is a mistake in the unit address not matching the "reg"
> property, or am I not looking at the right tree?
>
> &mdio {
>         ext_rgmii_phy: ethernet-phy@1 {
>                 compatible = "ethernet-phy-ieee802.3-c22";
>                 reg = <0>;
>         };
> };
>
> If the PHY is really at MDIO address 0, then it should be
> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
> merging?

That is wrong. The board described in the example likely has a Realtek
RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
so it still works, but is the wrong representation.

>
>
>> So that adding a 'status = "disabled"' does not bring anything.
>>
>>>
>>> What I really don't think is necessary is:
>>>
>>> - duplicating the "mdio" controller node for external vs. internal PHY,
>>> because this is not accurate, there is just one MDIO controller, but
>>> there may be different kinds of MDIO/PHY devices attached
>>
>> For me, if we want to represent the reality, we need two MDIO:
>> - since two PHY at the same address could co-exists
>> - since they are isolated so not on the same MDIO bus
>
> Is that really true? It might be, but from experience with e.g:
> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
> bus, which is convenient, except when you have an address conflict.

There's a mux in the hardware: either the internal MDIO+MII lines
from the internal PHY are connected to the MAC, or the external
MDIO+MII lines from the pin controller are connected. I believe
this was already mentioned?

>
>>
>>> - having the STMMAC driver MDIO probing code having to deal with a
>>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>>> and requiring more driver-level changes that are error prone
>>
>> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
>> have to be changed to something like "register_parent_mdio"
>>
>>
>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
>> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
>> Really having two MDIO seems cleaner.
>
> The only valid thing that you have provided so far is this merging
> problem. Anything else ranging from "we will face with lots of small
> patch for adding phy-is-integrated" to "Really having two MDIO seems
> cleaner." are hard to receive as technical arguments for correctness.
>
> What happens if someone connects an external PHY at the same MDIO
> address than the internal PHY, which one do you get responses from? If
> you shutdown the internal PHY and it stops responding, then this
> probably becomes deterministic, but it still supports the fact there is
> just one MDIO bus controller per MAC.

Depends on whichever set of pins/lines are selected. But yeah, there's
only one MDIO bus controller in the MAC.

ChenYu

> Anyway, do whatever works best for you I guess.
> --
> Florian

^ permalink raw reply

* [PATCH] pktgen: add a new sample script for 40G and above link testing
From: Robert Hoo @ 2017-08-25  2:24 UTC (permalink / raw)
  To: robert.hu; +Cc: Robert Ho

From: Robert Ho <robert.hu@intel.com>

It's hard to benchmark 40G+ network bandwidth using ordinary
tools like iperf, netperf. I then tried with pktgen multiqueue sample
scripts, but still cannot reach line rate.
I then derived this NUMA awared irq affinity sample script from
multi-queue sample one, successfully benchmarked 40G link. I think this can
also be useful for 100G reference, though I haven't got device to test.

This script simply does:
Detect $DEV's NUMA node belonging.
Bind each thread (processor from that NUMA node) with each $DEV queue's
irq affinity, 1:1 mapping.
How many '-t' threads input determines how many queues will be
utilized.

Tested with Intel XL710 NIC with Cisco 3172 switch.

It would be even slightly better if the irqbalance service is turned
off outside.

Referrences:
https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf

Signed-off-by: Robert Hoo <robert.hu@intel.com>
---
 ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 +++++++++++++++++++++
 1 file changed, 132 insertions(+)
 create mode 100755 samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh

diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
new file mode 100755
index 0000000..f0ee25c
--- /dev/null
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Multiqueue: Using pktgen threads for sending on multiple CPUs
+#  * adding devices to kernel threads which are in the same NUMA node
+#  * bound devices queue's irq affinity to the threads, 1:1 mapping
+#  * notice the naming scheme for keeping device names unique
+#  * nameing scheme: dev@thread_number
+#  * flow variation via random UDP source port
+#
+basedir=`dirname $0`
+source ${basedir}/functions.sh
+root_check_run_with_sudo "$@"
+#
+# Required param: -i dev in $DEV
+source ${basedir}/parameters.sh
+
+get_iface_node()
+{
+	echo `cat /sys/class/net/$1/device/numa_node`
+}
+
+get_iface_irqs()
+{
+	local IFACE=$1
+	local queues="${IFACE}-.*TxRx"
+
+	irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
+	[ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
+	[ -z "$irqs" ] && irqs=$(for i in `ls -Ux /sys/class/net/$IFACE/device/msi_irqs` ;\
+		do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 -d : ;\
+	    done)
+	[ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"
+
+	echo $irqs
+}
+
+get_node_cpus()
+{
+	local node=$1
+	local node_cpu_list
+	local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \
+			/sys/devices/system/node/node$node/cpulist`
+
+	for cpu_range in $node_cpu_range_list
+	do
+		node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }`
+	done
+
+	echo $node_cpu_list
+}
+
+
+# Base Config
+DELAY="0"        # Zero means max speed
+COUNT="20000000"   # Zero means indefinitely
+[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
+
+# Flow variation random source port between min and max
+UDP_MIN=9
+UDP_MAX=109
+
+node=`get_iface_node $DEV`
+irq_array=(`get_iface_irqs $DEV`)
+cpu_array=(`get_node_cpus $node`)
+
+[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]}  ] && \
+	err 1 "Thread number $THREADS exceeds: min (${#irq_array[*]},${#cpu_array[*]})"
+
+# (example of setting default params in your script)
+if [ -z "$DEST_IP" ]; then
+    [ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
+fi
+[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+
+# General cleanup everything since last run
+pg_ctrl "reset"
+
+# Threads are specified with parameter -t value in $THREADS
+for ((i = 0; i < $THREADS; i++)); do
+    # The device name is extended with @name, using thread number to
+    # make then unique, but any name will do.
+    # Set the queue's irq affinity to this $thread (processor)
+    thread=${cpu_array[$i]}
+    dev=${DEV}@${thread}
+    echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list
+    echo "irq ${irq_array[$i]} is set affinity to `cat /proc/irq/${irq_array[$i]}/smp_affinity_list`"
+
+    # Add remove all other devices and add_device $dev to thread
+    pg_thread $thread "rem_device_all"
+    pg_thread $thread "add_device" $dev
+
+    # select queue and bind the queue and $dev in 1:1 relationship
+    queue_num=$i
+    echo "queue number is $queue_num"
+    pg_set $dev "queue_map_min $queue_num"
+    pg_set $dev "queue_map_max $queue_num"
+
+    # Notice config queue to map to cpu (mirrors smp_processor_id())
+    # It is beneficial to map IRQ /proc/irq/*/smp_affinity 1:1 to CPU number
+    pg_set $dev "flag QUEUE_MAP_CPU"
+
+    # Base config of dev
+    pg_set $dev "count $COUNT"
+    pg_set $dev "clone_skb $CLONE_SKB"
+    pg_set $dev "pkt_size $PKT_SIZE"
+    pg_set $dev "delay $DELAY"
+
+    # Flag example disabling timestamping
+    pg_set $dev "flag NO_TIMESTAMP"
+
+    # Destination
+    pg_set $dev "dst_mac $DST_MAC"
+    pg_set $dev "dst$IP6 $DEST_IP"
+
+    # Setup random UDP port src range
+    pg_set $dev "flag UDPSRC_RND"
+    pg_set $dev "udp_src_min $UDP_MIN"
+    pg_set $dev "udp_src_max $UDP_MAX"
+done
+
+# start_run
+echo "Running... ctrl^C to stop" >&2
+pg_ctrl "start"
+echo "Done" >&2
+
+# Print results
+for ((i = 0; i < $THREADS; i++)); do
+    thread=${cpu_array[$i]}
+    dev=${DEV}@${thread}
+    echo "Device: $dev"
+    cat /proc/net/pktgen/$dev | grep -A2 "Result:"
+done
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] staging: rtlwifi: Improve debugging by using debugfs
From: Andrew Lunn @ 2017-08-25  1:54 UTC (permalink / raw)
  To: Larry Finger
  Cc: gregkh, netdev, devel, Ping-Ke Shih, Yan-Hsuan Chuang,
	Birming Chiu, Shaofu, Steven Ting
In-Reply-To: <20170824212808.26632-1-Larry.Finger@lwfinger.net>

On Thu, Aug 24, 2017 at 04:28:08PM -0500, Larry Finger wrote:
> The changes in this commit are also being sent to the main rtlwifi
> drivers in wireless-next; however, these changes will also be useful for
> any debugging of r8822be before it gets moved into the main tree.
> 
> Use debugfs to dump register and btcoex status, and also write registers
> and h2c.
> 
> We create topdir in /sys/kernel/debug/rtlwifi/, and use the MAC address
> as subdirectory with several entries to dump mac_reg, bb_reg, rf_reg etc.
> An example is
>     /sys/kernel/debug/rtlwifi/00-11-22-33-44-55-66/mac_0
> 
> This change permits examination of device registers in a dynamic manner,
> a feature not available with the current debug mechanism.

Hi Larry

netdev frowns upon debugfs. You should try to keep this altogether,
making it easy to throw away before the driver is moved out of
staging.

You might want to look at ethtool -d. That will be accepted.

    Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: mv643xx_eth: Be drop monitor friendly
From: David Miller @ 2017-08-25  1:52 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, edumazet, andrew, sebastian.hesselbarth, linux-kernel
In-Reply-To: <20170824.182756.386299059058749250.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 24 Aug 2017 18:27:56 -0700 (PDT)

> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Thu, 24 Aug 2017 16:04:25 -0700
> 
>> txq_reclaim() does the normal transmit queue reclamation and
>> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
>> so use dev_consume_skb() for both locations.
>> 
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Applied.

-ENOCOMPILE?

drivers/net/ethernet/marvell/mv643xx_eth.c: In function ‘txq_reclaim’:
drivers/net/ethernet/marvell/mv643xx_eth.c:1124:5: error: implicit declaration of function ‘dev_consume_skb’; did you mean ‘dev_consume_skb_any’? [-Werror=implicit-function-declaration]
     dev_consume_skb(skb);
     ^~~~~~~~~~~~~~~
     dev_consume_skb_any

Reverted.

^ permalink raw reply

* [PATCH net-next] e1000e: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:36 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, davem, Florian Fainelli, Jeff Kirsher,
	moderated list:INTEL ETHERNET DRIVERS, open list

e1000_put_txbuf() cleans up the successfully transmitted TX packets,
e1000e_tx_hwtstamp_work() also does the successfully completes the
timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
e1000_remove() cleans up the timestampted packets. None of these
functions should be reporting dropped packets, so make them use
dev_consume_skb*() to be drop monitor friendly.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 327dfe5bedc0..91303544975a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1085,7 +1085,7 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,
 		buffer_info->dma = 0;
 	}
 	if (buffer_info->skb) {
-		dev_kfree_skb_any(buffer_info->skb);
+		dev_consume_skb_any(buffer_info->skb);
 		buffer_info->skb = NULL;
 	}
 	buffer_info->time_stamp = 0;
@@ -1199,7 +1199,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct *work)
 		wmb(); /* force write prior to skb_tstamp_tx */
 
 		skb_tstamp_tx(skb, &shhwtstamps);
-		dev_kfree_skb_any(skb);
+		dev_consume_skb_any(skb);
 	} else if (time_after(jiffies, adapter->tx_hwtstamp_start
 			      + adapter->tx_timeout_factor * HZ)) {
 		dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
@@ -1716,7 +1716,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
 		}
 
 		if (buffer_info->skb) {
-			dev_kfree_skb(buffer_info->skb);
+			dev_consume_skb(buffer_info->skb);
 			buffer_info->skb = NULL;
 		}
 
@@ -1734,7 +1734,7 @@ static void e1000_clean_rx_ring(struct e1000_ring *rx_ring)
 
 	/* there also may be some cached data from a chained receive */
 	if (rx_ring->rx_skb_top) {
-		dev_kfree_skb(rx_ring->rx_skb_top);
+		dev_consume_skb(rx_ring->rx_skb_top);
 		rx_ring->rx_skb_top = NULL;
 	}
 
@@ -7411,7 +7411,7 @@ static void e1000_remove(struct pci_dev *pdev)
 	if (adapter->flags & FLAG_HAS_HW_TIMESTAMP) {
 		cancel_work_sync(&adapter->tx_hwtstamp_work);
 		if (adapter->tx_hwtstamp_skb) {
-			dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
+			dev_consume_skb_any(adapter->tx_hwtstamp_skb);
 			adapter->tx_hwtstamp_skb = NULL;
 		}
 	}
-- 
2.9.3

^ permalink raw reply related

* [PATCH net 2/2] r8169: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli
In-Reply-To: <20170825013444.27326-1-f.fainelli@gmail.com>

rtl_tx() is the TX reclamation process whereas rtl8169_tx_clear_range() does
the TX ring cleaning during shutdown, both of these functions should call
dev_consume_skb_any() to be drop monitor friendly.

Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Fixes: eb781397904e ("r8169: Do not use dev_kfree_skb in xmit path")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 8a1bbd2a6a20..e03fcf914690 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6863,7 +6863,7 @@ static void rtl8169_tx_clear_range(struct rtl8169_private *tp, u32 start,
 			rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 					     tp->TxDescArray + entry);
 			if (skb) {
-				dev_kfree_skb_any(skb);
+				dev_consume_skb_any(skb);
 				tx_skb->skb = NULL;
 			}
 		}
@@ -7318,7 +7318,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 			tp->tx_stats.packets++;
 			tp->tx_stats.bytes += tx_skb->skb->len;
 			u64_stats_update_end(&tp->tx_stats.syncp);
-			dev_kfree_skb_any(tx_skb->skb);
+			dev_consume_skb_any(tx_skb->skb);
 			tx_skb->skb = NULL;
 		}
 		dirty_tx++;
-- 
2.9.3

^ permalink raw reply related

* [PATCH net 1/2] r8169: Do not increment tx_dropped in TX ring cleaning
From: Florian Fainelli @ 2017-08-25  1:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli
In-Reply-To: <20170825013444.27326-1-f.fainelli@gmail.com>

rtl8169_tx_clear_range() is responsible for cleaning up the TX ring
during interface shutdown, incrementing tx_dropped for every SKB that we
left at the time in the ring is misleading.

Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index bd07a15d3b7c..8a1bbd2a6a20 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6863,7 +6863,6 @@ static void rtl8169_tx_clear_range(struct rtl8169_private *tp, u32 start,
 			rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 					     tp->TxDescArray + entry);
 			if (skb) {
-				tp->dev->stats.tx_dropped++;
 				dev_kfree_skb_any(skb);
 				tx_skb->skb = NULL;
 			}
-- 
2.9.3

^ permalink raw reply related

* [PATCH net 0/2] r8169: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli

Hi all,

First patch may be questionable but no other driver appears to be doing that
and while it is defendable to account for left packets as dropped during TX
clean, this appears misleadning. I picked Stanislaw changes which brings us
back to 2010, but this was present from pre-git days as well.

Second patch fixes the two missing calls to dev_consume_skb_any().

Florian Fainelli (2):
  r8169: Do not increment tx_dropped in TX ring cleaning
  r8169: Be drop monitor friendly

 drivers/net/ethernet/realtek/r8169.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net 0/2] r8169: Be drop monitor friendly
From: Florian Fainelli @ 2017-08-25  1:33 UTC (permalink / raw)
  To: netdev
  Cc: davem, nic_swsd, romieu, edumazet, alexander.h.duyck, sgruszka,
	Florian Fainelli

Hi all,

First patch may be questionable but no other driver appears to be doing that
and while it is defendable to account for left packets as dropped during TX
clean, this appears misleadning. I picked Stanislaw changes which brings us
back to 2010, but this was present from pre-git days as well.

Second patch fixes the two missing calls to dev_consume_skb_any().

Florian Fainelli (2):
  r8169: Do not increment tx_dropped in TX ring cleaning
  r8169: Be drop monitor friendly

 drivers/net/ethernet/realtek/r8169.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply

* RE: Question about ip_defrag
From: liujian (CE) @ 2017-08-25  1:33 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, elena.reshetova@intel.com,
	edumazet@google.com, netdev@vger.kernel.org, Wangkefeng (Kevin),
	weiyongjun (A)
In-Reply-To: <20170824205926.2c45e3a1@redhat.com>


> -----Original Message-----
> From: Jesper Dangaard Brouer [mailto:brouer@redhat.com]
> Sent: Friday, August 25, 2017 2:59 AM
> To: liujian (CE)
> Cc: davem@davemloft.net; kuznet@ms2.inr.ac.ru; yoshfuji@linux-ipv6.org;
> elena.reshetova@intel.com; edumazet@google.com; netdev@vger.kernel.org;
> brouer@redhat.com
> Subject: Re: Question about ip_defrag
> 
> 
> On Thu, 24 Aug 2017 16:04:41 +0000 "liujian (CE)" <liujian56@huawei.com>
> wrote:
> 
> > >What kernel version have you seen this issue with?
> >
> > 3.10,with some backport.
> >
> >  >As far as I remember, this issue have been fixed before...
> >
> > which one patch? I didnot find out the patch:(
> 
> AFAIK it was some bugs in the percpu_counter code.  If you need to backport
> look at the git commits:
> 
>  git log lib/percpu_counter.c include/linux/percpu_counter.h
> 
> Are you maintaining your own 3.10 kernel?
> 
> I know that for RHEL7 (also kernel 3.10) we backported the percpu_counter
> fixes...
>
Could you tell me which one patch?  we have backported most of the two files's change. 
Thank you ~


> --Jesper
> 
> 
> > 发件人: Jesper Dangaard Brouer
> > 收件人: liujian
> (CE)<liujian56@huawei.com<mailto:liujian56@huawei.com>>
> > 抄送:
> >
> davem@davemloft.net<mailto:davem@davemloft.net>;kuznet@ms2.inr.ac.ru
> <m
> > ailto:kuznet@ms2.inr.ac.ru>;yoshfuji@linux-ipv6.org<mailto:yoshfuji@li
> > nux-ipv6.org>;elena.reshetova@intel.com<mailto:elena.reshetova@intel.c
> >
> om>;edumazet@google.com<mailto:edumazet@google.com>;netdev@vger.k
> ernel
> > .org<mailto:netdev@vger.kernel.org>;brouer@redhat.com<mailto:brouer@r
> e
> > dhat.com>
> > 主题: Re: Question about ip_defrag
> > 时间: 2017-08-24 21:53:17
> >
> >
> > On Thu, 24 Aug 2017 13:15:33 +0000 "liujian (CE)" <liujian56@huawei.com>
> wrote:
> > > Hello,
> > >
> > > With below patch we met one issue.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > ommit/?h=v4.13-rc6&id=6d7b857d541e
> > >
> > > the issue:
> > > Ip_defrag fail caused by frag_mem_limit reached 4M(frags.high_thresh).
> > > At this moment,sum_frag_mem_limit is about 10K.
> > > and my test machine's cpu num is 64.
> > >
> > > Can i only change frag_mem_limit to sum_ frag_mem_limit?
> > >
> > >
> > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > > index 96e95e8..f09c00b 100644
> > > --- a/net/ipv4/inet_fragment.c
> > > +++ b/net/ipv4/inet_fragment.c
> > > @@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct
> > > inet_frags *f)  static bool inet_fragq_should_evict(const struct
> > > inet_frag_queue *q)  {
> > >         return q->net->low_thresh == 0 ||
> > > -              frag_mem_limit(q->net) >= q->net->low_thresh;
> > > +              sum_frag_mem_limit(q->net) >= q->net->low_thresh;
> > >  }
> > >
> > >  static unsigned int
> > > @@ -355,7 +355,7 @@ static struct inet_frag_queue
> > > *inet_frag_alloc(struct netns_frags *nf,  {
> > >         struct inet_frag_queue *q;
> > >
> > > -       if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
> > > +       if (!nf->high_thresh || sum_frag_mem_limit(nf) >
> > > + nf->high_thresh) {
> > >                 inet_frag_schedule_worker(f);
> > >                 return NULL;
> > >         }
> > > @@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct
> netns_frags *nf,
> > >         struct inet_frag_queue *q;
> > >         int depth = 0;
> > >
> > > -       if (frag_mem_limit(nf) > nf->low_thresh)
> > > +       if (sum_frag_mem_limit(nf) > nf->low_thresh)
> > >                 inet_frag_schedule_worker(f);
> > >
> > >         hash &= (INETFRAGS_HASHSZ - 1);
> > > --
> > >
> > > Thank you for your time.
> >
> > What kernel version have you seen this issue with?
> >
> > As far as I remember, this issue have been fixed before...
> >
> > --
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> 
> 
> 
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next v2] tg3: Be drop monitor friendly
From: David Miller @ 2017-08-25  1:28 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, edumazet, siva.kallam, prashant, mchan, linux-kernel
In-Reply-To: <20170825004711.19793-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 17:47:11 -0700

> tg3_tx() does the normal packet TX completion,
> tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
> new SKB that is suitable to workaround HW bugs, and finally
> tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
> these 3 locations to be SKB drop monitor friendly.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: mv643xx_eth: Be drop monitor friendly
From: David Miller @ 2017-08-25  1:27 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, edumazet, andrew, sebastian.hesselbarth, linux-kernel
In-Reply-To: <20170824230425.15948-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 16:04:25 -0700

> txq_reclaim() does the normal transmit queue reclamation and
> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
> so use dev_consume_skb() for both locations.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: systemport: Free DMA coherent descriptors on errors
From: David Miller @ 2017-08-25  1:25 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, edumazet
In-Reply-To: <20170824230113.15283-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 16:01:13 -0700

> In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we
> would return with an error, and call bcm_sysport_fini_tx_ring() and it
> would see that ring->cbs is NULL and do nothing. This would leak the
> coherent DMA descriptor area, so we need to free it on error before
> returning.
> 
> Reported-by: Eric Dumazet <edumazet@gmail.com>
> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] net: bcmgenet: Be drop monitor friendly
From: David Miller @ 2017-08-25  1:24 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, opendmb, edumazet, pgynther, jaedon.shin
In-Reply-To: <20170824225629.27850-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 15:56:29 -0700

> There are 3 spots where we call dev_kfree_skb() but we are actually
> just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal
> TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring
> setup and bcmgenet_free_rx_buffers() during RX ring cleanup.
> 
> Fixes: d6707bec5986 ("net: bcmgenet: rewrite bcmgenet_rx_refill()")
> Fixes: f48bed16a756 ("net: bcmgenet: Free skb after last Tx frag")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

> David, I tagged these two commits, but this can actually go back
> to when the driver was first merged. Let me know if you would want
> me to provide -stable backports against other trees.

Ok, I'll queue these up for -stable.

^ permalink raw reply

* Re: [PATCH net] bpf: fix bpf_setsockopts return value
From: David Miller @ 2017-08-25  1:24 UTC (permalink / raw)
  To: ycheng; +Cc: brakmo, ncardwell, cgallek, netdev
In-Reply-To: <20170824224821.31672-1-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Thu, 24 Aug 2017 15:48:21 -0700

> This patch fixes a bug causing any sock operations to always return EINVAL.
> 
> Fixes: a5192c52377e ("bpf: fix to bpf_setsockops").
> Reported-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Craig Gallek <kraig@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: systemport: Be drop monitor friendly
From: David Miller @ 2017-08-25  1:23 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, edumazet
In-Reply-To: <20170824222041.8377-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 Aug 2017 15:20:41 -0700

> Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is
> used when a TX packet is completed, as well as when the RX ring is
> cleaned on shutdown. None of these two cases are packet drops, so be
> drop monitor friendly.
> 
> Suggested-by: Eric Dumazet <edumazet@gmail.com>
> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use
From: David Miller @ 2017-08-25  1:21 UTC (permalink / raw)
  To: jkbs; +Cc: netdev, hannes, nikolay, tom
In-Reply-To: <20170823075831.27031-1-jkbs@redhat.com>

From: Jakub Sitnicki <jkbs@redhat.com>
Date: Wed, 23 Aug 2017 09:58:27 +0200

> This patch set is another take at making Path MTU Discovery work when
> server nodes are behind a router employing multipath routing in a
> load-balance or anycast setup (that is, when not every end-node can be
> reached by every path). The problem has been well described in RFC 7690
> [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
> to be routed back to the server node that sent a reply that exceeds path
> MTU.
 ...

Ok, looks not to bad.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
From: David Miller @ 2017-08-25  1:19 UTC (permalink / raw)
  To: decui
  Cc: jhansen, stefanha, netdev, mkubecek, joe, olaf, sthemmin,
	jasowang, kys, haiyangz, dave.scott, linux-kernel, apw,
	rolf.neugebauer, gregkh, marcelo.cerri, devel, vkuznets,
	georgezhang, dan.carpenter, acking, dtor, grantr, cavery
In-Reply-To: <KL1P15301MB0008D0FB2F8F5B90E7737C3CBF850@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 23 Aug 2017 04:52:14 +0000

> +#define VMBUS_PKT_TRAILER	(sizeof(u64))

This is not the packet trailer, it's the size of the packet trailer.
Please make this macro name match more accurately what it is.

> +	/* Have we sent the zero-length packet (FIN)? */
> +	unsigned long fin_sent;

Why does this need to be atomic?  Why can't a smaller simpler
mechanism be used to make sure hvs_shutdown() only performs
hvs_send_data() call once on the channel?

> +static inline bool is_valid_srv_id(const uuid_le *id)
> +{
> +	return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
> +}

Do not use the inline function attribute in *.c code.  Let the
compiler decide.

> +static inline unsigned int get_port_by_srv_id(const uuid_le *svr_id)

Likewise.

> +static inline void hvs_addr_init(struct sockaddr_vm *addr,

Likewise.

> +static inline void hvs_remote_addr_init(struct sockaddr_vm *remote,
> +					struct sockaddr_vm *local)

Likewise.

And so on and so forth, please audit this for your entire patch.

> +	*((u32 *)&h->vm_srv_id) = vsk->local_addr.svm_port;
> +	*((u32 *)&h->host_srv_id) = vsk->remote_addr.svm_port;

There has to be a better way to express this.

And if this is partially initializing vm_srv_id, at a minimum
endianness needs to be taken into account.

^ permalink raw reply

* Re: [net-next:master 1328/1341] drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 'devlink_dpipe_header_ethernet' undeclared here (not in a function)
From: David Miller @ 2017-08-25  1:11 UTC (permalink / raw)
  To: fengguang.wu; +Cc: arkadis, kbuild-all, netdev, jiri
In-Reply-To: <201708250859.0ROh3FDk%fengguang.wu@intel.com>

From: kbuild test robot <fengguang.wu@intel.com>
Date: Fri, 25 Aug 2017 08:16:01 +0800

> All error/warnings (new ones prefixed by >>):
> 
>>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 'devlink_dpipe_header_ethernet' undeclared here (not in a function)
>      &devlink_dpipe_header_ethernet,
>       ^
>>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:78:3: error: 'devlink_dpipe_header_ipv4' undeclared here (not in a function)
>      &devlink_dpipe_header_ipv4,
>       ^
>    drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 'mlxsw_sp_dpipe_erif_table_init':
>    drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:305:9: error: too few arguments to function 'devlink_dpipe_table_register'
>      return devlink_dpipe_table_register(devlink,

Jiri and co., I think you'll need some ifdef'ery to deal with this
properly.

^ permalink raw reply

* Re: [net-next:master 1324/1341] drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few arguments to function 'devlink_dpipe_table_register'
From: David Miller @ 2017-08-25  1:11 UTC (permalink / raw)
  To: fengguang.wu; +Cc: arkadis, kbuild-all, netdev, jiri
In-Reply-To: <201708250827.irRihDoq%fengguang.wu@intel.com>

From: kbuild test robot <fengguang.wu@intel.com>
Date: Fri, 25 Aug 2017 08:03:28 +0800

> All errors (new ones prefixed by >>):
> 
>    drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 'mlxsw_sp_dpipe_erif_table_init':
>>> drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few arguments to function 'devlink_dpipe_table_register'
>      return devlink_dpipe_table_register(devlink,
>             ^
>    In file included from drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:36:0:
>    include/net/devlink.h:401:1: note: declared here
>     devlink_dpipe_table_register(struct devlink *devlink,
>     ^
>    drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:327:1: warning: control reaches end of non-void function [-Wreturn-type]
>     }
>     ^

I'll push the following fix into net-next for this:

====================
>From 790c6056686cc4dd5b149b330bbd5ae208d4d721 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Thu, 24 Aug 2017 18:10:46 -0700
Subject: [PATCH] devlink: Fix devlink_dpipe_table_register() stub signature.

One too many arguments compared to the non-stub version.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: ffd3cdccf214 ("devlink: Add support for dynamic table size")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/devlink.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 047a0c54f652..aaf7178127a2 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -402,8 +402,7 @@ static inline int
 devlink_dpipe_table_register(struct devlink *devlink,
 			     const char *table_name,
 			     struct devlink_dpipe_table_ops *table_ops,
-			     void *priv, u64 size,
-			     bool counter_control_extern)
+			     void *priv, bool counter_control_extern)
 {
 	return 0;
 }
-- 
2.13.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox