Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 08/14] gtp: Support encpasulating over IPv6
From: Harald Welte @ 2017-09-21  0:04 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Tom Herbert, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <CALx6S36zz1CFq75S5-=UK8wtuK-pxaQdWU7tVuRGhtb_Y9QkDg@mail.gmail.com>

Hi Tom,

On Wed, Sep 20, 2017 at 01:40:54PM -0700, Tom Herbert wrote:
> On Wed, Sep 20, 2017 at 12:45 PM, David Miller <davem@davemloft.net> wrote:
> > There is a socket associated with the tunnel to do the encapsulation
> > and it has an address family, right?
> 
> If fd's are set from userspace for the sockets then we could derive
> the address family from them. I'll change that. Although, looking at
> now I am wondering why were passing fds into GTP instead of just
> having the kernel create the UDP port like is done for other encaps.

because the userspace process has to take care of those bits of GTP-U
that the kernel doesn't, such as responding to GTP ECHO requests with
GTP echo responses.  Only the "GTP Message type G-PDU" is handled in the
kernel, as only those frames contain user plane.  See table 1 of Section
7.1 of 3GPP TS 29.060.

If you create the socket in the kernel, how would you hand the socket to
the userspace process later on?

IMHO, it feels more natural to simply create it in userspace (like you
would do in the non-kernel-accelerated case) and then simply handle the
G-PDU messages in the kernel while doing the rest in userspace.

But if there's another method that feels more usual to the kernel
community, I'm not against any changes - but given kernel policies, we'd
have to keep userspace compatbility, right?

Regards,
	Harald
-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: [PATCH net-next 08/14] gtp: Support encpasulating over IPv6
From: Tom Herbert @ 2017-09-21  0:16 UTC (permalink / raw)
  To: Harald Welte
  Cc: Tom Herbert, David Miller, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <20170921000437.rg2h6vyxsrbc3bel@nataraja>

On Wed, Sep 20, 2017 at 5:04 PM, Harald Welte <laforge@gnumonks.org> wrote:
> Hi Tom,
>
> On Wed, Sep 20, 2017 at 01:40:54PM -0700, Tom Herbert wrote:
>> On Wed, Sep 20, 2017 at 12:45 PM, David Miller <davem@davemloft.net> wrote:
>> > There is a socket associated with the tunnel to do the encapsulation
>> > and it has an address family, right?
>>
>> If fd's are set from userspace for the sockets then we could derive
>> the address family from them. I'll change that. Although, looking at
>> now I am wondering why were passing fds into GTP instead of just
>> having the kernel create the UDP port like is done for other encaps.
>
> because the userspace process has to take care of those bits of GTP-U
> that the kernel doesn't, such as responding to GTP ECHO requests with
> GTP echo responses.  Only the "GTP Message type G-PDU" is handled in the
> kernel, as only those frames contain user plane.  See table 1 of Section
> 7.1 of 3GPP TS 29.060.
>
Okay, thanks for the explanation.

Tom

^ permalink raw reply

* Re: [v2,1/3] can: m_can: Make hclk optional
From: Franklin S Cooper Jr @ 2017-09-21  0:31 UTC (permalink / raw)
  To: Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-can-u79uwXL29TY76Z2rM5mHXA, wg-5Yr1BZd7O62+XT7JhA+gdA,
	mkl-bIcnvbaLZ9MEGnE8C9+IrQ, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	quentin.schulz-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8
  Cc: Kristo, Tero, Tony Lindgren, Linux OMAP List
In-Reply-To: <6f574628-51cd-0db8-e5aa-e7ae4a9cf79a-l0cyMroinI0@public.gmane.org>



On 08/24/2017 03:00 AM, Sekhar Nori wrote:
> + some OMAP folks and Linux OMAP list
> 
> On Tuesday 25 July 2017 04:21 AM, Franklin Cooper wrote:
>> Hclk is the MCAN's interface clock. However, for OMAP based devices such as
>> DRA7 SoC family the interface clock is handled by hwmod. Therefore, this
>> interface clock is managed by hwmod driver via pm_runtime_get and
>> pm_runtime_put calls. Therefore, this interface clock isn't defined in DT
>> and thus the driver shouldn't fail if this clock isn't found.
> 
> I agree that hclk is defined as interface clock for M_CAN IP on DRA76x.
> 
> However, there may be a need for the driver to know the value of hclk to
> properly configure the RAM watchdog register which has a counter
> counting down using hclk. Looks like the driver does not use the RAM
> watchdog today. But if there is a need to configure it in future, it
> might be a problem.

Honestly the RAM watchdog seems like a fundamental design problem.
This RAM watchdog seems to be used in case a request to access the
message ram is made but it hangs for what ever reason. Its even more
complicated since the Message RAM is external to the MCAN IP so its
implementation or how its handled probably differs from device to
device. From example say you do have this error it isn't clear how you
would recover from it. A logically answer would be to reset the entire
IP but that also assumes that Message RAM will be reset along with the
ip which likely depends on each SoC.

But if a readl/writel command hangs will the kernel eventually throw an
error on its on or will the driver just hang? If it does hang can a
driver in the ISR do something to properly terminate the driver or even
recover from it?
> 
> Is there a restriction in OMAP architecture against passing the
> interface clock also in the 'clocks' property in DT. I have not tried it
> myself, but wonder if you hit an issue that led to this patch.

No but not passing the interface clock is typical.
> 
>>
>> Signed-off-by: Franklin S Cooper Jr <fcooper-l0cyMroinI0@public.gmane.org>
>> ---
>> Version 2 changes:
>> Used NULL instead of 0 for unused hclk handle
>>
>>  drivers/net/can/m_can/m_can.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
>> index f4947a7..ea48e59 100644
>> --- a/drivers/net/can/m_can/m_can.c
>> +++ b/drivers/net/can/m_can/m_can.c
>> @@ -1568,8 +1568,13 @@ static int m_can_plat_probe(struct platform_device *pdev)
>>  	hclk = devm_clk_get(&pdev->dev, "hclk");
>>  	cclk = devm_clk_get(&pdev->dev, "cclk");
>>  
>> -	if (IS_ERR(hclk) || IS_ERR(cclk)) {
>> -		dev_err(&pdev->dev, "no clock found\n");
>> +	if (IS_ERR(hclk)) {
>> +		dev_warn(&pdev->dev, "hclk could not be found\n");
>> +		hclk = NULL;
> 
> What is the purpose of NULL setting the clock. I think this is taking it
> into a very implementation defined territory and the result could be
> different on different architectures. See Russell's explanation here:
> https://lkml.org/lkml/2016/11/10/799
> 
> Thanks,
> Sekhar
> 
>> +	}
>> +
>> +	if (IS_ERR(cclk)) {
>> +		dev_err(&pdev->dev, "cclk could not be found\n");
>>  		ret = -ENODEV;
>>  		goto failed_ret;
>>  	}
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [v2,3/3] can: m_can: Add PM Runtime
From: Franklin S Cooper Jr @ 2017-09-21  0:36 UTC (permalink / raw)
  To: Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-can-u79uwXL29TY76Z2rM5mHXA, wg-5Yr1BZd7O62+XT7JhA+gdA,
	mkl-bIcnvbaLZ9MEGnE8C9+IrQ, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	quentin.schulz-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8
  Cc: Linux OMAP List, Alexandre Belloni, Wenyou Yang,
	Mario Hüttel
In-Reply-To: <8037f5f9-d51f-99c3-f2e8-62c90e56a6fa-l0cyMroinI0@public.gmane.org>


On 08/24/2017 03:30 AM, Sekhar Nori wrote:
> + OMAP mailing list
> 
> On Tuesday 25 July 2017 04:21 AM, Franklin Cooper wrote:
>> Add support for PM Runtime which is the new way to handle managing clocks.
>> However, to avoid breaking SoCs not using PM_RUNTIME leave the old clk
>> management approach in place.
> 
> Since hclk is the interface clock, I think at a minimum there needs to
> be an assumption that pm_runtime_get_sync() will enable that clock and
> make the module ready for access.
> 
> The only in-kernel user of this driver seems to be an atmel SoC. I am
> ccing folks who added support for M_CAN on that SoC to see if hclk
> management can be left to pm_runtime*() on their SoC.
> 
> If they are okay, I think its a good to atleast drop explicit hclk
> enable disable in the driver. That way, we avoid double enable/disable
> of interface clock (hclk).

Wenyou,

Do you foresee this being a problem on your board? If not I can send a
v3 removing these clk_enable and clk_disable calls and it would be great
if you can test it out and provide your tested by.

> 
>>
>> PM_RUNTIME is required by OMAP based devices to handle clock management.
>> Therefore, this allows future Texas Instruments SoCs that have the MCAN IP
>> to work with this driver.
>>
>> Signed-off-by: Franklin S Cooper Jr <fcooper-l0cyMroinI0@public.gmane.org>
> 
> Thanks,
> Sekhar
> 
>> ---
>>  drivers/net/can/m_can/m_can.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
>> index ea48e59..93e23f5 100644
>> --- a/drivers/net/can/m_can/m_can.c
>> +++ b/drivers/net/can/m_can/m_can.c
>> @@ -23,6 +23,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_device.h>
>>  #include <linux/platform_device.h>
>> +#include <linux/pm_runtime.h>
>>  #include <linux/iopoll.h>
>>  #include <linux/can/dev.h>
>>  
>> @@ -633,11 +634,15 @@ static int m_can_clk_start(struct m_can_priv *priv)
>>  	if (err)
>>  		clk_disable_unprepare(priv->hclk);
>>  
>> +	pm_runtime_get_sync(priv->device);
>> +
>>  	return err;
>>  }
>>  
>>  static void m_can_clk_stop(struct m_can_priv *priv)
>>  {
>> +	pm_runtime_put_sync(priv->device);
>> +
>>  	clk_disable_unprepare(priv->cclk);
>>  	clk_disable_unprepare(priv->hclk);
>>  }
>> @@ -1582,6 +1587,8 @@ static int m_can_plat_probe(struct platform_device *pdev)
>>  	/* Enable clocks. Necessary to read Core Release in order to determine
>>  	 * M_CAN version
>>  	 */
>> +	pm_runtime_enable(&pdev->dev);
>> +
>>  	ret = clk_prepare_enable(hclk);
>>  	if (ret)
>>  		goto disable_hclk_ret;
>> @@ -1626,6 +1633,8 @@ static int m_can_plat_probe(struct platform_device *pdev)
>>  	 */
>>  	tx_fifo_size = mram_config_vals[7];
>>  
>> +	pm_runtime_get_sync(&pdev->dev);
>> +
>>  	/* allocate the m_can device */
>>  	dev = alloc_m_can_dev(pdev, addr, tx_fifo_size);
>>  	if (!dev) {
>> @@ -1670,6 +1679,7 @@ static int m_can_plat_probe(struct platform_device *pdev)
>>  disable_hclk_ret:
>>  	clk_disable_unprepare(hclk);
>>  failed_ret:
>> +	pm_runtime_put_sync(&pdev->dev);
>>  	return ret;
>>  }
>>  
>> @@ -1726,6 +1736,9 @@ static int m_can_plat_remove(struct platform_device *pdev)
>>  	struct net_device *dev = platform_get_drvdata(pdev);
>>  
>>  	unregister_m_can_dev(dev);
>> +
>> +	pm_runtime_disable(&pdev->dev);
>> +
>>  	platform_set_drvdata(pdev, NULL);
>>  
>>  	free_m_can_dev(dev);
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates
From: Franklin S Cooper Jr @ 2017-09-21  0:48 UTC (permalink / raw)
  To: Mario Hüttel, Yang, Wenyou, Sekhar Nori, wg, mkl, socketcan,
	quentin.schulz, edumazet, linux-can, netdev, linux-kernel
  Cc: Wenyou Yang, Dong Aisheng
In-Reply-To: <e3a7d9a1-d75c-a39e-f122-73fced02a1dc@gmx.net>



On 09/20/2017 04:37 PM, Mario Hüttel wrote:
> 
> 
> On 09/20/2017 10:19 PM, Franklin S Cooper Jr wrote:
>> Hi Wenyou,
>>
>> On 09/17/2017 10:47 PM, Yang, Wenyou wrote:
>>>
>>> On 2017/9/14 13:06, Sekhar Nori wrote:
>>>> On Thursday 14 September 2017 03:28 AM, Franklin S Cooper Jr wrote:
>>>>> On 08/18/2017 02:39 PM, Franklin S Cooper Jr wrote:
>>>>>> During test transmitting using CAN-FD at high bitrates (4 Mbps) only
>>>>>> resulted in errors. Scoping the signals I noticed that only a single
>>>>>> bit
>>>>>> was being transmitted and with a bit more investigation realized the
>>>>>> actual
>>>>>> MCAN IP would go back to initialization mode automatically.
>>>>>>
>>>>>> It appears this issue is due to the MCAN needing to use the Transmitter
>>>>>> Delay Compensation Mode as defined in the MCAN User's Guide. When this
>>>>>> mode is used the User's Guide indicates that the Transmitter Delay
>>>>>> Compensation Offset register should be set. The document mentions
>>>>>> that this
>>>>>> register should be set to (1/dbitrate)/2*(Func Clk Freq).
>>>>>>
>>>>>> Additional CAN-CIA's "Bit Time Requirements for CAN FD" document
>>>>>> indicates
>>>>>> that this TDC mode is only needed for data bit rates above 2.5 Mbps.
>>>>>> Therefore, only enable this mode and only set TDCO when the data bit
>>>>>> rate
>>>>>> is above 2.5 Mbps.
>>>>>>
>>>>>> Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
>>>>>> ---
>>>>>> I'm pretty surprised that this hasn't been implemented already since
>>>>>> the primary purpose of CAN-FD is to go beyond 1 Mbps and the MCAN IP
>>>>>> supports up to 10 Mbps.
>>>>>>
>>>>>> So it will be nice to get comments from users of this driver to
>>>>>> understand
>>>>>> if they have been able to use CAN-FD beyond 2.5 Mbps without this
>>>>>> patch.
>>>>>> If they haven't what did they do to get around it if they needed higher
>>>>>> speeds.
>>>>>>
>>>>>> Meanwhile I plan on testing this using a more "realistic" CAN bus to
>>>>>> insure
>>>>>> everything still works at 5 Mbps which is the max speed of my CAN
>>>>>> transceiver.
>>>>> ping. Anyone has any thoughts on this?
>>>> I added Dong who authored the m_can driver and Wenyou who added the only
>>>> in-kernel user of the driver for any help.
>>> I tested it on SAMA5D2 Xplained board both with and without this patch, 
>>> both work with the 4M bps data bit rate.
>> Thank you for testing this out. Its interesting that you have been able
>> to use higher speeds without this patch. What is the CAN transceiver
>> being used on the SAMA5D2 Xplained board? I tried looking at the
>> schematic but it seems the CAN signals are used on an extension board
>> which I can't find the schematic for. Also do you mind sharing your test
>> setup? Were you doing a short point to point test?
>>
>> Thank You,
>> Franklin
> Hello Franklin,
> 
> your patch definitely makes sense.
> 
> I forgot the TDC in my patches because it was not present in the
> previous driver versions and because I didn't encounter any
> problems when testing it myself.
> 
> The error is highly dependent on the hardware (transceiver) setup.
> So it is definitely possible that some people don't encounter errors
> without your patch.

So the Transmission Delay Compensation feature Value register is suppose
to take into consideration the transceiver delay automatically and add
the value of TDCO on top of that. So why would TDCO be dependent on the
transceiver? I've heard conflicting things regarding TDC so any
clarification on what actually impacts it would be appreciated.

Also part of the issue I'm having is how can we properly configure TDCO?
Configuring TDCO is essentially figuring out what Secondary Sample Point
to use. However, it is unclear what value to set SSP to and which use
cases a given SSP will work or doesn't work. I've seen various
recommendations from Bosch on choosing SSP but ultimately it seems they
suggestion "real world testing" to come up with a proper value. Not
setting TDCO causes problems for my device and improperly setting TDCO
causes problems for my device. So its likely any value I use could end
up breaking something for someone else.

Currently I leaning to a DT property that can be used for setting SSP.
Perhaps use a generic default value and allow individuals to override it
via DT?
> 
> Could you clarify what you meant with
>> Scoping the signals I noticed that only a single bit was being transmitted 
> 
> Do you mean one data bit (high bit rate)  or did the core already fail
> in the arbitration phase?

A single bit during the arbitration phase. Essentially the Start of
Frame bit is sent and then nothing else and the IP resets.
> 
> There is also another aspect that can lead to errors:
> 
> If the CAN clock 'cclk' is above the frequency of the interface/logic
> clock 'hclk', the clock domain crossing of the CAN messages can't
> work properly. However, I will throw this topic as an extra e-mail into
> the round.
> 
> Mario
> 
>  
> 

^ permalink raw reply

* Re: [PATCH iproute2] tc: fq: support low_rate_threshold attribute
From: Stephen Hemminger @ 2017-09-21  0:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Neal Cardwell, Soheil Hassas Yeganeh
In-Reply-To: <1504905179.15310.101.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, 08 Sep 2017 14:12:59 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9
> 
> Tested:
> 
> lpaa5:/tmp# tc -qd add dev eth1 root fq
> lpaa5:/tmp# tc -s qd sh dev eth1
> qdisc fq 8003: root refcnt 5 limit 10000p flow_limit 1000p buckets 4096 \
>  orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3648 \
>  initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
>  Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   116 flows (114 inactive, 0 throttled)
>   1 gc, 0 highprio, 0 throttled
>     
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
> 99th Percentile Latency Microseconds
> 7081
>     
> lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
> 99th Percentile Latency Microseconds
> 858
> 
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied

^ permalink raw reply

* Re: [iproute PATCH v2] ipaddress: Fix segfault in 'addr showdump'
From: Stephen Hemminger @ 2017-09-21  0:54 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Hangbin Liu, netdev, Julien Fortin
In-Reply-To: <20170913092034.7002-1-phil@nwl.cc>

On Wed, 13 Sep 2017 11:20:34 +0200
Phil Sutter <phil@nwl.cc> wrote:

> Obviously, 'addr showdump' feature wasn't adjusted to json output
> support. As a consequence, calls to print_string() in print_addrinfo()
> tried to dereference a NULL FILE pointer.
> 
> Cc: Julien Fortin <julien@cumulusnetworks.com>
> Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> --
> Changes since v1:
> Align json output with that of 'ip -j addr show':
> - Interface index label is 'ifindex', not 'index' and it doesn't belong
>   to 'addr_info' array.
> - Create one 'addr_info' array per dumped address, not one for all.
> ---
>  ip/ipaddress.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 

Looks good, applied.

^ permalink raw reply

* Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
From: Tom Herbert @ 2017-09-21  0:55 UTC (permalink / raw)
  To: Harald Welte
  Cc: Andreas Schultz, Tom Herbert, David S. Miller,
	Linux Kernel Network Developers, Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <20170921001313.n2uo56oxmphcvd5g@nataraja>

On Wed, Sep 20, 2017 at 5:13 PM, Harald Welte <laforge@gnumonks.org> wrote:
> Hi Tom,
>
> On Wed, Sep 20, 2017 at 09:24:07AM -0700, Tom Herbert wrote:
>> On Wed, Sep 20, 2017 at 9:07 AM, Andreas Schultz <aschultz@tpip.net> wrote:
>> > GTP isn't special, I just don't like to have testing only features in there
>> > when the same goal can be reached without having to add extra stuff. Adding
>> > code that is not going to be useful in real production setups (or in this
>> > case would even break production setups when enabled accidentally) makes the
>> > implementation more complex than it needs to be.
>>
>> Well, you could make the same argument that allowing GTP to configured
>> as standalone interface is a problem since GTP is only allowed to be
>> with used with GTP-C. But, then we have something in the kernel that
>> the community is expected to support, but requires jumping through a
>> whole bunch of hoops just to run a simple netperf.
>
> "A whole bunch of hoops" without your new interface would consist of
> running a single command-line program that is supplied with libgtpnl.
> This is not a complete 3GPP network, but a simple libmnl-based helper
> library with no other depenencies.
>
You have the point of view of someone who has a lot of experience
dealing with this protocol. Try to imagine if you were some random
kernel network programmer with no experience in the area. If they
happen to find a one-off bug and want to do the right thing by running
a test, you want to make that as easy as possible. From that
perspective, building protocol specific libraries and finding the
right cmd line to run is significant hoops (I can attest to this).
There are other examples in the kernel of systems bigger than GTP that
require a whole lot of effort just to run a simple test; you'll notice
for those it's rare that best developers ever bother to look at them
unless they're making a global change that affects the code. We don't
want GTP to take be like that!

> I'm not neccessarily against introducing features like the 'standalone
> interface configuration'.  However, we must make sure that any
> significant new feature contributions like IPv6 are tested in a
> "realistic setup" and not just using those 'interfaces added for easy
> development'.  Also, I would argue those 'interfaces added for easy
> deveopment/benchmarking' should probably be clearly marked as such to
> avoid raising the impression that this is what leads to a
> standard-conforming / production-type setup.
>
Given the obvious complexity of running a real GTP stack, I don't
think we have to worry about this. In order to test a "realistic
setup" a whole bunch of other support is needed. So the forward
looking question now is how to get to be able to run a "realistic
setup"?

Tom

^ permalink raw reply

* Re: [PATCH iproute2] Add information about COLORFGBG to ip.8 man page
From: Stephen Hemminger @ 2017-09-21  0:56 UTC (permalink / raw)
  To: Roland Hopferwieser; +Cc: netdev
In-Reply-To: <873bbae3-db89-b572-748b-cfa62922f883@ica.jku.at>

On Fri, 15 Sep 2017 22:04:16 +0200
Roland Hopferwieser <rhopfer@ica.jku.at> wrote:

> diff --git a/man/man8/ip.8 b/man/man8/ip.8
> index ae018fdf..2a27a56e 100644
> --- a/man/man8/ip.8
> +++ b/man/man8/ip.8
> @@ -187,7 +187,8 @@ executes specified command over all objects, it 
> depends if command supports this
> 
>   .TP
>   .BR "\-c" , " -color"
> -Use color output.
> +Use color output. The color palette is affected by the COLORFGBG 
> environment variable, which typically has the form "fg;bg".
> +If "bg" is set to 0-6 or 8, the dark color palette is used.
> 
>   .TP
>   .BR "\-t" , " \-timestamp"

Your patch was damaged by the mailer you used.
Please fix and resubmit.

$ patch -p1 < ~/Downloads/iproute2-Add-information-about-COLORFGBG-to-ip.8-man-page.patch 
patching file man/man8/ip.8
patch: **** malformed patch at line 23: depends if command supports this

^ permalink raw reply

* Re: [iproute2 PATCH] tc/mirred: Clean up white-space noise
From: Stephen Hemminger @ 2017-09-21  0:59 UTC (permalink / raw)
  To: Amritha Nambiar; +Cc: netdev, alexander.h.duyck
In-Reply-To: <150529711361.57333.10151730051655064515.stgit@anamdev.jf.intel.com>

On Wed, 13 Sep 2017 03:05:13 -0700
Amritha Nambiar <amritha.nambiar@intel.com> wrote:

> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> ---
>  include/linux/tc_act/tc_mirred.h |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/tc_act/tc_mirred.h b/include/linux/tc_act/tc_mirred.h
> index 3d7a2b3..69038c2 100644
> --- a/include/linux/tc_act/tc_mirred.h
> +++ b/include/linux/tc_act/tc_mirred.h

All files in include/linux are perodically refreshed from the uapi directory
in the kernel source.

Please re-submit this as a patch to net-next in kernel against include/uapi/tc_act/tc_mirred.h
and then iproute2 will pick it up in next pass.

^ permalink raw reply

* Re: [PATCH iproute2] tc: fix typo in tc-tcindex man page
From: Stephen Hemminger @ 2017-09-21  1:01 UTC (permalink / raw)
  To: Davide Caratti; +Cc: netdev
In-Reply-To: <de07579ec6e41c65103ada6733072874af95773e.1505397960.git.dcaratti@redhat.com>

On Thu, 14 Sep 2017 17:00:46 +0200
Davide Caratti <dcaratti@redhat.com> wrote:

> fix mis-typed 'pass_on' keyword.
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied, thanks Davide

^ permalink raw reply

* Re: [PATCH 1/1] ip: ip_print: if no json obj is initialized default fp is stdout
From: Stephen Hemminger @ 2017-09-21  1:04 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa
In-Reply-To: <20170920221914.92740-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 15:19:14 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
> 
> $ ip -t mon label link
>     fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
>     #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
>     #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
>     #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
>     #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
>         at libnetlink.c:761
> 	#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
> 	#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
> 	#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
> 
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
> 
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
> ---

The fix looks correct, but the patch has minor style format issue.

ERROR: "foo* bar" should be "foo *bar"
#54: FILE: ip/ip_print.c:26:
+static inline FILE* _get_fp(void)

^ permalink raw reply

* Re: [patch net-next 00/16] mlxsw: Multicast flood update
From: David Miller @ 2017-09-21  1:05 UTC (permalink / raw)
  To: jiri; +Cc: netdev, nogahf, idosch, mlxsw
In-Reply-To: <20170920141516.1402-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 20 Sep 2017 16:15:00 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Nogah says:
> 
> Currently, there are four erroneous flows in MC flood:
> 1. When MC is disabled it affects only the flood table for unregistered
>    MC packets, but packets that match an entry in the MDB are unaffected.
> 2. When MC is disabled, MC packets are being sent to all the ports in the
>    bridge (like BC and link-local MC packets) regardless of the designated
>    flag (BR_MCAST_FLAG).
> 3. When a port is being deleted from a bridge it might remain in the MDB.
> 4. When MC is enabled packets are flooded to the mrouter ports only if
>    they don't match any entry in the MDB, when they should always be
>    flooded to them.
> 
> What these problems have in common is the discrepancy between how the
> hardware handles MDB and mcast flood, and how the driver does it. Each
> of these problems needs fixing either in the MDB code, or in mcast flood
> code, and some in both.
> 
> Patches 1-6 change the way the MDB is handled in the driver to make the
> following changes easier.
> Patches 7-8 fix problem number 1 by removing the MDB from the HW when MC
> is being disabled and restoring it when it is being enabled.
> Patches 9-10 fix problem number 2 by offloading the flood table by the
> appropriate flag.
> Patch 11 fixes problem number 3 by adding MDB flush to the port removal.
> Patches 12-14 fix problem number 4 by adding the mrouter ports to every
> MDB entry in the HW to mimic the wanted behaviour.

Looks good, series applied, thanks!

^ permalink raw reply

* Re: [PATCH iproute2 v2] ip: ipaddress: fix missing space after prefixlen
From: Stephen Hemminger @ 2017-09-21  1:06 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa, sd
In-Reply-To: <20170920202651.95860-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 13:26:51 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")
> Reported-by: Sabrina Dubroca <sd@queasysnail.net>
> Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

Applied, thanks for spotting this.

^ permalink raw reply

* Re: [PATCH iproute2/net-next] tc: flower: support for matching MPLS labels
From: Stephen Hemminger @ 2017-09-21  1:09 UTC (permalink / raw)
  To: Simon Horman; +Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev, oss-drivers
In-Reply-To: <1505225175-13830-1-git-send-email-simon.horman@netronome.com>

On Tue, 12 Sep 2017 16:06:15 +0200
Simon Horman <simon.horman@netronome.com> wrote:

> From: Benjamin LaHaise <benjamin.lahaise@netronome.com>
> 
> This patch adds support to the iproute2 tc filter command for matching MPLS
> labels in the flower classifier.  The ability to match the Time To Live,
> Bottom Of Stack, Traffic Control and Label fields are added as options to
> the flower filter.
> 
> e.g.:
>   tc filter add dev eth0 protocol 0x8847 parent ffff: \
>     flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
>     action drop
> 
> Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Looks good, applied

^ permalink raw reply

* Re: Latest net-next from GIT panic
From: Wei Wang @ 2017-09-21  1:09 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Cong Wang, Eric Dumazet, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <CAEA6p_DpSr84cYu3pXqoHV=V=daSsbiOcnzsF+Kt0r4Wv_jmdg@mail.gmail.com>

> Thanks very much Pawel for the feedback.
>
> I was looking into the code (specifically IPv4 part) and found that in
> free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> fnhe_lock. I am wondering if that could cause some race condition on
> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> same dst could be happening.
>
> But as we call free_fib_info_rcu() only after the grace period, and
> the lookup code which could potentially modify
> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> fine...
>

Hi Pawel,

Could you try the following debug patch on top of net-next branch and
reproduce the issue check if there are warning msg showing?

diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a352..82aff41c6f63 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry
*dst, unsigned long time)
 static inline struct dst_entry *dst_clone(struct dst_entry *dst)
 {
        if (dst)
-               atomic_inc(&dst->__refcnt);
+               dst_hold(dst);
        return dst;
 }

Thanks.
Wei


On Wed, Sep 20, 2017 at 3:09 PM, Wei Wang <weiwan@google.com> wrote:
>>>> bisected again and same result:
>>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>> Author: Wei Wang <weiwan@google.com>
>>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>>
>>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>
>>>>     With the previous preparation patches, we are ready to get rid of the
>>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>>> calls
>>>>     to dst_free().
>>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>>
>>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>>
>>>> Will add now version 2 of patch from Eric and we will see
>>>>
>>>>
>>> after adding patch
>>> perf top catch
>>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>>> (all, 40 CPUs)
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>     60.95%  [kernel]            [k] dev_put.part.6
>>>      4.00%  [kernel]            [k] ixgbe_poll
>>>      3.63%  [kernel]            [k] irq_entries_start
>>>      1.22%  [kernel]            [k] fib_table_lookup
>>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>>      1.04%  [kernel]            [k] lookup
>>>      0.87%  [kernel]            [k] eth_type_trans
>>>
>>>
>>> no panic on console - rebooting to check logs
>>>
>>>
>> Nothing logged
>>
>
> Thanks very much Pawel for the feedback.
>
> I was looking into the code (specifically IPv4 part) and found that in
> free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> fnhe_lock. I am wondering if that could cause some race condition on
> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> same dst could be happening.
>
> But as we call free_fib_info_rcu() only after the grace period, and
> the lookup code which could potentially modify
> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> fine...
>
>
> On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>>
>>
>> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>>
>>>
>>>
>>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>>>
>>>>
>>>>
>>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>>
>>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>>>
>>>>>>
>>>>>>
>>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>>>>
>>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>>>>
>>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>>>
>>>>>>>>> This is very odd.
>>>>>>>>>
>>>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>>>> to be NULL.
>>>>>>>>
>>>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>>>> this would allow the netdev to be freed too soon.
>>>>>>>>
>>>>>>>> -> Use after free.
>>>>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>>>>> kernel
>>>>>>>> user.
>>>>>>>>
>>>>>>> Sure, but only unregister could trigger a free. If there is no
>>>>>>> unregister,
>>>>>>> like what Pawel claims, then there is no free, the refcnt just goes to
>>>>>>> 0 but the memory is still there.
>>>>>>>
>>>>>> About possible mistake from my side with bisect - i can judge too early
>>>>>> that some bisect was good
>>>>>> the road was:
>>>>>> git bisect start
>>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>>>>> 'pinctrl-v4.13-1' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next'
>>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using
>>>>>> stack larger than 1024.
>>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>>>>> 'udp-reduce-cache-pressure'
>>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>>>>> 's390-net-updates-part-2'
>>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>>>>> 'bpf-ctx-narrow'
>>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>>>>>> cp_outgoing
>>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new
>>>>>> function dst_dev_put()
>>>>>>
>>>>>> And currently have this running for about 4 hours without problems.
>>>>>>
>>>>>>
>>>>>>
>>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>>>>> DST_NOCACHE flag
>>>>>>
>>>>>> Here for sure - panic
>>>>>>
>>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>>>>> dst_hold_safe() properly
>>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>>>>> dst_hold_safe() properly
>>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>>>>> dst->__refcnt for insertion into fib6 tree
>>>>>>
>>>>>> im not 100% sure tor last two
>>>>>> Will test them again starting from
>>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>>>>>> properly
>>>>>>
>>>>>>
>>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC
>>>>>> and remove the operation of dst_free()
>>>>>>
>>>>>>
>>>>>>
>>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4:
>>>>>> mark DST_NOGC and remove the operation of dst_free()
>>>>>>
>>>>>>
>>>>>>
>>>>> What i can say more
>>>>> I can reproduce this on any server with similar configuration
>>>>> the difference can be teamd instead of bonding
>>>>> ixgbe or i40e and mlx5
>>>>> Same problems
>>>>>
>>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink ->
>>>>> kernel
>>>>> But normally in lab when using only plain routing no bgpd and about 128
>>>>> vlans - with 128 routes - cant reproduce this - this apperas only with bgp -
>>>>> minimum where i can reproduce this was about 130k prefixes with about 286
>>>>> nexthops
>>>>>
>>>>>
>>>>>
>>>>>
>>>> bisected again and same result:
>>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>> Author: Wei Wang <weiwan@google.com>
>>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>>
>>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>
>>>>     With the previous preparation patches, we are ready to get rid of the
>>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>>> calls
>>>>     to dst_free().
>>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>>
>>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>>
>>>> Will add now version 2 of patch from Eric and we will see
>>>>
>>>>
>>> after adding patch
>>> perf top catch
>>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>>> (all, 40 CPUs)
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>     60.95%  [kernel]            [k] dev_put.part.6
>>>      4.00%  [kernel]            [k] ixgbe_poll
>>>      3.63%  [kernel]            [k] irq_entries_start
>>>      1.22%  [kernel]            [k] fib_table_lookup
>>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>>      1.04%  [kernel]            [k] lookup
>>>      0.87%  [kernel]            [k] eth_type_trans
>>>
>>>
>>> no panic on console - rebooting to check logs
>>>
>>>
>> Nothing logged
>>

^ permalink raw reply related

* [PATCH iproute2 v2] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:16 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

$ ip -t mon label link
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
at libnetlink.c:761

Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ip_print.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..7eb4c6da 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
 #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
 #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
 
+static inline FILE *_get_fp(void)
+{
+	return _fp ? _fp : stdout;
+};
+
 void new_json_obj(int json, FILE *fp)
 {
 	if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
 			jsonw_name(_jw, str);
 		jsonw_start_array(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
 		jsonw_end_array(_jw);
 		jsonw_pretty(_jw, true);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
 			else						\
 				jsonw_##type_name##_field(_jw, key, value); \
 		} else if (_IS_FP_CONTEXT(t)) {				\
-			color_fprintf(_fp, color, fmt, value);          \
+			color_fprintf(_get_fp(), color, fmt, value);	\
 		}							\
 	}
 _PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
 		else
 			jsonw_string_field(_jw, key, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
 
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
 		else
 			jsonw_bool(_jw, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value ? "true" : "false");
+		color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
 	}
 }
 
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
 		snprintf(b1, sizeof(b1), "%#x", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
 		else
 			jsonw_string(_jw, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
 		else
 			jsonw_null(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
-- 
2.14.1

^ permalink raw reply related

* Re: Latest net-next from GIT panic
From: Eric Dumazet @ 2017-09-21  1:17 UTC (permalink / raw)
  To: Wei Wang
  Cc: Paweł Staszewski, Cong Wang, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <CAEA6p_AiLvZxsyu+iOCrPzW-H28N=HBnWycy8w6gdZHFk2Q1dQ@mail.gmail.com>

On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
> > Thanks very much Pawel for the feedback.
> >
> > I was looking into the code (specifically IPv4 part) and found that in
> > free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> > fnhe_lock. I am wondering if that could cause some race condition on
> > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> > same dst could be happening.
> >
> > But as we call free_fib_info_rcu() only after the grace period, and
> > the lookup code which could potentially modify
> > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> > fine...
> >
> 
> Hi Pawel,
> 
> Could you try the following debug patch on top of net-next branch and
> reproduce the issue check if there are warning msg showing?
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 93568bd0a352..82aff41c6f63 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry
> *dst, unsigned long time)
>  static inline struct dst_entry *dst_clone(struct dst_entry *dst)
>  {
>         if (dst)
> -               atomic_inc(&dst->__refcnt);
> +               dst_hold(dst);
>         return dst;
>  }
> 
> Thanks.
> Wei
> 


Yes, we believe skb_dst_force() and skb_dst_force_safe() should be
unified  (to the 'safe' version)

We no longer have gc to protect from 0 -> 1 transition of dst refcount.

^ permalink raw reply

* Re: [PATCH iproute2 v2] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:22 UTC (permalink / raw)
  To: netdev; +Cc: Roopa Prabhu, Nikolay Aleksandrov, David Ahern, Julien Fortin
In-Reply-To: <20170921011645.45555-1-julien@cumulusnetworks.com>

looks like git-mail ate all the traceback lines starting with #, will
re-submit v3

On Wed, Sep 20, 2017 at 6:16 PM, Julien Fortin
<julien@cumulusnetworks.com> wrote:
> From: Julien Fortin <julien@cumulusnetworks.com>
>
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
>
> $ ip -t mon label link
> fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
> at libnetlink.c:761
>
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
>
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
> ---
>  ip/ip_print.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/ip/ip_print.c b/ip/ip_print.c
> index 4cd6a0bc..7eb4c6da 100644
> --- a/ip/ip_print.c
> +++ b/ip/ip_print.c
> @@ -23,6 +23,11 @@ static FILE *_fp;
>  #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
>  #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
>
> +static inline FILE *_get_fp(void)
> +{
> +       return _fp ? _fp : stdout;
> +};
> +
>  void new_json_obj(int json, FILE *fp)
>  {
>         if (json) {
> @@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
>                         jsonw_name(_jw, str);
>                 jsonw_start_array(_jw);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               fprintf(_fp, "%s", str);
> +               fprintf(_get_fp(), "%s", str);
>         }
>  }
>
> @@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
>                 jsonw_end_array(_jw);
>                 jsonw_pretty(_jw, true);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               fprintf(_fp, "%s", str);
> +               fprintf(_get_fp(), "%s", str);
>         }
>  }
>
> @@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
>                         else                                            \
>                                 jsonw_##type_name##_field(_jw, key, value); \
>                 } else if (_IS_FP_CONTEXT(t)) {                         \
> -                       color_fprintf(_fp, color, fmt, value);          \
> +                       color_fprintf(_get_fp(), color, fmt, value);    \
>                 }                                                       \
>         }
>  _PRINT_FUNC(int, int);
> @@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
>                 else
>                         jsonw_string_field(_jw, key, value);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value);
> +               color_fprintf(_get_fp(), color, fmt, value);
>         }
>  }
>
> @@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
>                 else
>                         jsonw_bool(_jw, value);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value ? "true" : "false");
> +               color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
>         }
>  }
>
> @@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
>                 snprintf(b1, sizeof(b1), "%#x", hex);
>                 print_string(PRINT_JSON, key, NULL, b1);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, hex);
> +               color_fprintf(_get_fp(), color, fmt, hex);
>         }
>  }
>
> @@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
>                 else
>                         jsonw_string(_jw, b1);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, hex);
> +               color_fprintf(_get_fp(), color, fmt, hex);
>         }
>  }
>
> @@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
>                 else
>                         jsonw_null(_jw);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value);
> +               color_fprintf(_get_fp(), color, fmt, value);
>         }
>  }
> --
> 2.14.1
>

^ permalink raw reply

* [PATCH iproute2 v3] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:23 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

$ ip -t mon label link
        fmt=fmt@entry=0x45460d "%d: ", value=<optimized out>) at ip_print.c:132
        #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d "%d: ", key=0x4545fc "ifindex", t=PRINT_ANY) at ip_common.h:189
        #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
        #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
        #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
            at libnetlink.c:761
            #7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
            #8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 "mon", argc=3, argv=0x7fffffffe588) at ip.c:116
            #9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
    
Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ip_print.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..7eb4c6da 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
 #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
 #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
 
+static inline FILE *_get_fp(void)
+{
+	return _fp ? _fp : stdout;
+};
+
 void new_json_obj(int json, FILE *fp)
 {
 	if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
 			jsonw_name(_jw, str);
 		jsonw_start_array(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
 		jsonw_end_array(_jw);
 		jsonw_pretty(_jw, true);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
 			else						\
 				jsonw_##type_name##_field(_jw, key, value); \
 		} else if (_IS_FP_CONTEXT(t)) {				\
-			color_fprintf(_fp, color, fmt, value);          \
+			color_fprintf(_get_fp(), color, fmt, value);	\
 		}							\
 	}
 _PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
 		else
 			jsonw_string_field(_jw, key, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
 
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
 		else
 			jsonw_bool(_jw, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value ? "true" : "false");
+		color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
 	}
 }
 
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
 		snprintf(b1, sizeof(b1), "%#x", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
 		else
 			jsonw_string(_jw, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
 		else
 			jsonw_null(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
-- 
2.14.1

^ permalink raw reply related

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
From: Roman Gushchin @ 2017-09-21  1:46 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, linux-kernel

> Hello.
>
> Since, IIRC, v4.11, there is some regression in TCP stack resulting in the 
> warning shown below. Most of the time it is harmless, but rarely it just 
> causes either freeze or (I believe, this is related too) panic in 
> tcp_sacktag_walk() (because sk_buff passed to this function is NULL). 
> Unfortunately, I still do not have proper stacktrace from panic, but will try 
> to capture it if possible.
> 
> Also, I have custom settings regarding TCP stack, shown below as well. ifb is 
> used to shape traffic with tc.
> 
> Please note this regression was already reported as BZ [1] and as a letter to 
> ML [2], but got neither attention nor resolution. It is reproducible for (not 
> only) me on my home router since v4.11 till v4.13.1 incl.
> 
> Please advise on how to deal with it. I'll provide any additional info if 
> necessary, also ready to test patches if any.
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> [2] https://www.spinics.net/lists/netdev/msg436158.html

We're experiencing the same problems on some machines in our fleet.
Exactly the same symptoms: tcp_fastretrans_alert() warnings and
sometimes panics in tcp_sacktag_walk().

Here is an example of a backtrace with the panic log:

978.210080]  fuse
[973978.214099]  sg
[973978.217789]  loop
[973978.221829]  efivarfs
[973978.226544]  autofs4
[973978.231109] CPU: 12 PID: 3806320 Comm: ld:srv:W20 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973978.250563] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973978.266558] Call Trace:
[973978.271615]  <IRQ>
[973978.275817]  dump_stack+0x4d/0x70
[973978.282626]  __warn+0xd3/0xf0
[973978.288727]  warn_slowpath_null+0x1e/0x20
[973978.296910]  tcp_fastretrans_alert+0xacf/0xbd0
[973978.305974]  tcp_ack+0xbce/0x1390
[973978.312770]  tcp_rcv_established+0x1ce/0x740
[973978.321488]  tcp_v6_do_rcv+0x195/0x440
[973978.329166]  tcp_v6_rcv+0x94c/0x9f0
[973978.336323]  ip6_input_finish+0xea/0x430
[973978.344330]  ip6_input+0x32/0xa0
[973978.350968]  ? ip6_rcv_finish+0xa0/0xa0
[973978.358799]  ip6_rcv_finish+0x4b/0xa0
[973978.366289]  ipv6_rcv+0x2ec/0x4f0
[973978.373082]  ? ip6_make_skb+0x1c0/0x1c0
[973978.380919]  __netif_receive_skb_core+0x2d5/0x9a0
[973978.390505]  __netif_receive_skb+0x16/0x70
[973978.398875]  netif_receive_skb_internal+0x23/0x80
[973978.408462]  napi_gro_receive+0x113/0x1d0
[973978.416657]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973978.426398]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973978.434405]  mlx5e_napi_poll+0x8c/0x380
[973978.442238]  ? mlx5_cq_completion+0x54/0xb0
[973978.450770]  net_rx_action+0x22e/0x380
[973978.458447]  __do_softirq+0x106/0x2e8
[973978.465950]  irq_exit+0xb0/0xc0
[973978.472396]  do_IRQ+0x4f/0xd0
[973978.478495]  common_interrupt+0x86/0x86
[973978.486329] RIP: 0033:0x7f8dee58d8ae
[973978.493642] RSP: 002b:00007f8cb925f078 EFLAGS: 00000206
[973978.504251]  ORIG_RAX: ffffffffffffff5f
[973978.512085] RAX: 00007f8cb925f1a8 RBX: 0000000048000000 RCX: 00007f8764bd6a80
[973978.526508] RDX: 00000000000001ba RSI: 00007f7cb4ca3410 RDI: 00007f7cb4ca3410
[973978.540927] RBP: 000000000000000d R08: 00007f8764bd6b00 R09: 00007f8764bd6b80
[973978.555347] R10: 0000000000002400 R11: 00007f8dee58e240 R12: d3273c84146e8c29
[973978.569766] R13: 9dac83ddf04c235c R14: 00007f7cb4ca33b0 R15: 00007f7cb4ca4f50
[973978.584189]  </IRQ>
[973978.588650] ---[ end trace 5d1c83e12a57d039 ]---
[973995.178183] BUG: unable to handle kernel 
[973995.186385] NULL pointer dereference
[973995.193692]  at 0000000000000028
[973995.200323] IP: tcp_sacktag_walk+0x2b7/0x460
[973995.209032] PGD 102d856067 
[973995.214789] PUD fded0d067 
[973995.220385] PMD 0 
[973995.224577] 
[973995.227732] ------------[ cut here ]------------
[973995.237128] Oops: 0000 [#1] SMP
[973995.243575] Modules linked in:
[973995.249868]  mptctl
[973995.254251]  mptbase
[973995.258792]  xt_DSCP
[973995.263331]  xt_set
[973995.267698]  ip_set_hash_ip
[973995.273452]  cls_u32
[973995.277993]  sch_sfq
[973995.282535]  cls_fw
[973995.286904]  sch_htb
[973995.291444]  mpt3sas
[973995.295982]  raid_class
[973995.301044]  ip6table_mangle
[973995.306973]  iptable_mangle
[973995.312726]  cls_bpf
[973995.317268]  tcp_diag
[973995.321983]  udp_diag
[973995.326697]  inet_diag
[973995.331585]  ip6table_filter
[973995.337513]  xt_NFLOG
[973995.342226]  nfnetlink_log
[973995.347807]  xt_comment
[973995.352866]  xt_statistic
[973995.358276]  iptable_filter
[973995.364029]  xt_mark
[973995.368572]  sb_edac
[973995.373113]  edac_core
[973995.378001]  x86_pkg_temp_thermal
[973995.384795]  intel_powerclamp
[973995.390897]  coretemp
[973995.395608]  kvm_intel
[973995.400498]  kvm
[973995.404345]  irqbypass
[973995.409235]  ses
[973995.413078]  iTCO_wdt
[973995.417794]  iTCO_vendor_support
[973995.424415]  enclosure
[973995.429301]  lpc_ich
[973995.433843]  scsi_transport_sas
[973995.440292]  mfd_core
[973995.445007]  efivars
[973995.449548]  ipmi_si
[973995.454087]  ipmi_devintf
[973995.459496]  i2c_i801
[973995.464209]  ipmi_msghandler
[973995.470138]  acpi_cpufreq
[973995.475545]  button
[973995.479914]  sch_fq_codel
[973995.485319]  nfsd
[973995.489341]  auth_rpcgss
[973995.494573]  nfs_acl
[973995.499114]  oid_registry
[973995.504524]  lockd
[973995.508717]  grace
[973995.512912]  sunrpc
[973995.517280]  megaraid_sas
[973995.522689]  fuse
[973995.526709]  sg
[973995.530382]  loop
[973995.534405]  efivarfs
[973995.539118]  autofs4
[973995.543660] CPU: 19 PID: 3806297 Comm: ld:srv:W0 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973995.562936] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973995.578914] task: ffff880129e5c380 task.stack: ffffc900210cc000
[973995.590914] RIP: 0010:tcp_sacktag_walk+0x2b7/0x460
[973995.600648] RSP: 0000:ffff88203ef438e8 EFLAGS: 00010207
[973995.611254] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000004e4ec474
[973995.625677] RDX: 000000004e4ec2ad RSI: ffff8810361faa00 RDI: ffff881ffecf8840
[973995.640102] RBP: ffff88203ef43940 R08: 0000000045729921 R09: 0000000000000000
[973995.654519] R10: 00000000000085d6 R11: ffff881ffecf8998 R12: ffff881ffecf8840
[973995.668938] R13: ffff88203ef43a48 R14: 0000000000000000 R15: ffff881ffecf8998
[973995.683362] FS:  00007f8cc8cf7700(0000) GS:ffff88203ef40000(0000) knlGS:0000000000000000
[973995.699686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[973995.711331] CR2: 0000000000000028 CR3: 0000000104c1b000 CR4: 00000000003406e0
[973995.725755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[973995.740175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[973995.754595] Call Trace:
[973995.759652]  <IRQ>
[973995.763855]  ? kprobe_perf_func+0x28/0x210
[973995.772210]  tcp_sacktag_write_queue+0x5ff/0x9e0
[973995.781615]  tcp_ack+0x677/0x1390
[973995.788408]  tcp_rcv_established+0x1ce/0x740
[973995.797112]  tcp_v6_do_rcv+0x195/0x440
[973995.804767]  tcp_v6_rcv+0x94c/0x9f0
[973995.811911]  ip6_input_finish+0xea/0x430
[973995.819917]  ip6_input+0x32/0xa0
[973995.826538]  ? ip6_rcv_finish+0xa0/0xa0
[973995.834373]  ip6_rcv_finish+0x4b/0xa0
[973995.841859]  ipv6_rcv+0x2ec/0x4f0
[973995.848653]  ? ip6_fragment+0x9f0/0x9f0
[973995.856489]  ? ip6_make_skb+0x1c0/0x1c0
[973995.864323]  __netif_receive_skb_core+0x2d5/0x9a0
[973995.873891]  __netif_receive_skb+0x16/0x70
[973995.882244]  netif_receive_skb_internal+0x23/0x80
[973995.891812]  napi_gro_receive+0x113/0x1d0
[973995.899993]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973995.909735]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973995.917740]  mlx5e_napi_poll+0x8c/0x380
[973995.925576]  ? mlx5_cq_completion+0x54/0xb0
[973995.934101]  net_rx_action+0x22e/0x380
[973995.941764]  __do_softirq+0x106/0x2e8
[973995.949255]  irq_exit+0xb0/0xc0
[973995.955696]  do_IRQ+0x4f/0xd0
[973995.961798]  common_interrupt+0x86/0x86
[973995.969634] RIP: 0033:0x7f8dec97a557
[973995.976945] RSP: 002b:00007f8cc8cf2f48 EFLAGS: 00000206
[973995.987552]  ORIG_RAX: ffffffffffffff20
[973995.995386] RAX: 00007f7fa9e15230 RBX: 00007f8c2153a160 RCX: 00007f7fa9e15230
[973996.009810] RDX: 0000000000000000 RSI: 00007f8cc8cf3040 RDI: 00007f8c204f90c0
[973996.024230] RBP: 00007f8cc8cf2f80 R08: 0000000000000001 R09: 000131aa4c002c01
[973996.038652] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f8c2153a170
[973996.053073] R13: 00007f8cc8cf3040 R14: 00007f8c204f90c0 R15: 00007f8c2153a120
[973996.067494]  </IRQ>
[973996.071858] Code: 
[973996.076051] b9 
[973996.079723] 01 
[973996.083383] 00 
[973996.087056] 00 
[973996.090730] 00 
[973996.094388] 85 
[973996.098062] c0 
[973996.101738] 0f 
[973996.105410] 8e 
[973996.109087] da 
[973996.112759] fd 
[973996.116433] ff 
[973996.120109] ff 
[973996.123783] 85 
[973996.127457] c0 
[973996.131132] 75 
[973996.134806] 28 
[973996.138481] 0f 
[973996.142156] b7 
[973996.145831] 43 
[973996.149504] 30 
[973996.153180] 41 
[973996.156835] 01 
[973996.160511] 45 
[973996.164168] 04 
[973996.167843] 48 
[973996.171517] 8b 
[973996.175190] 1b 
[973996.178848] 4c 
[973996.182521] 39 
[973996.186198] fb 
[973996.189872] 74 
[973996.193529] 8c 
[973996.197202] 49 
[973996.200877] 3b 
[973996.204534] 9c 
[973996.208209] 24 
[973996.211883] 50 
[973996.215559] 01 
[973996.219215] 00 
[973996.222889] 00 
[973996.226562] 74 
[973996.230221] c1 
[973996.233894] <8b> 
[973996.237916] 43 
[973996.241590] 28 
[973996.245264] 3b 
[973996.248921] 45 
[973996.252596] d4 
[973996.256271] 0f 
[973996.259929] 88 
[973996.263601] 9f 
[973996.267276] fd 
[973996.270935] ff 
[973996.274592] ff 
[973996.278267] eb 
[973996.281938] b3 
[973996.285614] 48 
[973996.289289] 8d 
[973996.292964] 43 
[973996.296638] 10 
[973996.300296] 8b 
[973996.303969] 4b 
[973996.307642] 28 
[973996.311325] RIP: tcp_sacktag_walk+0x2b7/0x460 RSP: ffff88203ef438e8
[973996.324007] ------------[ cut here ]------------
[973996.333399] CR2: 0000000000000028
[973996.340218] ---[ end trace 5d1c83e12a57d03a ]---
[973996.349605] Kernel panic - not syncing: Fatal exception in interrupt
[973996.362521] Kernel Offset: disabled
TBOOT: wait until all APs ready for txt shutdown
TBOOT: IA32_FEATURE_CONTROL_MSR: 0000ff07
TBOOT: CPU is SMX-capable
TBOOT: CPU is VMX-capable
TBOOT: SMX is enabled
TBOOT: TXT chipset and all needed capabilities present
TBOOT: TPM: Pcr 17 extend, return value = 0000003D
TBOOT: TPM: Pcr 18 extend, return value = 0000003D
TBOOT: TPM: Pcr 19 extend, return value = 0000003D
TBOOT: cap'ed dynamic PCRs
TBOOT: waiting for APs (0) to exit guests...
TBOOT: .
TBOOT: 
TBOOT: all APs exited guests
TBOOT: calling txt_shutdown on AP


Thanks!

^ permalink raw reply

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
From: Steven Rostedt @ 2017-09-21  1:41 UTC (permalink / raw)
  To: Yonghong Song; +Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20170918233836.1817062-1-yhs@fb.com>

On Mon, 18 Sep 2017 16:38:36 -0700
Yonghong Song <yhs@fb.com> wrote:

> This patch fixes a bug exhibited by the following scenario:
>   1. fd1 = perf_event_open with attr.config = ID1
>   2. attach bpf program prog1 to fd1
>   3. fd2 = perf_event_open with attr.config = ID1
>      <this will be successful>
>   4. user program closes fd2 and prog1 is detached from the tracepoint.
>   5. user program with fd1 does not work properly as tracepoint
>      no output any more.
> 
> The issue happens at step 4. Multiple perf_event_open can be called
> successfully, but only one bpf prog pointer in the tp_event. In the
> current logic, any fd release for the same tp_event will free
> the tp_event->prog.
> 
> The fix is to free tp_event->prog only when the closing fd
> corresponds to the one which registered the program.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  Additional context: discussed with Alexei internally but did not find
>  a solution which can avoid introducing the additional field in
>  trace_event_call structure.
> 
>  Peter, could you take a look as well and maybe you could have better
>  alternative? Thanks!
> 
>  include/linux/trace_events.h | 1 +
>  kernel/events/core.c         | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 7f11050..2e0f222 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -272,6 +272,7 @@ struct trace_event_call {
>  	int				perf_refcount;
>  	struct hlist_head __percpu	*perf_events;
>  	struct bpf_prog			*prog;
> +	struct perf_event		*bpf_prog_owner;

Does this have to be in the trace_event_call structure? Hmm, I'm
wondering if the prog needs to be there (I should look to see if we can
move it from it). The trace_event_call is created for *every* event,
and there's thousands of them now. Every byte to this structure adds
1000s of bytes to the kernel. Would it be possible to attach the prog
and the owner to the perf_event?

-- Steve


>  
>  	int	(*perf_perm)(struct trace_event_call *,
>  			     struct perf_event *);
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 3e691b7..6bc21e2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
>  		}
>  	}
>  	event->tp_event->prog = prog;
> +	event->tp_event->bpf_prog_owner = event;
>  
>  	return 0;
>  }
> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
>  		return;
>  
>  	prog = event->tp_event->prog;
> -	if (prog) {
> +	if (prog && event->tp_event->bpf_prog_owner == event) {
>  		event->tp_event->prog = NULL;
>  		bpf_prog_put(prog);
>  	}

^ permalink raw reply

* Re: [Patch v3 1/3] ipv4: Namespaceify tcp_fastopen knob
From: 严海双 @ 2017-09-21  1:55 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, edumazet, weiwan, lucab, netdev, linux-kernel
In-Reply-To: <20170920.142227.65942571438912956.davem@davemloft.net>



> On 2017年9月21日, at 上午5:22, David Miller <davem@davemloft.net> wrote:
> 
> From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> Date: Tue, 19 Sep 2017 17:38:14 +0800
> 
>> -		if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
>> -		    (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) &&
>> +		tcp_fastopen =  sock_net(sk)->ipv4.sysctl_tcp_fastopen;
>                              ^^
> 
> Please change that to one space.
> 
> And also please provide an appropriate "[PATCH vX 0/3] " header
> posting when you respin this series.

Sorry, it’s my mistake, thanks David.
> 
>> @@ -282,18 +280,19 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
>> 	struct tcp_fastopen_cookie valid_foc = { .len = -1 };
>> 	bool syn_data = TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq + 1;
>> 	struct sock *child;
>> +	int tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
> 
> Please order local variables from longest to shortest line (aka. reverse
> christmas tree format).
> 

Okay, I’ll take care of such coding style in next commit, thanks!

^ permalink raw reply

* Re: [PATCH iproute2 v3] ip: ip_print: if no json obj is initialized default fp is stdout
From: Stephen Hemminger @ 2017-09-21  2:22 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa
In-Reply-To: <20170921012356.46451-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 18:23:56 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
> 
> $ ip -t mon label link
>         fmt=fmt@entry=0x45460d "%d: ", value=<optimized out>) at ip_print.c:132
>         #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d "%d: ", key=0x4545fc "ifindex", t=PRINT_ANY) at ip_common.h:189
>         #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
>         #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
>         #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
>             at libnetlink.c:761
>             #7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
>             #8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 "mon", argc=3, argv=0x7fffffffe588) at ip.c:116
>             #9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
>     
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
> 
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

This is getting way more complex than it needs to be.

new_json_obj is always called with fp == stdout.
There is no reason to have the _fp at all!
Please rework new_json_obj to take only one argument.

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: introduce noref sk
From: David Miller @ 2017-09-21  3:20 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, pablo, fw, edumazet, hannes
In-Reply-To: <cover.1505926196.git.pabeni@redhat.com>

From: Paolo Abeni <pabeni@redhat.com>
Date: Wed, 20 Sep 2017 18:54:00 +0200

> This series introduce the infrastructure to store inside the skb a socket
> pointer without carrying a refcount to the socket.
> 
> Such infrastructure is then used in the network receive path - and
> specifically the early demux operation.
> 
> This allows the UDP early demux to perform a full lookup for UDP sockets,
> with many benefits:
> 
> - the UDP early demux code is now much simpler
> - the early demux does not hit any performance penalties in case of UDP hash
>   table collision - previously the early demux performed a partial, unsuccesful,
>   lookup
> - early demux is now operational also for unconnected sockets.
> 
> This infrastrcture will be used in follow-up series to allow dst caching for
> unconnected UDP sockets, and than to extend the same features to TCP listening
> sockets.

Like Eric, I find this series (while exciting) quite scary :-)

You really have to post some kind of performance numbers in your
header posting in order to justify something with these ramifications
and scale.

Thank you.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox