Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 3/3] net: stmmac: Cocci spatch "of_table"
From: Thomas Meyer @ 2017-09-21  6:24 UTC (permalink / raw)
  To: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel
In-Reply-To: <1505974912018-1637125348-0-diffsplit-thomas@m3y3r.de>

Make sure (of/i2c/platform)_device_id tables are NULL terminated.
Found by coccinelle spatch "misc/of_table.cocci"

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
---

diff -u -p a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -315,6 +315,7 @@ static int stmmac_dt_phy(struct plat_stm
 		{ .compatible = "allwinner,sun8i-h3-emac" },
 		{ .compatible = "allwinner,sun8i-v3s-emac" },
 		{ .compatible = "allwinner,sun50i-a64-emac" },
+		{},
 	};
 
 	/* If phy-handle property is passed from DT, use it as the PHY */

^ permalink raw reply

* [PATCH 2/6] e100: Cocci spatch "pool_zalloc-simple"
From: Thomas Meyer @ 2017-09-21  6:15 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan, netdev, linux-kernel
In-Reply-To: <1505974468535-2047238743-0-diffsplit-thomas@m3y3r.de>

Use *_pool_zalloc rather than *_pool_alloc followed by memset with 0.
Found by coccinelle spatch "api/alloc/pool_zalloc-simple.cocci"

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
---

diff -u -p a/drivers/net/ethernet/intel/e100.c b/drivers/net/ethernet/intel/e100.c
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -1910,11 +1910,10 @@ static int e100_alloc_cbs(struct nic *ni
 	nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
 	nic->cbs_avail = 0;
 
-	nic->cbs = pci_pool_alloc(nic->cbs_pool, GFP_KERNEL,
-				  &nic->cbs_dma_addr);
+	nic->cbs = pci_pool_zalloc(nic->cbs_pool, GFP_KERNEL,
+				   &nic->cbs_dma_addr);
 	if (!nic->cbs)
 		return -ENOMEM;
-	memset(nic->cbs, 0, count * sizeof(struct cb));
 
 	for (cb = nic->cbs, i = 0; i < count; cb++, i++) {
 		cb->next = (i + 1 < count) ? cb + 1 : nic->cbs;

^ permalink raw reply

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
From: Yonghong Song @ 2017-09-21  5:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <94f5a61e-1a88-f285-224b-66c92dc3da7b@fb.com>



On 9/20/17 10:17 PM, Yonghong Song wrote:
> 
> 
> On 9/20/17 6:41 PM, Steven Rostedt wrote:
>> On Mon, 18 Sep 2017 16:38:36 -0700
>> Yonghong Song <yhs@fb.com> wrote:
>>
>>> This patch fixes a bug exhibited by the following scenario:
>>>    1. fd1 = perf_event_open with attr.config = ID1
>>>    2. attach bpf program prog1 to fd1
>>>    3. fd2 = perf_event_open with attr.config = ID1
>>>       <this will be successful>
>>>    4. user program closes fd2 and prog1 is detached from the tracepoint.
>>>    5. user program with fd1 does not work properly as tracepoint
>>>       no output any more.
>>>
>>> The issue happens at step 4. Multiple perf_event_open can be called
>>> successfully, but only one bpf prog pointer in the tp_event. In the
>>> current logic, any fd release for the same tp_event will free
>>> the tp_event->prog.
>>>
>>> The fix is to free tp_event->prog only when the closing fd
>>> corresponds to the one which registered the program.
>>>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>   Additional context: discussed with Alexei internally but did not find
>>>   a solution which can avoid introducing the additional field in
>>>   trace_event_call structure.
>>>
>>>   Peter, could you take a look as well and maybe you could have better
>>>   alternative? Thanks!
>>>
>>>   include/linux/trace_events.h | 1 +
>>>   kernel/events/core.c         | 3 ++-
>>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>>> index 7f11050..2e0f222 100644
>>> --- a/include/linux/trace_events.h
>>> +++ b/include/linux/trace_events.h
>>> @@ -272,6 +272,7 @@ struct trace_event_call {
>>>       int                perf_refcount;
>>>       struct hlist_head __percpu    *perf_events;
>>>       struct bpf_prog            *prog;
>>> +    struct perf_event        *bpf_prog_owner;
>>
>> Does this have to be in the trace_event_call structure? Hmm, I'm
>> wondering if the prog needs to be there (I should look to see if we can
>> move it from it). The trace_event_call is created for *every* event,
>> and there's thousands of them now. Every byte to this structure adds
>> 1000s of bytes to the kernel. Would it be possible to attach the prog
>> and the owner to the perf_event?
> 
> Regarding whether we could move the prog and the owner to the 
> perf_event. It is possible. There is already a "prog" field in the
> perf_event structure for overflow handler. We could reuse the "prog"
> field and we do not need bpf_prog_owner any more. We can iterate
> through trace_event_call->perf_events to find the event which has the
> prog and executes it. This will support multiple prog's attaching to
> the same trace_event_call as well.
> 
> This approach may need careful evaluation though.
> (1). It adds runtime overhead although the overhead should be small
> since perf_event attaching to the same trace_event_call should be small.
> (2). trace_event_call->perf_events are per cpu data structure, that 
> means, some filtering logic is needed to avoid the same perf_event prog 
> is executing twice.

What I mean here is that the trace_event_call->perf_events need to be
checked on ALL cpus since bpf prog should be executed regardless of
cpu affiliation. It is possible that the same perf_event in different
per_cpu bucket and hence filtering is needed to avoid the same
perf_event bpf_prog is executed twice.

> (3). since the list is traversed, the locking (rcu?) may be required to
> pretect the list. Not sure whether additional locking is needed or not.
> 
> Alternative to using trace_event_call->perf_events, we may replace
> "struct bpf_prog *prog" to a list (or some other list like data 
> structure) to just record perf events which have bpf progs attached.
> But this will add memory overhead to trace_event_call data structure.
> 
>>
>> -- Steve
>>
>>
>>>       int    (*perf_perm)(struct trace_event_call *,
>>>                    struct perf_event *);
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 3e691b7..6bc21e2 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct 
>>> perf_event *event, u32 prog_fd)
>>>           }
>>>       }
>>>       event->tp_event->prog = prog;
>>> +    event->tp_event->bpf_prog_owner = event;
>>>       return 0;
>>>   }
>>> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct 
>>> perf_event *event)
>>>           return;
>>>       prog = event->tp_event->prog;
>>> -    if (prog) {
>>> +    if (prog && event->tp_event->bpf_prog_owner == event) {
>>>           event->tp_event->prog = NULL;
>>>           bpf_prog_put(prog);
>>>       }
>>

^ permalink raw reply

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
From: Yonghong Song @ 2017-09-21  5:17 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20170920214109.11002b7c@vmware.local.home>



On 9/20/17 6:41 PM, Steven Rostedt wrote:
> On Mon, 18 Sep 2017 16:38:36 -0700
> Yonghong Song <yhs@fb.com> wrote:
> 
>> This patch fixes a bug exhibited by the following scenario:
>>    1. fd1 = perf_event_open with attr.config = ID1
>>    2. attach bpf program prog1 to fd1
>>    3. fd2 = perf_event_open with attr.config = ID1
>>       <this will be successful>
>>    4. user program closes fd2 and prog1 is detached from the tracepoint.
>>    5. user program with fd1 does not work properly as tracepoint
>>       no output any more.
>>
>> The issue happens at step 4. Multiple perf_event_open can be called
>> successfully, but only one bpf prog pointer in the tp_event. In the
>> current logic, any fd release for the same tp_event will free
>> the tp_event->prog.
>>
>> The fix is to free tp_event->prog only when the closing fd
>> corresponds to the one which registered the program.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   Additional context: discussed with Alexei internally but did not find
>>   a solution which can avoid introducing the additional field in
>>   trace_event_call structure.
>>
>>   Peter, could you take a look as well and maybe you could have better
>>   alternative? Thanks!
>>
>>   include/linux/trace_events.h | 1 +
>>   kernel/events/core.c         | 3 ++-
>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>> index 7f11050..2e0f222 100644
>> --- a/include/linux/trace_events.h
>> +++ b/include/linux/trace_events.h
>> @@ -272,6 +272,7 @@ struct trace_event_call {
>>   	int				perf_refcount;
>>   	struct hlist_head __percpu	*perf_events;
>>   	struct bpf_prog			*prog;
>> +	struct perf_event		*bpf_prog_owner;
> 
> Does this have to be in the trace_event_call structure? Hmm, I'm
> wondering if the prog needs to be there (I should look to see if we can
> move it from it). The trace_event_call is created for *every* event,
> and there's thousands of them now. Every byte to this structure adds
> 1000s of bytes to the kernel. Would it be possible to attach the prog
> and the owner to the perf_event?

Regarding whether we could move the prog and the owner to the 
perf_event. It is possible. There is already a "prog" field in the
perf_event structure for overflow handler. We could reuse the "prog"
field and we do not need bpf_prog_owner any more. We can iterate
through trace_event_call->perf_events to find the event which has the
prog and executes it. This will support multiple prog's attaching to
the same trace_event_call as well.

This approach may need careful evaluation though.
(1). It adds runtime overhead although the overhead should be small
since perf_event attaching to the same trace_event_call should be small.
(2). trace_event_call->perf_events are per cpu data structure, that 
means, some filtering logic is needed to avoid the same perf_event prog 
is executing twice.
(3). since the list is traversed, the locking (rcu?) may be required to
pretect the list. Not sure whether additional locking is needed or not.

Alternative to using trace_event_call->perf_events, we may replace
"struct bpf_prog *prog" to a list (or some other list like data 
structure) to just record perf events which have bpf progs attached.
But this will add memory overhead to trace_event_call data structure.

> 
> -- Steve
> 
> 
>>   
>>   	int	(*perf_perm)(struct trace_event_call *,
>>   			     struct perf_event *);
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 3e691b7..6bc21e2 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
>>   		}
>>   	}
>>   	event->tp_event->prog = prog;
>> +	event->tp_event->bpf_prog_owner = event;
>>   
>>   	return 0;
>>   }
>> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
>>   		return;
>>   
>>   	prog = event->tp_event->prog;
>> -	if (prog) {
>> +	if (prog && event->tp_event->bpf_prog_owner == event) {
>>   		event->tp_event->prog = NULL;
>>   		bpf_prog_put(prog);
>>   	}
> 

^ permalink raw reply

* Re:Re: [PATCH net-next ] net: Remove useless function skb_header_release
From: Gao Feng @ 2017-09-21  5:07 UTC (permalink / raw)
  To: Joe Perches; +Cc: davem, netdev
In-Reply-To: <1505969350.12311.20.camel@perches.com>

>On Thu, 2017-09-21 at 12:39 +0800, gfree.wind@vip.163.com wrote:
>> From: Gao Feng <gfree.wind@vip.163.com>
>> 
>> There is no one which would invokes the function skb_header_release.
>> So just remove it now.
>
>This in incomplete.
>There are other references to this function.
>
>$ git grep -w skb_header_release
>drivers/net/usb/asix_common.c:	 * TCP packets for example are cloned, but skb_header_release()
>include/linux/skbuff.h: *	skb_header_release - release reference to header
>include/linux/skbuff.h:static inline void skb_header_release(struct sk_buff *skb)
>include/linux/skbuff.h: *	Variant of skb_header_release() assuming skb is private to caller.
>net/batman-adv/soft-interface.c:	 * data using skb_header_release in our skbs to allow skb_cow_header to

But there are comments, not real codes.

Unless the skb_header_release may be used in the external modules

Best Regards
Feng

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: use dedicated CPU port
From: Florian Fainelli @ 2017-09-21  5:10 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170920162805.29503-1-vivien.didelot@savoirfairelinux.com>

On 09/20/2017 09:28 AM, Vivien Didelot wrote:
> Each port in DSA has its own dedicated CPU port currently available in
> its parent switch's ds->ports[port].cpu_dp. Use it instead of getting
> the unique tree CPU port, which will be deprecated soon.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
>  drivers/net/dsa/b53/b53_common.c | 4 ++--
>  drivers/net/dsa/bcm_sf2.c        | 6 +++---
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: add port fdb dump
From: Florian Fainelli @ 2017-09-21  5:09 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170920233214.6936-1-vivien.didelot@savoirfairelinux.com>

On 09/20/2017 04:32 PM, Vivien Didelot wrote:
> Dumping a DSA port's FDB entries is not specific to a DSA slave, so add
> a dsa_port_fdb_dump function, similarly to dsa_port_fdb_add and
> dsa_port_fdb_del.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: better scoping of slave functions
From: Florian Fainelli @ 2017-09-21  5:08 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170920233157.6832-1-vivien.didelot@savoirfairelinux.com>

On 09/20/2017 04:31 PM, Vivien Didelot wrote:
> A few DSA slave functions take a dsa_slave_priv pointer as first
> argument, whereas the scope of the slave.c functions is the slave
> net_device structure. Fix this and rename dsa_netpoll_send_skb to
> dsa_slave_netpoll_send_skb.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 12/14] gtp: Configuration for zero UDP checksum
From: Harald Welte @ 2017-09-21  1:55 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Tom Herbert, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <CALx6S35NP_BJ8oHwjkOZkNYJa=g7hWeUiNMZ5WNj-2FQaksCeQ@mail.gmail.com>

Hi Tom,

On Wed, Sep 20, 2017 at 11:09:29AM -0700, Tom Herbert wrote:
> On Mon, Sep 18, 2017 at 9:24 PM, David Miller <davem@davemloft.net> wrote:
> > From: Tom Herbert <tom@quantonium.net>
> >> Add configuration to control use of zero checksums on transmit for both
> >> IPv4 and IPv6, and control over accepting zero IPv6 checksums on
> >> receive.
> >
> > I thought we were trying to move away from this special case of allowing
> > zero UDP checksums with tunnels, especially for ipv6.
> 
> I don't have a strong preference either way. I like consistency with
> VXLAN and foo/UDP, but I guess it's not required. Interestingly, since
> GTP only carries IP, IPv6 zero checksums are actually safer here than
> VXLAN or GRE/UDP.

Just for the record: I don't care either way and I defer to the kernel
networking developers to decide if they want to have zero UDP checksum
in GTP or not.

The 3GPP specs don't say anything about UDP checksums.  So there's no
requirement to use them, and hence operation without UDP checksums
should be compliant.  Cisco GTP implementation has udp checksumming
configurable, so other implementations also seem to provide both ways.

In general, I would argue one wants UDP checksumming of GTP in all
setups, as while the inner IP packet might be protected, the GTP header
itself is not, and that's what contains important data suhc as the TEID
(Tunnel Endpoint ID).  But that's of course just my personal opinion,
and I'm not saying we should prevent people from using lower protection
if that's what they want.

-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: [PATCH net-next ] net: Remove useless function skb_header_release
From: Joe Perches @ 2017-09-21  4:49 UTC (permalink / raw)
  To: gfree.wind, davem, netdev
In-Reply-To: <1505968771-92232-1-git-send-email-gfree.wind@vip.163.com>

On Thu, 2017-09-21 at 12:39 +0800, gfree.wind@vip.163.com wrote:
> From: Gao Feng <gfree.wind@vip.163.com>
> 
> There is no one which would invokes the function skb_header_release.
> So just remove it now.

This in incomplete.
There are other references to this function.

$ git grep -w skb_header_release
drivers/net/usb/asix_common.c:	 * TCP packets for example are cloned, but skb_header_release()
include/linux/skbuff.h: *	skb_header_release - release reference to header
include/linux/skbuff.h:static inline void skb_header_release(struct sk_buff *skb)
include/linux/skbuff.h: *	Variant of skb_header_release() assuming skb is private to caller.
net/batman-adv/soft-interface.c:	 * data using skb_header_release in our skbs to allow skb_cow_header to

^ permalink raw reply

* [PATCH net-next ] net: Remove useless function skb_header_release
From: gfree.wind @ 2017-09-21  4:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: Gao Feng

From: Gao Feng <gfree.wind@vip.163.com>

There is no one which would invokes the function skb_header_release.
So just remove it now.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
 include/linux/skbuff.h | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72299ef..ce632cd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1457,22 +1457,6 @@ static inline int skb_header_unclone(struct sk_buff *skb, gfp_t pri)
 }
 
 /**
- *	skb_header_release - release reference to header
- *	@skb: buffer to operate on
- *
- *	Drop a reference to the header part of the buffer.  This is done
- *	by acquiring a payload reference.  You must not read from the header
- *	part of skb->data after this.
- *	Note : Check if you can use __skb_header_release() instead.
- */
-static inline void skb_header_release(struct sk_buff *skb)
-{
-	BUG_ON(skb->nohdr);
-	skb->nohdr = 1;
-	atomic_add(1 << SKB_DATAREF_SHIFT, &skb_shinfo(skb)->dataref);
-}
-
-/**
  *	__skb_header_release - release reference to header
  *	@skb: buffer to operate on
  *
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next 0/5] net: introduce noref sk
From: David Miller @ 2017-09-21  3:20 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, pablo, fw, edumazet, hannes
In-Reply-To: <cover.1505926196.git.pabeni@redhat.com>

From: Paolo Abeni <pabeni@redhat.com>
Date: Wed, 20 Sep 2017 18:54:00 +0200

> This series introduce the infrastructure to store inside the skb a socket
> pointer without carrying a refcount to the socket.
> 
> Such infrastructure is then used in the network receive path - and
> specifically the early demux operation.
> 
> This allows the UDP early demux to perform a full lookup for UDP sockets,
> with many benefits:
> 
> - the UDP early demux code is now much simpler
> - the early demux does not hit any performance penalties in case of UDP hash
>   table collision - previously the early demux performed a partial, unsuccesful,
>   lookup
> - early demux is now operational also for unconnected sockets.
> 
> This infrastrcture will be used in follow-up series to allow dst caching for
> unconnected UDP sockets, and than to extend the same features to TCP listening
> sockets.

Like Eric, I find this series (while exciting) quite scary :-)

You really have to post some kind of performance numbers in your
header posting in order to justify something with these ramifications
and scale.

Thank you.

^ permalink raw reply

* Re: [PATCH iproute2 v3] ip: ip_print: if no json obj is initialized default fp is stdout
From: Stephen Hemminger @ 2017-09-21  2:22 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa
In-Reply-To: <20170921012356.46451-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 18:23:56 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
> 
> $ ip -t mon label link
>         fmt=fmt@entry=0x45460d "%d: ", value=<optimized out>) at ip_print.c:132
>         #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d "%d: ", key=0x4545fc "ifindex", t=PRINT_ANY) at ip_common.h:189
>         #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
>         #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
>         #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
>             at libnetlink.c:761
>             #7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
>             #8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 "mon", argc=3, argv=0x7fffffffe588) at ip.c:116
>             #9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
>     
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
> 
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

This is getting way more complex than it needs to be.

new_json_obj is always called with fp == stdout.
There is no reason to have the _fp at all!
Please rework new_json_obj to take only one argument.

^ permalink raw reply

* Re: [Patch v3 1/3] ipv4: Namespaceify tcp_fastopen knob
From: 严海双 @ 2017-09-21  1:55 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, edumazet, weiwan, lucab, netdev, linux-kernel
In-Reply-To: <20170920.142227.65942571438912956.davem@davemloft.net>



> On 2017年9月21日, at 上午5:22, David Miller <davem@davemloft.net> wrote:
> 
> From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> Date: Tue, 19 Sep 2017 17:38:14 +0800
> 
>> -		if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
>> -		    (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) &&
>> +		tcp_fastopen =  sock_net(sk)->ipv4.sysctl_tcp_fastopen;
>                              ^^
> 
> Please change that to one space.
> 
> And also please provide an appropriate "[PATCH vX 0/3] " header
> posting when you respin this series.

Sorry, it’s my mistake, thanks David.
> 
>> @@ -282,18 +280,19 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
>> 	struct tcp_fastopen_cookie valid_foc = { .len = -1 };
>> 	bool syn_data = TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq + 1;
>> 	struct sock *child;
>> +	int tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
> 
> Please order local variables from longest to shortest line (aka. reverse
> christmas tree format).
> 

Okay, I’ll take care of such coding style in next commit, thanks!

^ permalink raw reply

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
From: Steven Rostedt @ 2017-09-21  1:41 UTC (permalink / raw)
  To: Yonghong Song; +Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20170918233836.1817062-1-yhs@fb.com>

On Mon, 18 Sep 2017 16:38:36 -0700
Yonghong Song <yhs@fb.com> wrote:

> This patch fixes a bug exhibited by the following scenario:
>   1. fd1 = perf_event_open with attr.config = ID1
>   2. attach bpf program prog1 to fd1
>   3. fd2 = perf_event_open with attr.config = ID1
>      <this will be successful>
>   4. user program closes fd2 and prog1 is detached from the tracepoint.
>   5. user program with fd1 does not work properly as tracepoint
>      no output any more.
> 
> The issue happens at step 4. Multiple perf_event_open can be called
> successfully, but only one bpf prog pointer in the tp_event. In the
> current logic, any fd release for the same tp_event will free
> the tp_event->prog.
> 
> The fix is to free tp_event->prog only when the closing fd
> corresponds to the one which registered the program.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  Additional context: discussed with Alexei internally but did not find
>  a solution which can avoid introducing the additional field in
>  trace_event_call structure.
> 
>  Peter, could you take a look as well and maybe you could have better
>  alternative? Thanks!
> 
>  include/linux/trace_events.h | 1 +
>  kernel/events/core.c         | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 7f11050..2e0f222 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -272,6 +272,7 @@ struct trace_event_call {
>  	int				perf_refcount;
>  	struct hlist_head __percpu	*perf_events;
>  	struct bpf_prog			*prog;
> +	struct perf_event		*bpf_prog_owner;

Does this have to be in the trace_event_call structure? Hmm, I'm
wondering if the prog needs to be there (I should look to see if we can
move it from it). The trace_event_call is created for *every* event,
and there's thousands of them now. Every byte to this structure adds
1000s of bytes to the kernel. Would it be possible to attach the prog
and the owner to the perf_event?

-- Steve


>  
>  	int	(*perf_perm)(struct trace_event_call *,
>  			     struct perf_event *);
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 3e691b7..6bc21e2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
>  		}
>  	}
>  	event->tp_event->prog = prog;
> +	event->tp_event->bpf_prog_owner = event;
>  
>  	return 0;
>  }
> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
>  		return;
>  
>  	prog = event->tp_event->prog;
> -	if (prog) {
> +	if (prog && event->tp_event->bpf_prog_owner == event) {
>  		event->tp_event->prog = NULL;
>  		bpf_prog_put(prog);
>  	}

^ permalink raw reply

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
From: Roman Gushchin @ 2017-09-21  1:46 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, linux-kernel

> Hello.
>
> Since, IIRC, v4.11, there is some regression in TCP stack resulting in the 
> warning shown below. Most of the time it is harmless, but rarely it just 
> causes either freeze or (I believe, this is related too) panic in 
> tcp_sacktag_walk() (because sk_buff passed to this function is NULL). 
> Unfortunately, I still do not have proper stacktrace from panic, but will try 
> to capture it if possible.
> 
> Also, I have custom settings regarding TCP stack, shown below as well. ifb is 
> used to shape traffic with tc.
> 
> Please note this regression was already reported as BZ [1] and as a letter to 
> ML [2], but got neither attention nor resolution. It is reproducible for (not 
> only) me on my home router since v4.11 till v4.13.1 incl.
> 
> Please advise on how to deal with it. I'll provide any additional info if 
> necessary, also ready to test patches if any.
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> [2] https://www.spinics.net/lists/netdev/msg436158.html

We're experiencing the same problems on some machines in our fleet.
Exactly the same symptoms: tcp_fastretrans_alert() warnings and
sometimes panics in tcp_sacktag_walk().

Here is an example of a backtrace with the panic log:

978.210080]  fuse
[973978.214099]  sg
[973978.217789]  loop
[973978.221829]  efivarfs
[973978.226544]  autofs4
[973978.231109] CPU: 12 PID: 3806320 Comm: ld:srv:W20 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973978.250563] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973978.266558] Call Trace:
[973978.271615]  <IRQ>
[973978.275817]  dump_stack+0x4d/0x70
[973978.282626]  __warn+0xd3/0xf0
[973978.288727]  warn_slowpath_null+0x1e/0x20
[973978.296910]  tcp_fastretrans_alert+0xacf/0xbd0
[973978.305974]  tcp_ack+0xbce/0x1390
[973978.312770]  tcp_rcv_established+0x1ce/0x740
[973978.321488]  tcp_v6_do_rcv+0x195/0x440
[973978.329166]  tcp_v6_rcv+0x94c/0x9f0
[973978.336323]  ip6_input_finish+0xea/0x430
[973978.344330]  ip6_input+0x32/0xa0
[973978.350968]  ? ip6_rcv_finish+0xa0/0xa0
[973978.358799]  ip6_rcv_finish+0x4b/0xa0
[973978.366289]  ipv6_rcv+0x2ec/0x4f0
[973978.373082]  ? ip6_make_skb+0x1c0/0x1c0
[973978.380919]  __netif_receive_skb_core+0x2d5/0x9a0
[973978.390505]  __netif_receive_skb+0x16/0x70
[973978.398875]  netif_receive_skb_internal+0x23/0x80
[973978.408462]  napi_gro_receive+0x113/0x1d0
[973978.416657]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973978.426398]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973978.434405]  mlx5e_napi_poll+0x8c/0x380
[973978.442238]  ? mlx5_cq_completion+0x54/0xb0
[973978.450770]  net_rx_action+0x22e/0x380
[973978.458447]  __do_softirq+0x106/0x2e8
[973978.465950]  irq_exit+0xb0/0xc0
[973978.472396]  do_IRQ+0x4f/0xd0
[973978.478495]  common_interrupt+0x86/0x86
[973978.486329] RIP: 0033:0x7f8dee58d8ae
[973978.493642] RSP: 002b:00007f8cb925f078 EFLAGS: 00000206
[973978.504251]  ORIG_RAX: ffffffffffffff5f
[973978.512085] RAX: 00007f8cb925f1a8 RBX: 0000000048000000 RCX: 00007f8764bd6a80
[973978.526508] RDX: 00000000000001ba RSI: 00007f7cb4ca3410 RDI: 00007f7cb4ca3410
[973978.540927] RBP: 000000000000000d R08: 00007f8764bd6b00 R09: 00007f8764bd6b80
[973978.555347] R10: 0000000000002400 R11: 00007f8dee58e240 R12: d3273c84146e8c29
[973978.569766] R13: 9dac83ddf04c235c R14: 00007f7cb4ca33b0 R15: 00007f7cb4ca4f50
[973978.584189]  </IRQ>
[973978.588650] ---[ end trace 5d1c83e12a57d039 ]---
[973995.178183] BUG: unable to handle kernel 
[973995.186385] NULL pointer dereference
[973995.193692]  at 0000000000000028
[973995.200323] IP: tcp_sacktag_walk+0x2b7/0x460
[973995.209032] PGD 102d856067 
[973995.214789] PUD fded0d067 
[973995.220385] PMD 0 
[973995.224577] 
[973995.227732] ------------[ cut here ]------------
[973995.237128] Oops: 0000 [#1] SMP
[973995.243575] Modules linked in:
[973995.249868]  mptctl
[973995.254251]  mptbase
[973995.258792]  xt_DSCP
[973995.263331]  xt_set
[973995.267698]  ip_set_hash_ip
[973995.273452]  cls_u32
[973995.277993]  sch_sfq
[973995.282535]  cls_fw
[973995.286904]  sch_htb
[973995.291444]  mpt3sas
[973995.295982]  raid_class
[973995.301044]  ip6table_mangle
[973995.306973]  iptable_mangle
[973995.312726]  cls_bpf
[973995.317268]  tcp_diag
[973995.321983]  udp_diag
[973995.326697]  inet_diag
[973995.331585]  ip6table_filter
[973995.337513]  xt_NFLOG
[973995.342226]  nfnetlink_log
[973995.347807]  xt_comment
[973995.352866]  xt_statistic
[973995.358276]  iptable_filter
[973995.364029]  xt_mark
[973995.368572]  sb_edac
[973995.373113]  edac_core
[973995.378001]  x86_pkg_temp_thermal
[973995.384795]  intel_powerclamp
[973995.390897]  coretemp
[973995.395608]  kvm_intel
[973995.400498]  kvm
[973995.404345]  irqbypass
[973995.409235]  ses
[973995.413078]  iTCO_wdt
[973995.417794]  iTCO_vendor_support
[973995.424415]  enclosure
[973995.429301]  lpc_ich
[973995.433843]  scsi_transport_sas
[973995.440292]  mfd_core
[973995.445007]  efivars
[973995.449548]  ipmi_si
[973995.454087]  ipmi_devintf
[973995.459496]  i2c_i801
[973995.464209]  ipmi_msghandler
[973995.470138]  acpi_cpufreq
[973995.475545]  button
[973995.479914]  sch_fq_codel
[973995.485319]  nfsd
[973995.489341]  auth_rpcgss
[973995.494573]  nfs_acl
[973995.499114]  oid_registry
[973995.504524]  lockd
[973995.508717]  grace
[973995.512912]  sunrpc
[973995.517280]  megaraid_sas
[973995.522689]  fuse
[973995.526709]  sg
[973995.530382]  loop
[973995.534405]  efivarfs
[973995.539118]  autofs4
[973995.543660] CPU: 19 PID: 3806297 Comm: ld:srv:W0 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973995.562936] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973995.578914] task: ffff880129e5c380 task.stack: ffffc900210cc000
[973995.590914] RIP: 0010:tcp_sacktag_walk+0x2b7/0x460
[973995.600648] RSP: 0000:ffff88203ef438e8 EFLAGS: 00010207
[973995.611254] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000004e4ec474
[973995.625677] RDX: 000000004e4ec2ad RSI: ffff8810361faa00 RDI: ffff881ffecf8840
[973995.640102] RBP: ffff88203ef43940 R08: 0000000045729921 R09: 0000000000000000
[973995.654519] R10: 00000000000085d6 R11: ffff881ffecf8998 R12: ffff881ffecf8840
[973995.668938] R13: ffff88203ef43a48 R14: 0000000000000000 R15: ffff881ffecf8998
[973995.683362] FS:  00007f8cc8cf7700(0000) GS:ffff88203ef40000(0000) knlGS:0000000000000000
[973995.699686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[973995.711331] CR2: 0000000000000028 CR3: 0000000104c1b000 CR4: 00000000003406e0
[973995.725755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[973995.740175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[973995.754595] Call Trace:
[973995.759652]  <IRQ>
[973995.763855]  ? kprobe_perf_func+0x28/0x210
[973995.772210]  tcp_sacktag_write_queue+0x5ff/0x9e0
[973995.781615]  tcp_ack+0x677/0x1390
[973995.788408]  tcp_rcv_established+0x1ce/0x740
[973995.797112]  tcp_v6_do_rcv+0x195/0x440
[973995.804767]  tcp_v6_rcv+0x94c/0x9f0
[973995.811911]  ip6_input_finish+0xea/0x430
[973995.819917]  ip6_input+0x32/0xa0
[973995.826538]  ? ip6_rcv_finish+0xa0/0xa0
[973995.834373]  ip6_rcv_finish+0x4b/0xa0
[973995.841859]  ipv6_rcv+0x2ec/0x4f0
[973995.848653]  ? ip6_fragment+0x9f0/0x9f0
[973995.856489]  ? ip6_make_skb+0x1c0/0x1c0
[973995.864323]  __netif_receive_skb_core+0x2d5/0x9a0
[973995.873891]  __netif_receive_skb+0x16/0x70
[973995.882244]  netif_receive_skb_internal+0x23/0x80
[973995.891812]  napi_gro_receive+0x113/0x1d0
[973995.899993]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973995.909735]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973995.917740]  mlx5e_napi_poll+0x8c/0x380
[973995.925576]  ? mlx5_cq_completion+0x54/0xb0
[973995.934101]  net_rx_action+0x22e/0x380
[973995.941764]  __do_softirq+0x106/0x2e8
[973995.949255]  irq_exit+0xb0/0xc0
[973995.955696]  do_IRQ+0x4f/0xd0
[973995.961798]  common_interrupt+0x86/0x86
[973995.969634] RIP: 0033:0x7f8dec97a557
[973995.976945] RSP: 002b:00007f8cc8cf2f48 EFLAGS: 00000206
[973995.987552]  ORIG_RAX: ffffffffffffff20
[973995.995386] RAX: 00007f7fa9e15230 RBX: 00007f8c2153a160 RCX: 00007f7fa9e15230
[973996.009810] RDX: 0000000000000000 RSI: 00007f8cc8cf3040 RDI: 00007f8c204f90c0
[973996.024230] RBP: 00007f8cc8cf2f80 R08: 0000000000000001 R09: 000131aa4c002c01
[973996.038652] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f8c2153a170
[973996.053073] R13: 00007f8cc8cf3040 R14: 00007f8c204f90c0 R15: 00007f8c2153a120
[973996.067494]  </IRQ>
[973996.071858] Code: 
[973996.076051] b9 
[973996.079723] 01 
[973996.083383] 00 
[973996.087056] 00 
[973996.090730] 00 
[973996.094388] 85 
[973996.098062] c0 
[973996.101738] 0f 
[973996.105410] 8e 
[973996.109087] da 
[973996.112759] fd 
[973996.116433] ff 
[973996.120109] ff 
[973996.123783] 85 
[973996.127457] c0 
[973996.131132] 75 
[973996.134806] 28 
[973996.138481] 0f 
[973996.142156] b7 
[973996.145831] 43 
[973996.149504] 30 
[973996.153180] 41 
[973996.156835] 01 
[973996.160511] 45 
[973996.164168] 04 
[973996.167843] 48 
[973996.171517] 8b 
[973996.175190] 1b 
[973996.178848] 4c 
[973996.182521] 39 
[973996.186198] fb 
[973996.189872] 74 
[973996.193529] 8c 
[973996.197202] 49 
[973996.200877] 3b 
[973996.204534] 9c 
[973996.208209] 24 
[973996.211883] 50 
[973996.215559] 01 
[973996.219215] 00 
[973996.222889] 00 
[973996.226562] 74 
[973996.230221] c1 
[973996.233894] <8b> 
[973996.237916] 43 
[973996.241590] 28 
[973996.245264] 3b 
[973996.248921] 45 
[973996.252596] d4 
[973996.256271] 0f 
[973996.259929] 88 
[973996.263601] 9f 
[973996.267276] fd 
[973996.270935] ff 
[973996.274592] ff 
[973996.278267] eb 
[973996.281938] b3 
[973996.285614] 48 
[973996.289289] 8d 
[973996.292964] 43 
[973996.296638] 10 
[973996.300296] 8b 
[973996.303969] 4b 
[973996.307642] 28 
[973996.311325] RIP: tcp_sacktag_walk+0x2b7/0x460 RSP: ffff88203ef438e8
[973996.324007] ------------[ cut here ]------------
[973996.333399] CR2: 0000000000000028
[973996.340218] ---[ end trace 5d1c83e12a57d03a ]---
[973996.349605] Kernel panic - not syncing: Fatal exception in interrupt
[973996.362521] Kernel Offset: disabled
TBOOT: wait until all APs ready for txt shutdown
TBOOT: IA32_FEATURE_CONTROL_MSR: 0000ff07
TBOOT: CPU is SMX-capable
TBOOT: CPU is VMX-capable
TBOOT: SMX is enabled
TBOOT: TXT chipset and all needed capabilities present
TBOOT: TPM: Pcr 17 extend, return value = 0000003D
TBOOT: TPM: Pcr 18 extend, return value = 0000003D
TBOOT: TPM: Pcr 19 extend, return value = 0000003D
TBOOT: cap'ed dynamic PCRs
TBOOT: waiting for APs (0) to exit guests...
TBOOT: .
TBOOT: 
TBOOT: all APs exited guests
TBOOT: calling txt_shutdown on AP


Thanks!

^ permalink raw reply

* [PATCH iproute2 v3] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:23 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

$ ip -t mon label link
        fmt=fmt@entry=0x45460d "%d: ", value=<optimized out>) at ip_print.c:132
        #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d "%d: ", key=0x4545fc "ifindex", t=PRINT_ANY) at ip_common.h:189
        #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
        #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
        #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
            at libnetlink.c:761
            #7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
            #8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 "mon", argc=3, argv=0x7fffffffe588) at ip.c:116
            #9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
    
Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ip_print.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..7eb4c6da 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
 #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
 #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
 
+static inline FILE *_get_fp(void)
+{
+	return _fp ? _fp : stdout;
+};
+
 void new_json_obj(int json, FILE *fp)
 {
 	if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
 			jsonw_name(_jw, str);
 		jsonw_start_array(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
 		jsonw_end_array(_jw);
 		jsonw_pretty(_jw, true);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
 			else						\
 				jsonw_##type_name##_field(_jw, key, value); \
 		} else if (_IS_FP_CONTEXT(t)) {				\
-			color_fprintf(_fp, color, fmt, value);          \
+			color_fprintf(_get_fp(), color, fmt, value);	\
 		}							\
 	}
 _PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
 		else
 			jsonw_string_field(_jw, key, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
 
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
 		else
 			jsonw_bool(_jw, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value ? "true" : "false");
+		color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
 	}
 }
 
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
 		snprintf(b1, sizeof(b1), "%#x", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
 		else
 			jsonw_string(_jw, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
 		else
 			jsonw_null(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH iproute2 v2] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:22 UTC (permalink / raw)
  To: netdev; +Cc: Roopa Prabhu, Nikolay Aleksandrov, David Ahern, Julien Fortin
In-Reply-To: <20170921011645.45555-1-julien@cumulusnetworks.com>

looks like git-mail ate all the traceback lines starting with #, will
re-submit v3

On Wed, Sep 20, 2017 at 6:16 PM, Julien Fortin
<julien@cumulusnetworks.com> wrote:
> From: Julien Fortin <julien@cumulusnetworks.com>
>
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
>
> $ ip -t mon label link
> fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
> at libnetlink.c:761
>
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
>
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
> ---
>  ip/ip_print.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/ip/ip_print.c b/ip/ip_print.c
> index 4cd6a0bc..7eb4c6da 100644
> --- a/ip/ip_print.c
> +++ b/ip/ip_print.c
> @@ -23,6 +23,11 @@ static FILE *_fp;
>  #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
>  #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
>
> +static inline FILE *_get_fp(void)
> +{
> +       return _fp ? _fp : stdout;
> +};
> +
>  void new_json_obj(int json, FILE *fp)
>  {
>         if (json) {
> @@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
>                         jsonw_name(_jw, str);
>                 jsonw_start_array(_jw);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               fprintf(_fp, "%s", str);
> +               fprintf(_get_fp(), "%s", str);
>         }
>  }
>
> @@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
>                 jsonw_end_array(_jw);
>                 jsonw_pretty(_jw, true);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               fprintf(_fp, "%s", str);
> +               fprintf(_get_fp(), "%s", str);
>         }
>  }
>
> @@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
>                         else                                            \
>                                 jsonw_##type_name##_field(_jw, key, value); \
>                 } else if (_IS_FP_CONTEXT(t)) {                         \
> -                       color_fprintf(_fp, color, fmt, value);          \
> +                       color_fprintf(_get_fp(), color, fmt, value);    \
>                 }                                                       \
>         }
>  _PRINT_FUNC(int, int);
> @@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
>                 else
>                         jsonw_string_field(_jw, key, value);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value);
> +               color_fprintf(_get_fp(), color, fmt, value);
>         }
>  }
>
> @@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
>                 else
>                         jsonw_bool(_jw, value);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value ? "true" : "false");
> +               color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
>         }
>  }
>
> @@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
>                 snprintf(b1, sizeof(b1), "%#x", hex);
>                 print_string(PRINT_JSON, key, NULL, b1);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, hex);
> +               color_fprintf(_get_fp(), color, fmt, hex);
>         }
>  }
>
> @@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
>                 else
>                         jsonw_string(_jw, b1);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, hex);
> +               color_fprintf(_get_fp(), color, fmt, hex);
>         }
>  }
>
> @@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
>                 else
>                         jsonw_null(_jw);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value);
> +               color_fprintf(_get_fp(), color, fmt, value);
>         }
>  }
> --
> 2.14.1
>

^ permalink raw reply

* Re: Latest net-next from GIT panic
From: Eric Dumazet @ 2017-09-21  1:17 UTC (permalink / raw)
  To: Wei Wang
  Cc: Paweł Staszewski, Cong Wang, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <CAEA6p_AiLvZxsyu+iOCrPzW-H28N=HBnWycy8w6gdZHFk2Q1dQ@mail.gmail.com>

On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
> > Thanks very much Pawel for the feedback.
> >
> > I was looking into the code (specifically IPv4 part) and found that in
> > free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> > fnhe_lock. I am wondering if that could cause some race condition on
> > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> > same dst could be happening.
> >
> > But as we call free_fib_info_rcu() only after the grace period, and
> > the lookup code which could potentially modify
> > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> > fine...
> >
> 
> Hi Pawel,
> 
> Could you try the following debug patch on top of net-next branch and
> reproduce the issue check if there are warning msg showing?
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 93568bd0a352..82aff41c6f63 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry
> *dst, unsigned long time)
>  static inline struct dst_entry *dst_clone(struct dst_entry *dst)
>  {
>         if (dst)
> -               atomic_inc(&dst->__refcnt);
> +               dst_hold(dst);
>         return dst;
>  }
> 
> Thanks.
> Wei
> 


Yes, we believe skb_dst_force() and skb_dst_force_safe() should be
unified  (to the 'safe' version)

We no longer have gc to protect from 0 -> 1 transition of dst refcount.

^ permalink raw reply

* [PATCH iproute2 v2] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:16 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

$ ip -t mon label link
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
at libnetlink.c:761

Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ip_print.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..7eb4c6da 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
 #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
 #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
 
+static inline FILE *_get_fp(void)
+{
+	return _fp ? _fp : stdout;
+};
+
 void new_json_obj(int json, FILE *fp)
 {
 	if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
 			jsonw_name(_jw, str);
 		jsonw_start_array(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
 		jsonw_end_array(_jw);
 		jsonw_pretty(_jw, true);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
 			else						\
 				jsonw_##type_name##_field(_jw, key, value); \
 		} else if (_IS_FP_CONTEXT(t)) {				\
-			color_fprintf(_fp, color, fmt, value);          \
+			color_fprintf(_get_fp(), color, fmt, value);	\
 		}							\
 	}
 _PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
 		else
 			jsonw_string_field(_jw, key, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
 
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
 		else
 			jsonw_bool(_jw, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value ? "true" : "false");
+		color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
 	}
 }
 
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
 		snprintf(b1, sizeof(b1), "%#x", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
 		else
 			jsonw_string(_jw, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
 		else
 			jsonw_null(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
-- 
2.14.1

^ permalink raw reply related

* Re: Latest net-next from GIT panic
From: Wei Wang @ 2017-09-21  1:09 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Cong Wang, Eric Dumazet, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <CAEA6p_DpSr84cYu3pXqoHV=V=daSsbiOcnzsF+Kt0r4Wv_jmdg@mail.gmail.com>

> Thanks very much Pawel for the feedback.
>
> I was looking into the code (specifically IPv4 part) and found that in
> free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> fnhe_lock. I am wondering if that could cause some race condition on
> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> same dst could be happening.
>
> But as we call free_fib_info_rcu() only after the grace period, and
> the lookup code which could potentially modify
> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> fine...
>

Hi Pawel,

Could you try the following debug patch on top of net-next branch and
reproduce the issue check if there are warning msg showing?

diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a352..82aff41c6f63 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry
*dst, unsigned long time)
 static inline struct dst_entry *dst_clone(struct dst_entry *dst)
 {
        if (dst)
-               atomic_inc(&dst->__refcnt);
+               dst_hold(dst);
        return dst;
 }

Thanks.
Wei


On Wed, Sep 20, 2017 at 3:09 PM, Wei Wang <weiwan@google.com> wrote:
>>>> bisected again and same result:
>>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>> Author: Wei Wang <weiwan@google.com>
>>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>>
>>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>
>>>>     With the previous preparation patches, we are ready to get rid of the
>>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>>> calls
>>>>     to dst_free().
>>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>>
>>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>>
>>>> Will add now version 2 of patch from Eric and we will see
>>>>
>>>>
>>> after adding patch
>>> perf top catch
>>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>>> (all, 40 CPUs)
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>     60.95%  [kernel]            [k] dev_put.part.6
>>>      4.00%  [kernel]            [k] ixgbe_poll
>>>      3.63%  [kernel]            [k] irq_entries_start
>>>      1.22%  [kernel]            [k] fib_table_lookup
>>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>>      1.04%  [kernel]            [k] lookup
>>>      0.87%  [kernel]            [k] eth_type_trans
>>>
>>>
>>> no panic on console - rebooting to check logs
>>>
>>>
>> Nothing logged
>>
>
> Thanks very much Pawel for the feedback.
>
> I was looking into the code (specifically IPv4 part) and found that in
> free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> fnhe_lock. I am wondering if that could cause some race condition on
> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> same dst could be happening.
>
> But as we call free_fib_info_rcu() only after the grace period, and
> the lookup code which could potentially modify
> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> fine...
>
>
> On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>>
>>
>> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>>
>>>
>>>
>>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>>>
>>>>
>>>>
>>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>>
>>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>>>
>>>>>>
>>>>>>
>>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>>>>
>>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>>>>
>>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>>>
>>>>>>>>> This is very odd.
>>>>>>>>>
>>>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>>>> to be NULL.
>>>>>>>>
>>>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>>>> this would allow the netdev to be freed too soon.
>>>>>>>>
>>>>>>>> -> Use after free.
>>>>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>>>>> kernel
>>>>>>>> user.
>>>>>>>>
>>>>>>> Sure, but only unregister could trigger a free. If there is no
>>>>>>> unregister,
>>>>>>> like what Pawel claims, then there is no free, the refcnt just goes to
>>>>>>> 0 but the memory is still there.
>>>>>>>
>>>>>> About possible mistake from my side with bisect - i can judge too early
>>>>>> that some bisect was good
>>>>>> the road was:
>>>>>> git bisect start
>>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>>>>> 'pinctrl-v4.13-1' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next'
>>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using
>>>>>> stack larger than 1024.
>>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>>>>> 'udp-reduce-cache-pressure'
>>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>>>>> 's390-net-updates-part-2'
>>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>>>>> 'bpf-ctx-narrow'
>>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>>>>>> cp_outgoing
>>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new
>>>>>> function dst_dev_put()
>>>>>>
>>>>>> And currently have this running for about 4 hours without problems.
>>>>>>
>>>>>>
>>>>>>
>>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>>>>> DST_NOCACHE flag
>>>>>>
>>>>>> Here for sure - panic
>>>>>>
>>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>>>>> dst_hold_safe() properly
>>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>>>>> dst_hold_safe() properly
>>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>>>>> dst->__refcnt for insertion into fib6 tree
>>>>>>
>>>>>> im not 100% sure tor last two
>>>>>> Will test them again starting from
>>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>>>>>> properly
>>>>>>
>>>>>>
>>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC
>>>>>> and remove the operation of dst_free()
>>>>>>
>>>>>>
>>>>>>
>>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4:
>>>>>> mark DST_NOGC and remove the operation of dst_free()
>>>>>>
>>>>>>
>>>>>>
>>>>> What i can say more
>>>>> I can reproduce this on any server with similar configuration
>>>>> the difference can be teamd instead of bonding
>>>>> ixgbe or i40e and mlx5
>>>>> Same problems
>>>>>
>>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink ->
>>>>> kernel
>>>>> But normally in lab when using only plain routing no bgpd and about 128
>>>>> vlans - with 128 routes - cant reproduce this - this apperas only with bgp -
>>>>> minimum where i can reproduce this was about 130k prefixes with about 286
>>>>> nexthops
>>>>>
>>>>>
>>>>>
>>>>>
>>>> bisected again and same result:
>>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>> Author: Wei Wang <weiwan@google.com>
>>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>>
>>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>
>>>>     With the previous preparation patches, we are ready to get rid of the
>>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>>> calls
>>>>     to dst_free().
>>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>>
>>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>>
>>>> Will add now version 2 of patch from Eric and we will see
>>>>
>>>>
>>> after adding patch
>>> perf top catch
>>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>>> (all, 40 CPUs)
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>     60.95%  [kernel]            [k] dev_put.part.6
>>>      4.00%  [kernel]            [k] ixgbe_poll
>>>      3.63%  [kernel]            [k] irq_entries_start
>>>      1.22%  [kernel]            [k] fib_table_lookup
>>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>>      1.04%  [kernel]            [k] lookup
>>>      0.87%  [kernel]            [k] eth_type_trans
>>>
>>>
>>> no panic on console - rebooting to check logs
>>>
>>>
>> Nothing logged
>>

^ permalink raw reply related

* Re: [PATCH iproute2/net-next] tc: flower: support for matching MPLS labels
From: Stephen Hemminger @ 2017-09-21  1:09 UTC (permalink / raw)
  To: Simon Horman; +Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev, oss-drivers
In-Reply-To: <1505225175-13830-1-git-send-email-simon.horman@netronome.com>

On Tue, 12 Sep 2017 16:06:15 +0200
Simon Horman <simon.horman@netronome.com> wrote:

> From: Benjamin LaHaise <benjamin.lahaise@netronome.com>
> 
> This patch adds support to the iproute2 tc filter command for matching MPLS
> labels in the flower classifier.  The ability to match the Time To Live,
> Bottom Of Stack, Traffic Control and Label fields are added as options to
> the flower filter.
> 
> e.g.:
>   tc filter add dev eth0 protocol 0x8847 parent ffff: \
>     flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
>     action drop
> 
> Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Looks good, applied

^ permalink raw reply

* Re: [PATCH iproute2 v2] ip: ipaddress: fix missing space after prefixlen
From: Stephen Hemminger @ 2017-09-21  1:06 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa, sd
In-Reply-To: <20170920202651.95860-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 13:26:51 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")
> Reported-by: Sabrina Dubroca <sd@queasysnail.net>
> Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

Applied, thanks for spotting this.

^ permalink raw reply

* Re: [patch net-next 00/16] mlxsw: Multicast flood update
From: David Miller @ 2017-09-21  1:05 UTC (permalink / raw)
  To: jiri; +Cc: netdev, nogahf, idosch, mlxsw
In-Reply-To: <20170920141516.1402-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 20 Sep 2017 16:15:00 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Nogah says:
> 
> Currently, there are four erroneous flows in MC flood:
> 1. When MC is disabled it affects only the flood table for unregistered
>    MC packets, but packets that match an entry in the MDB are unaffected.
> 2. When MC is disabled, MC packets are being sent to all the ports in the
>    bridge (like BC and link-local MC packets) regardless of the designated
>    flag (BR_MCAST_FLAG).
> 3. When a port is being deleted from a bridge it might remain in the MDB.
> 4. When MC is enabled packets are flooded to the mrouter ports only if
>    they don't match any entry in the MDB, when they should always be
>    flooded to them.
> 
> What these problems have in common is the discrepancy between how the
> hardware handles MDB and mcast flood, and how the driver does it. Each
> of these problems needs fixing either in the MDB code, or in mcast flood
> code, and some in both.
> 
> Patches 1-6 change the way the MDB is handled in the driver to make the
> following changes easier.
> Patches 7-8 fix problem number 1 by removing the MDB from the HW when MC
> is being disabled and restoring it when it is being enabled.
> Patches 9-10 fix problem number 2 by offloading the flood table by the
> appropriate flag.
> Patch 11 fixes problem number 3 by adding MDB flush to the port removal.
> Patches 12-14 fix problem number 4 by adding the mrouter ports to every
> MDB entry in the HW to mimic the wanted behaviour.

Looks good, series applied, thanks!

^ permalink raw reply

* Re: [PATCH 1/1] ip: ip_print: if no json obj is initialized default fp is stdout
From: Stephen Hemminger @ 2017-09-21  1:04 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa
In-Reply-To: <20170920221914.92740-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 15:19:14 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
> 
> $ ip -t mon label link
>     fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
>     #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
>     #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
>     #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
>     #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
>         at libnetlink.c:761
> 	#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
> 	#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
> 	#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
> 
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
> 
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
> ---

The fix looks correct, but the patch has minor style format issue.

ERROR: "foo* bar" should be "foo *bar"
#54: FILE: ip/ip_print.c:26:
+static inline FILE* _get_fp(void)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox