Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: dsa: add port fdb dump
From: Florian Fainelli @ 2017-09-21  5:09 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170920233214.6936-1-vivien.didelot@savoirfairelinux.com>



On 09/20/2017 04:32 PM, Vivien Didelot wrote:
> Dumping a DSA port's FDB entries is not specific to a DSA slave, so add
> a dsa_port_fdb_dump function, similarly to dsa_port_fdb_add and
> dsa_port_fdb_del.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: better scoping of slave functions
From: Florian Fainelli @ 2017-09-21  5:08 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170920233157.6832-1-vivien.didelot@savoirfairelinux.com>



On 09/20/2017 04:31 PM, Vivien Didelot wrote:
> A few DSA slave functions take a dsa_slave_priv pointer as first
> argument, whereas the scope of the slave.c functions is the slave
> net_device structure. Fix this and rename dsa_netpoll_send_skb to
> dsa_slave_netpoll_send_skb.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 12/14] gtp: Configuration for zero UDP checksum
From: Harald Welte @ 2017-09-21  1:55 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Tom Herbert, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <CALx6S35NP_BJ8oHwjkOZkNYJa=g7hWeUiNMZ5WNj-2FQaksCeQ@mail.gmail.com>

Hi Tom,

On Wed, Sep 20, 2017 at 11:09:29AM -0700, Tom Herbert wrote:
> On Mon, Sep 18, 2017 at 9:24 PM, David Miller <davem@davemloft.net> wrote:
> > From: Tom Herbert <tom@quantonium.net>
> >> Add configuration to control use of zero checksums on transmit for both
> >> IPv4 and IPv6, and control over accepting zero IPv6 checksums on
> >> receive.
> >
> > I thought we were trying to move away from this special case of allowing
> > zero UDP checksums with tunnels, especially for ipv6.
> 
> I don't have a strong preference either way. I like consistency with
> VXLAN and foo/UDP, but I guess it's not required. Interestingly, since
> GTP only carries IP, IPv6 zero checksums are actually safer here than
> VXLAN or GRE/UDP.

Just for the record: I don't care either way and I defer to the kernel
networking developers to decide if they want to have zero UDP checksum
in GTP or not.

The 3GPP specs don't say anything about UDP checksums.  So there's no
requirement to use them, and hence operation without UDP checksums
should be compliant.  Cisco GTP implementation has udp checksumming
configurable, so other implementations also seem to provide both ways.

In general, I would argue one wants UDP checksumming of GTP in all
setups, as while the inner IP packet might be protected, the GTP header
itself is not, and that's what contains important data suhc as the TEID
(Tunnel Endpoint ID).  But that's of course just my personal opinion,
and I'm not saying we should prevent people from using lower protection
if that's what they want.

-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: [PATCH net-next ] net: Remove useless function skb_header_release
From: Joe Perches @ 2017-09-21  4:49 UTC (permalink / raw)
  To: gfree.wind, davem, netdev
In-Reply-To: <1505968771-92232-1-git-send-email-gfree.wind@vip.163.com>

On Thu, 2017-09-21 at 12:39 +0800, gfree.wind@vip.163.com wrote:
> From: Gao Feng <gfree.wind@vip.163.com>
> 
> There is no one which would invokes the function skb_header_release.
> So just remove it now.

This in incomplete.
There are other references to this function.

$ git grep -w skb_header_release
drivers/net/usb/asix_common.c:	 * TCP packets for example are cloned, but skb_header_release()
include/linux/skbuff.h: *	skb_header_release - release reference to header
include/linux/skbuff.h:static inline void skb_header_release(struct sk_buff *skb)
include/linux/skbuff.h: *	Variant of skb_header_release() assuming skb is private to caller.
net/batman-adv/soft-interface.c:	 * data using skb_header_release in our skbs to allow skb_cow_header to

^ permalink raw reply

* [PATCH net-next ] net: Remove useless function skb_header_release
From: gfree.wind @ 2017-09-21  4:39 UTC (permalink / raw)
  To: davem, netdev; +Cc: Gao Feng

From: Gao Feng <gfree.wind@vip.163.com>

There is no one which would invokes the function skb_header_release.
So just remove it now.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
 include/linux/skbuff.h | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72299ef..ce632cd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1457,22 +1457,6 @@ static inline int skb_header_unclone(struct sk_buff *skb, gfp_t pri)
 }
 
 /**
- *	skb_header_release - release reference to header
- *	@skb: buffer to operate on
- *
- *	Drop a reference to the header part of the buffer.  This is done
- *	by acquiring a payload reference.  You must not read from the header
- *	part of skb->data after this.
- *	Note : Check if you can use __skb_header_release() instead.
- */
-static inline void skb_header_release(struct sk_buff *skb)
-{
-	BUG_ON(skb->nohdr);
-	skb->nohdr = 1;
-	atomic_add(1 << SKB_DATAREF_SHIFT, &skb_shinfo(skb)->dataref);
-}
-
-/**
  *	__skb_header_release - release reference to header
  *	@skb: buffer to operate on
  *
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next 0/5] net: introduce noref sk
From: David Miller @ 2017-09-21  3:20 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, pablo, fw, edumazet, hannes
In-Reply-To: <cover.1505926196.git.pabeni@redhat.com>

From: Paolo Abeni <pabeni@redhat.com>
Date: Wed, 20 Sep 2017 18:54:00 +0200

> This series introduce the infrastructure to store inside the skb a socket
> pointer without carrying a refcount to the socket.
> 
> Such infrastructure is then used in the network receive path - and
> specifically the early demux operation.
> 
> This allows the UDP early demux to perform a full lookup for UDP sockets,
> with many benefits:
> 
> - the UDP early demux code is now much simpler
> - the early demux does not hit any performance penalties in case of UDP hash
>   table collision - previously the early demux performed a partial, unsuccesful,
>   lookup
> - early demux is now operational also for unconnected sockets.
> 
> This infrastrcture will be used in follow-up series to allow dst caching for
> unconnected UDP sockets, and than to extend the same features to TCP listening
> sockets.

Like Eric, I find this series (while exciting) quite scary :-)

You really have to post some kind of performance numbers in your
header posting in order to justify something with these ramifications
and scale.

Thank you.

^ permalink raw reply

* Re: [PATCH iproute2 v3] ip: ip_print: if no json obj is initialized default fp is stdout
From: Stephen Hemminger @ 2017-09-21  2:22 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa
In-Reply-To: <20170921012356.46451-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 18:23:56 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
> 
> $ ip -t mon label link
>         fmt=fmt@entry=0x45460d "%d: ", value=<optimized out>) at ip_print.c:132
>         #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d "%d: ", key=0x4545fc "ifindex", t=PRINT_ANY) at ip_common.h:189
>         #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
>         #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
>         #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
>             at libnetlink.c:761
>             #7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
>             #8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 "mon", argc=3, argv=0x7fffffffe588) at ip.c:116
>             #9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
>     
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
> 
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

This is getting way more complex than it needs to be.

new_json_obj is always called with fp == stdout.
There is no reason to have the _fp at all!
Please rework new_json_obj to take only one argument.

^ permalink raw reply

* Re: [Patch v3 1/3] ipv4: Namespaceify tcp_fastopen knob
From: 严海双 @ 2017-09-21  1:55 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, edumazet, weiwan, lucab, netdev, linux-kernel
In-Reply-To: <20170920.142227.65942571438912956.davem@davemloft.net>



> On 2017年9月21日, at 上午5:22, David Miller <davem@davemloft.net> wrote:
> 
> From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> Date: Tue, 19 Sep 2017 17:38:14 +0800
> 
>> -		if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1) &&
>> -		    (sysctl_tcp_fastopen & TFO_SERVER_ENABLE) &&
>> +		tcp_fastopen =  sock_net(sk)->ipv4.sysctl_tcp_fastopen;
>                              ^^
> 
> Please change that to one space.
> 
> And also please provide an appropriate "[PATCH vX 0/3] " header
> posting when you respin this series.

Sorry, it’s my mistake, thanks David.
> 
>> @@ -282,18 +280,19 @@ struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
>> 	struct tcp_fastopen_cookie valid_foc = { .len = -1 };
>> 	bool syn_data = TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq + 1;
>> 	struct sock *child;
>> +	int tcp_fastopen = sock_net(sk)->ipv4.sysctl_tcp_fastopen;
> 
> Please order local variables from longest to shortest line (aka. reverse
> christmas tree format).
> 

Okay, I’ll take care of such coding style in next commit, thanks!

^ permalink raw reply

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
From: Steven Rostedt @ 2017-09-21  1:41 UTC (permalink / raw)
  To: Yonghong Song; +Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20170918233836.1817062-1-yhs@fb.com>

On Mon, 18 Sep 2017 16:38:36 -0700
Yonghong Song <yhs@fb.com> wrote:

> This patch fixes a bug exhibited by the following scenario:
>   1. fd1 = perf_event_open with attr.config = ID1
>   2. attach bpf program prog1 to fd1
>   3. fd2 = perf_event_open with attr.config = ID1
>      <this will be successful>
>   4. user program closes fd2 and prog1 is detached from the tracepoint.
>   5. user program with fd1 does not work properly as tracepoint
>      no output any more.
> 
> The issue happens at step 4. Multiple perf_event_open can be called
> successfully, but only one bpf prog pointer in the tp_event. In the
> current logic, any fd release for the same tp_event will free
> the tp_event->prog.
> 
> The fix is to free tp_event->prog only when the closing fd
> corresponds to the one which registered the program.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  Additional context: discussed with Alexei internally but did not find
>  a solution which can avoid introducing the additional field in
>  trace_event_call structure.
> 
>  Peter, could you take a look as well and maybe you could have better
>  alternative? Thanks!
> 
>  include/linux/trace_events.h | 1 +
>  kernel/events/core.c         | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 7f11050..2e0f222 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -272,6 +272,7 @@ struct trace_event_call {
>  	int				perf_refcount;
>  	struct hlist_head __percpu	*perf_events;
>  	struct bpf_prog			*prog;
> +	struct perf_event		*bpf_prog_owner;

Does this have to be in the trace_event_call structure? Hmm, I'm
wondering if the prog needs to be there (I should look to see if we can
move it from it). The trace_event_call is created for *every* event,
and there's thousands of them now. Every byte to this structure adds
1000s of bytes to the kernel. Would it be possible to attach the prog
and the owner to the perf_event?

-- Steve


>  
>  	int	(*perf_perm)(struct trace_event_call *,
>  			     struct perf_event *);
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 3e691b7..6bc21e2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
>  		}
>  	}
>  	event->tp_event->prog = prog;
> +	event->tp_event->bpf_prog_owner = event;
>  
>  	return 0;
>  }
> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
>  		return;
>  
>  	prog = event->tp_event->prog;
> -	if (prog) {
> +	if (prog && event->tp_event->bpf_prog_owner == event) {
>  		event->tp_event->prog = NULL;
>  		bpf_prog_put(prog);
>  	}

^ permalink raw reply

* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
From: Roman Gushchin @ 2017-09-21  1:46 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, linux-kernel

> Hello.
>
> Since, IIRC, v4.11, there is some regression in TCP stack resulting in the 
> warning shown below. Most of the time it is harmless, but rarely it just 
> causes either freeze or (I believe, this is related too) panic in 
> tcp_sacktag_walk() (because sk_buff passed to this function is NULL). 
> Unfortunately, I still do not have proper stacktrace from panic, but will try 
> to capture it if possible.
> 
> Also, I have custom settings regarding TCP stack, shown below as well. ifb is 
> used to shape traffic with tc.
> 
> Please note this regression was already reported as BZ [1] and as a letter to 
> ML [2], but got neither attention nor resolution. It is reproducible for (not 
> only) me on my home router since v4.11 till v4.13.1 incl.
> 
> Please advise on how to deal with it. I'll provide any additional info if 
> necessary, also ready to test patches if any.
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
> [2] https://www.spinics.net/lists/netdev/msg436158.html

We're experiencing the same problems on some machines in our fleet.
Exactly the same symptoms: tcp_fastretrans_alert() warnings and
sometimes panics in tcp_sacktag_walk().

Here is an example of a backtrace with the panic log:

978.210080]  fuse
[973978.214099]  sg
[973978.217789]  loop
[973978.221829]  efivarfs
[973978.226544]  autofs4
[973978.231109] CPU: 12 PID: 3806320 Comm: ld:srv:W20 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973978.250563] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973978.266558] Call Trace:
[973978.271615]  <IRQ>
[973978.275817]  dump_stack+0x4d/0x70
[973978.282626]  __warn+0xd3/0xf0
[973978.288727]  warn_slowpath_null+0x1e/0x20
[973978.296910]  tcp_fastretrans_alert+0xacf/0xbd0
[973978.305974]  tcp_ack+0xbce/0x1390
[973978.312770]  tcp_rcv_established+0x1ce/0x740
[973978.321488]  tcp_v6_do_rcv+0x195/0x440
[973978.329166]  tcp_v6_rcv+0x94c/0x9f0
[973978.336323]  ip6_input_finish+0xea/0x430
[973978.344330]  ip6_input+0x32/0xa0
[973978.350968]  ? ip6_rcv_finish+0xa0/0xa0
[973978.358799]  ip6_rcv_finish+0x4b/0xa0
[973978.366289]  ipv6_rcv+0x2ec/0x4f0
[973978.373082]  ? ip6_make_skb+0x1c0/0x1c0
[973978.380919]  __netif_receive_skb_core+0x2d5/0x9a0
[973978.390505]  __netif_receive_skb+0x16/0x70
[973978.398875]  netif_receive_skb_internal+0x23/0x80
[973978.408462]  napi_gro_receive+0x113/0x1d0
[973978.416657]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973978.426398]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973978.434405]  mlx5e_napi_poll+0x8c/0x380
[973978.442238]  ? mlx5_cq_completion+0x54/0xb0
[973978.450770]  net_rx_action+0x22e/0x380
[973978.458447]  __do_softirq+0x106/0x2e8
[973978.465950]  irq_exit+0xb0/0xc0
[973978.472396]  do_IRQ+0x4f/0xd0
[973978.478495]  common_interrupt+0x86/0x86
[973978.486329] RIP: 0033:0x7f8dee58d8ae
[973978.493642] RSP: 002b:00007f8cb925f078 EFLAGS: 00000206
[973978.504251]  ORIG_RAX: ffffffffffffff5f
[973978.512085] RAX: 00007f8cb925f1a8 RBX: 0000000048000000 RCX: 00007f8764bd6a80
[973978.526508] RDX: 00000000000001ba RSI: 00007f7cb4ca3410 RDI: 00007f7cb4ca3410
[973978.540927] RBP: 000000000000000d R08: 00007f8764bd6b00 R09: 00007f8764bd6b80
[973978.555347] R10: 0000000000002400 R11: 00007f8dee58e240 R12: d3273c84146e8c29
[973978.569766] R13: 9dac83ddf04c235c R14: 00007f7cb4ca33b0 R15: 00007f7cb4ca4f50
[973978.584189]  </IRQ>
[973978.588650] ---[ end trace 5d1c83e12a57d039 ]---
[973995.178183] BUG: unable to handle kernel 
[973995.186385] NULL pointer dereference
[973995.193692]  at 0000000000000028
[973995.200323] IP: tcp_sacktag_walk+0x2b7/0x460
[973995.209032] PGD 102d856067 
[973995.214789] PUD fded0d067 
[973995.220385] PMD 0 
[973995.224577] 
[973995.227732] ------------[ cut here ]------------
[973995.237128] Oops: 0000 [#1] SMP
[973995.243575] Modules linked in:
[973995.249868]  mptctl
[973995.254251]  mptbase
[973995.258792]  xt_DSCP
[973995.263331]  xt_set
[973995.267698]  ip_set_hash_ip
[973995.273452]  cls_u32
[973995.277993]  sch_sfq
[973995.282535]  cls_fw
[973995.286904]  sch_htb
[973995.291444]  mpt3sas
[973995.295982]  raid_class
[973995.301044]  ip6table_mangle
[973995.306973]  iptable_mangle
[973995.312726]  cls_bpf
[973995.317268]  tcp_diag
[973995.321983]  udp_diag
[973995.326697]  inet_diag
[973995.331585]  ip6table_filter
[973995.337513]  xt_NFLOG
[973995.342226]  nfnetlink_log
[973995.347807]  xt_comment
[973995.352866]  xt_statistic
[973995.358276]  iptable_filter
[973995.364029]  xt_mark
[973995.368572]  sb_edac
[973995.373113]  edac_core
[973995.378001]  x86_pkg_temp_thermal
[973995.384795]  intel_powerclamp
[973995.390897]  coretemp
[973995.395608]  kvm_intel
[973995.400498]  kvm
[973995.404345]  irqbypass
[973995.409235]  ses
[973995.413078]  iTCO_wdt
[973995.417794]  iTCO_vendor_support
[973995.424415]  enclosure
[973995.429301]  lpc_ich
[973995.433843]  scsi_transport_sas
[973995.440292]  mfd_core
[973995.445007]  efivars
[973995.449548]  ipmi_si
[973995.454087]  ipmi_devintf
[973995.459496]  i2c_i801
[973995.464209]  ipmi_msghandler
[973995.470138]  acpi_cpufreq
[973995.475545]  button
[973995.479914]  sch_fq_codel
[973995.485319]  nfsd
[973995.489341]  auth_rpcgss
[973995.494573]  nfs_acl
[973995.499114]  oid_registry
[973995.504524]  lockd
[973995.508717]  grace
[973995.512912]  sunrpc
[973995.517280]  megaraid_sas
[973995.522689]  fuse
[973995.526709]  sg
[973995.530382]  loop
[973995.534405]  efivarfs
[973995.539118]  autofs4
[973995.543660] CPU: 19 PID: 3806297 Comm: ld:srv:W0 Tainted: G        W       4.11.3-7_fbk1_1174_ga56eebf #7
[973995.562936] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM06   10/28/2016
[973995.578914] task: ffff880129e5c380 task.stack: ffffc900210cc000
[973995.590914] RIP: 0010:tcp_sacktag_walk+0x2b7/0x460
[973995.600648] RSP: 0000:ffff88203ef438e8 EFLAGS: 00010207
[973995.611254] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000004e4ec474
[973995.625677] RDX: 000000004e4ec2ad RSI: ffff8810361faa00 RDI: ffff881ffecf8840
[973995.640102] RBP: ffff88203ef43940 R08: 0000000045729921 R09: 0000000000000000
[973995.654519] R10: 00000000000085d6 R11: ffff881ffecf8998 R12: ffff881ffecf8840
[973995.668938] R13: ffff88203ef43a48 R14: 0000000000000000 R15: ffff881ffecf8998
[973995.683362] FS:  00007f8cc8cf7700(0000) GS:ffff88203ef40000(0000) knlGS:0000000000000000
[973995.699686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[973995.711331] CR2: 0000000000000028 CR3: 0000000104c1b000 CR4: 00000000003406e0
[973995.725755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[973995.740175] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[973995.754595] Call Trace:
[973995.759652]  <IRQ>
[973995.763855]  ? kprobe_perf_func+0x28/0x210
[973995.772210]  tcp_sacktag_write_queue+0x5ff/0x9e0
[973995.781615]  tcp_ack+0x677/0x1390
[973995.788408]  tcp_rcv_established+0x1ce/0x740
[973995.797112]  tcp_v6_do_rcv+0x195/0x440
[973995.804767]  tcp_v6_rcv+0x94c/0x9f0
[973995.811911]  ip6_input_finish+0xea/0x430
[973995.819917]  ip6_input+0x32/0xa0
[973995.826538]  ? ip6_rcv_finish+0xa0/0xa0
[973995.834373]  ip6_rcv_finish+0x4b/0xa0
[973995.841859]  ipv6_rcv+0x2ec/0x4f0
[973995.848653]  ? ip6_fragment+0x9f0/0x9f0
[973995.856489]  ? ip6_make_skb+0x1c0/0x1c0
[973995.864323]  __netif_receive_skb_core+0x2d5/0x9a0
[973995.873891]  __netif_receive_skb+0x16/0x70
[973995.882244]  netif_receive_skb_internal+0x23/0x80
[973995.891812]  napi_gro_receive+0x113/0x1d0
[973995.899993]  mlx5e_handle_rx_cqe_mpwrq+0x5b6/0x8d0
[973995.909735]  mlx5e_poll_rx_cq+0x7c/0x7f0
[973995.917740]  mlx5e_napi_poll+0x8c/0x380
[973995.925576]  ? mlx5_cq_completion+0x54/0xb0
[973995.934101]  net_rx_action+0x22e/0x380
[973995.941764]  __do_softirq+0x106/0x2e8
[973995.949255]  irq_exit+0xb0/0xc0
[973995.955696]  do_IRQ+0x4f/0xd0
[973995.961798]  common_interrupt+0x86/0x86
[973995.969634] RIP: 0033:0x7f8dec97a557
[973995.976945] RSP: 002b:00007f8cc8cf2f48 EFLAGS: 00000206
[973995.987552]  ORIG_RAX: ffffffffffffff20
[973995.995386] RAX: 00007f7fa9e15230 RBX: 00007f8c2153a160 RCX: 00007f7fa9e15230
[973996.009810] RDX: 0000000000000000 RSI: 00007f8cc8cf3040 RDI: 00007f8c204f90c0
[973996.024230] RBP: 00007f8cc8cf2f80 R08: 0000000000000001 R09: 000131aa4c002c01
[973996.038652] R10: 0000000000000000 R11: 0000000000000001 R12: 00007f8c2153a170
[973996.053073] R13: 00007f8cc8cf3040 R14: 00007f8c204f90c0 R15: 00007f8c2153a120
[973996.067494]  </IRQ>
[973996.071858] Code: 
[973996.076051] b9 
[973996.079723] 01 
[973996.083383] 00 
[973996.087056] 00 
[973996.090730] 00 
[973996.094388] 85 
[973996.098062] c0 
[973996.101738] 0f 
[973996.105410] 8e 
[973996.109087] da 
[973996.112759] fd 
[973996.116433] ff 
[973996.120109] ff 
[973996.123783] 85 
[973996.127457] c0 
[973996.131132] 75 
[973996.134806] 28 
[973996.138481] 0f 
[973996.142156] b7 
[973996.145831] 43 
[973996.149504] 30 
[973996.153180] 41 
[973996.156835] 01 
[973996.160511] 45 
[973996.164168] 04 
[973996.167843] 48 
[973996.171517] 8b 
[973996.175190] 1b 
[973996.178848] 4c 
[973996.182521] 39 
[973996.186198] fb 
[973996.189872] 74 
[973996.193529] 8c 
[973996.197202] 49 
[973996.200877] 3b 
[973996.204534] 9c 
[973996.208209] 24 
[973996.211883] 50 
[973996.215559] 01 
[973996.219215] 00 
[973996.222889] 00 
[973996.226562] 74 
[973996.230221] c1 
[973996.233894] <8b> 
[973996.237916] 43 
[973996.241590] 28 
[973996.245264] 3b 
[973996.248921] 45 
[973996.252596] d4 
[973996.256271] 0f 
[973996.259929] 88 
[973996.263601] 9f 
[973996.267276] fd 
[973996.270935] ff 
[973996.274592] ff 
[973996.278267] eb 
[973996.281938] b3 
[973996.285614] 48 
[973996.289289] 8d 
[973996.292964] 43 
[973996.296638] 10 
[973996.300296] 8b 
[973996.303969] 4b 
[973996.307642] 28 
[973996.311325] RIP: tcp_sacktag_walk+0x2b7/0x460 RSP: ffff88203ef438e8
[973996.324007] ------------[ cut here ]------------
[973996.333399] CR2: 0000000000000028
[973996.340218] ---[ end trace 5d1c83e12a57d03a ]---
[973996.349605] Kernel panic - not syncing: Fatal exception in interrupt
[973996.362521] Kernel Offset: disabled
TBOOT: wait until all APs ready for txt shutdown
TBOOT: IA32_FEATURE_CONTROL_MSR: 0000ff07
TBOOT: CPU is SMX-capable
TBOOT: CPU is VMX-capable
TBOOT: SMX is enabled
TBOOT: TXT chipset and all needed capabilities present
TBOOT: TPM: Pcr 17 extend, return value = 0000003D
TBOOT: TPM: Pcr 18 extend, return value = 0000003D
TBOOT: TPM: Pcr 19 extend, return value = 0000003D
TBOOT: cap'ed dynamic PCRs
TBOOT: waiting for APs (0) to exit guests...
TBOOT: .
TBOOT: 
TBOOT: all APs exited guests
TBOOT: calling txt_shutdown on AP


Thanks!

^ permalink raw reply

* [PATCH iproute2 v3] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:23 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

$ ip -t mon label link
        fmt=fmt@entry=0x45460d "%d: ", value=<optimized out>) at ip_print.c:132
        #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d "%d: ", key=0x4545fc "ifindex", t=PRINT_ANY) at ip_common.h:189
        #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
        #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
        #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
            at libnetlink.c:761
            #7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
            #8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 "mon", argc=3, argv=0x7fffffffe588) at ip.c:116
            #9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
    
Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ip_print.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..7eb4c6da 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
 #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
 #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
 
+static inline FILE *_get_fp(void)
+{
+	return _fp ? _fp : stdout;
+};
+
 void new_json_obj(int json, FILE *fp)
 {
 	if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
 			jsonw_name(_jw, str);
 		jsonw_start_array(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
 		jsonw_end_array(_jw);
 		jsonw_pretty(_jw, true);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
 			else						\
 				jsonw_##type_name##_field(_jw, key, value); \
 		} else if (_IS_FP_CONTEXT(t)) {				\
-			color_fprintf(_fp, color, fmt, value);          \
+			color_fprintf(_get_fp(), color, fmt, value);	\
 		}							\
 	}
 _PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
 		else
 			jsonw_string_field(_jw, key, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
 
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
 		else
 			jsonw_bool(_jw, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value ? "true" : "false");
+		color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
 	}
 }
 
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
 		snprintf(b1, sizeof(b1), "%#x", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
 		else
 			jsonw_string(_jw, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
 		else
 			jsonw_null(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH iproute2 v2] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:22 UTC (permalink / raw)
  To: netdev; +Cc: Roopa Prabhu, Nikolay Aleksandrov, David Ahern, Julien Fortin
In-Reply-To: <20170921011645.45555-1-julien@cumulusnetworks.com>

looks like git-mail ate all the traceback lines starting with #, will
re-submit v3

On Wed, Sep 20, 2017 at 6:16 PM, Julien Fortin
<julien@cumulusnetworks.com> wrote:
> From: Julien Fortin <julien@cumulusnetworks.com>
>
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
>
> $ ip -t mon label link
> fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
> at libnetlink.c:761
>
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
>
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
> ---
>  ip/ip_print.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/ip/ip_print.c b/ip/ip_print.c
> index 4cd6a0bc..7eb4c6da 100644
> --- a/ip/ip_print.c
> +++ b/ip/ip_print.c
> @@ -23,6 +23,11 @@ static FILE *_fp;
>  #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
>  #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
>
> +static inline FILE *_get_fp(void)
> +{
> +       return _fp ? _fp : stdout;
> +};
> +
>  void new_json_obj(int json, FILE *fp)
>  {
>         if (json) {
> @@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
>                         jsonw_name(_jw, str);
>                 jsonw_start_array(_jw);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               fprintf(_fp, "%s", str);
> +               fprintf(_get_fp(), "%s", str);
>         }
>  }
>
> @@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
>                 jsonw_end_array(_jw);
>                 jsonw_pretty(_jw, true);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               fprintf(_fp, "%s", str);
> +               fprintf(_get_fp(), "%s", str);
>         }
>  }
>
> @@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
>                         else                                            \
>                                 jsonw_##type_name##_field(_jw, key, value); \
>                 } else if (_IS_FP_CONTEXT(t)) {                         \
> -                       color_fprintf(_fp, color, fmt, value);          \
> +                       color_fprintf(_get_fp(), color, fmt, value);    \
>                 }                                                       \
>         }
>  _PRINT_FUNC(int, int);
> @@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
>                 else
>                         jsonw_string_field(_jw, key, value);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value);
> +               color_fprintf(_get_fp(), color, fmt, value);
>         }
>  }
>
> @@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
>                 else
>                         jsonw_bool(_jw, value);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value ? "true" : "false");
> +               color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
>         }
>  }
>
> @@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
>                 snprintf(b1, sizeof(b1), "%#x", hex);
>                 print_string(PRINT_JSON, key, NULL, b1);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, hex);
> +               color_fprintf(_get_fp(), color, fmt, hex);
>         }
>  }
>
> @@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
>                 else
>                         jsonw_string(_jw, b1);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, hex);
> +               color_fprintf(_get_fp(), color, fmt, hex);
>         }
>  }
>
> @@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
>                 else
>                         jsonw_null(_jw);
>         } else if (_IS_FP_CONTEXT(type)) {
> -               color_fprintf(_fp, color, fmt, value);
> +               color_fprintf(_get_fp(), color, fmt, value);
>         }
>  }
> --
> 2.14.1
>

^ permalink raw reply

* Re: Latest net-next from GIT panic
From: Eric Dumazet @ 2017-09-21  1:17 UTC (permalink / raw)
  To: Wei Wang
  Cc: Paweł Staszewski, Cong Wang, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <CAEA6p_AiLvZxsyu+iOCrPzW-H28N=HBnWycy8w6gdZHFk2Q1dQ@mail.gmail.com>

On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote:
> > Thanks very much Pawel for the feedback.
> >
> > I was looking into the code (specifically IPv4 part) and found that in
> > free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> > fnhe_lock. I am wondering if that could cause some race condition on
> > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> > same dst could be happening.
> >
> > But as we call free_fib_info_rcu() only after the grace period, and
> > the lookup code which could potentially modify
> > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> > fine...
> >
> 
> Hi Pawel,
> 
> Could you try the following debug patch on top of net-next branch and
> reproduce the issue check if there are warning msg showing?
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 93568bd0a352..82aff41c6f63 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry
> *dst, unsigned long time)
>  static inline struct dst_entry *dst_clone(struct dst_entry *dst)
>  {
>         if (dst)
> -               atomic_inc(&dst->__refcnt);
> +               dst_hold(dst);
>         return dst;
>  }
> 
> Thanks.
> Wei
> 


Yes, we believe skb_dst_force() and skb_dst_force_safe() should be
unified  (to the 'safe' version)

We no longer have gc to protect from 0 -> 1 transition of dst refcount.

^ permalink raw reply

* [PATCH iproute2 v2] ip: ip_print: if no json obj is initialized default fp is stdout
From: Julien Fortin @ 2017-09-21  1:16 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.

$ ip -t mon label link
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
at libnetlink.c:761

Traceback can be seen when running the following command in a new terminal.
$ ip link add dev br0 type bridge

Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ip_print.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/ip/ip_print.c b/ip/ip_print.c
index 4cd6a0bc..7eb4c6da 100644
--- a/ip/ip_print.c
+++ b/ip/ip_print.c
@@ -23,6 +23,11 @@ static FILE *_fp;
 #define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
 #define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
 
+static inline FILE *_get_fp(void)
+{
+	return _fp ? _fp : stdout;
+};
+
 void new_json_obj(int json, FILE *fp)
 {
 	if (json) {
@@ -91,7 +96,7 @@ void open_json_array(enum output_type type, const char *str)
 			jsonw_name(_jw, str);
 		jsonw_start_array(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -105,7 +110,7 @@ void close_json_array(enum output_type type, const char *str)
 		jsonw_end_array(_jw);
 		jsonw_pretty(_jw, true);
 	} else if (_IS_FP_CONTEXT(type)) {
-		fprintf(_fp, "%s", str);
+		fprintf(_get_fp(), "%s", str);
 	}
 }
 
@@ -126,7 +131,7 @@ void close_json_array(enum output_type type, const char *str)
 			else						\
 				jsonw_##type_name##_field(_jw, key, value); \
 		} else if (_IS_FP_CONTEXT(t)) {				\
-			color_fprintf(_fp, color, fmt, value);          \
+			color_fprintf(_get_fp(), color, fmt, value);	\
 		}							\
 	}
 _PRINT_FUNC(int, int);
@@ -149,7 +154,7 @@ void print_color_string(enum output_type type,
 		else
 			jsonw_string_field(_jw, key, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
 
@@ -170,7 +175,7 @@ void print_color_bool(enum output_type type,
 		else
 			jsonw_bool(_jw, value);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value ? "true" : "false");
+		color_fprintf(_get_fp(), color, fmt, value ? "true" : "false");
 	}
 }
 
@@ -189,7 +194,7 @@ void print_color_0xhex(enum output_type type,
 		snprintf(b1, sizeof(b1), "%#x", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -208,7 +213,7 @@ void print_color_hex(enum output_type type,
 		else
 			jsonw_string(_jw, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, hex);
+		color_fprintf(_get_fp(), color, fmt, hex);
 	}
 }
 
@@ -228,6 +233,6 @@ void print_color_null(enum output_type type,
 		else
 			jsonw_null(_jw);
 	} else if (_IS_FP_CONTEXT(type)) {
-		color_fprintf(_fp, color, fmt, value);
+		color_fprintf(_get_fp(), color, fmt, value);
 	}
 }
-- 
2.14.1

^ permalink raw reply related

* Re: Latest net-next from GIT panic
From: Wei Wang @ 2017-09-21  1:09 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Cong Wang, Eric Dumazet, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <CAEA6p_DpSr84cYu3pXqoHV=V=daSsbiOcnzsF+Kt0r4Wv_jmdg@mail.gmail.com>

> Thanks very much Pawel for the feedback.
>
> I was looking into the code (specifically IPv4 part) and found that in
> free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> fnhe_lock. I am wondering if that could cause some race condition on
> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> same dst could be happening.
>
> But as we call free_fib_info_rcu() only after the grace period, and
> the lookup code which could potentially modify
> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> fine...
>

Hi Pawel,

Could you try the following debug patch on top of net-next branch and
reproduce the issue check if there are warning msg showing?

diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a352..82aff41c6f63 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry
*dst, unsigned long time)
 static inline struct dst_entry *dst_clone(struct dst_entry *dst)
 {
        if (dst)
-               atomic_inc(&dst->__refcnt);
+               dst_hold(dst);
        return dst;
 }

Thanks.
Wei


On Wed, Sep 20, 2017 at 3:09 PM, Wei Wang <weiwan@google.com> wrote:
>>>> bisected again and same result:
>>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>> Author: Wei Wang <weiwan@google.com>
>>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>>
>>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>
>>>>     With the previous preparation patches, we are ready to get rid of the
>>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>>> calls
>>>>     to dst_free().
>>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>>
>>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>>
>>>> Will add now version 2 of patch from Eric and we will see
>>>>
>>>>
>>> after adding patch
>>> perf top catch
>>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>>> (all, 40 CPUs)
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>     60.95%  [kernel]            [k] dev_put.part.6
>>>      4.00%  [kernel]            [k] ixgbe_poll
>>>      3.63%  [kernel]            [k] irq_entries_start
>>>      1.22%  [kernel]            [k] fib_table_lookup
>>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>>      1.04%  [kernel]            [k] lookup
>>>      0.87%  [kernel]            [k] eth_type_trans
>>>
>>>
>>> no panic on console - rebooting to check logs
>>>
>>>
>> Nothing logged
>>
>
> Thanks very much Pawel for the feedback.
>
> I was looking into the code (specifically IPv4 part) and found that in
> free_fib_info_rcu(), we call free_nh_exceptions() without holding the
> fnhe_lock. I am wondering if that could cause some race condition on
> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the
> same dst could be happening.
>
> But as we call free_fib_info_rcu() only after the grace period, and
> the lookup code which could potentially modify
> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems
> fine...
>
>
> On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>>
>>
>> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze:
>>
>>>
>>>
>>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:
>>>>
>>>>
>>>>
>>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>>
>>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:
>>>>>>
>>>>>>
>>>>>>
>>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze:
>>>>>>>
>>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet
>>>>>>> <eric.dumazet@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
>>>>>>>>>
>>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps.
>>>>>>>>>
>>>>>>>>> This is very odd.
>>>>>>>>>
>>>>>>>>> We only free netdevice in free_netdev() and it is only called when
>>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible
>>>>>>>>> to be NULL.
>>>>>>>>
>>>>>>>> If there is a missing dev_hold() or one dev_put() in excess,
>>>>>>>> this would allow the netdev to be freed too soon.
>>>>>>>>
>>>>>>>> -> Use after free.
>>>>>>>> memory holding netdev could be reallocated-cleared by some other
>>>>>>>> kernel
>>>>>>>> user.
>>>>>>>>
>>>>>>> Sure, but only unregister could trigger a free. If there is no
>>>>>>> unregister,
>>>>>>> like what Pawel claims, then there is no free, the refcnt just goes to
>>>>>>> 0 but the memory is still there.
>>>>>>>
>>>>>> About possible mistake from my side with bisect - i can judge too early
>>>>>> that some bisect was good
>>>>>> the road was:
>>>>>> git bisect start
>>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag
>>>>>> 'pinctrl-v4.13-1' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
>>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next'
>>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
>>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
>>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using
>>>>>> stack larger than 1024.
>>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
>>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch
>>>>>> 'udp-reduce-cache-pressure'
>>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
>>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch
>>>>>> 's390-net-updates-part-2'
>>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
>>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch
>>>>>> 'bpf-ctx-narrow'
>>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
>>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove
>>>>>> cp_outgoing
>>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
>>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add
>>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix
>>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
>>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new
>>>>>> function dst_dev_put()
>>>>>>
>>>>>> And currently have this running for about 4 hours without problems.
>>>>>>
>>>>>>
>>>>>>
>>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
>>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove
>>>>>> DST_NOCACHE flag
>>>>>>
>>>>>> Here for sure - panic
>>>>>>
>>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
>>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
>>>>>> dst_hold_safe() properly
>>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
>>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call
>>>>>> dst_hold_safe() properly
>>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
>>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
>>>>>> dst->__refcnt for insertion into fib6 tree
>>>>>>
>>>>>> im not 100% sure tor last two
>>>>>> Will test them again starting from
>>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put()
>>>>>> properly
>>>>>>
>>>>>>
>>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
>>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC
>>>>>> and remove the operation of dst_free()
>>>>>>
>>>>>>
>>>>>>
>>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4:
>>>>>> mark DST_NOGC and remove the operation of dst_free()
>>>>>>
>>>>>>
>>>>>>
>>>>> What i can say more
>>>>> I can reproduce this on any server with similar configuration
>>>>> the difference can be teamd instead of bonding
>>>>> ixgbe or i40e and mlx5
>>>>> Same problems
>>>>>
>>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink ->
>>>>> kernel
>>>>> But normally in lab when using only plain routing no bgpd and about 128
>>>>> vlans - with 128 routes - cant reproduce this - this apperas only with bgp -
>>>>> minimum where i can reproduce this was about 130k prefixes with about 286
>>>>> nexthops
>>>>>
>>>>>
>>>>>
>>>>>
>>>> bisected again and same result:
>>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
>>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
>>>> Author: Wei Wang <weiwan@google.com>
>>>> Date:   Sat Jun 17 10:42:32 2017 -0700
>>>>
>>>>     ipv4: mark DST_NOGC and remove the operation of dst_free()
>>>>
>>>>     With the previous preparation patches, we are ready to get rid of the
>>>>     dst gc operation in ipv4 code and release dst based on refcnt only.
>>>>     So this patch adds DST_NOGC flag for all IPv4 dst and remove the
>>>> calls
>>>>     to dst_free().
>>>>     At this point, all dst created in ipv4 code do not use the dst gc
>>>>     anymore and will be destroyed at the point when refcnt drops to 0.
>>>>
>>>>     Signed-off-by: Wei Wang <weiwan@google.com>
>>>>     Acked-by: Martin KaFai Lau <kafai@fb.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da
>>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net
>>>>
>>>> Will add now version 2 of patch from Eric and we will see
>>>>
>>>>
>>> after adding patch
>>> perf top catch
>>>    PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],
>>> (all, 40 CPUs)
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>     60.95%  [kernel]            [k] dev_put.part.6
>>>      4.00%  [kernel]            [k] ixgbe_poll
>>>      3.63%  [kernel]            [k] irq_entries_start
>>>      1.22%  [kernel]            [k] fib_table_lookup
>>>      1.15%  [kernel]            [k] do_raw_spin_lock
>>>      1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
>>>      1.04%  [kernel]            [k] lookup
>>>      0.87%  [kernel]            [k] eth_type_trans
>>>
>>>
>>> no panic on console - rebooting to check logs
>>>
>>>
>> Nothing logged
>>

^ permalink raw reply related

* Re: [PATCH iproute2/net-next] tc: flower: support for matching MPLS labels
From: Stephen Hemminger @ 2017-09-21  1:09 UTC (permalink / raw)
  To: Simon Horman; +Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev, oss-drivers
In-Reply-To: <1505225175-13830-1-git-send-email-simon.horman@netronome.com>

On Tue, 12 Sep 2017 16:06:15 +0200
Simon Horman <simon.horman@netronome.com> wrote:

> From: Benjamin LaHaise <benjamin.lahaise@netronome.com>
> 
> This patch adds support to the iproute2 tc filter command for matching MPLS
> labels in the flower classifier.  The ability to match the Time To Live,
> Bottom Of Stack, Traffic Control and Label fields are added as options to
> the flower filter.
> 
> e.g.:
>   tc filter add dev eth0 protocol 0x8847 parent ffff: \
>     flower mpls_label 1 mpls_tc 2 mpls_ttl 3 mpls_bos 0 \
>     action drop
> 
> Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Looks good, applied

^ permalink raw reply

* Re: [PATCH iproute2 v2] ip: ipaddress: fix missing space after prefixlen
From: Stephen Hemminger @ 2017-09-21  1:06 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa, sd
In-Reply-To: <20170920202651.95860-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 13:26:51 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")
> Reported-by: Sabrina Dubroca <sd@queasysnail.net>
> Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

Applied, thanks for spotting this.

^ permalink raw reply

* Re: [patch net-next 00/16] mlxsw: Multicast flood update
From: David Miller @ 2017-09-21  1:05 UTC (permalink / raw)
  To: jiri; +Cc: netdev, nogahf, idosch, mlxsw
In-Reply-To: <20170920141516.1402-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 20 Sep 2017 16:15:00 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Nogah says:
> 
> Currently, there are four erroneous flows in MC flood:
> 1. When MC is disabled it affects only the flood table for unregistered
>    MC packets, but packets that match an entry in the MDB are unaffected.
> 2. When MC is disabled, MC packets are being sent to all the ports in the
>    bridge (like BC and link-local MC packets) regardless of the designated
>    flag (BR_MCAST_FLAG).
> 3. When a port is being deleted from a bridge it might remain in the MDB.
> 4. When MC is enabled packets are flooded to the mrouter ports only if
>    they don't match any entry in the MDB, when they should always be
>    flooded to them.
> 
> What these problems have in common is the discrepancy between how the
> hardware handles MDB and mcast flood, and how the driver does it. Each
> of these problems needs fixing either in the MDB code, or in mcast flood
> code, and some in both.
> 
> Patches 1-6 change the way the MDB is handled in the driver to make the
> following changes easier.
> Patches 7-8 fix problem number 1 by removing the MDB from the HW when MC
> is being disabled and restoring it when it is being enabled.
> Patches 9-10 fix problem number 2 by offloading the flood table by the
> appropriate flag.
> Patch 11 fixes problem number 3 by adding MDB flush to the port removal.
> Patches 12-14 fix problem number 4 by adding the mrouter ports to every
> MDB entry in the HW to mimic the wanted behaviour.

Looks good, series applied, thanks!

^ permalink raw reply

* Re: [PATCH 1/1] ip: ip_print: if no json obj is initialized default fp is stdout
From: Stephen Hemminger @ 2017-09-21  1:04 UTC (permalink / raw)
  To: Julien Fortin; +Cc: netdev, roopa, nikolay, dsa
In-Reply-To: <20170920221914.92740-1-julien@cumulusnetworks.com>

On Wed, 20 Sep 2017 15:19:14 -0700
Julien Fortin <julien@cumulusnetworks.com> wrote:

> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> The ip monitor didn't call `new_json_obj` (even for in non json context),
> so the static FILE* _fp variable wasn't initialized, thus raising a
> SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
> paths won't have to call `new_json_obj`.
> 
> $ ip -t mon label link
>     fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
>     #3  0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
>     #4  print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
>     #5  0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
>     #6  0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
>         at libnetlink.c:761
> 	#7  0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
> 	#8  0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
> 	#9  0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
> 
> Traceback can be seen when running the following command in a new terminal.
> $ ip link add dev br0 type bridge
> 
> Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
> ---

The fix looks correct, but the patch has minor style format issue.

ERROR: "foo* bar" should be "foo *bar"
#54: FILE: ip/ip_print.c:26:
+static inline FILE* _get_fp(void)

^ permalink raw reply

* Re: [PATCH iproute2] tc: fix typo in tc-tcindex man page
From: Stephen Hemminger @ 2017-09-21  1:01 UTC (permalink / raw)
  To: Davide Caratti; +Cc: netdev
In-Reply-To: <de07579ec6e41c65103ada6733072874af95773e.1505397960.git.dcaratti@redhat.com>

On Thu, 14 Sep 2017 17:00:46 +0200
Davide Caratti <dcaratti@redhat.com> wrote:

> fix mis-typed 'pass_on' keyword.
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied, thanks Davide

^ permalink raw reply

* Re: [iproute2 PATCH] tc/mirred: Clean up white-space noise
From: Stephen Hemminger @ 2017-09-21  0:59 UTC (permalink / raw)
  To: Amritha Nambiar; +Cc: netdev, alexander.h.duyck
In-Reply-To: <150529711361.57333.10151730051655064515.stgit@anamdev.jf.intel.com>

On Wed, 13 Sep 2017 03:05:13 -0700
Amritha Nambiar <amritha.nambiar@intel.com> wrote:

> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> ---
>  include/linux/tc_act/tc_mirred.h |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/tc_act/tc_mirred.h b/include/linux/tc_act/tc_mirred.h
> index 3d7a2b3..69038c2 100644
> --- a/include/linux/tc_act/tc_mirred.h
> +++ b/include/linux/tc_act/tc_mirred.h

All files in include/linux are perodically refreshed from the uapi directory
in the kernel source.

Please re-submit this as a patch to net-next in kernel against include/uapi/tc_act/tc_mirred.h
and then iproute2 will pick it up in next pass.

^ permalink raw reply

* Re: [PATCH iproute2] Add information about COLORFGBG to ip.8 man page
From: Stephen Hemminger @ 2017-09-21  0:56 UTC (permalink / raw)
  To: Roland Hopferwieser; +Cc: netdev
In-Reply-To: <873bbae3-db89-b572-748b-cfa62922f883@ica.jku.at>

On Fri, 15 Sep 2017 22:04:16 +0200
Roland Hopferwieser <rhopfer@ica.jku.at> wrote:

> diff --git a/man/man8/ip.8 b/man/man8/ip.8
> index ae018fdf..2a27a56e 100644
> --- a/man/man8/ip.8
> +++ b/man/man8/ip.8
> @@ -187,7 +187,8 @@ executes specified command over all objects, it 
> depends if command supports this
> 
>   .TP
>   .BR "\-c" , " -color"
> -Use color output.
> +Use color output. The color palette is affected by the COLORFGBG 
> environment variable, which typically has the form "fg;bg".
> +If "bg" is set to 0-6 or 8, the dark color palette is used.
> 
>   .TP
>   .BR "\-t" , " \-timestamp"

Your patch was damaged by the mailer you used.
Please fix and resubmit.

$ patch -p1 < ~/Downloads/iproute2-Add-information-about-COLORFGBG-to-ip.8-man-page.patch 
patching file man/man8/ip.8
patch: **** malformed patch at line 23: depends if command supports this

^ permalink raw reply

* Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
From: Tom Herbert @ 2017-09-21  0:55 UTC (permalink / raw)
  To: Harald Welte
  Cc: Andreas Schultz, Tom Herbert, David S. Miller,
	Linux Kernel Network Developers, Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <20170921001313.n2uo56oxmphcvd5g@nataraja>

On Wed, Sep 20, 2017 at 5:13 PM, Harald Welte <laforge@gnumonks.org> wrote:
> Hi Tom,
>
> On Wed, Sep 20, 2017 at 09:24:07AM -0700, Tom Herbert wrote:
>> On Wed, Sep 20, 2017 at 9:07 AM, Andreas Schultz <aschultz@tpip.net> wrote:
>> > GTP isn't special, I just don't like to have testing only features in there
>> > when the same goal can be reached without having to add extra stuff. Adding
>> > code that is not going to be useful in real production setups (or in this
>> > case would even break production setups when enabled accidentally) makes the
>> > implementation more complex than it needs to be.
>>
>> Well, you could make the same argument that allowing GTP to configured
>> as standalone interface is a problem since GTP is only allowed to be
>> with used with GTP-C. But, then we have something in the kernel that
>> the community is expected to support, but requires jumping through a
>> whole bunch of hoops just to run a simple netperf.
>
> "A whole bunch of hoops" without your new interface would consist of
> running a single command-line program that is supplied with libgtpnl.
> This is not a complete 3GPP network, but a simple libmnl-based helper
> library with no other depenencies.
>
You have the point of view of someone who has a lot of experience
dealing with this protocol. Try to imagine if you were some random
kernel network programmer with no experience in the area. If they
happen to find a one-off bug and want to do the right thing by running
a test, you want to make that as easy as possible. From that
perspective, building protocol specific libraries and finding the
right cmd line to run is significant hoops (I can attest to this).
There are other examples in the kernel of systems bigger than GTP that
require a whole lot of effort just to run a simple test; you'll notice
for those it's rare that best developers ever bother to look at them
unless they're making a global change that affects the code. We don't
want GTP to take be like that!

> I'm not neccessarily against introducing features like the 'standalone
> interface configuration'.  However, we must make sure that any
> significant new feature contributions like IPv6 are tested in a
> "realistic setup" and not just using those 'interfaces added for easy
> development'.  Also, I would argue those 'interfaces added for easy
> deveopment/benchmarking' should probably be clearly marked as such to
> avoid raising the impression that this is what leads to a
> standard-conforming / production-type setup.
>
Given the obvious complexity of running a real GTP stack, I don't
think we have to worry about this. In order to test a "realistic
setup" a whole bunch of other support is needed. So the forward
looking question now is how to get to be able to run a "realistic
setup"?

Tom

^ permalink raw reply

* Re: [iproute PATCH v2] ipaddress: Fix segfault in 'addr showdump'
From: Stephen Hemminger @ 2017-09-21  0:54 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Hangbin Liu, netdev, Julien Fortin
In-Reply-To: <20170913092034.7002-1-phil@nwl.cc>

On Wed, 13 Sep 2017 11:20:34 +0200
Phil Sutter <phil@nwl.cc> wrote:

> Obviously, 'addr showdump' feature wasn't adjusted to json output
> support. As a consequence, calls to print_string() in print_addrinfo()
> tried to dereference a NULL FILE pointer.
> 
> Cc: Julien Fortin <julien@cumulusnetworks.com>
> Fixes: d0e720111aad2 ("ip: ipaddress.c: add support for json output")
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> --
> Changes since v1:
> Align json output with that of 'ip -j addr show':
> - Interface index label is 'ifindex', not 'index' and it doesn't belong
>   to 'addr_info' array.
> - Create one 'addr_info' array per dumped address, not one for all.
> ---
>  ip/ipaddress.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 

Looks good, applied.

^ permalink raw reply

* Re: [PATCH iproute2] tc: fq: support low_rate_threshold attribute
From: Stephen Hemminger @ 2017-09-21  0:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Neal Cardwell, Soheil Hassas Yeganeh
In-Reply-To: <1504905179.15310.101.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, 08 Sep 2017 14:12:59 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9
> 
> Tested:
> 
> lpaa5:/tmp# tc -qd add dev eth1 root fq
> lpaa5:/tmp# tc -s qd sh dev eth1
> qdisc fq 8003: root refcnt 5 limit 10000p flow_limit 1000p buckets 4096 \
>  orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3648 \
>  initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
>  Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   116 flows (114 inactive, 0 throttled)
>   1 gc, 0 highprio, 0 throttled
>     
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
> 99th Percentile Latency Microseconds
> 7081
>     
> lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 500000 -r 300,300 -o P99_LATENCY
> 99th Percentile Latency Microseconds
> 858
> 
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox