Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v4 net-next] fix unsafe set_memory_rw from softirq
From: David Miller @ 2013-10-07 19:17 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ast, dborkman, edumazet, heiko.carstens, linux-arm-kernel,
	linuxppc-dev, linux-s390, netdev
In-Reply-To: <1381078592.12191.0.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 06 Oct 2013 09:56:32 -0700

> On Fri, 2013-10-04 at 00:14 -0700, Alexei Starovoitov wrote:
>> on x86 system with net.core.bpf_jit_enable = 1
> 
>> cannot reuse jited filter memory, since it's readonly,
>> so use original bpf insns memory to hold work_struct
>> 
>> defer kfree of sk_filter until jit completed freeing
>> 
>> tested on x86_64 and i386
>> 
>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

I've decided to apply this to 'net', thanks.

^ permalink raw reply

* Re: bug in passing file descriptors
From: Steve Rago @ 2013-10-07 19:17 UTC (permalink / raw)
  To: David Miller; +Cc: luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <20131007.151233.2237348893254566536.davem@davemloft.net>

On 10/07/2013 03:12 PM, David Miller wrote:
> From: Steve Rago <sar@nec-labs.com>
> Date: Mon, 7 Oct 2013 15:06:02 -0400
>
>> Maybe.  So a client expecting to receive x bytes of control
>> information should make sure their buffer is at least CMSG_SPACE(x)
>> bytes long instead of CMSG_LEN(x) bytes long, because you feel
>> compelled to copy the final padding from kernel space to user space?
>> Seems wrong to me.  IMHO, the final padding should only come into play
>> when calculating where the next header should begin.
>
> Yes, all control messages must be aligned to, and be of a length of a
> multiple of, "sizeof(long)".
>
> This is the only correct way to program control messages.
>

Except when sizeof(long) can change and you need to maintain binary compatibility with older applications.  x86 comes to 
mind as a relevant example: used to be 32 bits, but is 64 now.

Steve

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: cpsw: Search childs for slave nodes
From: David Miller @ 2013-10-07 19:19 UTC (permalink / raw)
  To: mpa; +Cc: florian, mugunthanvnm, linux-arm-kernel, netdev, kernel
In-Reply-To: <1380890680-30941-1-git-send-email-mpa@pengutronix.de>

From: Markus Pargmann <mpa@pengutronix.de>
Date: Fri,  4 Oct 2013 14:44:39 +0200

> The current implementation searches the whole DT for nodes named
> "slave".
> 
> This patch changes it to search only child nodes for slaves.
> 
> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] net/ethernet: cpsw: DT read bool dual_emac
From: David Miller @ 2013-10-07 19:19 UTC (permalink / raw)
  To: mpa; +Cc: florian, mugunthanvnm, linux-arm-kernel, netdev, kernel
In-Reply-To: <1380890680-30941-2-git-send-email-mpa@pengutronix.de>

From: Markus Pargmann <mpa@pengutronix.de>
Date: Fri,  4 Oct 2013 14:44:40 +0200

> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>

Applied.

^ permalink raw reply

* Re: [PATCH] net: Separate the close_list and the unreg_list v2
From: David Miller @ 2013-10-07 19:22 UTC (permalink / raw)
  To: ebiederm; +Cc: fruggeri, netdev
In-Reply-To: <87txgv9ltu.fsf@xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 05 Oct 2013 19:26:05 -0700

> 
> Separate the unreg_list and the close_list in dev_close_many preventing
> dev_close_many from permuting the unreg_list.  The permutations of the
> unreg_list have resulted in cases where the loopback device is accessed
> it has been freed in code such as dst_ifdown.  Resulting in subtle memory
> corruption.
> 
> This is the second bug from sharing the storage between the close_list
> and the unreg_list.  The issues that crop up with sharing are
> apparently too subtle to show up in normal testing or usage, so let's
> forget about being clever and use two separate lists.
> 
> v2: Make all callers pass in a close_list to dev_close_many
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> Sending the complete diff because this version is actually more
> readable and more obviously correct.

I'll apply this, thanks Eric.

^ permalink raw reply

* Re: [PATCH] ipv4: fix ineffective source address selection
From: David Miller @ 2013-10-07 19:27 UTC (permalink / raw)
  To: jbenc; +Cc: netdev
In-Reply-To: <245895a0777442b56ecea1453be041aa1b31c5a2.1380898983.git.jbenc@redhat.com>

From: Jiri Benc <jbenc@redhat.com>
Date: Fri,  4 Oct 2013 17:04:48 +0200

> When sending out multicast messages, the source address in inet->mc_addr is
> ignored and rewritten by an autoselected one. This is caused by a typo in
> commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output
> route lookups").
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>

My bad :-)  Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
From: David Miller @ 2013-10-07 19:36 UTC (permalink / raw)
  To: paul.durrant
  Cc: xen-devel, netdev, xixiong, msw, annie.li, wei.liu2, Ian.Campbell
In-Reply-To: <1380903983-27429-1-git-send-email-paul.durrant@citrix.com>

From: Paul Durrant <paul.durrant@citrix.com>
Date: Fri, 4 Oct 2013 17:26:23 +0100

> Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into
> xenvif_count_skb_slots() for skbs with a linear area spanning a page
> boundary. The alignment of skb->data needs to be taken into account, not
> just the head length. This patch fixes the issue by dry-running the code
> from xenvif_gop_skb() (and adjusting the comment above the function to note
> that).
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

There seems to be a lot of back and forth about what is the most
desirable way forward wrt. this commit and another similar one.

Please advise.

^ permalink raw reply

* Re: pull request: wireless-next 2013-10-04
From: David Miller @ 2013-10-07 19:41 UTC (permalink / raw)
  To: linville-2XuSBdqkA4R54TAoqtyWWQ
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20131004181607.GK3142-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Date: Fri, 4 Oct 2013 14:16:07 -0400

> Please pull this batch of patches intended for the 3.13 stream!

Pulled, thanks a lot John.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: bug in passing file descriptors
From: David Miller @ 2013-10-07 19:42 UTC (permalink / raw)
  To: sar; +Cc: luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <525308C4.1030007@nec-labs.com>

From: Steve Rago <sar@nec-labs.com>
Date: Mon, 7 Oct 2013 15:17:24 -0400

> Except when sizeof(long) can change and you need to maintain binary
> compatibility with older applications.  x86 comes to mind as a
> relevant example: used to be 32 bits, but is 64 now.

There is no compatability issue.

32-bit tasks will always see the 4-byte align/length.
64-bit tasks will always see the 8-byte align/length.

^ permalink raw reply

* [PATCH] pkt_sched: fq: fix typo for initial_quantum
From: Eric Dumazet @ 2013-10-07 19:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

TCA_FQ_INITIAL_QUANTUM should set q->initial_quantum

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_fq.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index a2fef8b..d8cb3b5 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -656,7 +656,7 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt)
 		q->quantum = nla_get_u32(tb[TCA_FQ_QUANTUM]);
 
 	if (tb[TCA_FQ_INITIAL_QUANTUM])
-		q->quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);
+		q->initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);
 
 	if (tb[TCA_FQ_FLOW_DEFAULT_RATE])
 		q->flow_default_rate = nla_get_u32(tb[TCA_FQ_FLOW_DEFAULT_RATE]);

^ permalink raw reply related

* Re: IPv6 kernel warning
From: Yuchung Cheng @ 2013-10-07 19:51 UTC (permalink / raw)
  To: dormando
  Cc: Michele Baldessari, Russell King - ARM Linux, netdev,
	Neal Cardwell, Nandita Dukkipati
In-Reply-To: <alpine.DEB.2.02.1310071111240.17658@dtop>

On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote:
>
> > >
> > > there's been multiple reports about this one:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=989251
> > > http://bugzilla.kernel.org/show_bug.cgi?id=60779
> > >
> > > Could you try Yuchung's debug patch?
> > > http://www.spinics.net/lists/netdev/msg250193.html
> > Yes it looks like the same bug. Please try that patch to help identify
> > this elusive bug.
> >
>
> Hi!
>
> We get this one a few times a day in production. Here's a warning with
> your debug trace in the line immediately following:
> (I censored a few things)
>
>  [125311.721950] ------------[ cut here ]------------
>  [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80()
>  [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core
>  [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1
>  [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012
>  [125311.721984]  ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998
>  [125311.721986]  ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120
>  [125311.721989]  0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8
>  [125311.721991] Call Trace:
>  [125311.721992]  <IRQ>  [<ffffffff816bb9cc>] dump_stack+0x19/0x1d
>  [125311.722002]  [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0
>  [125311.722005]  [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20
>  [125311.722007]  [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80
>  [125311.722011]  [<ffffffff8161891f>] tcp_ack+0x6df/0xe90
>  [125311.722016]  [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680
>  [125311.722018]  [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320
>  [125311.722021]  [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810
>  [125311.722023]  [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0
>  [125311.722025]  [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750
>  [125311.722027]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
>  [125311.722032]  [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160
>  [125311.722034]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
>  [125311.722036]  [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250
>  [125311.722037]  [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80
>  [125311.722039]  [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360
>  [125311.722040]  [<ffffffff815ff8e0>] ip_rcv+0x230/0x350
>  [125311.722046]  [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600
>  [125311.722049]  [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70
>  [125311.722051]  [<ffffffff815b4354>] process_backlog+0xf4/0x1e0
>  [125311.722053]  [<ffffffff815b4b45>] net_rx_action+0xf5/0x250
>  [125311.722056]  [<ffffffff81053a5f>] __do_softirq+0xef/0x270
>  [125311.722058]  [<ffffffff81053cb5>] irq_exit+0x95/0xa0
>  [125311.722062]  [<ffffffff816c8f26>] do_IRQ+0x66/0xe0
>  [125311.722065]  [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a
>  [125311.722065]  <EOI>  [<ffffffff8100abf1>] ? default_idle+0x21/0xc0
>  [125311.722082]  [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20
>  [125311.722086]  [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230
>  [125311.722091]  [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3
>  [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]---
>  [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120
>
> It's been happening with all 3.10 kernels, and the one above is .13 as
> stated in the trace.

Thanks! could you post the output of `sysctl -a |grep tcp`?

I suspect tcp_process_tlp_ack() should not revert state to Open
directly, but calling tcp_try_keep_open() instead, similar to all the
undo processing in the tcp_fastretrans_alert(): after
tcp_end_cwnd_reduction(), the process (E) falls back to check other
stats before moving to CA_Open.


index 9c62257..9012b42 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
                        tcp_init_cwnd_reduction(sk, true);
                        tcp_set_ca_state(sk, TCP_CA_CWR);
                        tcp_end_cwnd_reduction(sk);
-                       tcp_set_ca_state(sk, TCP_CA_Open);
+                       tcp_try_keep_open(sk);
                        NET_INC_STATS_BH(sock_net(sk),
                                         LINUX_MIB_TCPLOSSPROBERECOVERY);
                }

^ permalink raw reply related

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
From: David Miller @ 2013-10-07 19:52 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, john.r.fastabend, andy
In-Reply-To: <1380917405-23801-2-git-send-email-nhorman@tuxdriver.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri,  4 Oct 2013 16:10:04 -0400

> @@ -426,9 +426,12 @@ struct sk_buff {
>  	char			cb[48] __aligned(8);
>  
>  	unsigned long		_skb_refdst;
> -#ifdef CONFIG_XFRM
> -	struct	sec_path	*sp;
> -#endif
> +
> +	union {
> +		struct	sec_path	*sp;
> +		void 			*accel_priv;
> +	};
> +

I'm not %100 sure these two things are really mutually exclusive.

What if bridging ebtables does an input route lookup?  That can
populate the security path.

Also, why have you not added this to the usual netdev_ops and
hw_features?

^ permalink raw reply

* Re: [PATCH net-next] net: fujitsu: Remove ISA depdendency from Kconfig
From: David Miller @ 2013-10-07 19:53 UTC (permalink / raw)
  To: tedheadster; +Cc: netdev
In-Reply-To: <1380920588-4121-1-git-send-email-tedheadster@gmail.com>

From: Matthew Whitehead <tedheadster@gmail.com>
Date: Fri,  4 Oct 2013 17:03:08 -0400

> There no longer are ISA drivers in the fujitsu directory, so remove the
> dependency from the Kconfig.
> 
> Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] can: dev: fix nlmsg size calculation in can_get_size()
From: David Miller @ 2013-10-07 19:56 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, kernel, wg
In-Reply-To: <1381001117-19624-1-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Sat,  5 Oct 2013 21:25:17 +0200

> This patch fixes the calculation of the nlmsg size, by adding the missing
> nla_total_size().
> 
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: IPv6 kernel warning
From: dormando @ 2013-10-07 19:56 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Michele Baldessari, Russell King - ARM Linux, netdev,
	Neal Cardwell, Nandita Dukkipati
In-Reply-To: <CAK6E8=czoej81t=-J=gjjyQiGVbZ0qiNKBbeRVSWYtweXfSRNQ@mail.gmail.com>

On Mon, 7 Oct 2013, Yuchung Cheng wrote:

> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote:
> >
> > > >
> > > > there's been multiple reports about this one:
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779
> > > >
> > > > Could you try Yuchung's debug patch?
> > > > http://www.spinics.net/lists/netdev/msg250193.html
> > > Yes it looks like the same bug. Please try that patch to help identify
> > > this elusive bug.
> > >
> >
> > Hi!
> >
> > We get this one a few times a day in production. Here's a warning with
> > your debug trace in the line immediately following:
> > (I censored a few things)
> >
> >  [125311.721950] ------------[ cut here ]------------
> >  [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80()
> >  [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core
> >  [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1
> >  [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012
> >  [125311.721984]  ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998
> >  [125311.721986]  ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120
> >  [125311.721989]  0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8
> >  [125311.721991] Call Trace:
> >  [125311.721992]  <IRQ>  [<ffffffff816bb9cc>] dump_stack+0x19/0x1d
> >  [125311.722002]  [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0
> >  [125311.722005]  [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20
> >  [125311.722007]  [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80
> >  [125311.722011]  [<ffffffff8161891f>] tcp_ack+0x6df/0xe90
> >  [125311.722016]  [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680
> >  [125311.722018]  [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320
> >  [125311.722021]  [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810
> >  [125311.722023]  [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0
> >  [125311.722025]  [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750
> >  [125311.722027]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
> >  [125311.722032]  [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160
> >  [125311.722034]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
> >  [125311.722036]  [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250
> >  [125311.722037]  [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80
> >  [125311.722039]  [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360
> >  [125311.722040]  [<ffffffff815ff8e0>] ip_rcv+0x230/0x350
> >  [125311.722046]  [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600
> >  [125311.722049]  [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70
> >  [125311.722051]  [<ffffffff815b4354>] process_backlog+0xf4/0x1e0
> >  [125311.722053]  [<ffffffff815b4b45>] net_rx_action+0xf5/0x250
> >  [125311.722056]  [<ffffffff81053a5f>] __do_softirq+0xef/0x270
> >  [125311.722058]  [<ffffffff81053cb5>] irq_exit+0x95/0xa0
> >  [125311.722062]  [<ffffffff816c8f26>] do_IRQ+0x66/0xe0
> >  [125311.722065]  [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a
> >  [125311.722065]  <EOI>  [<ffffffff8100abf1>] ? default_idle+0x21/0xc0
> >  [125311.722082]  [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20
> >  [125311.722086]  [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230
> >  [125311.722091]  [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3
> >  [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]---
> >  [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120
> >
> > It's been happening with all 3.10 kernels, and the one above is .13 as
> > stated in the trace.
>
> Thanks! could you post the output of `sysctl -a |grep tcp`?
>
> I suspect tcp_process_tlp_ack() should not revert state to Open
> directly, but calling tcp_try_keep_open() instead, similar to all the
> undo processing in the tcp_fastretrans_alert(): after
> tcp_end_cwnd_reduction(), the process (E) falls back to check other
> stats before moving to CA_Open.
>
>
> index 9c62257..9012b42 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
>                         tcp_init_cwnd_reduction(sk, true);
>                         tcp_set_ca_state(sk, TCP_CA_CWR);
>                         tcp_end_cwnd_reduction(sk);
> -                       tcp_set_ca_state(sk, TCP_CA_Open);
> +                       tcp_try_keep_open(sk);
>                         NET_INC_STATS_BH(sock_net(sk),
>                                          LINUX_MIB_TCPLOSSPROBERECOVERY);
>                 }
>

Should I apply this and see if the warning stops?

net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_allowed_congestion_control = cubic reno
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_available_congestion_control = cubic reno westwood
net.ipv4.tcp_base_mss = 512
net.ipv4.tcp_challenge_ack_limit = 100
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_dma_copybreak = 262144
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_early_retrans = 3
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_fack = 1
net.ipv4.tcp_fastopen = 0
net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_frto = 0
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_limit_output_bytes = 131072
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_max_orphans = 2000000
net.ipv4.tcp_max_ssthresh = 0
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_mem = 6188001	8250670	12376002
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_mtu_probing = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_rmem = 4096	87380	16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_syn_retries = 6
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_thin_dupack = 0
net.ipv4.tcp_thin_linear_timeouts = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tso_win_divisor = 3
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_user_cwnd_max = 20
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096	65536	16777216
net.ipv4.tcp_workaround_signed_windows = 0
net.ipv4.vs.secure_tcp = 0

^ permalink raw reply

* Re: [PATCH] net: Update the sysctl permissions handler to test effective uid/gid
From: David Miller @ 2013-10-07 19:58 UTC (permalink / raw)
  To: ebiederm; +Cc: sandeen, akpm, security, pmatouse, netdev, keescook
In-Reply-To: <87y567lbj1.fsf@xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 05 Oct 2013 13:15:30 -0700

> 
> On Tue, 20 Aug 2013 11:40:04 -0500 Eric Sandeen <sandeen@redhat.com> wrote:
>> This was brought up in a Red Hat bug (which may be marked private, I'm sorry):
>>
>> Bug 987055 - open O_WRONLY succeeds on some root owned files in /proc for process running with unprivileged EUID
>>
>> "On RHEL7 some of the files in /proc can be opened for writing by an unprivileged EUID."
>>
>> The flaw existed upstream as well last I checked.
>>
>> This commit in kernel v3.8 caused the regression:
>>
>> commit cff109768b2d9c03095848f4cd4b0754117262aa
>> Author: Eric W. Biederman <ebiederm@xmission.com>
>> Date:   Fri Nov 16 03:03:01 2012 +0000
>>
>>     net: Update the per network namespace sysctls to be available to the network namespace owner
>>
>>     - Allow anyone with CAP_NET_ADMIN rights in the user namespace of the
>>       the netowrk namespace to change sysctls.
>>     - Allow anyone the uid of the user namespace root the same
>>       permissions over the network namespace sysctls as the global root.
>>     - Allow anyone with gid of the user namespace root group the same
>>       permissions over the network namespace sysctl as the global root group.
>>
>>     Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> because it changed /sys/net's special permission handler to test current_uid, not
>> current_euid; same for current_gid/current_egid.
>>
>> So in this case, root cannot drop privs via set[ug]id, and retains all privs
>> in this codepath.
> 
> Modify the code to use current_euid(), and in_egroup_p, as in done
> in fs/proc/proc_sysctl.c:test_perm()
> 
> Cc: stable@vger.kernel.org
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Reported-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH v2] net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not selected
From: David Miller @ 2013-10-07 19:59 UTC (permalink / raw)
  To: festevam; +Cc: edumazet, hannes, netdev, olof, fabio.estevam
In-Reply-To: <1381006619-17126-2-git-send-email-festevam@gmail.com>

From: Fabio Estevam <festevam@gmail.com>
Date: Sat,  5 Oct 2013 17:56:59 -0300

> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> net_secret() is only used when CONFIG_IPV6 or CONFIG_INET are selected.
> 
> Building a defconfig with both of these symbols unselected (Using the ARM
> at91sam9rl_defconfig, for example) leads to the following build warning:
 ...
> Fix this warning by protecting the definition of net_secret() with these
> symbols.
> 
> Reported-by: Olof Johansson <olof@lixom.net>
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
> ---
> Changes since v1:
> - Add #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)

This looks a lot better, applied, thanks.

^ permalink raw reply

* Re: IPv6 kernel warning
From: Yuchung Cheng @ 2013-10-07 20:00 UTC (permalink / raw)
  To: dormando
  Cc: Michele Baldessari, Russell King - ARM Linux, netdev,
	Neal Cardwell, Nandita Dukkipati
In-Reply-To: <alpine.DEB.2.02.1310071256190.17658@dtop>

On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote:
> On Mon, 7 Oct 2013, Yuchung Cheng wrote:
>
>> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote:
>> >
>> > > >
>> > > > there's been multiple reports about this one:
>> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251
>> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779
>> > > >
>> > > > Could you try Yuchung's debug patch?
>> > > > http://www.spinics.net/lists/netdev/msg250193.html
>> > > Yes it looks like the same bug. Please try that patch to help identify
>> > > this elusive bug.
>> > >
>> >
>> > Hi!
>> >
>> > We get this one a few times a day in production. Here's a warning with
>> > your debug trace in the line immediately following:
>> > (I censored a few things)
>> >
>> >  [125311.721950] ------------[ cut here ]------------
>> >  [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80()
>> >  [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core
>> >  [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1
>> >  [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012
>> >  [125311.721984]  ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998
>> >  [125311.721986]  ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120
>> >  [125311.721989]  0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8
>> >  [125311.721991] Call Trace:
>> >  [125311.721992]  <IRQ>  [<ffffffff816bb9cc>] dump_stack+0x19/0x1d
>> >  [125311.722002]  [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0
>> >  [125311.722005]  [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20
>> >  [125311.722007]  [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80
>> >  [125311.722011]  [<ffffffff8161891f>] tcp_ack+0x6df/0xe90
>> >  [125311.722016]  [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680
>> >  [125311.722018]  [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320
>> >  [125311.722021]  [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810
>> >  [125311.722023]  [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0
>> >  [125311.722025]  [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750
>> >  [125311.722027]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
>> >  [125311.722032]  [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160
>> >  [125311.722034]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
>> >  [125311.722036]  [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250
>> >  [125311.722037]  [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80
>> >  [125311.722039]  [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360
>> >  [125311.722040]  [<ffffffff815ff8e0>] ip_rcv+0x230/0x350
>> >  [125311.722046]  [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600
>> >  [125311.722049]  [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70
>> >  [125311.722051]  [<ffffffff815b4354>] process_backlog+0xf4/0x1e0
>> >  [125311.722053]  [<ffffffff815b4b45>] net_rx_action+0xf5/0x250
>> >  [125311.722056]  [<ffffffff81053a5f>] __do_softirq+0xef/0x270
>> >  [125311.722058]  [<ffffffff81053cb5>] irq_exit+0x95/0xa0
>> >  [125311.722062]  [<ffffffff816c8f26>] do_IRQ+0x66/0xe0
>> >  [125311.722065]  [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a
>> >  [125311.722065]  <EOI>  [<ffffffff8100abf1>] ? default_idle+0x21/0xc0
>> >  [125311.722082]  [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20
>> >  [125311.722086]  [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230
>> >  [125311.722091]  [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3
>> >  [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]---
>> >  [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120
>> >
>> > It's been happening with all 3.10 kernels, and the one above is .13 as
>> > stated in the trace.
>>
>> Thanks! could you post the output of `sysctl -a |grep tcp`?
>>
>> I suspect tcp_process_tlp_ack() should not revert state to Open
>> directly, but calling tcp_try_keep_open() instead, similar to all the
>> undo processing in the tcp_fastretrans_alert(): after
>> tcp_end_cwnd_reduction(), the process (E) falls back to check other
>> stats before moving to CA_Open.
>>
>>
>> index 9c62257..9012b42 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
>>                         tcp_init_cwnd_reduction(sk, true);
>>                         tcp_set_ca_state(sk, TCP_CA_CWR);
>>                         tcp_end_cwnd_reduction(sk);
>> -                       tcp_set_ca_state(sk, TCP_CA_Open);
>> +                       tcp_try_keep_open(sk);
>>                         NET_INC_STATS_BH(sock_net(sk),
>>                                          LINUX_MIB_TCPLOSSPROBERECOVERY);
>>                 }
>>
>
> Should I apply this and see if the warning stops?
I'd like to hear what the authors of TLP think. In the mean time could
you help us collect more evidence by disabling TLP with
sysctl net.ipv4.tcp_early_retrans=2
and see if the problem still occurs? (it should not).

thanks

>
> net.ipv4.tcp_abort_on_overflow = 0
> net.ipv4.tcp_adv_win_scale = 1
> net.ipv4.tcp_allowed_congestion_control = cubic reno
> net.ipv4.tcp_app_win = 31
> net.ipv4.tcp_available_congestion_control = cubic reno westwood
> net.ipv4.tcp_base_mss = 512
> net.ipv4.tcp_challenge_ack_limit = 100
> net.ipv4.tcp_congestion_control = cubic
> net.ipv4.tcp_dma_copybreak = 262144
> net.ipv4.tcp_dsack = 1
> net.ipv4.tcp_early_retrans = 3


> net.ipv4.tcp_ecn = 2
> net.ipv4.tcp_fack = 1
> net.ipv4.tcp_fastopen = 0
> net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89
> net.ipv4.tcp_fin_timeout = 5
> net.ipv4.tcp_frto = 0
> net.ipv4.tcp_keepalive_intvl = 75
> net.ipv4.tcp_keepalive_probes = 9
> net.ipv4.tcp_keepalive_time = 1800
> net.ipv4.tcp_limit_output_bytes = 131072
> net.ipv4.tcp_low_latency = 0
> net.ipv4.tcp_max_orphans = 2000000
> net.ipv4.tcp_max_ssthresh = 0
> net.ipv4.tcp_max_syn_backlog = 65536
> net.ipv4.tcp_max_tw_buckets = 2000000
> net.ipv4.tcp_mem = 6188001      8250670 12376002
> net.ipv4.tcp_moderate_rcvbuf = 1
> net.ipv4.tcp_mtu_probing = 0
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_orphan_retries = 0
> net.ipv4.tcp_reordering = 3
> net.ipv4.tcp_retrans_collapse = 1
> net.ipv4.tcp_retries1 = 3
> net.ipv4.tcp_retries2 = 15
> net.ipv4.tcp_rfc1337 = 0
> net.ipv4.tcp_rmem = 4096        87380   16777216
> net.ipv4.tcp_sack = 1
> net.ipv4.tcp_slow_start_after_idle = 0
> net.ipv4.tcp_stdurg = 0
> net.ipv4.tcp_syn_retries = 6
> net.ipv4.tcp_synack_retries = 5
> net.ipv4.tcp_syncookies = 1
> net.ipv4.tcp_thin_dupack = 0
> net.ipv4.tcp_thin_linear_timeouts = 0
> net.ipv4.tcp_timestamps = 1
> net.ipv4.tcp_tso_win_divisor = 3
> net.ipv4.tcp_tw_recycle = 0
> net.ipv4.tcp_tw_reuse = 0
> net.ipv4.tcp_user_cwnd_max = 20
> net.ipv4.tcp_window_scaling = 1
> net.ipv4.tcp_wmem = 4096        65536   16777216
> net.ipv4.tcp_workaround_signed_windows = 0
> net.ipv4.vs.secure_tcp = 0
>

^ permalink raw reply

* Re: [PATCH] eisa: standardize on eisa_register_driver like similar bus registrations
From: David Miller @ 2013-10-07 20:02 UTC (permalink / raw)
  To: tedheadster; +Cc: netdev, linux-scsi
In-Reply-To: <1381026958-4459-1-git-send-email-tedheadster@gmail.com>

From: Matthew Whitehead <tedheadster@gmail.com>
Date: Sat,  5 Oct 2013 22:35:58 -0400

> The other buses (isa, pci, pnp, parport, usb, tty, etc) all use the convention
> of ${BUSNAME}_register_driver. Rewrite the little remaining code that uses EISA
> to follow this convention for easier readability.
> 
> This affects the EISA bus, SCSI, and networking subsystems so only one should
> ultimately merge the patch if it is accepted.
> 
> Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>

I'm fine with someone else taking this, for networking parts:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
From: Ian Campbell @ 2013-10-07 20:03 UTC (permalink / raw)
  To: David Miller
  Cc: paul.durrant, xen-devel, netdev, xixiong, msw, annie.li, wei.liu2
In-Reply-To: <20131007.153622.1392772445100851507.davem@davemloft.net>

On Mon, 2013-10-07 at 15:36 -0400, David Miller wrote:
> From: Paul Durrant <paul.durrant@citrix.com>
> Date: Fri, 4 Oct 2013 17:26:23 +0100
> 
> > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into
> > xenvif_count_skb_slots() for skbs with a linear area spanning a page
> > boundary. The alignment of skb->data needs to be taken into account, not
> > just the head length. This patch fixes the issue by dry-running the code
> > from xenvif_gop_skb() (and adjusting the comment above the function to note
> > that).
> > 
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> There seems to be a lot of back and forth about what is the most
> desirable way forward wrt. this commit and another similar one.
> 
> Please advise.

Lets revert 4f0581d25827d5e864bcf07b05d73d0d12a20a5c and see about
making this stuff less fragile in the future.

Thanks,
Ian.

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Benjamin Herrenschmidt @ 2013-10-07 20:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alexander Gordeev, Ben Hutchings, linux-kernel, Bjorn Helgaas,
	Ralf Baechle, Michael Ellerman, Martin Schwidefsky, Ingo Molnar,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-mips, linuxppc-dev, linux390, linux-s390, x86, linux-ide,
	iss_storagedev, linux-nvme, linux-rdma, netdev, e1000-devel
In-Reply-To: <20131007180111.GC2481@htj.dyndns.org>

On Mon, 2013-10-07 at 14:01 -0400, Tejun Heo wrote:
> I don't think the same race condition would happen with the loop.  The
> problem case is where multiple msi(x) allocation fails completely
> because the global limit went down before inquiry and allocation.  In
> the loop based interface, it'd retry with the lower number.
> 
> As long as the number of drivers which need this sort of adaptive
> allocation isn't too high and the common cases can be made simple, I
> don't think the "complex" part of interface is all that important.
> Maybe we can have reserve / cancel type interface or just keep the
> loop with more explicit function names (ie. try_enable or something
> like that).

I'm thinking a better API overall might just have been to request
individual MSI-X one by one :-)

We want to be able to request an MSI-X at runtime anyway ... if I want
to dynamically add a queue to my network interface, I want it to be able
to pop a new arbitrary MSI-X.

And we don't want to lock drivers into contiguous MSI-X sets either.

And for the cleanup ... well that's what the "pcim" functions are for,
we can just make MSI-X variants.

Ben.

^ permalink raw reply

* Re: IPv6 kernel warning
From: dormando @ 2013-10-07 20:15 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Michele Baldessari, Russell King - ARM Linux, netdev,
	Neal Cardwell, Nandita Dukkipati
In-Reply-To: <CAK6E8=d_O+HHTKb37zGKxpU8E0tTUL6m77g_o6a7ASEiJfXSkw@mail.gmail.com>



On Mon, 7 Oct 2013, Yuchung Cheng wrote:

> On Mon, Oct 7, 2013 at 12:56 PM, dormando <dormando@rydia.net> wrote:
> > On Mon, 7 Oct 2013, Yuchung Cheng wrote:
> >
> >> On Mon, Oct 7, 2013 at 11:13 AM, dormando <dormando@rydia.net> wrote:
> >> >
> >> > > >
> >> > > > there's been multiple reports about this one:
> >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=989251
> >> > > > http://bugzilla.kernel.org/show_bug.cgi?id=60779
> >> > > >
> >> > > > Could you try Yuchung's debug patch?
> >> > > > http://www.spinics.net/lists/netdev/msg250193.html
> >> > > Yes it looks like the same bug. Please try that patch to help identify
> >> > > this elusive bug.
> >> > >
> >> >
> >> > Hi!
> >> >
> >> > We get this one a few times a day in production. Here's a warning with
> >> > your debug trace in the line immediately following:
> >> > (I censored a few things)
> >> >
> >> >  [125311.721950] ------------[ cut here ]------------
> >> >  [125311.721961] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xb58/0xc80()
> >> >  [125311.721962] Modules linked in: bridge ip_vs macvlan coretemp crc32_pclmul ghash_clmulni_intel gpio_ich ipmi_watchdog microcode ipmi_devintf sb_edac lpc_ich edac_core mfd_core ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat ixgbe igb mdio i2c_algo_bit ptp pps_core
> >> >  [125311.721981] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.13 #1
> >> >  [125311.721982] Hardware name: Supermicro XXXXXXXXXXX, BIOS 1.1 10/03/2012
> >> >  [125311.721984]  ffffffff81a82007 ffff88407fc63958 ffffffff816bb9cc ffff88407fc63998
> >> >  [125311.721986]  ffffffff8104b940 00ff8840ad904f82 ffff883b8a165b00 0000000000004120
> >> >  [125311.721989]  0000000000000001 0000000000000019 0000000000000000 ffff88407fc639a8
> >> >  [125311.721991] Call Trace:
> >> >  [125311.721992]  <IRQ>  [<ffffffff816bb9cc>] dump_stack+0x19/0x1d
> >> >  [125311.722002]  [<ffffffff8104b940>] warn_slowpath_common+0x70/0xa0
> >> >  [125311.722005]  [<ffffffff8104b98a>] warn_slowpath_null+0x1a/0x20
> >> >  [125311.722007]  [<ffffffff81616db8>] tcp_fastretrans_alert+0xb58/0xc80
> >> >  [125311.722011]  [<ffffffff8161891f>] tcp_ack+0x6df/0xe90
> >> >  [125311.722016]  [<ffffffff8164e0ca>] ? ipt_do_table+0x22a/0x680
> >> >  [125311.722018]  [<ffffffff816194b3>] ? tcp_validate_incoming+0x63/0x320
> >> >  [125311.722021]  [<ffffffff8161a55c>] tcp_rcv_established+0x2cc/0x810
> >> >  [125311.722023]  [<ffffffff81622c84>] tcp_v4_do_rcv+0x254/0x4f0
> >> >  [125311.722025]  [<ffffffff816245ac>] tcp_v4_rcv+0x5fc/0x750
> >> >  [125311.722027]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
> >> >  [125311.722032]  [<ffffffff815df3ad>] ? nf_hook_slow+0x7d/0x160
> >> >  [125311.722034]  [<ffffffff815ffa00>] ? ip_rcv+0x350/0x350
> >> >  [125311.722036]  [<ffffffff815fface>] ip_local_deliver_finish+0xce/0x250
> >> >  [125311.722037]  [<ffffffff815ffc9c>] ip_local_deliver+0x4c/0x80
> >> >  [125311.722039]  [<ffffffff815ff329>] ip_rcv_finish+0x119/0x360
> >> >  [125311.722040]  [<ffffffff815ff8e0>] ip_rcv+0x230/0x350
> >> >  [125311.722046]  [<ffffffff815b4067>] __netif_receive_skb_core+0x477/0x600
> >> >  [125311.722049]  [<ffffffff815b4217>] __netif_receive_skb+0x27/0x70
> >> >  [125311.722051]  [<ffffffff815b4354>] process_backlog+0xf4/0x1e0
> >> >  [125311.722053]  [<ffffffff815b4b45>] net_rx_action+0xf5/0x250
> >> >  [125311.722056]  [<ffffffff81053a5f>] __do_softirq+0xef/0x270
> >> >  [125311.722058]  [<ffffffff81053cb5>] irq_exit+0x95/0xa0
> >> >  [125311.722062]  [<ffffffff816c8f26>] do_IRQ+0x66/0xe0
> >> >  [125311.722065]  [<ffffffff816bf62a>] common_interrupt+0x6a/0x6a
> >> >  [125311.722065]  <EOI>  [<ffffffff8100abf1>] ? default_idle+0x21/0xc0
> >> >  [125311.722082]  [<ffffffff8100a54f>] arch_cpu_idle+0xf/0x20
> >> >  [125311.722086]  [<ffffffff8108f353>] cpu_startup_entry+0xb3/0x230
> >> >  [125311.722091]  [<ffffffff816b439e>] start_secondary+0x1dc/0x1e3
> >> >  [125311.722093] ---[ end trace e77cd5ba583fcbe9 ]---
> >> >  [125311.722096] 355.355.1.355:22496 F0x4120 S1 s7 IF25+17-1-24f0 ur57 rr3 rt0 um0 hs23120 nxt23120
> >> >
> >> > It's been happening with all 3.10 kernels, and the one above is .13 as
> >> > stated in the trace.
> >>
> >> Thanks! could you post the output of `sysctl -a |grep tcp`?
> >>
> >> I suspect tcp_process_tlp_ack() should not revert state to Open
> >> directly, but calling tcp_try_keep_open() instead, similar to all the
> >> undo processing in the tcp_fastretrans_alert(): after
> >> tcp_end_cwnd_reduction(), the process (E) falls back to check other
> >> stats before moving to CA_Open.
> >>
> >>
> >> index 9c62257..9012b42 100644
> >> --- a/net/ipv4/tcp_input.c
> >> +++ b/net/ipv4/tcp_input.c
> >> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
> >>                         tcp_init_cwnd_reduction(sk, true);
> >>                         tcp_set_ca_state(sk, TCP_CA_CWR);
> >>                         tcp_end_cwnd_reduction(sk);
> >> -                       tcp_set_ca_state(sk, TCP_CA_Open);
> >> +                       tcp_try_keep_open(sk);
> >>                         NET_INC_STATS_BH(sock_net(sk),
> >>                                          LINUX_MIB_TCPLOSSPROBERECOVERY);
> >>                 }
> >>
> >
> > Should I apply this and see if the warning stops?
> I'd like to hear what the authors of TLP think. In the mean time could
> you help us collect more evidence by disabling TLP with
> sysctl net.ipv4.tcp_early_retrans=2
> and see if the problem still occurs? (it should not).
>
> thanks

Changed on one machine. We tend to only see one per box every 12-24 hours,
so it'll take a while to confirm.

> >
> > net.ipv4.tcp_abort_on_overflow = 0
> > net.ipv4.tcp_adv_win_scale = 1
> > net.ipv4.tcp_allowed_congestion_control = cubic reno
> > net.ipv4.tcp_app_win = 31
> > net.ipv4.tcp_available_congestion_control = cubic reno westwood
> > net.ipv4.tcp_base_mss = 512
> > net.ipv4.tcp_challenge_ack_limit = 100
> > net.ipv4.tcp_congestion_control = cubic
> > net.ipv4.tcp_dma_copybreak = 262144
> > net.ipv4.tcp_dsack = 1
> > net.ipv4.tcp_early_retrans = 3
>
>
> > net.ipv4.tcp_ecn = 2
> > net.ipv4.tcp_fack = 1
> > net.ipv4.tcp_fastopen = 0
> > net.ipv4.tcp_fastopen_key = 009dc92c-82e3e514-d440ed23-c49b1a89
> > net.ipv4.tcp_fin_timeout = 5
> > net.ipv4.tcp_frto = 0
> > net.ipv4.tcp_keepalive_intvl = 75
> > net.ipv4.tcp_keepalive_probes = 9
> > net.ipv4.tcp_keepalive_time = 1800
> > net.ipv4.tcp_limit_output_bytes = 131072
> > net.ipv4.tcp_low_latency = 0
> > net.ipv4.tcp_max_orphans = 2000000
> > net.ipv4.tcp_max_ssthresh = 0
> > net.ipv4.tcp_max_syn_backlog = 65536
> > net.ipv4.tcp_max_tw_buckets = 2000000
> > net.ipv4.tcp_mem = 6188001      8250670 12376002
> > net.ipv4.tcp_moderate_rcvbuf = 1
> > net.ipv4.tcp_mtu_probing = 0
> > net.ipv4.tcp_no_metrics_save = 1
> > net.ipv4.tcp_orphan_retries = 0
> > net.ipv4.tcp_reordering = 3
> > net.ipv4.tcp_retrans_collapse = 1
> > net.ipv4.tcp_retries1 = 3
> > net.ipv4.tcp_retries2 = 15
> > net.ipv4.tcp_rfc1337 = 0
> > net.ipv4.tcp_rmem = 4096        87380   16777216
> > net.ipv4.tcp_sack = 1
> > net.ipv4.tcp_slow_start_after_idle = 0
> > net.ipv4.tcp_stdurg = 0
> > net.ipv4.tcp_syn_retries = 6
> > net.ipv4.tcp_synack_retries = 5
> > net.ipv4.tcp_syncookies = 1
> > net.ipv4.tcp_thin_dupack = 0
> > net.ipv4.tcp_thin_linear_timeouts = 0
> > net.ipv4.tcp_timestamps = 1
> > net.ipv4.tcp_tso_win_divisor = 3
> > net.ipv4.tcp_tw_recycle = 0
> > net.ipv4.tcp_tw_reuse = 0
> > net.ipv4.tcp_user_cwnd_max = 20
> > net.ipv4.tcp_window_scaling = 1
> > net.ipv4.tcp_wmem = 4096        65536   16777216
> > net.ipv4.tcp_workaround_signed_windows = 0
> > net.ipv4.vs.secure_tcp = 0
> >
>

^ permalink raw reply

* Re: bug in passing file descriptors
From: Steve Rago @ 2013-10-07 20:29 UTC (permalink / raw)
  To: David Miller; +Cc: luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <20131007.154226.533738557474978526.davem@davemloft.net>

On 10/07/2013 03:42 PM, David Miller wrote:
> There is no compatability issue.
>
> 32-bit tasks will always see the 4-byte align/length.
> 64-bit tasks will always see the 8-byte align/length.
>

Really?  So when I compile my application on a 32-bit Linux box and then try to run it on a 64-bit Linux box, you're not 
going to overrun my buffer when CMSG_SPACE led me to allocate an insufficient amount of memory needed to account for 
padding on the 64-bit platform?

By the way, FreeBSD, Mac OS X, and Solaris all behave as I described, so if you're happy with Linux behaving 
differently, then I'll stop wasting bandwidth.

Steve

^ permalink raw reply

* Re: [PATCH RFC 54/77] ntb: Ensure number of MSIs on SNB is enough for the link interrupt
From: Jon Mason @ 2013-10-07 20:31 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Matt Porter, stable,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers,
	VMware, Inc.
In-Reply-To: <20131007183845.GA1834@dhcp-26-207.brq.redhat.com>

On Mon, Oct 07, 2013 at 08:38:45PM +0200, Alexander Gordeev wrote:
> On Mon, Oct 07, 2013 at 09:50:57AM -0700, Jon Mason wrote:
> > On Sat, Oct 05, 2013 at 11:43:04PM +0200, Alexander Gordeev wrote:
> > > On Wed, Oct 02, 2013 at 05:48:05PM -0700, Jon Mason wrote:
> > > > On Wed, Oct 02, 2013 at 12:49:10PM +0200, Alexander Gordeev wrote:
> > > > > Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> > > > > ---
> > > > >  drivers/ntb/ntb_hw.c |    2 +-
> > > > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
> > > > > index de2062c..eccd5e5 100644
> > > > > --- a/drivers/ntb/ntb_hw.c
> > > > > +++ b/drivers/ntb/ntb_hw.c
> > > > > @@ -1066,7 +1066,7 @@ static int ntb_setup_msix(struct ntb_device *ndev)
> > > > >  		/* On SNB, the link interrupt is always tied to 4th vector.  If
> > > > >  		 * we can't get all 4, then we can't use MSI-X.
> > > > >  		 */
> > > > > -		if (ndev->hw_type != BWD_HW) {
> > > > > +		if ((rc < SNB_MSIX_CNT) && (ndev->hw_type != BWD_HW)) {
> > > > 
> > > > Nack, this check is unnecessary.
> > > 
> > > If SNB can do more than SNB_MSIX_CNT MSI-Xs then this check is needed
> > > to enable less than maximum MSI-Xs in case the maximum was not allocated.
> > > Otherwise SNB will fallback to single MSI instead of multiple MSI-Xs.
> > 
> > Per the comment in the code snippet above, "If we can't get all 4,
> > then we can't use MSI-X".  There is already a check to see if more
> > than 4 were acquired.  So it's not possible to hit this.  Even if it
> > was, don't use SNB_MSIX_CNT here (limits.msix_cnt is the preferred
> > variable).  Also, the "()" are unnecessary.
> 
> The changelog is definitely bogus. I meant here an improvement to the
> existing scheme, not a conversion to the new one:
> 
> 	msix_entries = msix_table_size(val);
> 
> Getting i.e. 16 vectors here.
> 
> 	if (msix_entries > ndev->limits.msix_cnt) {

On SNB HW, limits.msix_cnt is set to SNB_MSIX_CNT (4)
http://lxr.free-electrons.com/source/drivers/ntb/ntb_hw.c#L558

> 		rc = -EINVAL;
> 		goto err;
> 	}
> 
> Upper limit check i.e. succeeds.
> 
> 	[...]
> 
> 	rc = pci_enable_msix(pdev, ndev->msix_entries, msix_entries);
> 
> pci_enable_msix() does not success and returns i.e. 8 here, should retry.

Per the above, since our upper bound is 4.  We will either have this
return 0 for all 4 or a number between 1 and 3 (or an error, but
that's not relevant to this discussion).

> 	if (rc < 0)
> 		goto err1;
> 	if (rc > 0) {
> 		/* On SNB, the link interrupt is always tied to 4th vector.  If
> 		 * we can't get all 4, then we can't use MSI-X.
> 		 */
> 		if (ndev->hw_type != BWD_HW) {
> 
> On SNB bail out here, although could have continue with 8 vectors.
> Can only use SNB_MSIX_CNT here, since limits.msix_cnt is the upper limit.

Since we can guarantee that rc is between 1 and 3 at this point (on
SNB HW), we should error out.

Thanks,
Jon


> 
> 			rc = -EIO;
> 			goto err1;
> 		}
> 
> 		[...]
> 	}
> 
> -- 
> Regards,
> Alexander Gordeev
> agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Ben Hutchings @ 2013-10-07 20:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Tejun Heo, Alexander Gordeev, linux-kernel, Bjorn Helgaas,
	Ralf Baechle, Michael Ellerman, Martin Schwidefsky, Ingo Molnar,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-mips, linuxppc-dev, linux390, linux-s390, x86, linux-ide,
	iss_storagedev, linux-nvme, linux-rdma, netdev, e1000-devel,
	linux-driver, Solarflare linux maintainers
In-Reply-To: <1381176656.645.171.camel@pasglop>

On Tue, 2013-10-08 at 07:10 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2013-10-07 at 14:01 -0400, Tejun Heo wrote:
> > I don't think the same race condition would happen with the loop.  The
> > problem case is where multiple msi(x) allocation fails completely
> > because the global limit went down before inquiry and allocation.  In
> > the loop based interface, it'd retry with the lower number.
> > 
> > As long as the number of drivers which need this sort of adaptive
> > allocation isn't too high and the common cases can be made simple, I
> > don't think the "complex" part of interface is all that important.
> > Maybe we can have reserve / cancel type interface or just keep the
> > loop with more explicit function names (ie. try_enable or something
> > like that).
> 
> I'm thinking a better API overall might just have been to request
> individual MSI-X one by one :-)
> 
> We want to be able to request an MSI-X at runtime anyway ... if I want
> to dynamically add a queue to my network interface, I want it to be able
> to pop a new arbitrary MSI-X.

Yes, this would be very useful.

> And we don't want to lock drivers into contiguous MSI-X sets either.

I don't think there's any such limitation now.  The entries array passed
to pci_enable_msix() specifies which MSI-X vectors the driver wants to
enable.  It's usually filled with 0..nvec-1 in order, but not always.
And the IRQ numbers returned aren't usually contiguous either, on x86.

Ben.

> And for the cleanup ... well that's what the "pcim" functions are for,
> we can just make MSI-X variants.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox