* soft lockup in inet_csk_get_port @ 2009-12-01 2:02 kapil dakhane 2009-12-01 6:10 ` Eric Dumazet 2009-12-01 15:00 ` [PATCH] tcp: Fix a connect() race with timewait sockets Eric Dumazet 0 siblings, 2 replies; 31+ messages in thread From: kapil dakhane @ 2009-12-01 2:02 UTC (permalink / raw) To: netdev; +Cc: netfilter Hello, I am trying to analyze the capacity of linux network stack on x6270 which has 16 Hyper threads on two 8-core Intel(r) Xeon(r) CPU. I see that at around 150000 simultaneous connections, after around 1.6 gbps, a cpu get stuck in an infinite loop in inet_csk_bind_conflict, then other cpus get locked up doing spin_lock. Before the lockup cpu usage was around 25%. It appears to be a bug, unless I am hitting some kind of resource limit. It would be good if someone familiar with network code would confirm this, or point me in the right direction. Important details are: I am using kernel version 2.6.31.4 recompiled with TPROXY related options: NF_CONNTRACK, NETFILTER_TPROXY, NETFILTER_XT_MATCH_SOCKET, NETFILTER_XT_TARGET_TPROXY. I have enabled transparent capture and transparent forward using iptables and ip rules. I have 10 instances of a single threaded user space bits-forwarding-proxy (fast), each bound to different hyper-threads (CPUs). Rest 6 CPUs are dedicated to interrupt processing, each handling interrupts from six different network cards. TCP flow from a 4-tuple always get handled by the same proxy process, interrupt thread, and network card. In this way, network traffic is segregated as much as possible to achieve high degree of parallelism. First /var/log/message entry shows CPU#7 is stuck in inet_csk_bind_conflict Nov 17 23:02:04 cap-x6270-01 kernel: BUG: soft lockup - CPU#7 stuck for 61s! [fast:20701] Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: xt_TPROXY xt_MARK xt_socket nf_defrag_ipv4 nf_tproxy_core iptable_mangle ipv6 autofs4 hidp rfcomm l2cap bluetooth rfkill sunrpc 8021q xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video output sbs sbshc battery acpi_memhotplug ac parport_pc lp parport joydev sg rtc_cmos serio_raw rtc_core button igb rtc_lib niu i2c_i801 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ahci libata shpchp aacraid sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Nov 17 23:02:04 cap-x6270-01 kernel: CPU 7: Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: ... Nov 17 23:02:04 cap-x6270-01 kernel: Pid: 20701, comm: fast Not tainted 2.6.31.4 #1 SUN BLADE X6270 SERVER MODULE Nov 17 23:02:04 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c53>] [<ffffffff81285c53>] inet_csk_bind_conflict+0x99/0xa6 Nov 17 23:02:04 cap-x6270-01 kernel: RSP: 0018:ffff88095ac7fe30 EFLAGS: 00000202 Nov 17 23:02:04 cap-x6270-01 kernel: RAX: 000000003c0ba8c0 RBX: ffff88097b14bae0 RCX: ffff8804b3d57820 Nov 17 23:02:04 cap-x6270-01 kernel: RDX: ffff8804b3d57800 RSI: 0000000000000000 RDI: ffff880940421840 Nov 17 23:02:04 cap-x6270-01 kernel: RBP: ffffffff8100c42e R08: 000000002d01960c R09: ffff880909832ea0 Nov 17 23:02:04 cap-x6270-01 kernel: R10: 00007fffc5d92700 R11: ffff880940421840 R12: 0000000000000001 Nov 17 23:02:04 cap-x6270-01 kernel: R13: ffff88097c5e9400 R14: ffffffff810b7a92 R15: 0000000000000001 Nov 17 23:02:04 cap-x6270-01 kernel: FS: 00007f20a08416e0(0000) GS:ffffc90000e00000(0000) knlGS:0000000000000000 Nov 17 23:02:04 cap-x6270-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 17 23:02:04 cap-x6270-01 kernel: CR2: 00000000081df408 CR3: 000000049ac01000 CR4: 00000000000006e0 Nov 17 23:02:04 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 17 23:02:04 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Nov 17 23:02:04 cap-x6270-01 kernel: Call Trace: Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff812859e4>] ? inet_csk_get_port+0x1b2/0x29e Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff812a1596>] ? inet_bind+0x10c/0x1b7 Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8124ef53>] ? sys_bind+0x6e/0x9e Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8106de14>] ? audit_syscall_entry+0x1a4/0x1cf Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8100b92b>] ? system_call_fastpath+0x16/0x1b While other CPUs get stuck doing _spin_lock: Nov 17 23:02:04 cap-x6270-01 kernel: BUG: soft lockup - CPU#15 stuck for 61s! [fast:20702] Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: ... Nov 17 23:02:04 cap-x6270-01 kernel: CPU 15: Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: ... Nov 17 23:02:04 cap-x6270-01 kernel: Pid: 20702, comm: fast Not tainted 2.6.31.4 #1 SUN BLADE X6270 SERVER MODULE Nov 17 23:02:04 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dedff>] [<ffffffff812dedff>] _spin_lock+0x10/0x15 Nov 17 23:02:04 cap-x6270-01 kernel: RSP: 0018:ffff88090ecabe30 EFLAGS: 00000297 Nov 17 23:02:04 cap-x6270-01 kernel: RAX: 0000000000000504 RBX: 00000000ffffffea RCX: 0000000000000000 Nov 17 23:02:04 cap-x6270-01 kernel: RDX: 00000000000000a2 RSI: 0000000000000fa2 RDI: ffffc90019f82a20 Nov 17 23:02:04 cap-x6270-01 kernel: RBP: ffffffff8100c42e R08: ffff880905d89840 R09: 0000000000000000 Nov 17 23:02:04 cap-x6270-01 kernel: R10: 00007fffd8db4574 R11: ffff880905d89840 R12: ffffffff812509d4 Nov 17 23:02:04 cap-x6270-01 kernel: R13: ffff88094c0bd280 R14: 0000000000000246 R15: 0000000000000001 Nov 17 23:02:04 cap-x6270-01 kernel: FS: 00007f0deff696e0(0000) GS:ffffc90001e00000(0000) knlGS:0000000000000000 Nov 17 23:02:04 cap-x6270-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 17 23:02:04 cap-x6270-01 kernel: CR2: 0000000007ec2258 CR3: 00000009494b1000 CR4: 00000000000006e0 Nov 17 23:02:04 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 17 23:02:04 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Nov 17 23:02:04 cap-x6270-01 kernel: Call Trace: Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff81285989>] ? inet_csk_get_port+0x157/0x29e Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff812a1596>] ? inet_bind+0x10c/0x1b7 Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8124ef53>] ? sys_bind+0x6e/0x9e Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8106de14>] ? audit_syscall_entry+0x1a4/0x1cf Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8100b92b>] ? system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: soft lockup in inet_csk_get_port 2009-12-01 2:02 soft lockup in inet_csk_get_port kapil dakhane @ 2009-12-01 6:10 ` Eric Dumazet 2009-12-01 15:00 ` [PATCH] tcp: Fix a connect() race with timewait sockets Eric Dumazet 1 sibling, 0 replies; 31+ messages in thread From: Eric Dumazet @ 2009-12-01 6:10 UTC (permalink / raw) To: kapil dakhane; +Cc: netdev, netfilter kapil dakhane a écrit : > Hello, > > I am trying to analyze the capacity of linux network stack on x6270 > which has 16 Hyper threads on two 8-core Intel(r) Xeon(r) CPU. I see > that at around 150000 simultaneous connections, after around 1.6 gbps, > a cpu get stuck in an infinite loop in inet_csk_bind_conflict, then > other cpus get locked up doing spin_lock. Before the lockup cpu usage > was around 25%. It appears to be a bug, unless I am hitting some kind > of resource limit. It would be good if someone familiar with network > code would confirm this, or point me in the right direction. > > Important details are: > > I am using kernel version 2.6.31.4 recompiled with TPROXY related > options: NF_CONNTRACK, NETFILTER_TPROXY, NETFILTER_XT_MATCH_SOCKET, > NETFILTER_XT_TARGET_TPROXY. > > > I have enabled transparent capture and transparent forward using > iptables and ip rules. I have 10 instances of a single threaded user > space bits-forwarding-proxy (fast), each bound to different > hyper-threads (CPUs). Rest 6 CPUs are dedicated to interrupt > processing, each handling interrupts from six different network cards. > TCP flow from a 4-tuple always get handled by the same proxy process, > interrupt thread, and network card. In this way, network traffic is > segregated as much as possible to achieve high degree of parallelism. > > First /var/log/message entry shows CPU#7 is stuck in inet_csk_bind_conflict > > Nov 17 23:02:04 cap-x6270-01 kernel: BUG: soft lockup - CPU#7 stuck > for 61s! [fast:20701] > Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: xt_TPROXY > xt_MARK xt_socket nf_defrag_ipv4 nf_tproxy_core iptable_mangle ipv6 > autofs4 hidp rfcomm l2cap bluetooth rfkill sunrpc 8021q xt_state > nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables > cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video > output sbs sbshc battery acpi_memhotplug ac parport_pc lp parport > joydev sg rtc_cmos serio_raw rtc_core button igb rtc_lib niu i2c_i801 > i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log > dm_mod usb_storage ahci libata shpchp aacraid sd_mod scsi_mod ext3 jbd > uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] > Nov 17 23:02:04 cap-x6270-01 kernel: CPU 7: > Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: ... > Nov 17 23:02:04 cap-x6270-01 kernel: Pid: 20701, comm: fast Not > tainted 2.6.31.4 #1 SUN BLADE X6270 SERVER MODULE > Nov 17 23:02:04 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c53>] > [<ffffffff81285c53>] inet_csk_bind_conflict+0x99/0xa6 > Nov 17 23:02:04 cap-x6270-01 kernel: RSP: 0018:ffff88095ac7fe30 > EFLAGS: 00000202 > Nov 17 23:02:04 cap-x6270-01 kernel: RAX: 000000003c0ba8c0 RBX: > ffff88097b14bae0 RCX: ffff8804b3d57820 > Nov 17 23:02:04 cap-x6270-01 kernel: RDX: ffff8804b3d57800 RSI: > 0000000000000000 RDI: ffff880940421840 > Nov 17 23:02:04 cap-x6270-01 kernel: RBP: ffffffff8100c42e R08: > 000000002d01960c R09: ffff880909832ea0 > Nov 17 23:02:04 cap-x6270-01 kernel: R10: 00007fffc5d92700 R11: > ffff880940421840 R12: 0000000000000001 > Nov 17 23:02:04 cap-x6270-01 kernel: R13: ffff88097c5e9400 R14: > ffffffff810b7a92 R15: 0000000000000001 > Nov 17 23:02:04 cap-x6270-01 kernel: FS: 00007f20a08416e0(0000) > GS:ffffc90000e00000(0000) knlGS:0000000000000000 > Nov 17 23:02:04 cap-x6270-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Nov 17 23:02:04 cap-x6270-01 kernel: CR2: 00000000081df408 CR3: > 000000049ac01000 CR4: 00000000000006e0 > Nov 17 23:02:04 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Nov 17 23:02:04 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Nov 17 23:02:04 cap-x6270-01 kernel: Call Trace: > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff812859e4>] ? > inet_csk_get_port+0x1b2/0x29e > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff812a1596>] ? > inet_bind+0x10c/0x1b7 > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8124ef53>] ? sys_bind+0x6e/0x9e > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8106de14>] ? > audit_syscall_entry+0x1a4/0x1cf > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8100b92b>] ? > system_call_fastpath+0x16/0x1b > > While other CPUs get stuck doing _spin_lock: > > > Nov 17 23:02:04 cap-x6270-01 kernel: BUG: soft lockup - CPU#15 stuck > for 61s! [fast:20702] > Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: ... > Nov 17 23:02:04 cap-x6270-01 kernel: CPU 15: > Nov 17 23:02:04 cap-x6270-01 kernel: Modules linked in: ... > Nov 17 23:02:04 cap-x6270-01 kernel: Pid: 20702, comm: fast Not > tainted 2.6.31.4 #1 SUN BLADE X6270 SERVER MODULE > Nov 17 23:02:04 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dedff>] > [<ffffffff812dedff>] _spin_lock+0x10/0x15 > Nov 17 23:02:04 cap-x6270-01 kernel: RSP: 0018:ffff88090ecabe30 > EFLAGS: 00000297 > Nov 17 23:02:04 cap-x6270-01 kernel: RAX: 0000000000000504 RBX: > 00000000ffffffea RCX: 0000000000000000 > Nov 17 23:02:04 cap-x6270-01 kernel: RDX: 00000000000000a2 RSI: > 0000000000000fa2 RDI: ffffc90019f82a20 > Nov 17 23:02:04 cap-x6270-01 kernel: RBP: ffffffff8100c42e R08: > ffff880905d89840 R09: 0000000000000000 > Nov 17 23:02:04 cap-x6270-01 kernel: R10: 00007fffd8db4574 R11: > ffff880905d89840 R12: ffffffff812509d4 > Nov 17 23:02:04 cap-x6270-01 kernel: R13: ffff88094c0bd280 R14: > 0000000000000246 R15: 0000000000000001 > Nov 17 23:02:04 cap-x6270-01 kernel: FS: 00007f0deff696e0(0000) > GS:ffffc90001e00000(0000) knlGS:0000000000000000 > Nov 17 23:02:04 cap-x6270-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Nov 17 23:02:04 cap-x6270-01 kernel: CR2: 0000000007ec2258 CR3: > 00000009494b1000 CR4: 00000000000006e0 > Nov 17 23:02:04 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Nov 17 23:02:04 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Nov 17 23:02:04 cap-x6270-01 kernel: Call Trace: > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff81285989>] ? > inet_csk_get_port+0x157/0x29e > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff812a1596>] ? > inet_bind+0x10c/0x1b7 > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8124ef53>] ? sys_bind+0x6e/0x9e > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8106de14>] ? > audit_syscall_entry+0x1a4/0x1cf > Nov 17 23:02:04 cap-x6270-01 kernel: [<ffffffff8100b92b>] ? > system_call_fastpath+0x16/0x1b > -- Hmm, I did an one hour audit and could not yet find the bug. Is it a reproductible error, and any chance I can have a snapshot of "netstat -atn" before the lockup ? (maybe privately, since it might be too big for netdev) What is the 'fast' program, is it freely available somewhere ? Thanks ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-01 2:02 soft lockup in inet_csk_get_port kapil dakhane 2009-12-01 6:10 ` Eric Dumazet @ 2009-12-01 15:00 ` Eric Dumazet 2009-12-02 8:59 ` David Miller ` (3 more replies) 1 sibling, 4 replies; 31+ messages in thread From: Eric Dumazet @ 2009-12-01 15:00 UTC (permalink / raw) To: kapil dakhane; +Cc: netdev, netfilter, David S. Miller, Evgeniy Polyakov kapil dakhane a écrit : > Hello, > > I am trying to analyze the capacity of linux network stack on x6270 > which has 16 Hyper threads on two 8-core Intel(r) Xeon(r) CPU. I see > that at around 150000 simultaneous connections, after around 1.6 gbps, > a cpu get stuck in an infinite loop in inet_csk_bind_conflict, then > other cpus get locked up doing spin_lock. Before the lockup cpu usage > was around 25%. It appears to be a bug, unless I am hitting some kind > of resource limit. It would be good if someone familiar with network > code would confirm this, or point me in the right direction. > > Important details are: > > I am using kernel version 2.6.31.4 recompiled with TPROXY related > options: NF_CONNTRACK, NETFILTER_TPROXY, NETFILTER_XT_MATCH_SOCKET, > NETFILTER_XT_TARGET_TPROXY. > > > I have enabled transparent capture and transparent forward using > iptables and ip rules. I have 10 instances of a single threaded user > space bits-forwarding-proxy (fast), each bound to different > hyper-threads (CPUs). Rest 6 CPUs are dedicated to interrupt > processing, each handling interrupts from six different network cards. > TCP flow from a 4-tuple always get handled by the same proxy process, > interrupt thread, and network card. In this way, network traffic is > segregated as much as possible to achieve high degree of parallelism. > > First /var/log/message entry shows CPU#7 is stuck in inet_csk_bind_conflict > > Nov 17 23:02:04 cap-x6270-01 kernel: BUG: soft lockup - CPU#7 stuck > for 61s! [fast:20701] After some more audit and coffee, I finally found one subtle bug in our connect() code, that periodically triggers but never got tracked. Here is a patch cooked on top of current linux-2.6 git tree, it should probably apply on 2.6.31.6 as well... Thanks [PATCH] tcp: Fix a connect() race with timewait sockets When we find a timewait connection in __inet_hash_connect() and reuse it for a new connection request, we have a race window, releasing bind list lock and reacquiring it in __inet_twsk_kill() to remove timewait socket from list. Another thread might find the timewait socket we already chose, leading to list corruption and crashes. Fix is to remove timewait socket from bind list before releasing the lock. Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/inet_timewait_sock.h | 4 +++ net/ipv4/inet_hashtables.c | 4 +++ net/ipv4/inet_timewait_sock.c | 37 ++++++++++++++++++++--------- 3 files changed, 34 insertions(+), 11 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index f93ad90..e18e5df 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -206,6 +206,10 @@ extern void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, struct inet_hashinfo *hashinfo); +extern void inet_twsk_unhash(struct inet_timewait_sock *tw, + struct inet_hashinfo *hashinfo, + bool mustlock); + extern void inet_twsk_schedule(struct inet_timewait_sock *tw, struct inet_timewait_death_row *twdr, const int timeo, const int timewait_len); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 625cc5f..76d81e4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -488,6 +488,10 @@ ok: inet_sk(sk)->sport = htons(port); hash(sk); } + + if (tw) + inet_twsk_unhash(tw, hinfo, false); + spin_unlock(&head->lock); if (tw) { diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 13f0781..2d6d543 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -14,12 +14,34 @@ #include <net/inet_timewait_sock.h> #include <net/ip.h> + +void inet_twsk_unhash(struct inet_timewait_sock *tw, + struct inet_hashinfo *hashinfo, + bool mustlock) +{ + struct inet_bind_hashbucket *bhead; + struct inet_bind_bucket *tb = tw->tw_tb; + + if (!tb) + return; + + /* Disassociate with bind bucket. */ + bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), + tw->tw_num, + hashinfo->bhash_size)]; + if (mustlock) + spin_lock(&bhead->lock); + __hlist_del(&tw->tw_bind_node); + tw->tw_tb = NULL; + inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + if (mustlock) + spin_unlock(&bhead->lock); +} + /* Must be called with locally disabled BHs. */ static void __inet_twsk_kill(struct inet_timewait_sock *tw, struct inet_hashinfo *hashinfo) { - struct inet_bind_hashbucket *bhead; - struct inet_bind_bucket *tb; /* Unlink from established hashes. */ spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); @@ -32,15 +54,8 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, sk_nulls_node_init(&tw->tw_node); spin_unlock(lock); - /* Disassociate with bind bucket. */ - bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), tw->tw_num, - hashinfo->bhash_size)]; - spin_lock(&bhead->lock); - tb = tw->tw_tb; - __hlist_del(&tw->tw_bind_node); - tw->tw_tb = NULL; - inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); - spin_unlock(&bhead->lock); + inet_twsk_unhash(tw, hashinfo, true); + #ifdef SOCK_REFCNT_DEBUG if (atomic_read(&tw->tw_refcnt) != 1) { printk(KERN_DEBUG "%s timewait_sock %p refcnt=%d\n", ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-01 15:00 ` [PATCH] tcp: Fix a connect() race with timewait sockets Eric Dumazet @ 2009-12-02 8:59 ` David Miller 2009-12-02 9:23 ` Eric Dumazet 2009-12-02 16:05 ` [PATCH] tcp: Fix a connect() race with timewait sockets Ashwani Wason 2009-12-04 13:45 ` [PATCH 0/2] tcp: Fix connect() races " Eric Dumazet ` (2 subsequent siblings) 3 siblings, 2 replies; 31+ messages in thread From: David Miller @ 2009-12-02 8:59 UTC (permalink / raw) To: eric.dumazet; +Cc: kdakhane, netdev, netfilter, zbr From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 01 Dec 2009 16:00:39 +0100 > [PATCH] tcp: Fix a connect() race with timewait sockets This condition would only trigger if the timewait recycling sysctl is enabled. It is off by default, and I can't find any mention in this bug report that it has been turned on. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-02 8:59 ` David Miller @ 2009-12-02 9:23 ` Eric Dumazet 2009-12-02 10:33 ` Eric Dumazet 2009-12-02 16:05 ` [PATCH] tcp: Fix a connect() race with timewait sockets Ashwani Wason 1 sibling, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2009-12-02 9:23 UTC (permalink / raw) To: David Miller; +Cc: kdakhane, netdev, netfilter, zbr David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 01 Dec 2009 16:00:39 +0100 > >> [PATCH] tcp: Fix a connect() race with timewait sockets > > This condition would only trigger if the timewait recycling sysctl is > enabled. > > It is off by default, and I can't find any mention in this bug report > that it has been turned on. Very true. I know nothing about context of the reporter, he didnt answered to my queries. Yes, if sysctl_tw_reuse is set, bug can triggers without any extra conditions. But even if sysctl_tw_reuse is cleared, we might trigger the bug if local port is bound to a value. [User application called bind( port=XXX) before connect() ] __inet_hash_connect() can indeed call check_established(... twp = NULL) ... head = &hinfo->bhash[inet_bhashfn(net, snum, hinfo->bhash_size)]; tb = inet_csk(sk)->icsk_bind_hash; spin_lock_bh(&head->lock); if (sk_head(&tb->owners) == sk && !sk->sk_bind_node.next) { hash(sk); spin_unlock_bh(&head->lock); return 0; } else { spin_unlock(&head->lock); /* No definite answer... Walk to established hash table */ ret = check_established(death_row, sk, snum, NULL); <<< HERE >>> out: local_bh_enable(); return ret; } In this case, we call tcp_twsk_unique() with twp = NULL, this bypass the sysctl_tcp_tw_reuse test. int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp) { const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw); struct tcp_sock *tp = tcp_sk(sk); /* With PAWS, it is safe from the viewpoint of data integrity. Even without PAWS it is safe provided sequence spaces do not overlap i.e. at data rates <= 80Mbit/sec. Actually, the idea is close to VJ's one, only timestamp cache is held not per host, but per port pair and TW bucket is used as state holder. If TW bucket has been already destroyed we fall back to VJ's scheme and use initial timestamp retrieved from peer table. */ if (tcptw->tw_ts_recent_stamp && <<HERE>> (twp == NULL || (sysctl_tcp_tw_reuse && get_seconds() - tcptw->tw_ts_recent_stamp > 1))) { ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-02 9:23 ` Eric Dumazet @ 2009-12-02 10:33 ` Eric Dumazet 2009-12-02 11:32 ` Evgeniy Polyakov 2009-12-02 15:08 ` [PATCH net-next-2.6] tcp: connect() race with timewait reuse Eric Dumazet 0 siblings, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2009-12-02 10:33 UTC (permalink / raw) To: David Miller; +Cc: kdakhane, netdev, netfilter, zbr Eric Dumazet a écrit : > > But even if sysctl_tw_reuse is cleared, we might trigger the bug if > local port is bound to a value. Oh well, that's more subtle than that. __inet_check_established() is called not only with bh disabled, but also with a lock on bind list if twp != NULL. However, if twp is NULL, lock is not held by caller. [ Thats the final ret = check_established(death_row, sk, snum, NULL); in __inet_hash_connect()] So triggering this bug with tw_reuse clear is tricky : You need several threads, using sockets with REUSEADDR set, and bind() to same address/port before connect() to same target. We need another patch to correct this. I wonder if always hold lock before calling check_established() would be cleaner. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-02 10:33 ` Eric Dumazet @ 2009-12-02 11:32 ` Evgeniy Polyakov 2009-12-02 19:18 ` kapil dakhane 2009-12-02 15:08 ` [PATCH net-next-2.6] tcp: connect() race with timewait reuse Eric Dumazet 1 sibling, 1 reply; 31+ messages in thread From: Evgeniy Polyakov @ 2009-12-02 11:32 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, kdakhane, netdev, netfilter On Wed, Dec 02, 2009 at 11:33:55AM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: > You need several threads, using sockets with REUSEADDR set, > and bind() to same address/port before connect() to same target. > > We need another patch to correct this. > > I wonder if always hold lock before calling check_established() > would be cleaner. Isnt this a too big overhead? -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-02 11:32 ` Evgeniy Polyakov @ 2009-12-02 19:18 ` kapil dakhane 2009-12-03 2:43 ` kapil dakhane 0 siblings, 1 reply; 31+ messages in thread From: kapil dakhane @ 2009-12-02 19:18 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Eric Dumazet, David Miller, netdev, netfilter Here's the list of tuning parameters used: net.ipv4.tcp_keepalive_intvl = 5 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_time = 180 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.tcp_max_tw_buckets = 360000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_syncookies = 0 net.core.netdev_max_backlog = 5000 Kapil On Wed, Dec 2, 2009 at 3:32 AM, Evgeniy Polyakov <zbr@ioremap.net> wrote: > On Wed, Dec 02, 2009 at 11:33:55AM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: >> You need several threads, using sockets with REUSEADDR set, >> and bind() to same address/port before connect() to same target. >> >> We need another patch to correct this. >> >> I wonder if always hold lock before calling check_established() >> would be cleaner. > > Isnt this a too big overhead? > > -- > Evgeniy Polyakov > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-02 19:18 ` kapil dakhane @ 2009-12-03 2:43 ` kapil dakhane 2009-12-03 10:49 ` [PATCH] tcp: fix a timewait refcnt race Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: kapil dakhane @ 2009-12-03 2:43 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, netfilter, Evgeniy Polyakov Eric, I ran the test again after patching my kernel with your changes. Unfortunately, the result appear to be the same. Here's what I get from /var/log/messages: Dec 2 14:42:17 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 ---same message repeats every minute--- Dec 2 14:55:25 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 14:56:31 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 14:57:37 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 14:58:42 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 14:59:48 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:00:54 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:01:59 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:03:05 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:04:11 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:05:16 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:06:22 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c24>] [<ffffffff81285c24>] inet_csk_bind_conflict+0x1e/0xa6 Dec 2 15:07:28 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:08:33 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c45>] [<ffffffff81285c45>] inet_csk_bind_conflict+0x3f/0xa6 Dec 2 15:09:39 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:10:45 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:11:50 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:12:56 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:14:02 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c65>] [<ffffffff81285c65>] inet_csk_bind_conflict+0x5f/0xa6 Dec 2 15:15:07 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:16:13 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:17:19 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:18:25 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:19:30 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:20:36 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:21:42 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:22:47 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:23:53 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:24:59 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:26:04 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:27:10 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 ---same message repeats every minute--- Dec 2 15:43:35 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:44:41 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:45:47 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:46:52 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:47:58 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:49:04 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:50:09 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:51:15 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:52:21 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c9f>] [<ffffffff81285c9f>] inet_csk_bind_conflict+0x99/0xa6 Dec 2 15:53:26 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 15:54:32 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Here's detailed stack from first minute: Dec 2 14:42:17 cap-x6270-01 kernel: BUG: soft lockup - CPU#14 stuck for 61s! [fast:14591] Dec 2 14:42:17 cap-x6270-01 kernel: Modules linked in: xt_TPROXY xt_MARK xt_socket nf_defrag_ipv4 nf_tproxy_core iptable_mangle ipv6 autofs4 hidp rfcomm l2cap bluetooth rfkill sunrpc 8021q xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video output sbs sbshc battery acpi_memhotplug ac parport_pc lp parport joydev sg serio_raw rtc_cmos button rtc_core rtc_lib igb niu i2c_i801 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ahci libata shpchp aacraid sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Dec 2 14:42:17 cap-x6270-01 kernel: CPU 14: Dec 2 14:42:17 cap-x6270-01 kernel: Modules linked in: .... Dec 2 14:42:17 cap-x6270-01 kernel: Pid: 14591, comm: fast Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:17 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 14:42:17 cap-x6270-01 kernel: RSP: 0018:ffff8804e1471e30 EFLAGS: 00000282 Dec 2 14:42:17 cap-x6270-01 kernel: RAX: ffffffff815c4101 RBX: ffff8804ea477da0 RCX: ffff8808c54a5820 Dec 2 14:42:17 cap-x6270-01 kernel: RDX: ffff8809041922c0 RSI: 0000000000000000 RDI: ffff8809071340c0 Dec 2 14:42:17 cap-x6270-01 kernel: RBP: ffffffff8100c42e R08: 00000000a900220e R09: ffff8804de0bc0a0 Dec 2 14:42:17 cap-x6270-01 kernel: R10: 00007fff281a1501 R11: ffff8809071340c0 R12: ffffffff812509d4 Dec 2 14:42:17 cap-x6270-01 kernel: R13: ffff88097b9a38c0 R14: 0000000000000246 R15: 0000000000000001 Dec 2 14:42:17 cap-x6270-01 kernel: FS: 00007f2c0e2006e0(0000) GS:ffffc90001c00000(0000) knlGS:0000000000000000 Dec 2 14:42:17 cap-x6270-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 2 14:42:17 cap-x6270-01 kernel: CR2: 0000000020fe7000 CR3: 000000097b1b1000 CR4: 00000000000006e0 Dec 2 14:42:17 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:17 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:17 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:17 cap-x6270-01 kernel: [<ffffffff81285a30>] ? inet_csk_get_port+0x1b2/0x29e Dec 2 14:42:17 cap-x6270-01 kernel: [<ffffffff812a15e2>] ? inet_bind+0x10c/0x1b7 Dec 2 14:42:17 cap-x6270-01 kernel: [<ffffffff8124ef53>] ? sys_bind+0x6e/0x9e Dec 2 14:42:17 cap-x6270-01 kernel: [<ffffffff8106de14>] ? audit_syscall_entry+0x1a4/0x1cf Dec 2 14:42:17 cap-x6270-01 kernel: [<ffffffff8100b92b>] ? system_call_fastpath+0x16/0x1b Dec 2 14:42:26 cap-x6270-01 kernel: BUG: soft lockup - CPU#4 stuck for 61s! [swapper:0] Dec 2 14:42:26 cap-x6270-01 kernel: Modules linked in: ... Dec 2 14:42:26 cap-x6270-01 kernel: CPU 4: Dec 2 14:42:26 cap-x6270-01 kernel: Modules linked in: ... Dec 2 14:42:26 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:26 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee49>] [<ffffffff812dee49>] _spin_lock+0xa/0x15 Dec 2 14:42:26 cap-x6270-01 kernel: RSP: 0018:ffffc90000803c98 EFLAGS: 00000297 Dec 2 14:42:26 cap-x6270-01 kernel: RAX: 000000000000e5e4 RBX: ffff8808d94f1bc0 RCX: ffff8808d94f1bc0 Dec 2 14:42:26 cap-x6270-01 kernel: RDX: ffffffff81b4b340 RSI: ffff8808c8570e80 RDI: ffffc90019f82a20 Dec 2 14:42:26 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: 8000000000000000 R09: 0400000000000000 Dec 2 14:42:26 cap-x6270-01 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000803c10 Dec 2 14:42:26 cap-x6270-01 kernel: R13: ffffc90019f82a20 R14: ffffc90000803c10 R15: ffffffff8101da86 Dec 2 14:42:26 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90000800000(0000) knlGS:0000000000000000 Dec 2 14:42:26 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:42:26 cap-x6270-01 kernel: CR2: 00007f436db8f000 CR3: 0000000001001000 CR4: 00000000000006e0 Dec 2 14:42:26 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:26 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:26 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:26 cap-x6270-01 kernel: <IRQ> [<ffffffff81284d1d>] ? __inet_twsk_hashdance+0x54/0x127 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff812979d0>] ? tcp_time_wait+0x13c/0x1c0 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8128bbc5>] ? tcp_fin+0x7e/0x178 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8128c782>] ? tcp_data_queue+0x2b4/0xaf9 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8128fce1>] ? tcp_rcv_state_process+0x8a7/0x909 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff812951cc>] ? tcp_v4_do_rcv+0x181/0x1d5 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff81296be2>] ? tcp_v4_rcv+0x4ac/0x706 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8127c950>] ? ip_rcv_finish+0x0/0x366 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8127cdd6>] ? ip_local_deliver_finish+0x120/0x1e3 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8127cc9c>] ? ip_rcv_finish+0x34c/0x366 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8127d20e>] ? ip_rcv+0x289/0x2d0 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8125e156>] ? process_backlog+0x6f/0x98 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8125df97>] ? net_rx_action+0xa9/0x17d Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100d472>] ? do_IRQ+0xa0/0xb6 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100c2d3>] ? ret_from_intr+0x0/0xa Dec 2 14:42:26 cap-x6270-01 kernel: <EOI> [<ffffffff8100c42e>] ? apic_timer_interrupt+0xe/0x20 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff811b3292>] ? acpi_idle_enter_simple+0x120/0x14e Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff811b3288>] ? acpi_idle_enter_simple+0x116/0x14e Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e Dec 2 14:42:26 cap-x6270-01 kernel: BUG: soft lockup - CPU#12 stuck for 61s! [swapper:0] ... Dec 2 14:42:26 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:26 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee4f>] [<ffffffff812dee4f>] _spin_lock+0x10/0x15 Dec 2 14:42:26 cap-x6270-01 kernel: RSP: 0018:ffffc90001803ec8 EFLAGS: 00000297 Dec 2 14:42:26 cap-x6270-01 kernel: RAX: 0000000000004a49 RBX: ffff8808c8570e80 RCX: ffff8808c8571168 Dec 2 14:42:26 cap-x6270-01 kernel: RDX: ffffc90001803f00 RSI: 0000000000000100 RDI: ffff8808c8570ec8 Dec 2 14:42:26 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: 0000000000000010 R09: 0000000000000000 Dec 2 14:42:26 cap-x6270-01 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90001803e40 Dec 2 14:42:26 cap-x6270-01 kernel: R13: ffff88097cdf0000 R14: 0000000000000082 R15: ffffffff8101da86 Dec 2 14:42:26 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90001800000(0000) knlGS:0000000000000000 Dec 2 14:42:26 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:42:26 cap-x6270-01 kernel: CR2: 0000000021601ff8 CR3: 0000000001001000 CR4: 00000000000006e0 Dec 2 14:42:26 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:26 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:26 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:26 cap-x6270-01 kernel: <IRQ> [<ffffffff81293b71>] ? tcp_write_timer+0x16/0x5c9 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff81293b5b>] ? tcp_write_timer+0x0/0x5c9 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff81048575>] ? run_timer_softirq+0x131/0x197 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8101da8b>] ? smp_apic_timer_interrupt+0x88/0x95 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100c433>] ? apic_timer_interrupt+0x13/0x20 Dec 2 14:42:26 cap-x6270-01 kernel: <EOI> [<ffffffff8100c42e>] ? apic_timer_interrupt+0xe/0x20 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff811b3147>] ? acpi_idle_enter_bm+0x249/0x274 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff811b313d>] ? acpi_idle_enter_bm+0x23f/0x274 Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:42:26 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e Dec 2 14:42:28 cap-x6270-01 kernel: BUG: soft lockup - CPU#11 stuck for 61s! [swapper:0] ... Dec 2 14:42:28 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:28 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee4f>] [<ffffffff812dee4f>] _spin_lock+0x10/0x15 Dec 2 14:42:28 cap-x6270-01 kernel: RSP: 0018:ffffc90001603eb8 EFLAGS: 00000297 Dec 2 14:42:28 cap-x6270-01 kernel: RAX: 0000000000004746 RBX: ffff8804a798ad00 RCX: ffff8804a798afe8 Dec 2 14:42:28 cap-x6270-01 kernel: RDX: ffffc90001603ef0 RSI: ffff8804f2858a40 RDI: ffff8804a798ad48 Dec 2 14:42:28 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: ffff88097cdea000 R09: 0000000000000003 Dec 2 14:42:28 cap-x6270-01 kernel: R10: ffffffff811a52f0 R11: 0000000000000000 R12: ffffc90001603e30 Dec 2 14:42:28 cap-x6270-01 kernel: R13: ffff8804fcdd4000 R14: ffffc90001603e30 R15: ffffffff8101da86 Dec 2 14:42:28 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90001600000(0000) knlGS:0000000000000000 Dec 2 14:42:28 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:42:28 cap-x6270-01 kernel: CR2: 0000000007225000 CR3: 00000004e318d000 CR4: 00000000000006e0 Dec 2 14:42:28 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:28 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:28 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:28 cap-x6270-01 kernel: <IRQ> [<ffffffff81293b71>] ? tcp_write_timer+0x16/0x5c9 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff81293b5b>] ? tcp_write_timer+0x0/0x5c9 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff81048575>] ? run_timer_softirq+0x131/0x197 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100d472>] ? do_IRQ+0xa0/0xb6 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100c2d3>] ? ret_from_intr+0x0/0xa Dec 2 14:42:28 cap-x6270-01 kernel: <EOI> [<ffffffff811a52f0>] ? acpi_hw_register_read+0x52/0xe5 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff811b3292>] ? acpi_idle_enter_simple+0x120/0x14e Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff811b3288>] ? acpi_idle_enter_simple+0x116/0x14e Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff811b2fd3>] ? acpi_idle_enter_bm+0xd5/0x274 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e Dec 2 14:42:28 cap-x6270-01 kernel: BUG: soft lockup - CPU#15 stuck for 61s! [swapper:0] ... Dec 2 14:42:28 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:28 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee51>] [<ffffffff812dee51>] _spin_lock+0x12/0x15 Dec 2 14:42:28 cap-x6270-01 kernel: RSP: 0018:ffffc90001e03ce8 EFLAGS: 00000293 Dec 2 14:42:28 cap-x6270-01 kernel: RAX: 000000000000e6e4 RBX: ffff8808df9e9b40 RCX: ffff8808df9e9b40 Dec 2 14:42:28 cap-x6270-01 kernel: RDX: ffffffff81b4b340 RSI: ffff8804a798ad00 RDI: ffffc90019f82a20 Dec 2 14:42:28 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: 000000001a8082c7 R09: 0000000003069b61 Dec 2 14:42:28 cap-x6270-01 kernel: R10: 0000001400c21ca1 R11: ffff8804681763c0 R12: ffffc90001e03c60 Dec 2 14:42:28 cap-x6270-01 kernel: R13: ffffc90019f82a20 R14: ffffc90001e03c60 R15: ffffffff8101da86 Dec 2 14:42:28 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90001e00000(0000) knlGS:0000000000000000 Dec 2 14:42:28 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:42:28 cap-x6270-01 kernel: CR2: 0000000007438398 CR3: 00000004f45a5000 CR4: 00000000000006e0 Dec 2 14:42:28 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:28 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:28 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:28 cap-x6270-01 kernel: <IRQ> [<ffffffff81284d1d>] ? __inet_twsk_hashdance+0x54/0x127 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff812979d0>] ? tcp_time_wait+0x13c/0x1c0 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8128fc1a>] ? tcp_rcv_state_process+0x7e0/0x909 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff812951cc>] ? tcp_v4_do_rcv+0x181/0x1d5 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff81296be2>] ? tcp_v4_rcv+0x4ac/0x706 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8127c950>] ? ip_rcv_finish+0x0/0x366 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8127cdd6>] ? ip_local_deliver_finish+0x120/0x1e3 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8127cc9c>] ? ip_rcv_finish+0x34c/0x366 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8127d20e>] ? ip_rcv+0x289/0x2d0 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8125e156>] ? process_backlog+0x6f/0x98 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8125df97>] ? net_rx_action+0xa9/0x17d Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100d472>] ? do_IRQ+0xa0/0xb6 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100c2d3>] ? ret_from_intr+0x0/0xa Dec 2 14:42:28 cap-x6270-01 kernel: <EOI> [<ffffffff811a52f0>] ? acpi_hw_register_read+0x52/0xe5 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff812def52>] ? _spin_unlock_irqrestore+0x4/0x5 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff811b32ab>] ? acpi_idle_enter_simple+0x139/0x14e Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff811b2fd3>] ? acpi_idle_enter_bm+0xd5/0x274 Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:42:28 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e Dec 2 14:42:31 cap-x6270-01 kernel: BUG: soft lockup - CPU#13 stuck for 61s! [fast:14590] ... Dec 2 14:42:31 cap-x6270-01 kernel: Pid: 14590, comm: fast Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:31 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee4f>] [<ffffffff812dee4f>] _spin_lock+0x10/0x15 Dec 2 14:42:31 cap-x6270-01 kernel: RSP: 0018:ffff8804dd95be30 EFLAGS: 00000293 Dec 2 14:42:31 cap-x6270-01 kernel: RAX: 000000000000e7e4 RBX: 00000000ffffffea RCX: 0000000000000000 Dec 2 14:42:31 cap-x6270-01 kernel: RDX: 00000000000000a2 RSI: 0000000000000fa2 RDI: ffffc90019f82a20 Dec 2 14:42:31 cap-x6270-01 kernel: RBP: ffffffff8100c42e R08: ffff8809311a6840 R09: 000000004b16ed16 Dec 2 14:42:31 cap-x6270-01 kernel: R10: 00007ffff6230734 R11: ffff8809311a6840 R12: ffffffff812509d4 Dec 2 14:42:31 cap-x6270-01 kernel: R13: ffff88097404b2c0 R14: 0000000000000246 R15: 0000000000000000 Dec 2 14:42:31 cap-x6270-01 kernel: FS: 00007f0b9511d6e0(0000) GS:ffffc90001a00000(0000) knlGS:0000000000000000 Dec 2 14:42:31 cap-x6270-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 2 14:42:31 cap-x6270-01 kernel: CR2: 00007f11263c7000 CR3: 00000004de426000 CR4: 00000000000006e0 Dec 2 14:42:31 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:31 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:31 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:31 cap-x6270-01 kernel: [<ffffffff812859d5>] ? inet_csk_get_port+0x157/0x29e Dec 2 14:42:31 cap-x6270-01 kernel: [<ffffffff812a15e2>] ? inet_bind+0x10c/0x1b7 Dec 2 14:42:31 cap-x6270-01 kernel: [<ffffffff8124ef53>] ? sys_bind+0x6e/0x9e Dec 2 14:42:31 cap-x6270-01 kernel: [<ffffffff8106de14>] ? audit_syscall_entry+0x1a4/0x1cf Dec 2 14:42:31 cap-x6270-01 kernel: [<ffffffff8100b92b>] ? system_call_fastpath+0x16/0x1b ... Dec 2 14:42:45 cap-x6270-01 kernel: BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0] ... Dec 2 14:42:45 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:45 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee4f>] [<ffffffff812dee4f>] _spin_lock+0x10/0x15 Dec 2 14:42:45 cap-x6270-01 kernel: RSP: 0018:ffffc90000a03ce8 EFLAGS: 00000297 Dec 2 14:42:45 cap-x6270-01 kernel: RAX: 000000000000e8e4 RBX: ffff8808be888200 RCX: ffff8808be888200 Dec 2 14:42:45 cap-x6270-01 kernel: RDX: ffffffff81b4b340 RSI: ffff88092090ec40 RDI: ffffc90019f82a20 Dec 2 14:42:45 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: 000000006d4b0f59 R09: 00000000a15a9061 Dec 2 14:42:45 cap-x6270-01 kernel: R10: 0000001400c25cd8 R11: ffff880460516d80 R12: ffffc90000a03c60 Dec 2 14:42:45 cap-x6270-01 kernel: R13: ffffc90019f82a20 R14: ffffc90000a03c60 R15: ffffffff8101da86 Dec 2 14:42:45 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90000a00000(0000) knlGS:0000000000000000 Dec 2 14:42:45 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:42:45 cap-x6270-01 kernel: CR2: 00007f436db8f000 CR3: 0000000001001000 CR4: 00000000000006e0 Dec 2 14:42:45 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:45 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:45 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:45 cap-x6270-01 kernel: <IRQ> [<ffffffff81284d1d>] ? __inet_twsk_hashdance+0x54/0x127 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff812979d0>] ? tcp_time_wait+0x13c/0x1c0 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8128fc1a>] ? tcp_rcv_state_process+0x7e0/0x909 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff812951cc>] ? tcp_v4_do_rcv+0x181/0x1d5 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff81296be2>] ? tcp_v4_rcv+0x4ac/0x706 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8127c950>] ? ip_rcv_finish+0x0/0x366 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8127cdd6>] ? ip_local_deliver_finish+0x120/0x1e3 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8127cc9c>] ? ip_rcv_finish+0x34c/0x366 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8127d20e>] ? ip_rcv+0x289/0x2d0 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8125e156>] ? process_backlog+0x6f/0x98 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8125df97>] ? net_rx_action+0xa9/0x17d Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100d472>] ? do_IRQ+0xa0/0xb6 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100c2d3>] ? ret_from_intr+0x0/0xa Dec 2 14:42:45 cap-x6270-01 kernel: <EOI> [<ffffffff8100c2ce>] ? common_interrupt+0xe/0x13 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff811b3147>] ? acpi_idle_enter_bm+0x249/0x274 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff811b313d>] ? acpi_idle_enter_bm+0x23f/0x274 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e ... Dec 2 14:42:45 cap-x6270-01 kernel: BUG: soft lockup - CPU#7 stuck for 61s! [swapper:0] ... Dec 2 14:42:45 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:42:45 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee4f>] [<ffffffff812dee4f>] _spin_lock+0x10/0x15 Dec 2 14:42:45 cap-x6270-01 kernel: RSP: 0018:ffffc90000e03ec8 EFLAGS: 00000297 Dec 2 14:42:45 cap-x6270-01 kernel: RAX: 0000000000005251 RBX: ffff88092090ec40 RCX: ffff88092090ef28 Dec 2 14:42:45 cap-x6270-01 kernel: RDX: ffffc90000e03f00 RSI: 000000001a35d954 RDI: ffff88092090ec88 Dec 2 14:42:45 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: 0000000000000010 R09: 0000000000000000 Dec 2 14:42:45 cap-x6270-01 kernel: R10: ffffffff811a52f0 R11: 0000000000000000 R12: ffffc90000e03e40 Dec 2 14:42:45 cap-x6270-01 kernel: R13: ffff88097cd9c000 R14: ffffc90000e03e40 R15: ffffffff8101da86 Dec 2 14:42:45 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90000e00000(0000) knlGS:0000000000000000 Dec 2 14:42:45 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:42:45 cap-x6270-01 kernel: CR2: 000000000c0b25f8 CR3: 0000000973c37000 CR4: 00000000000006e0 Dec 2 14:42:45 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:42:45 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:42:45 cap-x6270-01 kernel: Call Trace: Dec 2 14:42:45 cap-x6270-01 kernel: <IRQ> [<ffffffff81054c63>] ? hrtimer_run_queues+0xed/0x193 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff81293b71>] ? tcp_write_timer+0x16/0x5c9 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff81293b5b>] ? tcp_write_timer+0x0/0x5c9 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff81048575>] ? run_timer_softirq+0x131/0x197 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8101da8b>] ? smp_apic_timer_interrupt+0x88/0x95 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100c433>] ? apic_timer_interrupt+0x13/0x20 Dec 2 14:42:45 cap-x6270-01 kernel: <EOI> [<ffffffff811a52f0>] ? acpi_hw_register_read+0x52/0xe5 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff811b3292>] ? acpi_idle_enter_simple+0x120/0x14e Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff811b3288>] ? acpi_idle_enter_simple+0x116/0x14e Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff811b2fd3>] ? acpi_idle_enter_bm+0xd5/0x274 Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:42:45 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e ... Dec 2 14:43:01 cap-x6270-01 kernel: BUG: soft lockup - CPU#1 stuck for 61s! [swapper:0] ... Dec 2 14:43:01 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:43:01 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee4f>] [<ffffffff812dee4f>] _spin_lock+0x10/0x15 Dec 2 14:43:01 cap-x6270-01 kernel: RSP: 0018:ffffc90000203c98 EFLAGS: 00000293 Dec 2 14:43:01 cap-x6270-01 kernel: RAX: 000000000000e9e4 RBX: ffff88045ad1a3c0 RCX: ffff88045ad1a3c0 Dec 2 14:43:01 cap-x6270-01 kernel: RDX: ffffffff81b4b340 RSI: ffff880496117800 RDI: ffffc90019f82a20 Dec 2 14:43:01 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: 8000000000000000 R09: 0400000000000000 Dec 2 14:43:01 cap-x6270-01 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000203c10 Dec 2 14:43:01 cap-x6270-01 kernel: R13: ffffc90019f82a20 R14: ffffc90000203c10 R15: ffffffff8101da86 Dec 2 14:43:01 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90000200000(0000) knlGS:0000000000000000 Dec 2 14:43:01 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:43:01 cap-x6270-01 kernel: CR2: 00007f3ff426a000 CR3: 0000000001001000 CR4: 00000000000006e0 Dec 2 14:43:01 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:43:01 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:43:01 cap-x6270-01 kernel: Call Trace: Dec 2 14:43:01 cap-x6270-01 kernel: <IRQ> [<ffffffff81284d1d>] ? __inet_twsk_hashdance+0x54/0x127 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff812979d0>] ? tcp_time_wait+0x13c/0x1c0 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8128bbc5>] ? tcp_fin+0x7e/0x178 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8128c782>] ? tcp_data_queue+0x2b4/0xaf9 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8128fce1>] ? tcp_rcv_state_process+0x8a7/0x909 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff812951cc>] ? tcp_v4_do_rcv+0x181/0x1d5 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff81296be2>] ? tcp_v4_rcv+0x4ac/0x706 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8127c950>] ? ip_rcv_finish+0x0/0x366 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8127cdd6>] ? ip_local_deliver_finish+0x120/0x1e3 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8127cc9c>] ? ip_rcv_finish+0x34c/0x366 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8127d20e>] ? ip_rcv+0x289/0x2d0 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8125e156>] ? process_backlog+0x6f/0x98 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8125df97>] ? net_rx_action+0xa9/0x17d Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8100d472>] ? do_IRQ+0xa0/0xb6 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8100c2d3>] ? ret_from_intr+0x0/0xa Dec 2 14:43:01 cap-x6270-01 kernel: <EOI> [<ffffffff8100c2ce>] ? common_interrupt+0xe/0x13 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff811b3147>] ? acpi_idle_enter_bm+0x249/0x274 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff811b313d>] ? acpi_idle_enter_bm+0x23f/0x274 Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:43:01 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e ... Dec 2 14:43:04 cap-x6270-01 kernel: BUG: soft lockup - CPU#9 stuck for 61s! [swapper:0] ... Dec 2 14:43:04 cap-x6270-01 kernel: Pid: 0, comm: swapper Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:43:04 cap-x6270-01 kernel: RIP: 0010:[<ffffffff812dee4f>] [<ffffffff812dee4f>] _spin_lock+0x10/0x15 Dec 2 14:43:04 cap-x6270-01 kernel: RSP: 0018:ffffc90001203ec8 EFLAGS: 00000297 Dec 2 14:43:04 cap-x6270-01 kernel: RAX: 0000000000008382 RBX: ffff880496117800 RCX: ffff880496117ae8 Dec 2 14:43:04 cap-x6270-01 kernel: RDX: ffff88048a8fe4e8 RSI: 0000000039b18490 RDI: ffff880496117848 Dec 2 14:43:04 cap-x6270-01 kernel: RBP: ffffffff8100c433 R08: 0000000000000010 R09: 0000000000000000 Dec 2 14:43:04 cap-x6270-01 kernel: R10: 0000000000000009 R11: 0000000000000000 R12: ffffc90001203e40 Dec 2 14:43:04 cap-x6270-01 kernel: R13: ffff8804fcd9c000 R14: ffffc90001203e40 R15: ffffffff8101da86 Dec 2 14:43:04 cap-x6270-01 kernel: FS: 0000000000000000(0000) GS:ffffc90001200000(0000) knlGS:0000000000000000 Dec 2 14:43:04 cap-x6270-01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 14:43:04 cap-x6270-01 kernel: CR2: 0000000003dd0000 CR3: 0000000001001000 CR4: 00000000000006e0 Dec 2 14:43:04 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:43:04 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:43:04 cap-x6270-01 kernel: Call Trace: Dec 2 14:43:04 cap-x6270-01 kernel: <IRQ> [<ffffffff81054c63>] ? hrtimer_run_queues+0xed/0x193 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff81293b71>] ? tcp_write_timer+0x16/0x5c9 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff81293b5b>] ? tcp_write_timer+0x0/0x5c9 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff81048575>] ? run_timer_softirq+0x131/0x197 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff810444ca>] ? __do_softirq+0xc5/0x183 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff8100ca5c>] ? call_softirq+0x1c/0x28 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff8100ddf2>] ? do_softirq+0x2c/0x68 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff8101da8b>] ? smp_apic_timer_interrupt+0x88/0x95 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff8100c433>] ? apic_timer_interrupt+0x13/0x20 Dec 2 14:43:04 cap-x6270-01 kernel: <EOI> [<ffffffff8100c42e>] ? apic_timer_interrupt+0xe/0x20 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff811b3147>] ? acpi_idle_enter_bm+0x249/0x274 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff811b313d>] ? acpi_idle_enter_bm+0x23f/0x274 Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff8123bbb4>] ? cpuidle_idle_call+0x7f/0xbb Dec 2 14:43:04 cap-x6270-01 kernel: [<ffffffff8100aa1d>] ? cpu_idle+0x40/0x5e ... Dec 2 14:43:23 cap-x6270-01 kernel: BUG: soft lockup - CPU#14 stuck for 61s! [fast:14591] ... Dec 2 14:43:23 cap-x6270-01 kernel: Pid: 14591, comm: fast Tainted: G W 2.6.31.4 #3 SUN BLADE X6270 SERVER MODULE Dec 2 14:43:23 cap-x6270-01 kernel: RIP: 0010:[<ffffffff81285c94>] [<ffffffff81285c94>] inet_csk_bind_conflict+0x8e/0xa6 Dec 2 14:43:23 cap-x6270-01 kernel: RSP: 0018:ffff8804e1471e30 EFLAGS: 00000286 Dec 2 14:43:23 cap-x6270-01 kernel: RAX: ffffffff815c4101 RBX: ffff8804ea477da0 RCX: ffff8804e1028ea0 Dec 2 14:43:23 cap-x6270-01 kernel: RDX: ffff88048a4842c0 RSI: 0000000000000000 RDI: ffff8809071340c0 Dec 2 14:43:23 cap-x6270-01 kernel: RBP: ffffffff8100c42e R08: 00000000a900220e R09: ffff8804d2cbcca0 Dec 2 14:43:23 cap-x6270-01 kernel: R10: 00007fff281a1501 R11: ffff8809071340c0 R12: ffffffff812509d4 Dec 2 14:43:23 cap-x6270-01 kernel: R13: ffff88097b9a38c0 R14: 0000000000000246 R15: 0000000000000001 Dec 2 14:43:23 cap-x6270-01 kernel: FS: 00007f2c0e2006e0(0000) GS:ffffc90001c00000(0000) knlGS:0000000000000000 Dec 2 14:43:23 cap-x6270-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 2 14:43:23 cap-x6270-01 kernel: CR2: 0000000020fe7000 CR3: 000000097b1b1000 CR4: 00000000000006e0 Dec 2 14:43:23 cap-x6270-01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 14:43:23 cap-x6270-01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 14:43:23 cap-x6270-01 kernel: Call Trace: Dec 2 14:43:23 cap-x6270-01 kernel: [<ffffffff81285a30>] ? inet_csk_get_port+0x1b2/0x29e Dec 2 14:43:23 cap-x6270-01 kernel: [<ffffffff812a15e2>] ? inet_bind+0x10c/0x1b7 Dec 2 14:43:23 cap-x6270-01 kernel: [<ffffffff8124ef53>] ? sys_bind+0x6e/0x9e Dec 2 14:43:23 cap-x6270-01 kernel: [<ffffffff8106de14>] ? audit_syscall_entry+0x1a4/0x1cf Dec 2 14:43:23 cap-x6270-01 kernel: [<ffffffff8100b92b>] ? system_call_fastpath+0x16/0x1b Either there are more places for race condition, or the fix didn't address the issue effectively. Regards, Kapil ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] tcp: fix a timewait refcnt race 2009-12-03 2:43 ` kapil dakhane @ 2009-12-03 10:49 ` Eric Dumazet 2009-12-04 0:19 ` David Miller 2009-12-04 3:20 ` kapil dakhane 0 siblings, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2009-12-03 10:49 UTC (permalink / raw) To: kapil dakhane; +Cc: David Miller, netdev, netfilter, Evgeniy Polyakov kapil dakhane a écrit : > Either there are more places for race condition, or the fix didn't > address the issue effectively. Thanks a lot for all these details ! It definitly is very usefull to localize problems. I believe I found another timewait problem, I am not sure it is what makes your test fail, but we make progress :) I cooked a patch against last net-next-2.6 + my previous patch. (2nd take of [PATCH net-next-2.6] tcp: connect() race with timewait reuse) [PATCH net-next-2.6] tcp: fix a timewait refcnt race After TCP RCU conversion, tw->tw_refcnt should not be set to 1 in inet_twsk_alloc(). It allows a RCU reader to get this timewait socket, while we not yet stabilized it. Only choice we have is to set tw_refcnt to 0 in inet_twsk_alloc(), then atomic_add() it later, once everything is done. Location of this atomic_add() is tricky, because we dont want another writer to find this timewait in ehash, while tw_refcnt is still zero ! Thanks to Kapil Dakhane tests and reports. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/inet_timewait_sock.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 11380e6..91680ec 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -109,7 +109,6 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, tw->tw_tb = icsk->icsk_bind_hash; WARN_ON(!icsk->icsk_bind_hash); inet_twsk_add_bind_node(tw, &tw->tw_tb->owners); - atomic_inc(&tw->tw_refcnt); spin_unlock(&bhead->lock); spin_lock(lock); @@ -119,13 +118,22 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, * Should be done before removing sk from established chain * because readers are lockless and search established first. */ - atomic_inc(&tw->tw_refcnt); inet_twsk_add_node_rcu(tw, &ehead->twchain); /* Step 3: Remove SK from established hash. */ if (__sk_nulls_del_node_init_rcu(sk)) sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); + /* + * Notes : + * - We initially set tw_refcnt to 0 in inet_twsk_alloc() + * - We add one reference for the bhash link + * - We add one reference for the ehash link + * - We want this refcnt update done before allowing other + * threads to find this tw in ehash chain. + */ + atomic_add(1 + 1 + 1, &tw->tw_refcnt); + spin_unlock(lock); } @@ -157,7 +165,12 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int stat tw->tw_transparent = inet->transparent; tw->tw_prot = sk->sk_prot_creator; twsk_net_set(tw, hold_net(sock_net(sk))); - atomic_set(&tw->tw_refcnt, 1); + /* + * Because we use RCU lookups, we should not set tw_refcnt + * to a non null value before everything is setup for this + * timewait socket. + */ + atomic_set(&tw->tw_refcnt, 0); inet_twsk_dead_node_init(tw); __module_get(tw->tw_prot->owner); } ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: fix a timewait refcnt race 2009-12-03 10:49 ` [PATCH] tcp: fix a timewait refcnt race Eric Dumazet @ 2009-12-04 0:19 ` David Miller 2009-12-04 3:20 ` kapil dakhane 1 sibling, 0 replies; 31+ messages in thread From: David Miller @ 2009-12-04 0:19 UTC (permalink / raw) To: eric.dumazet; +Cc: kdakhane, netdev, netfilter, zbr From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 03 Dec 2009 11:49:01 +0100 > [PATCH net-next-2.6] tcp: fix a timewait refcnt race Applied. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: fix a timewait refcnt race 2009-12-03 10:49 ` [PATCH] tcp: fix a timewait refcnt race Eric Dumazet 2009-12-04 0:19 ` David Miller @ 2009-12-04 3:20 ` kapil dakhane 2009-12-04 6:29 ` Eric Dumazet 1 sibling, 1 reply; 31+ messages in thread From: kapil dakhane @ 2009-12-04 3:20 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, netfilter, Evgeniy Polyakov Eric, On Thu, Dec 3, 2009 at 2:49 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > I believe I found another timewait problem, I am not sure > it is what makes your test fail, but we make progress :) > > I cooked a patch against last net-next-2.6 + my previous patch. > > (2nd take of [PATCH net-next-2.6] tcp: connect() race with timewait reuse) > > [PATCH net-next-2.6] tcp: fix a timewait refcnt race > I applied your changes are suggested above. It appears that the patch successfully resolves the race condition it addressed. I managed to push the system to 1.9 gbps, previously I could not push it beyond 1.6 gbps. Unfortunately, there appear to be more race conditions, as the fault happened again when I attempted to push it to 2.3 gbps. This time I did not get any error message in /var/log/messages, although they appear on the console the same way as before. Kapil ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: fix a timewait refcnt race 2009-12-04 3:20 ` kapil dakhane @ 2009-12-04 6:29 ` Eric Dumazet 2009-12-04 6:39 ` David Miller 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2009-12-04 6:29 UTC (permalink / raw) To: kapil dakhane; +Cc: David Miller, netdev, netfilter, Evgeniy Polyakov kapil dakhane a écrit : > > I applied your changes are suggested above. It appears that the patch > successfully resolves the race condition it addressed. I managed to > push the system to 1.9 gbps, previously I could not push it beyond 1.6 > gbps. Unfortunately, there appear to be more race conditions, as the > fault happened again when I attempted to push it to 2.3 gbps. This > time I did not get any error message in /var/log/messages, although > they appear on the console the same way as before. > Thanks for testing Kapil I am not sure of exact set of changes you have in your kernel. David, could you push your net-next-2.6 tree with pending changes so that I can work on bind() side (interaction with timewait) and be sure Kapil can work on latest net-next-2.6 ? Thanks ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: fix a timewait refcnt race 2009-12-04 6:29 ` Eric Dumazet @ 2009-12-04 6:39 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2009-12-04 6:39 UTC (permalink / raw) To: eric.dumazet; +Cc: kdakhane, netdev, netfilter, zbr From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 04 Dec 2009 07:29:05 +0100 > kapil dakhane a écrit : >> >> I applied your changes are suggested above. It appears that the patch >> successfully resolves the race condition it addressed. I managed to >> push the system to 1.9 gbps, previously I could not push it beyond 1.6 >> gbps. Unfortunately, there appear to be more race conditions, as the >> fault happened again when I attempted to push it to 2.3 gbps. This >> time I did not get any error message in /var/log/messages, although >> they appear on the console the same way as before. >> > > Thanks for testing Kapil > > I am not sure of exact set of changes you have in your kernel. > > David, could you push your net-next-2.6 tree with pending changes so that > I can work on bind() side (interaction with timewait) and be sure Kapil > can work on latest net-next-2.6 ? Done. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH net-next-2.6] tcp: connect() race with timewait reuse 2009-12-02 10:33 ` Eric Dumazet 2009-12-02 11:32 ` Evgeniy Polyakov @ 2009-12-02 15:08 ` Eric Dumazet 2009-12-02 22:15 ` Evgeniy Polyakov 1 sibling, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2009-12-02 15:08 UTC (permalink / raw) To: David Miller; +Cc: kdakhane, netdev, netfilter, zbr, Evgeniy Polyakov Eric Dumazet a écrit : > Eric Dumazet a écrit : >> But even if sysctl_tw_reuse is cleared, we might trigger the bug if >> local port is bound to a value. > > Oh well, that's more subtle than that. > > __inet_check_established() is called not only with bh disabled, > but also with a lock on bind list if twp != NULL. > > However, if twp is NULL, lock is not held by caller. > > [ Thats the final > ret = check_established(death_row, sk, snum, NULL); > in __inet_hash_connect()] > > So triggering this bug with tw_reuse clear is tricky : > > You need several threads, using sockets with REUSEADDR set, > and bind() to same address/port before connect() to same target. > > We need another patch to correct this. > Here is a separate patch for this issue, cooked on top of net-next-2.6 for testing purposes, and public discussion. Thanks [PATCH net-next-2.6] tcp: connect() race with timewait reuse Its currently possible that several threads issuing a connect() find the same timewait socket and try to reuse it, leading to list corruptions. Condition for bug is that these threads bound their socket on same address/port of to be found timewait socket, and connected to same target. (SO_REUSEADDR needed) To fix this problem, we could unhash timewait socket while holding ehash lock, to make sure lookups/changes will be serialized. Only first one find the timewait socket, other ones find the established socket and return an EADDRNOTAVAIL error. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/inet_timewait_sock.h | 2 + net/ipv4/inet_hashtables.c | 7 +++-- net/ipv4/inet_timewait_sock.c | 36 ++++++++++++++++++++--------- net/ipv6/inet6_hashtables.c | 12 +++++---- 4 files changed, 39 insertions(+), 18 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index 773b10f..59c80a0 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -199,6 +199,8 @@ static inline __be32 inet_rcv_saddr(const struct sock *sk) extern void inet_twsk_put(struct inet_timewait_sock *tw); +extern void inet_twsk_unhash(struct inet_timewait_sock *tw); + extern struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int state); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 94ef51a..143ddb4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -318,20 +318,21 @@ unique: sk->sk_hash = hash; WARN_ON(!sk_unhashed(sk)); __sk_nulls_add_node_rcu(sk, &head->chain); + if (tw) { + inet_twsk_unhash(tw); + NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); + } spin_unlock(lock); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); if (twp) { *twp = tw; - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); } else if (tw) { /* Silly. Should hash-dance instead... */ inet_twsk_deschedule(tw, death_row); - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); inet_twsk_put(tw); } - return 0; not_unique: diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 1f5d508..680d09b 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -14,6 +14,21 @@ #include <net/inet_timewait_sock.h> #include <net/ip.h> + +/* + * unhash a timewait socket from established hash + * lock must be hold by caller + */ +void inet_twsk_unhash(struct inet_timewait_sock *tw) +{ + if (hlist_nulls_unhashed(&tw->tw_node)) + return; + + hlist_nulls_del_rcu(&tw->tw_node); + sk_nulls_node_init(&tw->tw_node); + inet_twsk_put(tw); +} + /* Must be called with locally disabled BHs. */ static void __inet_twsk_kill(struct inet_timewait_sock *tw, struct inet_hashinfo *hashinfo) @@ -24,12 +39,9 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); spin_lock(lock); - if (hlist_nulls_unhashed(&tw->tw_node)) { - spin_unlock(lock); - return; - } - hlist_nulls_del_rcu(&tw->tw_node); - sk_nulls_node_init(&tw->tw_node); + + inet_twsk_unhash(tw); + spin_unlock(lock); /* Disassociate with bind bucket. */ @@ -37,9 +49,11 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, hashinfo->bhash_size)]; spin_lock(&bhead->lock); tb = tw->tw_tb; - __hlist_del(&tw->tw_bind_node); - tw->tw_tb = NULL; - inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + if (tb) { + __hlist_del(&tw->tw_bind_node); + tw->tw_tb = NULL; + inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + } spin_unlock(&bhead->lock); #ifdef SOCK_REFCNT_DEBUG if (atomic_read(&tw->tw_refcnt) != 1) { @@ -47,7 +61,8 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, tw->tw_prot->name, tw, atomic_read(&tw->tw_refcnt)); } #endif - inet_twsk_put(tw); + if (tb) + inet_twsk_put(tw); } static noinline void inet_twsk_free(struct inet_timewait_sock *tw) @@ -92,6 +107,7 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, tw->tw_tb = icsk->icsk_bind_hash; WARN_ON(!icsk->icsk_bind_hash); inet_twsk_add_bind_node(tw, &tw->tw_tb->owners); + atomic_inc(&tw->tw_refcnt); spin_unlock(&bhead->lock); spin_lock(lock); diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 00c6a3e..3681c00 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -250,19 +250,21 @@ unique: * in hash table socket with a funny identity. */ inet->inet_num = lport; inet->inet_sport = htons(lport); + sk->sk_hash = hash; WARN_ON(!sk_unhashed(sk)); __sk_nulls_add_node_rcu(sk, &head->chain); - sk->sk_hash = hash; + if (tw) { + inet_twsk_unhash(tw); + NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); + } spin_unlock(lock); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); - if (twp != NULL) { + if (twp) { *twp = tw; - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); - } else if (tw != NULL) { + } else if (tw) { /* Silly. Should hash-dance instead... */ inet_twsk_deschedule(tw, death_row); - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); inet_twsk_put(tw); } ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next-2.6] tcp: connect() race with timewait reuse 2009-12-02 15:08 ` [PATCH net-next-2.6] tcp: connect() race with timewait reuse Eric Dumazet @ 2009-12-02 22:15 ` Evgeniy Polyakov 2009-12-03 6:44 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Evgeniy Polyakov @ 2009-12-02 22:15 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, kdakhane, netdev, netfilter Hi. Looks very good, thanks Eric, I have one question. On Wed, Dec 02, 2009 at 04:08:59PM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: > + > +/* > + * unhash a timewait socket from established hash > + * lock must be hold by caller > + */ > +void inet_twsk_unhash(struct inet_timewait_sock *tw) > +{ > + if (hlist_nulls_unhashed(&tw->tw_node)) > + return; > + > + hlist_nulls_del_rcu(&tw->tw_node); > + sk_nulls_node_init(&tw->tw_node); > + inet_twsk_put(tw); Is it safe to call in locked context? inet_twsk_put() schedules preemption, also I did not check what tw destructor does. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next-2.6] tcp: connect() race with timewait reuse 2009-12-02 22:15 ` Evgeniy Polyakov @ 2009-12-03 6:44 ` Eric Dumazet 2009-12-03 8:31 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2009-12-03 6:44 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Miller, kdakhane, netdev, netfilter Evgeniy Polyakov a écrit : > Hi. > > Looks very good, thanks Eric, I have one question. > > On Wed, Dec 02, 2009 at 04:08:59PM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: >> + >> +/* >> + * unhash a timewait socket from established hash >> + * lock must be hold by caller >> + */ >> +void inet_twsk_unhash(struct inet_timewait_sock *tw) >> +{ >> + if (hlist_nulls_unhashed(&tw->tw_node)) >> + return; >> + >> + hlist_nulls_del_rcu(&tw->tw_node); >> + sk_nulls_node_init(&tw->tw_node); >> + inet_twsk_put(tw); > > Is it safe to call in locked context? inet_twsk_put() schedules > preemption, also I did not check what tw destructor does. > You are probably right, we could defer the inet_twsk_put(tw) out of locked section, you or I will submit another patch to correct this. Thanks ! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next-2.6] tcp: connect() race with timewait reuse 2009-12-03 6:44 ` Eric Dumazet @ 2009-12-03 8:31 ` Eric Dumazet 2009-12-03 23:22 ` Evgeniy Polyakov 2009-12-04 0:18 ` David Miller 0 siblings, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2009-12-03 8:31 UTC (permalink / raw) To: Evgeniy Polyakov, David Miller; +Cc: kdakhane, netdev, netfilter Eric Dumazet a écrit : > > > You are probably right, we could defer the inet_twsk_put(tw) out of locked > section, you or I will submit another patch to correct this. > Here is an updated patch, tested on my dev machine. I found another problem about tw refcnt I am going to address ASAP. Thanks ! [PATCH net-next-2.6] tcp: connect() race with timewait reuse Its currently possible that several threads issuing a connect() find the same timewait socket and try to reuse it, leading to list corruptions. Condition for bug is that these threads bound their socket on same address/port of to-be-find timewait socket, and connected to same target. (SO_REUSEADDR needed) To fix this problem, we could unhash timewait socket while holding ehash lock, to make sure lookups/changes will be serialized. Only first thread finds the timewait socket, other ones find the established socket and return an EADDRNOTAVAIL error. This second version takes into account Evgeniy's review and makes sure inet_twsk_put() is called outside of locked sections. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/inet_timewait_sock.h | 2 + net/ipv4/inet_hashtables.c | 10 +++++-- net/ipv4/inet_timewait_sock.c | 38 +++++++++++++++++++++-------- net/ipv6/inet6_hashtables.c | 15 +++++++---- 4 files changed, 47 insertions(+), 18 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index 773b10f..cb7d93b 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -199,6 +199,8 @@ static inline __be32 inet_rcv_saddr(const struct sock *sk) extern void inet_twsk_put(struct inet_timewait_sock *tw); +extern int inet_twsk_unhash(struct inet_timewait_sock *tw); + extern struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int state); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 94ef51a..30e73c5 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -286,6 +286,7 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row, struct sock *sk2; const struct hlist_nulls_node *node; struct inet_timewait_sock *tw; + int twrefcnt = 0; spin_lock(lock); @@ -318,20 +319,23 @@ unique: sk->sk_hash = hash; WARN_ON(!sk_unhashed(sk)); __sk_nulls_add_node_rcu(sk, &head->chain); + if (tw) { + twrefcnt = inet_twsk_unhash(tw); + NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); + } spin_unlock(lock); + if (twrefcnt) + inet_twsk_put(tw); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); if (twp) { *twp = tw; - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); } else if (tw) { /* Silly. Should hash-dance instead... */ inet_twsk_deschedule(tw, death_row); - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); inet_twsk_put(tw); } - return 0; not_unique: diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 1f5d508..11380e6 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -14,22 +14,33 @@ #include <net/inet_timewait_sock.h> #include <net/ip.h> + +/* + * unhash a timewait socket from established hash + * lock must be hold by caller + */ +int inet_twsk_unhash(struct inet_timewait_sock *tw) +{ + if (hlist_nulls_unhashed(&tw->tw_node)) + return 0; + + hlist_nulls_del_rcu(&tw->tw_node); + sk_nulls_node_init(&tw->tw_node); + return 1; +} + /* Must be called with locally disabled BHs. */ static void __inet_twsk_kill(struct inet_timewait_sock *tw, struct inet_hashinfo *hashinfo) { struct inet_bind_hashbucket *bhead; struct inet_bind_bucket *tb; + int refcnt; /* Unlink from established hashes. */ spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); spin_lock(lock); - if (hlist_nulls_unhashed(&tw->tw_node)) { - spin_unlock(lock); - return; - } - hlist_nulls_del_rcu(&tw->tw_node); - sk_nulls_node_init(&tw->tw_node); + refcnt = inet_twsk_unhash(tw); spin_unlock(lock); /* Disassociate with bind bucket. */ @@ -37,9 +48,12 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, hashinfo->bhash_size)]; spin_lock(&bhead->lock); tb = tw->tw_tb; - __hlist_del(&tw->tw_bind_node); - tw->tw_tb = NULL; - inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + if (tb) { + __hlist_del(&tw->tw_bind_node); + tw->tw_tb = NULL; + inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + refcnt++; + } spin_unlock(&bhead->lock); #ifdef SOCK_REFCNT_DEBUG if (atomic_read(&tw->tw_refcnt) != 1) { @@ -47,7 +61,10 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, tw->tw_prot->name, tw, atomic_read(&tw->tw_refcnt)); } #endif - inet_twsk_put(tw); + while (refcnt) { + inet_twsk_put(tw); + refcnt--; + } } static noinline void inet_twsk_free(struct inet_timewait_sock *tw) @@ -92,6 +109,7 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, tw->tw_tb = icsk->icsk_bind_hash; WARN_ON(!icsk->icsk_bind_hash); inet_twsk_add_bind_node(tw, &tw->tw_tb->owners); + atomic_inc(&tw->tw_refcnt); spin_unlock(&bhead->lock); spin_lock(lock); diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 00c6a3e..7207801 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -223,6 +223,7 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row, struct sock *sk2; const struct hlist_nulls_node *node; struct inet_timewait_sock *tw; + int twrefcnt = 0; spin_lock(lock); @@ -250,19 +251,23 @@ unique: * in hash table socket with a funny identity. */ inet->inet_num = lport; inet->inet_sport = htons(lport); + sk->sk_hash = hash; WARN_ON(!sk_unhashed(sk)); __sk_nulls_add_node_rcu(sk, &head->chain); - sk->sk_hash = hash; + if (tw) { + twrefcnt = inet_twsk_unhash(tw); + NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); + } spin_unlock(lock); + if (twrefcnt) + inet_twsk_put(tw); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); - if (twp != NULL) { + if (twp) { *twp = tw; - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); - } else if (tw != NULL) { + } else if (tw) { /* Silly. Should hash-dance instead... */ inet_twsk_deschedule(tw, death_row); - NET_INC_STATS_BH(net, LINUX_MIB_TIMEWAITRECYCLED); inet_twsk_put(tw); } ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH net-next-2.6] tcp: connect() race with timewait reuse 2009-12-03 8:31 ` Eric Dumazet @ 2009-12-03 23:22 ` Evgeniy Polyakov 2009-12-04 0:18 ` David Miller 1 sibling, 0 replies; 31+ messages in thread From: Evgeniy Polyakov @ 2009-12-03 23:22 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, kdakhane, netdev, netfilter Hi Eric. Patch looks good, thank you! On Thu, Dec 03, 2009 at 09:31:19AM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: > +/* > + * unhash a timewait socket from established hash > + * lock must be hold by caller > + */ > +int inet_twsk_unhash(struct inet_timewait_sock *tw) > +{ > + if (hlist_nulls_unhashed(&tw->tw_node)) > + return 0; > + > + hlist_nulls_del_rcu(&tw->tw_node); > + sk_nulls_node_init(&tw->tw_node); > + return 1; > +} > + > /* Must be called with locally disabled BHs. */ > static void __inet_twsk_kill(struct inet_timewait_sock *tw, > struct inet_hashinfo *hashinfo) > { > struct inet_bind_hashbucket *bhead; > struct inet_bind_bucket *tb; > + int refcnt; > /* Unlink from established hashes. */ > spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); > > spin_lock(lock); > - if (hlist_nulls_unhashed(&tw->tw_node)) { > - spin_unlock(lock); > - return; > - } > - hlist_nulls_del_rcu(&tw->tw_node); > - sk_nulls_node_init(&tw->tw_node); > + refcnt = inet_twsk_unhash(tw); Tricky :) > spin_unlock(lock); > > /* Disassociate with bind bucket. */ > @@ -37,9 +48,12 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, > hashinfo->bhash_size)]; > spin_lock(&bhead->lock); > tb = tw->tw_tb; > - __hlist_del(&tw->tw_bind_node); > - tw->tw_tb = NULL; > - inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); > + if (tb) { > + __hlist_del(&tw->tw_bind_node); > + tw->tw_tb = NULL; > + inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); > + refcnt++; > + } > spin_unlock(&bhead->lock); > #ifdef SOCK_REFCNT_DEBUG > if (atomic_read(&tw->tw_refcnt) != 1) { > @@ -47,7 +61,10 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, > tw->tw_prot->name, tw, atomic_read(&tw->tw_refcnt)); > } > #endif > - inet_twsk_put(tw); > + while (refcnt) { > + inet_twsk_put(tw); > + refcnt--; > + } > } -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH net-next-2.6] tcp: connect() race with timewait reuse 2009-12-03 8:31 ` Eric Dumazet 2009-12-03 23:22 ` Evgeniy Polyakov @ 2009-12-04 0:18 ` David Miller 1 sibling, 0 replies; 31+ messages in thread From: David Miller @ 2009-12-04 0:18 UTC (permalink / raw) To: eric.dumazet; +Cc: zbr, kdakhane, netdev, netfilter From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 03 Dec 2009 09:31:19 +0100 > [PATCH net-next-2.6] tcp: connect() race with timewait reuse Applied. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-02 8:59 ` David Miller 2009-12-02 9:23 ` Eric Dumazet @ 2009-12-02 16:05 ` Ashwani Wason 2009-12-03 6:38 ` David Miller 1 sibling, 1 reply; 31+ messages in thread From: Ashwani Wason @ 2009-12-02 16:05 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, kdakhane, netdev, netfilter, zbr Both reuse and recycle were enabled for this test. (I know because we, Kapil and I are working together on different aspects of this.) - Ashwani On Wed, Dec 2, 2009 at 12:59 AM, David Miller <davem@davemloft.net> wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 01 Dec 2009 16:00:39 +0100 > >> [PATCH] tcp: Fix a connect() race with timewait sockets > > This condition would only trigger if the timewait recycling sysctl is > enabled. > > It is off by default, and I can't find any mention in this bug report > that it has been turned on. > -- > To unsubscribe from this list: send the line "unsubscribe netfilter" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: Fix a connect() race with timewait sockets 2009-12-02 16:05 ` [PATCH] tcp: Fix a connect() race with timewait sockets Ashwani Wason @ 2009-12-03 6:38 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2009-12-03 6:38 UTC (permalink / raw) To: ashwas; +Cc: eric.dumazet, kdakhane, netdev, netfilter, zbr From: Ashwani Wason <ashwas@gmail.com> Date: Wed, 2 Dec 2009 08:05:51 -0800 > Both reuse and recycle were enabled for this test. (I know because we, > Kapil and I are working together on different aspects of this.) Thanks, so the timewait recycling code paths really are relevant. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 0/2] tcp: Fix connect() races with timewait sockets 2009-12-01 15:00 ` [PATCH] tcp: Fix a connect() race with timewait sockets Eric Dumazet 2009-12-02 8:59 ` David Miller @ 2009-12-04 13:45 ` Eric Dumazet 2009-12-04 13:46 ` [PATCH 1/2] tcp: Fix a connect() race " Eric Dumazet 2009-12-04 13:47 ` [PATCH 2/2] " Eric Dumazet 3 siblings, 0 replies; 31+ messages in thread From: Eric Dumazet @ 2009-12-04 13:45 UTC (permalink / raw) To: kapil dakhane, David S. Miller; +Cc: netdev, netfilter, Evgeniy Polyakov Eric Dumazet a écrit : > [PATCH] tcp: Fix a connect() race with timewait sockets > > When we find a timewait connection in __inet_hash_connect() and reuse > it for a new connection request, we have a race window, releasing bind > list lock and reacquiring it in __inet_twsk_kill() to remove timewait > socket from list. > > Another thread might find the timewait socket we already chose, leading to > list corruption and crashes. > > Fix is to remove timewait socket from bind list before releasing the lock. I cooked two patches on top of net-next-2.6 to solve the two last race problems I am aware of. Kapil, if you want to test them, make sure you take last net-next-2.6 snapshot. First patch changes __inet_hash_nolisten() and __inet6_hash() to get a timewait parameter to be able to unhash it from ehash at same time the new socket is inserted into ehash. Second patch is a respin of the first patch I sent : It makes sure __inet_has_connect() cannot give same timewait socket to different threads. Thanks ! Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 1/2] tcp: Fix a connect() race with timewait sockets 2009-12-01 15:00 ` [PATCH] tcp: Fix a connect() race with timewait sockets Eric Dumazet 2009-12-02 8:59 ` David Miller 2009-12-04 13:45 ` [PATCH 0/2] tcp: Fix connect() races " Eric Dumazet @ 2009-12-04 13:46 ` Eric Dumazet 2009-12-05 21:21 ` Evgeniy Polyakov 2009-12-09 4:18 ` [PATCH 1/2] tcp: Fix a connect() race with timewait sockets David Miller 2009-12-04 13:47 ` [PATCH 2/2] " Eric Dumazet 3 siblings, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2009-12-04 13:46 UTC (permalink / raw) To: kapil dakhane, David S. Miller; +Cc: netdev, netfilter, Evgeniy Polyakov First patch changes __inet_hash_nolisten() and __inet6_hash() to get a timewait parameter to be able to unhash it from ehash at same time the new socket is inserted in hash. This makes sure timewait socket wont be found by a concurrent writer in __inet_check_established() Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/inet6_hashtables.h | 2 +- include/net/inet_hashtables.h | 8 +++++--- net/dccp/ipv4.c | 2 +- net/dccp/ipv6.c | 4 ++-- net/ipv4/inet_hashtables.c | 22 ++++++++++++++++------ net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/inet6_hashtables.c | 8 +++++++- net/ipv6/tcp_ipv6.c | 4 ++-- 8 files changed, 35 insertions(+), 17 deletions(-) diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 92838d3..e46674d 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -53,7 +53,7 @@ static inline int inet6_sk_ehashfn(const struct sock *sk) return inet6_ehashfn(net, laddr, lport, faddr, fport); } -extern void __inet6_hash(struct sock *sk); +extern int __inet6_hash(struct sock *sk, struct inet_timewait_sock *twp); /* * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 41cbddd..74358d1 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -251,7 +251,7 @@ extern void inet_put_port(struct sock *sk); void inet_hashinfo_init(struct inet_hashinfo *h); -extern void __inet_hash_nolisten(struct sock *sk); +extern int __inet_hash_nolisten(struct sock *sk, struct inet_timewait_sock *tw); extern void inet_hash(struct sock *sk); extern void inet_unhash(struct sock *sk); @@ -391,10 +391,12 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, } extern int __inet_hash_connect(struct inet_timewait_death_row *death_row, - struct sock *sk, u32 port_offset, + struct sock *sk, + u32 port_offset, int (*check_established)(struct inet_timewait_death_row *, struct sock *, __u16, struct inet_timewait_sock **), - void (*hash)(struct sock *sk)); + int (*hash)(struct sock *sk, struct inet_timewait_sock *twp)); + extern int inet_hash_connect(struct inet_timewait_death_row *death_row, struct sock *sk); #endif /* _INET_HASHTABLES_H */ diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index efbcfdc..dad7bc4 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -408,7 +408,7 @@ struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb, dccp_sync_mss(newsk, dst_mtu(dst)); - __inet_hash_nolisten(newsk); + __inet_hash_nolisten(newsk, NULL); __inet_inherit_port(sk, newsk); return newsk; diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 6574215..baf05cf 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -46,7 +46,7 @@ static void dccp_v6_hash(struct sock *sk) return; } local_bh_disable(); - __inet6_hash(sk); + __inet6_hash(sk, NULL); local_bh_enable(); } } @@ -644,7 +644,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, newinet->inet_daddr = newinet->inet_saddr = LOOPBACK4_IPV6; newinet->inet_rcv_saddr = LOOPBACK4_IPV6; - __inet6_hash(newsk); + __inet6_hash(newsk, NULL); __inet_inherit_port(sk, newsk); return newsk; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 21e5e32..c4201b7 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -351,12 +351,13 @@ static inline u32 inet_sk_port_offset(const struct sock *sk) inet->inet_dport); } -void __inet_hash_nolisten(struct sock *sk) +int __inet_hash_nolisten(struct sock *sk, struct inet_timewait_sock *tw) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct hlist_nulls_head *list; spinlock_t *lock; struct inet_ehash_bucket *head; + int twrefcnt = 0; WARN_ON(!sk_unhashed(sk)); @@ -367,8 +368,13 @@ void __inet_hash_nolisten(struct sock *sk) spin_lock(lock); __sk_nulls_add_node_rcu(sk, list); + if (tw) { + WARN_ON(sk->sk_hash != tw->tw_hash); + twrefcnt = inet_twsk_unhash(tw); + } spin_unlock(lock); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); + return twrefcnt; } EXPORT_SYMBOL_GPL(__inet_hash_nolisten); @@ -378,7 +384,7 @@ static void __inet_hash(struct sock *sk) struct inet_listen_hashbucket *ilb; if (sk->sk_state != TCP_LISTEN) { - __inet_hash_nolisten(sk); + __inet_hash_nolisten(sk, NULL); return; } @@ -427,7 +433,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, struct sock *sk, u32 port_offset, int (*check_established)(struct inet_timewait_death_row *, struct sock *, __u16, struct inet_timewait_sock **), - void (*hash)(struct sock *sk)) + int (*hash)(struct sock *sk, struct inet_timewait_sock *twp)) { struct inet_hashinfo *hinfo = death_row->hashinfo; const unsigned short snum = inet_sk(sk)->inet_num; @@ -435,6 +441,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, struct inet_bind_bucket *tb; int ret; struct net *net = sock_net(sk); + int twrefcnt = 1; if (!snum) { int i, remaining, low, high, port; @@ -493,13 +500,16 @@ ok: inet_bind_hash(sk, tb, port); if (sk_unhashed(sk)) { inet_sk(sk)->inet_sport = htons(port); - hash(sk); + twrefcnt += hash(sk, tw); } spin_unlock(&head->lock); if (tw) { inet_twsk_deschedule(tw, death_row); - inet_twsk_put(tw); + while (twrefcnt) { + twrefcnt--; + inet_twsk_put(tw); + } } ret = 0; @@ -510,7 +520,7 @@ ok: tb = inet_csk(sk)->icsk_bind_hash; spin_lock_bh(&head->lock); if (sk_head(&tb->owners) == sk && !sk->sk_bind_node.next) { - hash(sk); + hash(sk, NULL); spin_unlock_bh(&head->lock); return 0; } else { diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 29002ab..15e9603 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1464,7 +1464,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, } #endif - __inet_hash_nolisten(newsk); + __inet_hash_nolisten(newsk, NULL); __inet_inherit_port(sk, newsk); return newsk; diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index c813e29..633a6c2 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -22,9 +22,10 @@ #include <net/inet6_hashtables.h> #include <net/ip.h> -void __inet6_hash(struct sock *sk) +int __inet6_hash(struct sock *sk, struct inet_timewait_sock *tw) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; + int twrefcnt = 0; WARN_ON(!sk_unhashed(sk)); @@ -45,10 +46,15 @@ void __inet6_hash(struct sock *sk) lock = inet_ehash_lockp(hashinfo, hash); spin_lock(lock); __sk_nulls_add_node_rcu(sk, list); + if (tw) { + WARN_ON(sk->sk_hash != tw->tw_hash); + twrefcnt = inet_twsk_unhash(tw); + } spin_unlock(lock); } sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); + return twrefcnt; } EXPORT_SYMBOL(__inet6_hash); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index aadd7ce..ee9cf62 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -96,7 +96,7 @@ static void tcp_v6_hash(struct sock *sk) return; } local_bh_disable(); - __inet6_hash(sk); + __inet6_hash(sk, NULL); local_bh_enable(); } } @@ -1496,7 +1496,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, } #endif - __inet6_hash(newsk); + __inet6_hash(newsk, NULL); __inet_inherit_port(sk, newsk); return newsk; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] tcp: Fix a connect() race with timewait sockets 2009-12-04 13:46 ` [PATCH 1/2] tcp: Fix a connect() race " Eric Dumazet @ 2009-12-05 21:21 ` Evgeniy Polyakov 2009-12-07 9:59 ` [PATCH] tcp: documents timewait refcnt tricks Eric Dumazet 2009-12-09 4:18 ` [PATCH 1/2] tcp: Fix a connect() race with timewait sockets David Miller 1 sibling, 1 reply; 31+ messages in thread From: Evgeniy Polyakov @ 2009-12-05 21:21 UTC (permalink / raw) To: Eric Dumazet; +Cc: kapil dakhane, David S. Miller, netdev, netfilter Hi Eric. On Fri, Dec 04, 2009 at 02:46:54PM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: > First patch changes __inet_hash_nolisten() and __inet6_hash() > to get a timewait parameter to be able to unhash it from ehash > at same time the new socket is inserted in hash. > > This makes sure timewait socket wont be found by a concurrent > writer in __inet_check_established() Both patches look good, although trick with returning reference counter may look like a hack especially when only viewing into ip code and not hashtable itself. Can you please cook up a documentation update for hash function that it is supposed to return refcnt when socket was in hash table. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] tcp: documents timewait refcnt tricks 2009-12-05 21:21 ` Evgeniy Polyakov @ 2009-12-07 9:59 ` Eric Dumazet 2009-12-07 16:06 ` Randy Dunlap 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2009-12-07 9:59 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: kapil dakhane, David S. Miller, netdev, netfilter Evgeniy Polyakov a écrit : > Hi Eric. > > On Fri, Dec 04, 2009 at 02:46:54PM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: >> First patch changes __inet_hash_nolisten() and __inet6_hash() >> to get a timewait parameter to be able to unhash it from ehash >> at same time the new socket is inserted in hash. >> >> This makes sure timewait socket wont be found by a concurrent >> writer in __inet_check_established() > > Both patches look good, although trick with returning reference counter > may look like a hack especially when only viewing into ip code and not > hashtable itself. Can you please cook up a documentation update for hash > function that it is supposed to return refcnt when socket was in hash > table. > Sure, here it is : Thanks ! [PATCH] tcp: documents timewait refcnt tricks Adds kerneldoc for inet_twsk_unhash() & inet_twsk_bind_unhash(). Suggested-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 1958cf5..cf719c2 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -15,9 +15,13 @@ #include <net/ip.h> -/* - * unhash a timewait socket from established hash - * lock must be hold by caller +/** + * inet_twsk_unhash - unhash a timewait socket from established hash + * @tw: timewait socket + * + * unhash a timewait socket from established hash, if hashed. + * ehash lock must be hold by caller. + * Returns 1 if caller should call inet_twsk_put() after lock release. */ int inet_twsk_unhash(struct inet_timewait_sock *tw) { @@ -26,12 +30,21 @@ int inet_twsk_unhash(struct inet_timewait_sock *tw) hlist_nulls_del_rcu(&tw->tw_node); sk_nulls_node_init(&tw->tw_node); + /* + * We cannot call inet_twsk_put() ourself under lock, + * caller must call it for us. + */ return 1; } -/* - * unhash a timewait socket from bind hash - * lock must be hold by caller +/** + * inet_twsk_bind_unhash - unhash a timewait socket from bind hash + * @tw: timewait socket + * @hashinfo: hashinfo pointer + * + * unhash a timewait socket from bind hash, if hashed. + * bind hash lock must be hold by caller. + * Returns 1 if caller should call inet_twsk_put() after lock release. */ int inet_twsk_bind_unhash(struct inet_timewait_sock *tw, struct inet_hashinfo *hashinfo) @@ -44,6 +57,10 @@ int inet_twsk_bind_unhash(struct inet_timewait_sock *tw, __hlist_del(&tw->tw_bind_node); tw->tw_tb = NULL; inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + /* + * We cannot call inet_twsk_put() ourself under lock, + * caller must call it for us. + */ return 1; } @@ -140,7 +157,7 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, /* * Notes : - * - We initially set tw_refcnt to 0 in inet_twsk_alloc() + * - We initially set tw_refcnt to 0 in inet_twsk_alloc() * - We add one reference for the bhash link * - We add one reference for the ehash link * - We want this refcnt update done before allowing other @@ -150,7 +167,6 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, spin_unlock(lock); } - EXPORT_SYMBOL_GPL(__inet_twsk_hashdance); struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int state) @@ -191,7 +207,6 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int stat return tw; } - EXPORT_SYMBOL_GPL(inet_twsk_alloc); /* Returns non-zero if quota exceeded. */ @@ -270,7 +285,6 @@ void inet_twdr_hangman(unsigned long data) out: spin_unlock(&twdr->death_lock); } - EXPORT_SYMBOL_GPL(inet_twdr_hangman); void inet_twdr_twkill_work(struct work_struct *work) @@ -301,7 +315,6 @@ void inet_twdr_twkill_work(struct work_struct *work) spin_unlock_bh(&twdr->death_lock); } } - EXPORT_SYMBOL_GPL(inet_twdr_twkill_work); /* These are always called from BH context. See callers in @@ -321,7 +334,6 @@ void inet_twsk_deschedule(struct inet_timewait_sock *tw, spin_unlock(&twdr->death_lock); __inet_twsk_kill(tw, twdr->hashinfo); } - EXPORT_SYMBOL(inet_twsk_deschedule); void inet_twsk_schedule(struct inet_timewait_sock *tw, @@ -402,7 +414,6 @@ void inet_twsk_schedule(struct inet_timewait_sock *tw, mod_timer(&twdr->tw_timer, jiffies + twdr->period); spin_unlock(&twdr->death_lock); } - EXPORT_SYMBOL_GPL(inet_twsk_schedule); void inet_twdr_twcal_tick(unsigned long data) @@ -463,7 +474,6 @@ out: #endif spin_unlock(&twdr->death_lock); } - EXPORT_SYMBOL_GPL(inet_twdr_twcal_tick); void inet_twsk_purge(struct inet_hashinfo *hashinfo, ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: documents timewait refcnt tricks 2009-12-07 9:59 ` [PATCH] tcp: documents timewait refcnt tricks Eric Dumazet @ 2009-12-07 16:06 ` Randy Dunlap 2009-12-09 4:20 ` David Miller 0 siblings, 1 reply; 31+ messages in thread From: Randy Dunlap @ 2009-12-07 16:06 UTC (permalink / raw) To: Eric Dumazet Cc: Evgeniy Polyakov, kapil dakhane, David S. Miller, netdev, netfilter Eric Dumazet wrote: > Evgeniy Polyakov a écrit : >> Hi Eric. >> >> On Fri, Dec 04, 2009 at 02:46:54PM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote: >>> First patch changes __inet_hash_nolisten() and __inet6_hash() >>> to get a timewait parameter to be able to unhash it from ehash >>> at same time the new socket is inserted in hash. >>> >>> This makes sure timewait socket wont be found by a concurrent >>> writer in __inet_check_established() >> Both patches look good, although trick with returning reference counter >> may look like a hack especially when only viewing into ip code and not >> hashtable itself. Can you please cook up a documentation update for hash >> function that it is supposed to return refcnt when socket was in hash >> table. >> > > Sure, here it is : > > Thanks ! > > [PATCH] tcp: documents timewait refcnt tricks > > Adds kerneldoc for inet_twsk_unhash() & inet_twsk_bind_unhash(). > > Suggested-by: Evgeniy Polyakov <zbr@ioremap.net> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c > index 1958cf5..cf719c2 100644 > --- a/net/ipv4/inet_timewait_sock.c > +++ b/net/ipv4/inet_timewait_sock.c > @@ -15,9 +15,13 @@ > #include <net/ip.h> > > > -/* > - * unhash a timewait socket from established hash > - * lock must be hold by caller > +/** > + * inet_twsk_unhash - unhash a timewait socket from established hash > + * @tw: timewait socket > + * > + * unhash a timewait socket from established hash, if hashed. > + * ehash lock must be hold by caller. held > + * Returns 1 if caller should call inet_twsk_put() after lock release. > */ > int inet_twsk_unhash(struct inet_timewait_sock *tw) > { > @@ -26,12 +30,21 @@ int inet_twsk_unhash(struct inet_timewait_sock *tw) > > hlist_nulls_del_rcu(&tw->tw_node); > sk_nulls_node_init(&tw->tw_node); > + /* > + * We cannot call inet_twsk_put() ourself under lock, > + * caller must call it for us. > + */ > return 1; > } > > -/* > - * unhash a timewait socket from bind hash > - * lock must be hold by caller > +/** > + * inet_twsk_bind_unhash - unhash a timewait socket from bind hash > + * @tw: timewait socket > + * @hashinfo: hashinfo pointer > + * > + * unhash a timewait socket from bind hash, if hashed. > + * bind hash lock must be hold by caller. held > + * Returns 1 if caller should call inet_twsk_put() after lock release. > */ > int inet_twsk_bind_unhash(struct inet_timewait_sock *tw, > struct inet_hashinfo *hashinfo) thanks. -- ~Randy ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] tcp: documents timewait refcnt tricks 2009-12-07 16:06 ` Randy Dunlap @ 2009-12-09 4:20 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2009-12-09 4:20 UTC (permalink / raw) To: rdunlap; +Cc: eric.dumazet, zbr, kdakhane, netdev, netfilter From: Randy Dunlap <rdunlap@xenotime.net> Date: Mon, 07 Dec 2009 08:06:58 -0800 > Eric Dumazet wrote: >> Evgeniy Polyakov a écrit : >> [PATCH] tcp: documents timewait refcnt tricks >> >> Adds kerneldoc for inet_twsk_unhash() & inet_twsk_bind_unhash(). >> >> Suggested-by: Evgeniy Polyakov <zbr@ioremap.net> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> ... >> + * unhash a timewait socket from established hash, if hashed. >> + * ehash lock must be hold by caller. > > held > ... >> + * unhash a timewait socket from bind hash, if hashed. >> + * bind hash lock must be hold by caller. > > held I've applied the patch with Randy's corrections added. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] tcp: Fix a connect() race with timewait sockets 2009-12-04 13:46 ` [PATCH 1/2] tcp: Fix a connect() race " Eric Dumazet 2009-12-05 21:21 ` Evgeniy Polyakov @ 2009-12-09 4:18 ` David Miller 1 sibling, 0 replies; 31+ messages in thread From: David Miller @ 2009-12-09 4:18 UTC (permalink / raw) To: eric.dumazet; +Cc: kdakhane, netdev, netfilter, zbr From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 04 Dec 2009 14:46:54 +0100 > First patch changes __inet_hash_nolisten() and __inet6_hash() > to get a timewait parameter to be able to unhash it from ehash > at same time the new socket is inserted in hash. > > This makes sure timewait socket wont be found by a concurrent > writer in __inet_check_established() > > Reported-by: kapil dakhane <kdakhane@gmail.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied and queued up for -stable. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 2/2] tcp: Fix a connect() race with timewait sockets 2009-12-01 15:00 ` [PATCH] tcp: Fix a connect() race with timewait sockets Eric Dumazet ` (2 preceding siblings ...) 2009-12-04 13:46 ` [PATCH 1/2] tcp: Fix a connect() race " Eric Dumazet @ 2009-12-04 13:47 ` Eric Dumazet 2009-12-09 4:19 ` David Miller 3 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2009-12-04 13:47 UTC (permalink / raw) To: kapil dakhane, David S. Miller; +Cc: netdev, netfilter, Evgeniy Polyakov When we find a timewait connection in __inet_hash_connect() and reuse it for a new connection request, we have a race window, releasing bind list lock and reacquiring it in __inet_twsk_kill() to remove timewait socket from list. Another thread might find the timewait socket we already chose, leading to list corruption and crashes. Fix is to remove timewait socket from bind list before releasing the bind lock. Note: This problem happens if sysctl_tcp_tw_reuse is set. Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/inet_timewait_sock.h | 3 +++ net/ipv4/inet_hashtables.c | 2 ++ net/ipv4/inet_timewait_sock.c | 29 +++++++++++++++++++++-------- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index b801ade..79f67ea 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -201,6 +201,9 @@ extern void inet_twsk_put(struct inet_timewait_sock *tw); extern int inet_twsk_unhash(struct inet_timewait_sock *tw); +extern int inet_twsk_bind_unhash(struct inet_timewait_sock *tw, + struct inet_hashinfo *hashinfo); + extern struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int state); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index c4201b7..2b79377 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -502,6 +502,8 @@ ok: inet_sk(sk)->inet_sport = htons(port); twrefcnt += hash(sk, tw); } + if (tw) + twrefcnt += inet_twsk_bind_unhash(tw, hinfo); spin_unlock(&head->lock); if (tw) { diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 0a1c62b..1958cf5 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -29,12 +29,29 @@ int inet_twsk_unhash(struct inet_timewait_sock *tw) return 1; } +/* + * unhash a timewait socket from bind hash + * lock must be hold by caller + */ +int inet_twsk_bind_unhash(struct inet_timewait_sock *tw, + struct inet_hashinfo *hashinfo) +{ + struct inet_bind_bucket *tb = tw->tw_tb; + + if (!tb) + return 0; + + __hlist_del(&tw->tw_bind_node); + tw->tw_tb = NULL; + inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + return 1; +} + /* Must be called with locally disabled BHs. */ static void __inet_twsk_kill(struct inet_timewait_sock *tw, struct inet_hashinfo *hashinfo) { struct inet_bind_hashbucket *bhead; - struct inet_bind_bucket *tb; int refcnt; /* Unlink from established hashes. */ spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); @@ -46,15 +63,11 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, /* Disassociate with bind bucket. */ bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), tw->tw_num, hashinfo->bhash_size)]; + spin_lock(&bhead->lock); - tb = tw->tw_tb; - if (tb) { - __hlist_del(&tw->tw_bind_node); - tw->tw_tb = NULL; - inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); - refcnt++; - } + refcnt += inet_twsk_bind_unhash(tw, hashinfo); spin_unlock(&bhead->lock); + #ifdef SOCK_REFCNT_DEBUG if (atomic_read(&tw->tw_refcnt) != 1) { printk(KERN_DEBUG "%s timewait_sock %p refcnt=%d\n", ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 2/2] tcp: Fix a connect() race with timewait sockets 2009-12-04 13:47 ` [PATCH 2/2] " Eric Dumazet @ 2009-12-09 4:19 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2009-12-09 4:19 UTC (permalink / raw) To: eric.dumazet; +Cc: kdakhane, netdev, netfilter, zbr From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 04 Dec 2009 14:47:42 +0100 > When we find a timewait connection in __inet_hash_connect() and reuse > it for a new connection request, we have a race window, releasing bind > list lock and reacquiring it in __inet_twsk_kill() to remove timewait > socket from list. > > Another thread might find the timewait socket we already chose, leading to > list corruption and crashes. > > Fix is to remove timewait socket from bind list before releasing the bind lock. > > Note: This problem happens if sysctl_tcp_tw_reuse is set. > > Reported-by: kapil dakhane <kdakhane@gmail.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied and queued up for -stable, thanks! ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2009-12-09 4:20 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-01 2:02 soft lockup in inet_csk_get_port kapil dakhane 2009-12-01 6:10 ` Eric Dumazet 2009-12-01 15:00 ` [PATCH] tcp: Fix a connect() race with timewait sockets Eric Dumazet 2009-12-02 8:59 ` David Miller 2009-12-02 9:23 ` Eric Dumazet 2009-12-02 10:33 ` Eric Dumazet 2009-12-02 11:32 ` Evgeniy Polyakov 2009-12-02 19:18 ` kapil dakhane 2009-12-03 2:43 ` kapil dakhane 2009-12-03 10:49 ` [PATCH] tcp: fix a timewait refcnt race Eric Dumazet 2009-12-04 0:19 ` David Miller 2009-12-04 3:20 ` kapil dakhane 2009-12-04 6:29 ` Eric Dumazet 2009-12-04 6:39 ` David Miller 2009-12-02 15:08 ` [PATCH net-next-2.6] tcp: connect() race with timewait reuse Eric Dumazet 2009-12-02 22:15 ` Evgeniy Polyakov 2009-12-03 6:44 ` Eric Dumazet 2009-12-03 8:31 ` Eric Dumazet 2009-12-03 23:22 ` Evgeniy Polyakov 2009-12-04 0:18 ` David Miller 2009-12-02 16:05 ` [PATCH] tcp: Fix a connect() race with timewait sockets Ashwani Wason 2009-12-03 6:38 ` David Miller 2009-12-04 13:45 ` [PATCH 0/2] tcp: Fix connect() races " Eric Dumazet 2009-12-04 13:46 ` [PATCH 1/2] tcp: Fix a connect() race " Eric Dumazet 2009-12-05 21:21 ` Evgeniy Polyakov 2009-12-07 9:59 ` [PATCH] tcp: documents timewait refcnt tricks Eric Dumazet 2009-12-07 16:06 ` Randy Dunlap 2009-12-09 4:20 ` David Miller 2009-12-09 4:18 ` [PATCH 1/2] tcp: Fix a connect() race with timewait sockets David Miller 2009-12-04 13:47 ` [PATCH 2/2] " Eric Dumazet 2009-12-09 4:19 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).