* [PATCH 1/2] net: dont leave active on stack LIST_HEAD [not found] ` <1298014191.2642.11.camel@edumazet-laptop> @ 2011-02-18 8:54 ` Eric Dumazet 2011-02-18 20:14 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2011-02-18 8:54 UTC (permalink / raw) To: David Miller Cc: torvalds, ebiederm, opurdila, mingo, mhocko, linux-mm, linux-kernel, netdev From: Linus Torvalds <torvalds@linux-foundation.org> Eric W. Biderman and Michal Hocko reported various memory corruptions that we suspected to be related to a LIST head located on stack, that was manipulated after thread left function frame (and eventually exited, so its stack was freed and reused). Eric Dumazet suggested the problem was probably coming from commit 443457242beb (net: factorize sync-rcu call in unregister_netdevice_many) This patch fixes __dev_close() and dev_close() to properly deinit their respective LIST_HEAD(single) before exiting. References: https://lkml.org/lkml/2011/2/16/304 References: https://lkml.org/lkml/2011/2/14/223 Reported-by: Michal Hocko <mhocko@suse.cz> Reported-by: Eric W. Biderman <ebiderman@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Octavian Purdila <opurdila@ixiacom.com> CC: Eric W. Biderman <ebiderman@xmission.com> --- net/core/dev.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 8e726cb..a18c164 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1280,10 +1280,13 @@ static int __dev_close_many(struct list_head *head) static int __dev_close(struct net_device *dev) { + int retval; LIST_HEAD(single); list_add(&dev->unreg_list, &single); - return __dev_close_many(&single); + retval = __dev_close_many(&single); + list_del(&single); + return retval; } int dev_close_many(struct list_head *head) @@ -1325,7 +1328,7 @@ int dev_close(struct net_device *dev) list_add(&dev->unreg_list, &single); dev_close_many(&single); - + list_del(&single); return 0; } EXPORT_SYMBOL(dev_close); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] net: dont leave active on stack LIST_HEAD 2011-02-18 8:54 ` [PATCH 1/2] net: dont leave active on stack LIST_HEAD Eric Dumazet @ 2011-02-18 20:14 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2011-02-18 20:14 UTC (permalink / raw) To: eric.dumazet Cc: torvalds, ebiederm, opurdila, mingo, mhocko, linux-mm, linux-kernel, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 18 Feb 2011 09:54:38 +0100 > From: Linus Torvalds <torvalds@linux-foundation.org> > > Eric W. Biderman and Michal Hocko reported various memory corruptions > that we suspected to be related to a LIST head located on stack, that > was manipulated after thread left function frame (and eventually exited, > so its stack was freed and reused). > > Eric Dumazet suggested the problem was probably coming from commit > 443457242beb (net: factorize > sync-rcu call in unregister_netdevice_many) > > This patch fixes __dev_close() and dev_close() to properly deinit their > respective LIST_HEAD(single) before exiting. > > References: https://lkml.org/lkml/2011/2/16/304 > References: https://lkml.org/lkml/2011/2/14/223 > > Reported-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Eric W. Biderman <ebiderman@xmission.com> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <m17hcx7wca.fsf@fess.ebiederm.org>]
* [PATCH 2/2] net: deinit automatic LIST_HEAD [not found] ` <m17hcx7wca.fsf@fess.ebiederm.org> @ 2011-02-18 8:59 ` Eric Dumazet 2011-02-18 20:14 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2011-02-18 8:59 UTC (permalink / raw) To: David Miller Cc: torvalds, ebiederm, mingo, opurdila, linux-kernel, linux-mm, mhocko, netdev, stable commit 9b5e383c11b08784 (net: Introduce unregister_netdevice_many()) left an active LIST_HEAD() in rollback_registered(), with possible memory corruption. Even if device is freed without touching its unreg_list (and therefore touching the previous memory location holding LISTE_HEAD(single), better close the bug for good, since its really subtle. (Same fix for default_device_exit_batch() for completeness) Reported-by: Michal Hocko <mhocko@suse.cz> Reported-by: Eric W. Biderman <ebiderman@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Octavian Purdila <opurdila@ixiacom.com> CC: Eric W. Biderman <ebiderman@xmission.com> CC: stable <stable@kernel.org> [.33+] --- net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index a18c164..8ae6631 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5066,6 +5066,7 @@ static void rollback_registered(struct net_device *dev) list_add(&dev->unreg_list, &single); rollback_registered_many(&single); + list_del(&single); } unsigned long netdev_fix_features(unsigned long features, const char *name) @@ -6219,6 +6220,7 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list) } } unregister_netdevice_many(&dev_kill_list); + list_del(&dev_kill_list); rtnl_unlock(); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] net: deinit automatic LIST_HEAD 2011-02-18 8:59 ` [PATCH 2/2] net: deinit automatic LIST_HEAD Eric Dumazet @ 2011-02-18 20:14 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2011-02-18 20:14 UTC (permalink / raw) To: eric.dumazet Cc: torvalds, ebiederm, mingo, opurdila, linux-kernel, linux-mm, mhocko, netdev, stable From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 18 Feb 2011 09:59:19 +0100 > commit 9b5e383c11b08784 (net: Introduce > unregister_netdevice_many()) left an active LIST_HEAD() in > rollback_registered(), with possible memory corruption. > > Even if device is freed without touching its unreg_list (and therefore > touching the previous memory location holding LISTE_HEAD(single), better > close the bug for good, since its really subtle. > > (Same fix for default_device_exit_batch() for completeness) > > Reported-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Eric W. Biderman <ebiderman@xmission.com> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20110218122938.GB26779@tiehlicka.suse.cz>]
[parent not found: <20110218162623.GD4862@tiehlicka.suse.cz>]
[parent not found: <AANLkTimO=M5xG_mnDBSxPKwSOTrp3JhHVBa8=wHsiVHY@mail.gmail.com>]
* Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 [not found] ` <AANLkTimO=M5xG_mnDBSxPKwSOTrp3JhHVBa8=wHsiVHY@mail.gmail.com> @ 2011-02-18 18:08 ` Eric W. Biederman 2011-02-18 18:48 ` Linus Torvalds 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2011-02-18 18:08 UTC (permalink / raw) To: Linus Torvalds Cc: Michal Hocko, Ingo Molnar, linux-mm, LKML, David Miller, Eric Dumazet, netdev Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Feb 18, 2011 at 8:26 AM, Michal Hocko <mhocko@suse.cz> wrote: >>> Now, I will try with the 2 patches patches in this thread. I will also >>> turn on DEBUG_LIST and DEBUG_PAGEALLOC. >> >> I am not able to reproduce with those 2 patches applied. > > Thanks for verifying. Davem/EricD - you can add Michal's tested-by to > the patches too. > > And I think we can consider this whole thing solved. It hopefully also > explains all the other random crashes that EricB saw - just random > memory corruption in other datastructures. > > EricB - do all your stress-testers run ok now? Things are looking better and PAGEALLOC debug isn't firing. So this looks like one bug down. I have not seen the bad page map symptom. I am still getting programs segfaulting but that is happening on other machines running on older kernels so I am going to chalk that up to a buggy test and a false positive. I am have OOM problems getting my tests run to complete. On a good day that happens about 1 time in 3 right now. I'm guess I will have to turn off DEBUG_PAGEALLOC to get everything to complete. DEBUG_PAGEALLOC causes us to use more memory doesn't it? The most interesting thing I have right now is a networking lockdep issue. Does anyone know what is going on there? Eric ================================= [ INFO: inconsistent lock state ] 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kworker/u:1/10833 [HC0[0]:SC0[0]:HE1:SE1] takes: (tcp_death_row.death_lock){+.?...}, at: [<ffffffff81460e69>] inet_twsk_deschedule+0x29/0xa0 {IN-SOFTIRQ-W} state was registered at: [<ffffffff810840ce>] __lock_acquire+0x70e/0x1d30 [<ffffffff81085cff>] lock_acquire+0x9f/0x120 [<ffffffff814deb6c>] _raw_spin_lock+0x2c/0x40 [<ffffffff8146066b>] inet_twsk_schedule+0x3b/0x1e0 [<ffffffff8147bf7d>] tcp_time_wait+0x20d/0x380 [<ffffffff8146b46e>] tcp_fin.clone.39+0x10e/0x1c0 [<ffffffff8146c628>] tcp_data_queue+0x798/0xd50 [<ffffffff8146fdd9>] tcp_rcv_state_process+0x799/0xbb0 [<ffffffff814786d8>] tcp_v4_do_rcv+0x238/0x500 [<ffffffff8147a90a>] tcp_v4_rcv+0x86a/0xbe0 [<ffffffff81455d4d>] ip_local_deliver_finish+0x10d/0x380 [<ffffffff81456180>] ip_local_deliver+0x80/0x90 [<ffffffff81455832>] ip_rcv_finish+0x192/0x5a0 [<ffffffff814563c4>] ip_rcv+0x234/0x300 [<ffffffff81420e83>] __netif_receive_skb+0x443/0x700 [<ffffffff81421d68>] netif_receive_skb+0xb8/0xf0 [<ffffffff81421ed8>] napi_skb_finish+0x48/0x60 [<ffffffff81422d35>] napi_gro_receive+0xb5/0xc0 [<ffffffffa006b4cf>] igb_poll+0x89f/0xd20 [igb] [<ffffffff81422279>] net_rx_action+0x149/0x270 [<ffffffff81054bc0>] __do_softirq+0xc0/0x1f0 [<ffffffff81003d1c>] call_softirq+0x1c/0x30 [<ffffffff81005825>] do_softirq+0xa5/0xe0 [<ffffffff81054dfd>] irq_exit+0x8d/0xa0 [<ffffffff81005391>] do_IRQ+0x61/0xe0 [<ffffffff814df793>] ret_from_intr+0x0/0x1a [<ffffffff810ec9ed>] ____pagevec_lru_add+0x16d/0x1a0 [<ffffffff810ed073>] lru_add_drain+0x73/0xe0 [<ffffffff8110a44c>] exit_mmap+0x5c/0x180 [<ffffffff8104aad2>] mmput+0x52/0xe0 [<ffffffff810513c0>] exit_mm+0x120/0x150 [<ffffffff81051522>] do_exit+0x132/0x8c0 [<ffffffff81051f39>] do_group_exit+0x59/0xd0 [<ffffffff81051fc2>] sys_exit_group+0x12/0x20 [<ffffffff81002d92>] system_call_fastpath+0x16/0x1b irq event stamp: 187417 hardirqs last enabled at (187417): [<ffffffff81127db5>] kmem_cache_free+0x125/0x160 hardirqs last disabled at (187416): [<ffffffff81127d02>] kmem_cache_free+0x72/0x160 softirqs last enabled at (187410): [<ffffffff81411c52>] sk_common_release+0x62/0xc0 softirqs last disabled at (187408): [<ffffffff814dec41>] _raw_write_lock_bh+0x11/0x40 other info that might help us debug this: 3 locks held by kworker/u:1/10833: #0: (netns){.+.+.+}, at: [<ffffffff81068be1>] process_one_work+0x121/0x4b0 #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff81068be1>] process_one_work+0x121/0x4b0 #2: (net_mutex){+.+.+.}, at: [<ffffffff8141c4a0>] cleanup_net+0x80/0x1b0 stack backtrace: Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 Call Trace: [<ffffffff810835b0>] ? print_usage_bug+0x170/0x180 [<ffffffff8108393f>] ? mark_lock+0x37f/0x400 [<ffffffff81084150>] ? __lock_acquire+0x790/0x1d30 [<ffffffff81083d8f>] ? __lock_acquire+0x3cf/0x1d30 [<ffffffff81124acf>] ? check_object+0xaf/0x270 [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 [<ffffffff81085cff>] ? lock_acquire+0x9f/0x120 [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 [<ffffffff81410f49>] ? __sk_free+0xd9/0x160 [<ffffffff814deb6c>] ? _raw_spin_lock+0x2c/0x40 [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180 [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180 [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20 [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60 [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0 [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0 [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0 [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0 [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330 [<ffffffff8106a500>] ? worker_thread+0x0/0x330 [<ffffffff8106f226>] ? kthread+0xb6/0xc0 [<ffffffff8108678d>] ? trace_hardirqs_on_caller+0x13d/0x180 [<ffffffff81003c24>] ? kernel_thread_helper+0x4/0x10 [<ffffffff814df854>] ? restore_args+0x0/0x30 [<ffffffff8106f170>] ? kthread+0x0/0xc0 [<ffffffff81003c20>] ? kernel_thread_helper+0x0/0x10 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 2011-02-18 18:08 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Eric W. Biederman @ 2011-02-18 18:48 ` Linus Torvalds 2011-02-18 19:01 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2011-02-18 18:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Michal Hocko, Ingo Molnar, linux-mm, LKML, David Miller, Eric Dumazet, netdev, Arnaldo Carvalho de Melo On Fri, Feb 18, 2011 at 10:08 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > I am still getting programs segfaulting but that is happening on other > machines running on older kernels so I am going to chalk that up to a > buggy test and a false positive. Ok. > I am have OOM problems getting my tests run to complete. On a good > day that happens about 1 time in 3 right now. I'm guess I will have > to turn off DEBUG_PAGEALLOC to get everything to complete. > DEBUG_PAGEALLOC causes us to use more memory doesn't it? It does use a bit more memory, but it shouldn't be _that_ noticeable. The real cost of DEBUG_PAGEALLOC is all the crazy page table operations and TLB flushes we do for each allocation/deallocation. So DEBUG_PAGEALLOC is very CPU-intensive, but it shouldn't have _that_ much of a memory overhead - just some trivial overhead due to not being able to use largepages for the normal kernel identity mappings. But there might be some other interaction with OOM that I haven't thought about. > The most interesting thing I have right now is a networking lockdep > issue. Does anyone know what is going on there? This seems to be a fairly straightforward bug. In net/ipv4/inet_timewait_sock.c we have this: /* These are always called from BH context. See callers in * tcp_input.c to verify this. */ /* This is for handling early-kills of TIME_WAIT sockets. */ void inet_twsk_deschedule(struct inet_timewait_sock *tw, struct inet_timewait_death_row *twdr) { spin_lock(&twdr->death_lock); .. and the intention is clearly that that spin_lock is BH-safe because it's called from BH context. Except that clearly isn't true. It's called from a worker thread: > stack backtrace: > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 > Call Trace: > [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 > [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180 > [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180 > [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20 > [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60 > [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0 > [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0 > [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0 > [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0 > [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330 so it can deadlock with a BH happening at the same time, afaik. The code (and comment) is all from 2005, it looks like the BH->worker thread has broken the code. But somebody who knows that code better should take a deeper look at it. Added acme to the cc, since the code is attributed to him back in 2005 ;). Although I don't know how active he's been in networking lately (seems to be all perf-related). Whatever, it can't hurt. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 2011-02-18 18:48 ` Linus Torvalds @ 2011-02-18 19:01 ` Arnaldo Carvalho de Melo 2011-02-18 19:11 ` Arnaldo Carvalho de Melo 2011-02-18 19:13 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Eric Dumazet 0 siblings, 2 replies; 12+ messages in thread From: Arnaldo Carvalho de Melo @ 2011-02-18 19:01 UTC (permalink / raw) To: Linus Torvalds Cc: Eric W. Biederman, Michal Hocko, Ingo Molnar, linux-mm, LKML, David Miller, Eric Dumazet, netdev Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu: > On Fri, Feb 18, 2011 at 10:08 AM, Eric W. Biederman > <ebiederm@xmission.com> wrote: > > > > I am still getting programs segfaulting but that is happening on other > > machines running on older kernels so I am going to chalk that up to a > > buggy test and a false positive. > > Ok. > > > I am have OOM problems getting my tests run to complete. On a good > > day that happens about 1 time in 3 right now. I'm guess I will have > > to turn off DEBUG_PAGEALLOC to get everything to complete. > > DEBUG_PAGEALLOC causes us to use more memory doesn't it? > > It does use a bit more memory, but it shouldn't be _that_ noticeable. > The real cost of DEBUG_PAGEALLOC is all the crazy page table > operations and TLB flushes we do for each allocation/deallocation. So > DEBUG_PAGEALLOC is very CPU-intensive, but it shouldn't have _that_ > much of a memory overhead - just some trivial overhead due to not > being able to use largepages for the normal kernel identity mappings. > > But there might be some other interaction with OOM that I haven't thought about. > > > The most interesting thing I have right now is a networking lockdep > > issue. Does anyone know what is going on there? > > This seems to be a fairly straightforward bug. > > In net/ipv4/inet_timewait_sock.c we have this: > > /* These are always called from BH context. See callers in > * tcp_input.c to verify this. > */ > > /* This is for handling early-kills of TIME_WAIT sockets. */ > void inet_twsk_deschedule(struct inet_timewait_sock *tw, > struct inet_timewait_death_row *twdr) > { > spin_lock(&twdr->death_lock); > .. > > and the intention is clearly that that spin_lock is BH-safe because > it's called from BH context. > > Except that clearly isn't true. It's called from a worker thread: > > > stack backtrace: > > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 > > Call Trace: > > [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 > > [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180 > > [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180 > > [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20 > > [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60 > > [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0 > > [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0 > > [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0 > > [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0 > > [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330 > > so it can deadlock with a BH happening at the same time, afaik. > > The code (and comment) is all from 2005, it looks like the BH->worker > thread has broken the code. But somebody who knows that code better > should take a deeper look at it. > > Added acme to the cc, since the code is attributed to him back in 2005 > ;). Although I don't know how active he's been in networking lately > (seems to be all perf-related). Whatever, it can't hurt. Original code is ANK's, I just made it possible to use with DCCP, and yeah, the smiley is appropriate, something 6 years old and the world around it changing continually... well, thanks for the git blame ;-) - Arnaldo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 2011-02-18 19:01 ` Arnaldo Carvalho de Melo @ 2011-02-18 19:11 ` Arnaldo Carvalho de Melo 2011-02-18 20:38 ` Eric W. Biederman 2011-02-18 19:13 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Eric Dumazet 1 sibling, 1 reply; 12+ messages in thread From: Arnaldo Carvalho de Melo @ 2011-02-18 19:11 UTC (permalink / raw) To: Linus Torvalds Cc: Eric W. Biederman, Michal Hocko, Ingo Molnar, linux-mm, LKML, David Miller, Eric Dumazet, netdev, Pavel Emelyanov Em Fri, Feb 18, 2011 at 05:01:28PM -0200, Arnaldo Carvalho de Melo escreveu: > Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu: > > This seems to be a fairly straightforward bug. > > > > In net/ipv4/inet_timewait_sock.c we have this: > > > > /* These are always called from BH context. See callers in > > * tcp_input.c to verify this. > > */ > > > > /* This is for handling early-kills of TIME_WAIT sockets. */ > > void inet_twsk_deschedule(struct inet_timewait_sock *tw, > > struct inet_timewait_death_row *twdr) > > { > > spin_lock(&twdr->death_lock); > > .. > > > > and the intention is clearly that that spin_lock is BH-safe because > > it's called from BH context. > > > > Except that clearly isn't true. It's called from a worker thread: > > > > > stack backtrace: > > > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 > > > Call Trace: > > > [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 > > > [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180 > > > [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180 > > > [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20 > > > [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60 > > > [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0 > > > [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0 > > > [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0 > > > [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0 > > > [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330 > > > > so it can deadlock with a BH happening at the same time, afaik. > > > > The code (and comment) is all from 2005, it looks like the BH->worker > > thread has broken the code. But somebody who knows that code better > > should take a deeper look at it. > > > > Added acme to the cc, since the code is attributed to him back in 2005 > > ;). Although I don't know how active he's been in networking lately > > (seems to be all perf-related). Whatever, it can't hurt. > > Original code is ANK's, I just made it possible to use with DCCP, and > yeah, the smiley is appropriate, something 6 years old and the world > around it changing continually... well, thanks for the git blame ;-) But yeah, your analisys seems correct, with the bug being introduced by one of these world around it changing continually issues, networking namespaces broke the rules of the game on its cleanup_net() routine, adding Pavel to the CC list since it doesn't hurt ;-) - Arnaldo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 2011-02-18 19:11 ` Arnaldo Carvalho de Melo @ 2011-02-18 20:38 ` Eric W. Biederman 2011-02-19 8:35 ` [PATCH] tcp: fix inet_twsk_deschedule() Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2011-02-18 20:38 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Linus Torvalds, Michal Hocko, Ingo Molnar, linux-mm, LKML, David Miller, Eric Dumazet, netdev, Pavel Emelyanov, Daniel Lezcano Arnaldo Carvalho de Melo <acme@redhat.com> writes: > Em Fri, Feb 18, 2011 at 05:01:28PM -0200, Arnaldo Carvalho de Melo escreveu: >> Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu: >> > This seems to be a fairly straightforward bug. >> > >> > In net/ipv4/inet_timewait_sock.c we have this: >> > >> > /* These are always called from BH context. See callers in >> > * tcp_input.c to verify this. >> > */ >> > >> > /* This is for handling early-kills of TIME_WAIT sockets. */ >> > void inet_twsk_deschedule(struct inet_timewait_sock *tw, >> > struct inet_timewait_death_row *twdr) >> > { >> > spin_lock(&twdr->death_lock); >> > .. >> > >> > and the intention is clearly that that spin_lock is BH-safe because >> > it's called from BH context. >> > >> > Except that clearly isn't true. It's called from a worker thread: >> > >> > > stack backtrace: >> > > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 >> > > Call Trace: >> > > [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 >> > > [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180 >> > > [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180 >> > > [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20 >> > > [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60 >> > > [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0 >> > > [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0 >> > > [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0 >> > > [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0 >> > > [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330 >> > >> > so it can deadlock with a BH happening at the same time, afaik. >> > >> > The code (and comment) is all from 2005, it looks like the BH->worker >> > thread has broken the code. But somebody who knows that code better >> > should take a deeper look at it. >> > >> > Added acme to the cc, since the code is attributed to him back in 2005 >> > ;). Although I don't know how active he's been in networking lately >> > (seems to be all perf-related). Whatever, it can't hurt. >> >> Original code is ANK's, I just made it possible to use with DCCP, and >> yeah, the smiley is appropriate, something 6 years old and the world >> around it changing continually... well, thanks for the git blame ;-) > > But yeah, your analisys seems correct, with the bug being introduced by > one of these world around it changing continually issues, networking > namespaces broke the rules of the game on its cleanup_net() routine, > adding Pavel to the CC list since it doesn't hurt ;-) Which probably gets the bug back around to me. I guess this must be one of those ipv4 cases that where the cleanup simply did not exist in the rmmod sense that we had to invent. I think that was Daniel who did the time wait sockets. I do remember they were a real pain. Would a bh_disable be sufficient? I guess I should stop remembering and look at the code now. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] tcp: fix inet_twsk_deschedule() 2011-02-18 20:38 ` Eric W. Biederman @ 2011-02-19 8:35 ` Eric Dumazet 2011-02-20 2:59 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2011-02-19 8:35 UTC (permalink / raw) To: Eric W. Biederman, David Miller Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Michal Hocko, Ingo Molnar, linux-mm, LKML, netdev, Pavel Emelyanov, Daniel Lezcano Le vendredi 18 février 2011 à 12:38 -0800, Eric W. Biederman a écrit : > Arnaldo Carvalho de Melo <acme@redhat.com> writes: > > > Em Fri, Feb 18, 2011 at 05:01:28PM -0200, Arnaldo Carvalho de Melo escreveu: > >> Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu: > >> > This seems to be a fairly straightforward bug. > >> > > >> > In net/ipv4/inet_timewait_sock.c we have this: > >> > > >> > /* These are always called from BH context. See callers in > >> > * tcp_input.c to verify this. > >> > */ > >> > > >> > /* This is for handling early-kills of TIME_WAIT sockets. */ > >> > void inet_twsk_deschedule(struct inet_timewait_sock *tw, > >> > struct inet_timewait_death_row *twdr) > >> > { > >> > spin_lock(&twdr->death_lock); > >> > .. > >> > > >> > and the intention is clearly that that spin_lock is BH-safe because > >> > it's called from BH context. > >> > > >> > Except that clearly isn't true. It's called from a worker thread: > >> > > >> > > stack backtrace: > >> > > Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 > >> > > Call Trace: > >> > > [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 > >> > > [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180 > >> > > [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180 > >> > > [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20 > >> > > [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60 > >> > > [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0 > >> > > [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0 > >> > > [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0 > >> > > [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0 > >> > > [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330 > >> > > >> > so it can deadlock with a BH happening at the same time, afaik. > >> > > >> > The code (and comment) is all from 2005, it looks like the BH->worker > >> > thread has broken the code. But somebody who knows that code better > >> > should take a deeper look at it. > >> > > >> > Added acme to the cc, since the code is attributed to him back in 2005 > >> > ;). Although I don't know how active he's been in networking lately > >> > (seems to be all perf-related). Whatever, it can't hurt. > >> > >> Original code is ANK's, I just made it possible to use with DCCP, and > >> yeah, the smiley is appropriate, something 6 years old and the world > >> around it changing continually... well, thanks for the git blame ;-) > > > > But yeah, your analisys seems correct, with the bug being introduced by > > one of these world around it changing continually issues, networking > > namespaces broke the rules of the game on its cleanup_net() routine, > > adding Pavel to the CC list since it doesn't hurt ;-) > > Which probably gets the bug back around to me. > > I guess this must be one of those ipv4 cases that where the cleanup > simply did not exist in the rmmod sense that we had to invent. > > I think that was Daniel who did the time wait sockets. I do remember > they were a real pain. > > Would a bh_disable be sufficient? I guess I should stop remembering and > look at the code now. > Here is the patch to fix the problem Daniel commit (d315492b1a6ba29d (netns : fix kernel panic in timewait socket destruction) was OK (it did use local_bh_disable()) Problem comes from commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) added in 2.6.33 Thanks ! [PATCH] tcp: fix inet_twsk_deschedule() Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule() This is caused by inet_twsk_purge(), run from process context, and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) removed the BH disabling that was necessary. Add the BH disabling but fine grained, right before calling inet_twsk_deschedule(), instead of whole function. With help from Linus Torvalds and Eric W. Biederman Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Lezcano <daniel.lezcano@free.fr> CC: Pavel Emelyanov <xemul@openvz.org> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: stable <stable@kernel.org> (# 2.6.33+) --- net/ipv4/inet_timewait_sock.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index c5af909..3c8dfa1 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -505,7 +505,9 @@ restart: } rcu_read_unlock(); + local_bh_disable(); inet_twsk_deschedule(tw, twdr); + local_bh_enable(); inet_twsk_put(tw); goto restart_rcu; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] tcp: fix inet_twsk_deschedule() 2011-02-19 8:35 ` [PATCH] tcp: fix inet_twsk_deschedule() Eric Dumazet @ 2011-02-20 2:59 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2011-02-20 2:59 UTC (permalink / raw) To: eric.dumazet Cc: ebiederm, acme, torvalds, mhocko, mingo, linux-mm, linux-kernel, netdev, xemul, daniel.lezcano From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 19 Feb 2011 09:35:56 +0100 > [PATCH] tcp: fix inet_twsk_deschedule() > > Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule() > > This is caused by inet_twsk_purge(), run from process context, > and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) > removed the BH disabling that was necessary. > > Add the BH disabling but fine grained, right before calling > inet_twsk_deschedule(), instead of whole function. > > With help from Linus Torvalds and Eric W. Biederman > > Reported-by: Eric W. Biederman <ebiederm@xmission.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: Daniel Lezcano <daniel.lezcano@free.fr> > CC: Pavel Emelyanov <xemul@openvz.org> > CC: Arnaldo Carvalho de Melo <acme@redhat.com> > CC: stable <stable@kernel.org> (# 2.6.33+) Applied, thanks Eric. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 2011-02-18 19:01 ` Arnaldo Carvalho de Melo 2011-02-18 19:11 ` Arnaldo Carvalho de Melo @ 2011-02-18 19:13 ` Eric Dumazet 1 sibling, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2011-02-18 19:13 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Linus Torvalds, Eric W. Biederman, Michal Hocko, Ingo Molnar, linux-mm, LKML, David Miller, netdev Le vendredi 18 février 2011 à 17:01 -0200, Arnaldo Carvalho de Melo a écrit : > Original code is ANK's, I just made it possible to use with DCCP, and > yeah, the smiley is appropriate, something 6 years old and the world > around it changing continually... well, thanks for the git blame ;-) > At that time, net namespaces did not exist. I would blame people implementing them ;) ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-02-20 2:59 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110216185234.GA11636@tiehlicka.suse.cz>
[not found] ` <20110216193700.GA6377@elte.hu>
[not found] ` <AANLkTinDxxbVjrUViCs=UaMD9Wg9PR7b0ShNud5zKE3w@mail.gmail.com>
[not found] ` <AANLkTi=xnbcs5BKj3cNE_aLtBO7W5m+2uaUacu7M8g_S@mail.gmail.com>
[not found] ` <20110217090910.GA3781@tiehlicka.suse.cz>
[not found] ` <AANLkTikPKpNHxDQAYBd3fiQsmVozLtCVDsNn=+eF_q2r@mail.gmail.com>
[not found] ` <20110217163531.GF14168@elte.hu>
[not found] ` <m1pqqqfpzh.fsf@fess.ebiederm.org>
[not found] ` <AANLkTinB=EgDGNv-v-qD-MvHVAmstfP_CyyLNhhotkZx@mail.gmail.com>
[not found] ` <m1sjvm822m.fsf@fess.ebiederm.org>
[not found] ` <AANLkTimzP0UNRXutkt1zJ+OGhmeg6ga87HFyMuZQmpMj@mail.gmail.com>
[not found] ` <20110217.203647.193696765.davem@davemloft.net>
[not found] ` <1298010320.2642.7.camel@edumazet-laptop>
[not found] ` <1298014191.2642.11.camel@edumazet-laptop>
2011-02-18 8:54 ` [PATCH 1/2] net: dont leave active on stack LIST_HEAD Eric Dumazet
2011-02-18 20:14 ` David Miller
[not found] ` <m17hcx7wca.fsf@fess.ebiederm.org>
2011-02-18 8:59 ` [PATCH 2/2] net: deinit automatic LIST_HEAD Eric Dumazet
2011-02-18 20:14 ` David Miller
[not found] ` <20110218122938.GB26779@tiehlicka.suse.cz>
[not found] ` <20110218162623.GD4862@tiehlicka.suse.cz>
[not found] ` <AANLkTimO=M5xG_mnDBSxPKwSOTrp3JhHVBa8=wHsiVHY@mail.gmail.com>
2011-02-18 18:08 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Eric W. Biederman
2011-02-18 18:48 ` Linus Torvalds
2011-02-18 19:01 ` Arnaldo Carvalho de Melo
2011-02-18 19:11 ` Arnaldo Carvalho de Melo
2011-02-18 20:38 ` Eric W. Biederman
2011-02-19 8:35 ` [PATCH] tcp: fix inet_twsk_deschedule() Eric Dumazet
2011-02-20 2:59 ` David Miller
2011-02-18 19:13 ` BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox