echo 3 > /proc/.../drop_caches goes mad with 3.1-rc6, maybe fsnotify related

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* echo 3 > /proc/.../drop_caches goes mad with 3.1-rc6, maybe fsnotify related
@ 2011-09-15 20:05 Tino Keitel
  2011-09-15 21:42 ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Tino Keitel @ 2011-09-15 20:05 UTC (permalink / raw)
  To: linux-kernel

Hi,

"echo 3 > /proc/sys/vm/drop_caches" does not return here, and in the
kernel log I see the log entries below. In fact, the computer becomes
partly unusable regarding disk access, and I have to reboot.

I currently use 3.1-rc6, but it also happened with older 3.1-rc
kernels.

As fsnotify is showing up in the trace: I have an inotify_wait always
running which triggers a mail queue run if something happens in my mail
queue directory.

INFO: rcu_sched_state detected stall on CPU 1 (t=18000 jiffies)
INFO: rcu_sched_state detected stall on CPU 1 (t=72030 jiffies)
INFO: rcu_sched_state detected stall on CPU 1 (t=126060 jiffies)
INFO: rcu_sched_state detected stall on CPU 1 (t=180090 jiffies)
INFO: rcu_sched_state detected stall on CPU 1 (t=234120 jiffies)
INFO: rcu_sched_state detected stall on CPU 1 (t=288150 jiffies)
INFO: task fsnotify_mark:491 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fsnotify_mark   D ffff88021fb10700     0   491      2 0x00000000
 ffff88021eac20d0 0000000000000046 ffff880200000000 ffff88021e8be0d0
 ffff880216497fd8 ffff880216497fd8 ffff880216497fd8 ffff88021eac20d0
 ffff880216497e4c 0000000181037707 0000000200000086 ffffffff819577b0
Call Trace:
 [<ffffffff814de368>] ? __mutex_lock_slowpath+0xc8/0x140
 [<ffffffff8108ae20>] ? synchronize_rcu_bh+0x60/0x60
 [<ffffffff814de013>] ? mutex_lock+0x23/0x40
 [<ffffffff8106468c>] ? __synchronize_srcu+0x2c/0xc0
 [<ffffffff81103583>] ? fsnotify_mark_destroy+0x83/0x160
 [<ffffffff8105fca0>] ? add_wait_queue+0x60/0x60
 [<ffffffff81103500>] ? fsnotify_put_mark+0x20/0x20
 [<ffffffff8105f53e>] ? kthread+0x7e/0x90
 [<ffffffff814e0b74>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff8105f4c0>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff814e0b70>] ? gs_change+0xb/0xb
INFO: task inotifywait:25496 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
inotifywait     D ffff88021fa10700     0 25496   2060 0x00000000
 ffff88006ef46650 0000000000000046 ffff880200000000 ffffffff81826020
 ffff88011355bfd8 ffff88011355bfd8 ffff88011355bfd8 ffff88006ef46650
 000000000c800000 0000000100000000 0000000000000002 ffff88011355bd88
Call Trace:
 [<ffffffff814ddc55>] ? schedule_timeout+0x1c5/0x240
 [<ffffffff814d89dd>] ? cache_alloc_refill+0x84/0x4c5
 [<ffffffff8124e997>] ? idr_remove+0x127/0x1c0
 [<ffffffff814dd61b>] ? wait_for_common+0xcb/0x160
 [<ffffffff8103ef00>] ? try_to_wake_up+0x270/0x270
 [<ffffffff8108ae20>] ? synchronize_rcu_bh+0x60/0x60
 [<ffffffff8108ae6d>] ? synchronize_sched+0x4d/0x60
 [<ffffffff8105ca60>] ? find_ge_pid+0x40/0x40
 [<ffffffff810646c3>] ? __synchronize_srcu+0x63/0xc0
 [<ffffffff81102e41>] ? fsnotify_put_group+0x21/0x40
 [<ffffffff81104838>] ? inotify_release+0x18/0x20
 [<ffffffff810d096a>] ? fput+0xea/0x240
 [<ffffffff810cd1ef>] ? filp_close+0x5f/0x90
 [<ffffffff81047116>] ? put_files_struct+0x76/0xe0

Regards,
Tino

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: echo 3 > /proc/.../drop_caches goes mad with 3.1-rc6, maybe fsnotify related
  2011-09-15 20:05 echo 3 > /proc/.../drop_caches goes mad with 3.1-rc6, maybe fsnotify related Tino Keitel
@ 2011-09-15 21:42 ` Hugh Dickins
  2011-09-16 18:19   ` Tino Keitel
  0 siblings, 1 reply; 3+ messages in thread
From: Hugh Dickins @ 2011-09-15 21:42 UTC (permalink / raw)
  To: Tino Keitel; +Cc: Shaohua Li, linux-kernel

On Thu, 15 Sep 2011, Tino Keitel wrote:
> 
> "echo 3 > /proc/sys/vm/drop_caches" does not return here, and in the
> kernel log I see the log entries below. In fact, the computer becomes
> partly unusable regarding disk access, and I have to reboot.
> 
> I currently use 3.1-rc6, but it also happened with older 3.1-rc
> kernels.
> 
> As fsnotify is showing up in the trace: I have an inotify_wait always
> running which triggers a mail queue run if something happens in my mail
> queue directory.
> 
> INFO: rcu_sched_state detected stall on CPU 1 (t=18000 jiffies)
> INFO: rcu_sched_state detected stall on CPU 1 (t=72030 jiffies)
> INFO: rcu_sched_state detected stall on CPU 1 (t=126060 jiffies)
> INFO: rcu_sched_state detected stall on CPU 1 (t=180090 jiffies)
> INFO: rcu_sched_state detected stall on CPU 1 (t=234120 jiffies)
> INFO: rcu_sched_state detected stall on CPU 1 (t=288150 jiffies)
> INFO: task fsnotify_mark:491 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> fsnotify_mark   D ffff88021fb10700     0   491      2 0x00000000
>  ffff88021eac20d0 0000000000000046 ffff880200000000 ffff88021e8be0d0
>  ffff880216497fd8 ffff880216497fd8 ffff880216497fd8 ffff88021eac20d0
>  ffff880216497e4c 0000000181037707 0000000200000086 ffffffff819577b0
> Call Trace:
>  [<ffffffff814de368>] ? __mutex_lock_slowpath+0xc8/0x140
>  [<ffffffff8108ae20>] ? synchronize_rcu_bh+0x60/0x60
>  [<ffffffff814de013>] ? mutex_lock+0x23/0x40
>  [<ffffffff8106468c>] ? __synchronize_srcu+0x2c/0xc0
>  [<ffffffff81103583>] ? fsnotify_mark_destroy+0x83/0x160
>  [<ffffffff8105fca0>] ? add_wait_queue+0x60/0x60
>  [<ffffffff81103500>] ? fsnotify_put_mark+0x20/0x20
>  [<ffffffff8105f53e>] ? kthread+0x7e/0x90
>  [<ffffffff814e0b74>] ? kernel_thread_helper+0x4/0x10
>  [<ffffffff8105f4c0>] ? kthread_worker_fn+0x180/0x180
>  [<ffffffff814e0b70>] ? gs_change+0xb/0xb
> INFO: task inotifywait:25496 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> inotifywait     D ffff88021fa10700     0 25496   2060 0x00000000
>  ffff88006ef46650 0000000000000046 ffff880200000000 ffffffff81826020
>  ffff88011355bfd8 ffff88011355bfd8 ffff88011355bfd8 ffff88006ef46650
>  000000000c800000 0000000100000000 0000000000000002 ffff88011355bd88
> Call Trace:
>  [<ffffffff814ddc55>] ? schedule_timeout+0x1c5/0x240
>  [<ffffffff814d89dd>] ? cache_alloc_refill+0x84/0x4c5
>  [<ffffffff8124e997>] ? idr_remove+0x127/0x1c0
>  [<ffffffff814dd61b>] ? wait_for_common+0xcb/0x160
>  [<ffffffff8103ef00>] ? try_to_wake_up+0x270/0x270
>  [<ffffffff8108ae20>] ? synchronize_rcu_bh+0x60/0x60
>  [<ffffffff8108ae6d>] ? synchronize_sched+0x4d/0x60
>  [<ffffffff8105ca60>] ? find_ge_pid+0x40/0x40
>  [<ffffffff810646c3>] ? __synchronize_srcu+0x63/0xc0
>  [<ffffffff81102e41>] ? fsnotify_put_group+0x21/0x40
>  [<ffffffff81104838>] ? inotify_release+0x18/0x20
>  [<ffffffff810d096a>] ? fput+0xea/0x240
>  [<ffffffff810cd1ef>] ? filp_close+0x5f/0x90
>  [<ffffffff81047116>] ? put_files_struct+0x76/0xe0

Although these stacktraces don't implicate find_get_pages() at all,
please try Shaohua's fix below (see thread: [BUG] infinite loop in
find_get_pages()), which Linus put in his tree yesterday.

Hugh

Subject: mm: account skipped entries to avoid looping in find_get_pages

The found entries by find_get_pages() could be all swap entries. In
this case we skip the entries, but make sure the skipped entries are
accounted, so we don't keep looping.
Using nr_found > nr_skip to simplify code as suggested by Eric.

Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>

diff --git a/mm/filemap.c b/mm/filemap.c
index 645a080..7771871 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -827,13 +827,14 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
 {
 	unsigned int i;
 	unsigned int ret;
-	unsigned int nr_found;
+	unsigned int nr_found, nr_skip;
 
 	rcu_read_lock();
 restart:
 	nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
 				(void ***)pages, NULL, start, nr_pages);
 	ret = 0;
+	nr_skip = 0;
 	for (i = 0; i < nr_found; i++) {
 		struct page *page;
 repeat:
@@ -856,6 +857,7 @@ repeat:
 			 * here as an exceptional entry: so skip over it -
 			 * we only reach this from invalidate_mapping_pages().
 			 */
+			nr_skip++;
 			continue;
 		}
 
@@ -876,7 +878,7 @@ repeat:
 	 * If all entries were removed before we could secure them,
 	 * try again, because callers stop trying once 0 is returned.
 	 */
-	if (unlikely(!ret && nr_found))
+	if (unlikely(!ret && nr_found > nr_skip))
 		goto restart;
 	rcu_read_unlock();
 	return ret;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: echo 3 > /proc/.../drop_caches goes mad with 3.1-rc6, maybe fsnotify related
  2011-09-15 21:42 ` Hugh Dickins
@ 2011-09-16 18:19   ` Tino Keitel
  0 siblings, 0 replies; 3+ messages in thread
From: Tino Keitel @ 2011-09-16 18:19 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Shaohua Li, linux-kernel

On Thu, Sep 15, 2011 at 14:42:05 -0700, Hugh Dickins wrote:

[...]

> Although these stacktraces don't implicate find_get_pages() at all,
> please try Shaohua's fix below (see thread: [BUG] infinite loop in
> find_get_pages()), which Linus put in his tree yesterday.

Hi,

thanks, I upgraded to git master
c455ea4f122d21c91fcf4c36c3f0c08535ba3ce8 and the problem is gone.

Regards,
Tino


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-09-16 18:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-15 20:05 echo 3 > /proc/.../drop_caches goes mad with 3.1-rc6, maybe fsnotify related Tino Keitel
2011-09-15 21:42 ` Hugh Dickins
2011-09-16 18:19   ` Tino Keitel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox