linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Java Stop-the-World GC stall induced by FS flush or many large file deletions
@ 2013-09-12  4:17 Cuong Tran
  2013-09-12  5:32 ` Sidorov, Andrei
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Cuong Tran @ 2013-09-12  4:17 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel

We have seen GC stalls that are NOT due to memory usage of applications.

GC log reports the CPU user and system time of GC threads, which are
almost 0, and stop-the-world time, which can be multiple seconds. This
indicates GC threads are waiting for IO but GC threads should be
CPU-bound in user mode.

We could reproduce the problems using a simple Java program that just
appends to a log file via log4j. If the test just runs by itself, it
does not incur any GC stalls. However, if we run a script that enters
a loop to create multiple large file via falloc() and then deletes
them, then GC stall of 1+ seconds can happen fairly predictably.

We can also reproduce the problem by periodically switch the log and
gzip the older log. IO device, a single disk drive, is overloaded by
FS flush when this happens.

Our guess is GC has to acquiesce its threads and if one of the threads
is stuck in the kernel (say in non-interruptible mode). Then GC has to
wait until this thread unblocks. In the mean time, it already stops
the world.

Another test that shows similar problem is doing deferred writes to
append a file. Latency of deferred writes is very fast but once a
while, it can last more than 1 second.

We would really appreciate if you could shed some light on possible
causes? (Threads blocked because of journal check point, delayed
allocation can't proceed?). We could alleviate the problem by
configuring expire_centisecs and writeback_centisecs to flush more
frequently, and thus even-out the workload to the disk drive. But we
would like to know if there  is a methodology to model the rate of
flush vs. rate of changes and IO throughput of the drive (SAS, 15K
RPM).

Many thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-09-14 18:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-12  4:17 Java Stop-the-World GC stall induced by FS flush or many large file deletions Cuong Tran
2013-09-12  5:32 ` Sidorov, Andrei
2013-09-12  5:45   ` Cuong Tran
2013-09-12  6:01     ` Sidorov, Andrei
2013-09-12  6:02     ` Sidorov, Andrei
2013-09-12  6:08       ` Cuong Tran
2013-09-12 15:47         ` Sidorov, Andrei
2013-09-12 16:58           ` Cuong Tran
2013-09-12  7:46 ` Zheng Liu
2013-09-12 11:46   ` Cuong Tran
2013-09-12 19:02 ` Theodore Ts'o
2013-09-13 15:25   ` Cuong Tran
2013-09-13 18:36     ` Theodore Ts'o
2013-09-14 18:47       ` Cuong Tran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).