All of lore.kernel.org
 help / color / mirror / Atom feed
* How can we report being OOM kiiled ?
@ 2013-08-01  9:57 Erwan Velu
  2013-08-01 17:29 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Erwan Velu @ 2013-08-01  9:57 UTC (permalink / raw)
  To: fio@vger.kernel.org

Hi,

I'm currently facing a weird issue with fio 2.0.8.

I'm running the following job that is supposed to write and last at 
least 300 seconds.
It just exit almost immediately. After a short search, I saw that I got 
OOM killed.
[1732289.080181] Killed process 3175 (fio) total-vm:225292kB, 
anon-rss:131440kB, file-rss:0kB

My first though was, oh... fio did something wrong while it was just 
killed in the head. Is there any way to report that we got killed ? That 
would be very valuable to know that fio got stopped too early and result 
are incomplete.

cheers,

[global]
ioengine=libaio
invalidate=1
ramp_time=5
iodepth=32
runtime=300
time_based
direct=1

[write-vdb-4m-para]
bs=4m
stonewall
filename=/dev/vdb
rw=write
write_bw_log=vm1-1-4m-vdb-write-para.results
write_iops_log=vm1-1-4m-vdb-write-para.results


I'm used to run it but since a few, it does perform like :

[root@host] fio vm1-1-4m-parallel-write-vdb.fio
write-vdb-4m-para: (g=0): rw=write, bs=4M-4M/4M-4M, ioengine=libaio, 
iodepth=32
2.0.8
Starting 1 process
fio: pid=3147, got signal=9

write-vdb-4m-para: (groupid=0, jobs=1): err= 0: pid=3147
   cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
   IO depths    : 1=2.2%, 2=4.3%, 4=8.7%, 8=17.4%, 16=34.8%, 32=32.6%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued    : total=r=0/w=46/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):

Disk stats (read/write):
   vdb: ios=0/376, merge=0/0, ticks=0/24204, in_queue=24204, util=33.25%
fio: file hash not empty on exit


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: How can we report being OOM kiiled ?
  2013-08-01  9:57 How can we report being OOM kiiled ? Erwan Velu
@ 2013-08-01 17:29 ` Jens Axboe
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2013-08-01 17:29 UTC (permalink / raw)
  To: Erwan Velu; +Cc: fio@vger.kernel.org

On 08/01/2013 03:57 AM, Erwan Velu wrote:
> Hi,
> 
> I'm currently facing a weird issue with fio 2.0.8.
> 
> I'm running the following job that is supposed to write and last at
> least 300 seconds.
> It just exit almost immediately. After a short search, I saw that I got
> OOM killed.
> [1732289.080181] Killed process 3175 (fio) total-vm:225292kB,
> anon-rss:131440kB, file-rss:0kB
> 
> My first though was, oh... fio did something wrong while it was just
> killed in the head. Is there any way to report that we got killed ? That
> would be very valuable to know that fio got stopped too early and result
> are incomplete.
> 
> cheers,
> 
> [global]
> ioengine=libaio
> invalidate=1
> ramp_time=5
> iodepth=32
> runtime=300
> time_based
> direct=1
> 
> [write-vdb-4m-para]
> bs=4m
> stonewall
> filename=/dev/vdb
> rw=write
> write_bw_log=vm1-1-4m-vdb-write-para.results
> write_iops_log=vm1-1-4m-vdb-write-para.results
> 
> 
> I'm used to run it but since a few, it does perform like :
> 
> [root@host] fio vm1-1-4m-parallel-write-vdb.fio
> write-vdb-4m-para: (g=0): rw=write, bs=4M-4M/4M-4M, ioengine=libaio,
> iodepth=32
> 2.0.8
> Starting 1 process
> fio: pid=3147, got signal=9
> 
> write-vdb-4m-para: (groupid=0, jobs=1): err= 0: pid=3147
>   cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
>   IO depths    : 1=2.2%, 2=4.3%, 4=8.7%, 8=17.4%, 16=34.8%, 32=32.6%,
>>=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.0%
>      issued    : total=r=0/w=46/d=0, short=r=0/w=0/d=0
> 
> Run status group 0 (all jobs):
> 
> Disk stats (read/write):
>   vdb: ios=0/376, merge=0/0, ticks=0/24204, in_queue=24204, util=33.25%
> fio: file hash not empty on exit

It's the iops and bw logging. Fio doesn't flush these until the job is
done, so if you are tight on memory, then I'm sure that would make the
OOM killer consider fio an ever growing monster.

At some point I had a patch to cap the number of entries and flush them
out periodically. Fio doesn't do this right now to avoid perturbing the
workload. But, arguably, using too much memory is even worse. So if you
feel up to it, it would not hurt to add this logic to the log handling.

Right now setup_log() sets up the initial log and allocates a stack of
entries. __add_log_sample() will increase the size of the log as needed
when adding entries. finish_log() flushes it out, that's done when the
job has completed.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-08-01 17:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-01  9:57 How can we report being OOM kiiled ? Erwan Velu
2013-08-01 17:29 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.