Flexible I/O Tester development
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Mark Nelson <mark.a.nelson@gmail.com>, fio@vger.kernel.org
Subject: Re: bug in logging code?
Date: Wed, 02 Apr 2014 12:55:43 -0600	[thread overview]
Message-ID: <533C5D2F.1080308@kernel.dk> (raw)
In-Reply-To: <533C55E8.1010108@gmail.com>

On 04/02/2014 12:24 PM, Mark Nelson wrote:
> On 04/02/2014 12:55 PM, Jens Axboe wrote:
>> On 04/02/2014 11:53 AM, Mark Nelson wrote:
>>> On 04/02/2014 12:38 PM, Jens Axboe wrote:
>>>> On 04/02/2014 09:23 AM, Mark Nelson wrote:
>>>>> I've been playing around with usng the various write_*_log options in
>>>>> fio to record run statistics.  What I'm finding is that some of the
>>>>> logs
>>>>> have malformed lines in them.  The following snippet is from the bw
>>>>> log:
>>>>>
>>>>> 811820, 12263, 0, 4194304
>>>>> 812619, 35929, 0, 4194304
>>>>> 818437, 8448, 0, 419430, 4194304
>>>>> 1256299, 160627, 0, 4194304
>>>>> 1256831, 154274, 0, 4194304
>>>>>
>>>>> Notice that the malformed line appears to be a partial write that gets
>>>>> truncated.  The fio command being used to run the tests is
>>>>> autogenerated, so job files aren't used.  here's the one that created
>>>>> the above log snippet:
>>>>>
>>>>> /home/nhm/bin/fio
>>>>> --rw=read
>>>>> --ioengine=libaio
>>>>> --runtime=7200
>>>>> --ramp_time=0
>>>>> --numjobs=1
>>>>> --direct=1
>>>>> --bs=4194304B
>>>>> --iodepth=16
>>>>> --size=9216M
>>>>> --write_iops_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --write_bw_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --write_lat_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --time_based
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-0
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-1
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-2
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-3
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-4
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-5
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-6
>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-7
>>>>>
>>>>> The --numjobs=1 line should be removed, but I believe it gets
>>>>> overridden
>>>>> anyway with multiple name parameters.  I wonder if having multiple
>>>>> name
>>>>> parameters (ie multiple fio processes) is causing those processes to
>>>>> all
>>>>> try to write to the same log files, potentially at the same time?  Any
>>>>> thoughts would be much appreicated!
>>>>
>>>> What version of fio are you running?
>>>>
>>>
>>> f42557f (ie master from yesterday) with a slightly different fix for the
>>> basename issues that just got resolved.
>>>
>>> I can verify now that running fio with a single --name flag appears to
>>> generate logs without malformated lines.
>>
>> OK, I'll take a look, We should already be serializing the writing to
>> the same log files, but your example definitely looks like we are not.
>>
>> BTW, the numjobs option applies to each --name section. Just to clear
>> that up from your original email. So you really get numjobs *
>> name_sections running.
>
> Aha, thanks for clearing that up!
>
> Is github the right place to create new issues?  If so, want me to
> create one for this?

Sure, you can go ahead and do that.

I'm puzzled by this report. The writeout of the logs is fully 
serialized, there should be no interleaving. Looking at your output, 
it's definitely either the output from two jobs, or mangled output from 
just one job. The corrupted line is at time 818437, and the next line is 
much further ahead. Normally you would expect to see:

500, some-bw, 0, blocksize		<- job X
1000, some-bw, 0, blocksize
[...]
end-time, some-bw, 0, blocksize		<- last entry from job X
500, some-bw, 0, blocksize		<- job Y now starts

Note that the log writing always appends. Did you possibly have this 
half-done files in that location from a previous run?

-- 
Jens Axboe



  reply	other threads:[~2014-04-02 18:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-02 15:23 bug in logging code? Mark Nelson
2014-04-02 17:38 ` Jens Axboe
2014-04-02 17:53   ` Mark Nelson
2014-04-02 17:55     ` Jens Axboe
2014-04-02 18:24       ` Mark Nelson
2014-04-02 18:55         ` Jens Axboe [this message]
2014-04-02 20:38           ` Mark Nelson
2014-04-03  3:24             ` Jens Axboe
2014-04-04  4:58               ` Andrey Kuzmin
2014-04-04  5:05                 ` Jens Axboe
2014-05-28 14:01               ` Mark Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=533C5D2F.1080308@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=fio@vger.kernel.org \
    --cc=mark.a.nelson@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox