From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <533CD452.7080205@kernel.dk> Date: Wed, 02 Apr 2014 21:24:02 -0600 From: Jens Axboe MIME-Version: 1.0 Subject: Re: bug in logging code? References: <533C2B54.6030606@gmail.com> <533C4B06.9070108@kernel.dk> <533C4E9D.8050901@gmail.com> <533C4F2B.5070708@kernel.dk> <533C55E8.1010108@gmail.com> <533C5D2F.1080308@kernel.dk> <533C755D.1070601@gmail.com> In-Reply-To: <533C755D.1070601@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit To: Mark Nelson , fio@vger.kernel.org List-ID: On 2014-04-02 14:38, Mark Nelson wrote: > On 04/02/2014 01:55 PM, Jens Axboe wrote: >> On 04/02/2014 12:24 PM, Mark Nelson wrote: >>> On 04/02/2014 12:55 PM, Jens Axboe wrote: >>>> On 04/02/2014 11:53 AM, Mark Nelson wrote: >>>>> On 04/02/2014 12:38 PM, Jens Axboe wrote: >>>>>> On 04/02/2014 09:23 AM, Mark Nelson wrote: >>>>>>> I've been playing around with usng the various write_*_log >>>>>>> options in >>>>>>> fio to record run statistics. What I'm finding is that some of the >>>>>>> logs >>>>>>> have malformed lines in them. The following snippet is from the bw >>>>>>> log: >>>>>>> >>>>>>> 811820, 12263, 0, 4194304 >>>>>>> 812619, 35929, 0, 4194304 >>>>>>> 818437, 8448, 0, 419430, 4194304 >>>>>>> 1256299, 160627, 0, 4194304 >>>>>>> 1256831, 154274, 0, 4194304 >>>>>>> >>>>>>> Notice that the malformed line appears to be a partial write that >>>>>>> gets >>>>>>> truncated. The fio command being used to run the tests is >>>>>>> autogenerated, so job files aren't used. here's the one that >>>>>>> created >>>>>>> the above log snippet: >>>>>>> >>>>>>> /home/nhm/bin/fio >>>>>>> --rw=read >>>>>>> --ioengine=libaio >>>>>>> --runtime=7200 >>>>>>> --ramp_time=0 >>>>>>> --numjobs=1 >>>>>>> --direct=1 >>>>>>> --bs=4194304B >>>>>>> --iodepth=16 >>>>>>> --size=9216M >>>>>>> --write_iops_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> --write_bw_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> --write_lat_log=/tmp/cbt/ceph/kvmrbdfio/osd_ra-00004096/client_ra-00000128/op_size-04194304/concurrent_procs-008/iodepth-016/read/output >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> --time_based >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-0 >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-1 >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-2 >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-3 >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-4 >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-5 >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-6 >>>>>>> --name=/srv/rbdfio-`hostname -s`-0/cbt-kvmrbdfio-7 >>>>>>> >>>>>>> The --numjobs=1 line should be removed, but I believe it gets >>>>>>> overridden >>>>>>> anyway with multiple name parameters. I wonder if having multiple >>>>>>> name >>>>>>> parameters (ie multiple fio processes) is causing those processes to >>>>>>> all >>>>>>> try to write to the same log files, potentially at the same time? >>>>>>> Any >>>>>>> thoughts would be much appreicated! >>>>>> >>>>>> What version of fio are you running? >>>>>> >>>>> >>>>> f42557f (ie master from yesterday) with a slightly different fix for >>>>> the >>>>> basename issues that just got resolved. >>>>> >>>>> I can verify now that running fio with a single --name flag appears to >>>>> generate logs without malformated lines. >>>> >>>> OK, I'll take a look, We should already be serializing the writing to >>>> the same log files, but your example definitely looks like we are not. >>>> >>>> BTW, the numjobs option applies to each --name section. Just to clear >>>> that up from your original email. So you really get numjobs * >>>> name_sections running. >>> >>> Aha, thanks for clearing that up! >>> >>> Is github the right place to create new issues? If so, want me to >>> create one for this? >> >> Sure, you can go ahead and do that. > > Actually, I'm going to wait after hearing your response below! > >> >> I'm puzzled by this report. The writeout of the logs is fully >> serialized, there should be no interleaving. Looking at your output, >> it's definitely either the output from two jobs, or mangled output from >> just one job. The corrupted line is at time 818437, and the next line is >> much further ahead. Normally you would expect to see: >> >> 500, some-bw, 0, blocksize <- job X >> 1000, some-bw, 0, blocksize >> [...] >> end-time, some-bw, 0, blocksize <- last entry from job X >> 500, some-bw, 0, blocksize <- job Y now starts >> >> Note that the log writing always appends. Did you possibly have this >> half-done files in that location from a previous run? > > I think this is possible. My tool is supposed to clean out old tmp > directories but it was broken (and unnoticed because all other output > files were getting overwritten each time). Guess I should prioritize a > long-standing plan to use UUIDs for all tests/directories to make this > kind of bug more obvious. :) That makes me feel better, and does make a lot more sense! But since I was looking at this code anyway, I could not help but be annoyed at how we serialize all the logs when writing them out. If the logs are big, it can take a long time per log. So I added some code to only lock per file instead. This means that if no log sharing is happening, we'll write all 5 logs in parallel. A quick test here with a 5s null run which resulted in 4 logs (two tiny, the bw and iops log) reduced the log writing from 26 seconds to 14 seconds. And that was just two 18M logs. So even if we didn't have a real bug here, at least the logging is now substantially faster. -- Jens Axboe