All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Shawn Lewis <shawnlewis@google.com>
Cc: fio <fio@vger.kernel.org>
Subject: Re: BUG: option runtime not working during a particular failure mode.
Date: Tue, 7 Oct 2008 11:30:14 +0200	[thread overview]
Message-ID: <20081007093014.GT19428@kernel.dk> (raw)
In-Reply-To: <7a2ad2470810061504x2b973b80v3caa136d6d883dd8@mail.gmail.com>

On Mon, Oct 06 2008, Shawn Lewis wrote:
> Hi,
> 
> I have a random read load in which fio hung on a machine. It is
> time_based with runtime=60. A few of the disks in question experienced
> errors at the same time so I would expect fio to fail or stop after
> 60- seconds.
> 
> I haven't tried to debug this in depth yet. Jens I thought an answer
> might jump out at you. If not I'll take a look.
> 
> Full disclosure: I modified the config and the strace output to show
> fewer disks then were actually being accessed.
> 
> Here is the config file:
> [sda-randomaccess]
> filename=/export/sda3/
> datafile.tmp
> rw=randread
> bs=64k
> ioengine=sync
> time_based=1
> runtime=3600
> bwavgtime=5000
> direct=1
> thread=1
> 
> [sdb-randomaccess]
> filename=/export/sdb3/datafile.tmp
> rw=randread
> bs=64k
> ioengine=sync
> time_based=1
> runtime=3600
> bwavgtime=5000
> direct=1
> thread=1
> 
> [sdc-randomaccess]
> filename=/export/sdc3/datafile.tmp
> rw=randread
> bs=64k
> ioengine=sync
> time_based=1
> runtime=3600
> bwavgtime=5000
> direct=1
> thread=1
> 
> [sdd-randomaccess]
> filename=/export/sdd3/datafile.tmp
> rw=randread
> bs=64k
> ioengine=sync
> time_based=1
> runtime=3600
> bwavgtime=5000
> direct=1
> thread=1
> 
> 
> We get some hints from strace. It looks like we're just doing the
> sig_alrm loop. But why aren't hitting runtime? Are the other threads
> stopped already for some reason?
> 
> static void sig_alrm(int sig)
> {
>         if (threads) {
>                 update_io_ticks();
>                 print_thread_status();
>                 status_timer_arm();
>         }
> }

Good question. What are the other threads doing, have you poked around
to see what they are up to? You mention IO errors, so are some of the
threads stuck in error handling or did they all exit? If they did exit,
did they exit nicely or did they get killed by the kernel?

> strace. This is repeated over and over:
> 
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> open("/sys/block/sdd/stat", O_RDONLY)   = 8
> fstat(8, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x2aaaadf29000
> read(8, "15490269   238448 1950476298  87"..., 4096) = 105
> close(8)                                = 0
> munmap(0x2aaaadf29000, 4096)            = 0
> open("/sys/block/sdc/stat", O_RDONLY)   = 8
> fstat(8, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x2aaaadf29000
> read(8, "15489598   238450 1950388618  86"..., 4096) = 105
> close(8)                                = 0
> munmap(0x2aaaadf29000, 4096)            = 0
> open("/sys/block/sda/stat", O_RDONLY)   = 8
> fstat(8, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x2aaaadf29000
> read(8, "15368259   237021 1934418422 103"..., 4096) = 105
> close(8)                                = 0
> munmap(0x2aaaadf29000, 4096)            = 0
> open("/sys/block/sdb/stat", O_RDONLY)   = 8
> fstat(8, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x2aaaadf29000
> read(8, "15665652   241115 1972572554 101"..., 4096) = 105
> close(8)                                = 0
> munmap(0x2aaaadf29000, 4096)            = 0
> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
> rt_sigreturn(0x5875e0)                  = -1 EINTR (Interrupted system call)
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, NULL)          = 0
> nanosleep({0, 10000000}, 0)             = ? ERESTART_RESTARTBLOCK (To
> be restarted)
> 
> 
> Thanks,
> Shawn
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Jens Axboe


  reply	other threads:[~2008-10-07  9:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-06 22:04 BUG: option runtime not working during a particular failure mode Shawn Lewis
2008-10-07  9:30 ` Jens Axboe [this message]
2008-10-07 16:28   ` Shawn Lewis
2008-10-08 11:01     ` Jens Axboe
2008-10-08 11:17       ` Jens Axboe
2008-10-08 16:24         ` Shawn Lewis
2008-10-08 17:23           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081007093014.GT19428@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=fio@vger.kernel.org \
    --cc=shawnlewis@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.