linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Lutomirski <andy@luto.us>
To: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.37: Multi-second I/O latency while untarring
Date: Mon, 28 Mar 2011 23:50:01 -0400	[thread overview]
Message-ID: <AANLkTikCKjrtSJbcR7_GmcZfxS+jfekkkiOP9jyAaM7F@mail.gmail.com> (raw)
In-Reply-To: <1297696565-sup-8163@think>

[-- Attachment #1: Type: text/plain, Size: 2154 bytes --]

On Mon, Feb 14, 2011 at 10:22 AM, Chris Mason <chris.mason@oracle.com> wrote:
> Excerpts from Andrew Lutomirski's message of 2011-02-11 19:35:02 -0500:
>> On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason <chris.mason@oracle.com> wrote:

>> >
>> > We can tell more if you post the full traces from latencytop.  I have a
>> > patch here for latencytop that adds a -c mode, which dumps the traces
>> > out to a text files.
>> >
>> > http://oss.oracle.com/~mason/latencytop.patch
>> >
>> > Based on what you have here, I think it's probably a latency problem
>> > between btrfs and the dm-crypt stuff.  How easily can setup a test
>> > partition without dm-crypt?
>>
>> Done, on the same physical disk as before.  The latency is just as
>> bad.  On this test, I wrote a total of 3.1G, which is under half of my
>> RAM.  That should rule out lots of VM issues.  latencytop trace below.
>
> Just to confirm, you say on a physical disk you mean without dm-crypt?

Sorry for the exceedingly slow reply.

This problem is really bad with 2.6.38.1.  To make it a little easier
to demonstrate, I wrote a tool that shows off the problem.

I made a test btrfs partition on a plain disk partition (same disk as
my dm-crypt but an unencrypted partition).  Now clone a kernel tree
there and run make -j8.  Wait until the disk starts to write data out
in earnest (takes awhile to dirty enough pages).  Watch crap like this
happen (with nr_requests = 2048, scheduler = deadline).

io_latency_watch read <1M file on test partition>

read took 0.000 seconds (worst = 0.963s)
read took 0.000 seconds (worst = 0.963s)
read took 0.022 seconds (worst = 0.963s)
read took 0.000 seconds (worst = 0.963s)
read took 0.028 seconds (worst = 0.963s)
read took 1.430 seconds (worst = 1.430s)
read took 0.270 seconds (worst = 1.430s)
read took 1.237 seconds (worst = 1.430s)
read took 0.282 seconds (worst = 1.430s)
read took 0.131 seconds (worst = 1.430s)

io_latency_watch read <1M file on other partition on same disk> is
similar, and io_latency_test write <dir on other partition> is even
worse.

The cfq scheduler is similar.

--Andy

[-- Attachment #2: io_latency_watch.c --]
[-- Type: text/x-csrc, Size: 3105 bytes --]

/* io_latency_test.c
 * Copyright (c) 2011 Andy Lutomirski
 * Licensed under GPLv2.
 *
 * Compile with gcc -O2 -std=gnu99 -lrt
 */

#define _FILE_OFFSET_BITS 64
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <time.h>
#include <stdint.h>
#include <string.h>
#include <signal.h>
#include <inttypes.h>
#include <fcntl.h>

volatile const char *file_to_unlink;

void handler(int x)
{
  if (file_to_unlink)
    unlink((char*)file_to_unlink);

  _exit(0);
}

void do_read(const char *name)
{
  int fd = open(name, O_RDONLY | O_DIRECT);
  if (fd < 0) {
    perror("open");
    exit(1);
  }

  uint64_t worst = 0;
  off_t size = lseek(fd, 0, SEEK_END);
  if (size == (off_t)-1) {
    perror("lseek");
    abort();
  }

  size -= (size % 4096);

  if (size < 4096) {
    printf("File is smaller than 4k\n");
    exit(1);
  }

  printf("File size is %" PRIu64 " bytes -- bigger is better\n", (uint64_t)size);

  while(true)
    {
      uint64_t pos = 4096 * (random() % (size / 4096));

      struct timespec start;
      clock_gettime(CLOCK_MONOTONIC, &start);

      unsigned char x[4096];
      if (pread(fd, x, 4096, pos) != 4096) {
	perror("pread");
	abort();
      }

      struct timespec end;
      clock_gettime(CLOCK_MONOTONIC, &end);
      
      uint64_t ns = (end.tv_nsec - start.tv_nsec) + 1000000000ULL * (end.tv_sec - start.tv_sec);

      if (ns > worst)
	worst = ns;

      printf("read took %.3f seconds (worst = %.3fs)\n",
	     1e-9 * ns, 1e-9 * worst);

      if (posix_fadvise(fd, 0, size, POSIX_FADV_DONTNEED) != 0)
	perror("posix_fadvise");

      usleep(1000000);
    }
}

void do_write(const char *dir)
{
  char *name;
  if (asprintf(&name, "%s/tmp.XXXXXX", dir) == -1)
    abort();

  int fd = mkstemp(name);
  if (fd == -1) {
    perror("mkstemp");
    abort();
  }

  file_to_unlink = name;

  uint64_t worst = 0;

  unsigned char x;
  while(true)
    {
      x++;
      struct timespec start;
      clock_gettime(CLOCK_MONOTONIC, &start);

      if (pwrite(fd, &x, 1, 0) != 1) {
	perror("pwrite");
	abort();
      }

      if (fdatasync(fd) != 0) {
	perror("fdatasync");
	abort();
      }

      struct timespec end;
      clock_gettime(CLOCK_MONOTONIC, &end);
      
      uint64_t ns = (end.tv_nsec - start.tv_nsec) + 1000000000ULL * (end.tv_sec - start.tv_sec);

      if (ns > worst)
	worst = ns;

      printf("write + fsync took %.3f seconds (worst = %.3fs)\n",
	     1e-9 * ns, 1e-9 * worst);

      usleep(1000000);
    }
}

int main(int argc, char **argv)
{
  if (argc != 3) {
    printf("Usage: %s write <dir> or %s read <file>\n", argv[0], argv[0]);
    return 1;
  }

  bool write;
  if (!strcmp(argv[1], "write")) {
    write = true;
  } else if (!strcmp(argv[1], "read")) {
    write = false;
  } else {
    printf("Bad mode\n");
    return 1;
  }

  struct sigaction sa;
  sa.sa_handler = handler;
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = 0;
  if (sigaction(SIGINT, &sa, 0) != 0) {
    perror("sigaction");
    exit(1);
  }

  if (write)
    do_write(argv[2]);
  else
    do_read(argv[2]);

  return 0;
}

  reply	other threads:[~2011-03-29  3:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-11 15:08 2.6.37: Multi-second I/O latency while untarring Andrew Lutomirski
2011-02-11 15:44 ` Chris Mason
2011-02-11 19:56   ` Andrew Lutomirski
2011-02-12  0:35   ` Andrew Lutomirski
2011-02-14 15:22     ` Chris Mason
2011-03-29  3:50       ` Andrew Lutomirski [this message]
2011-02-11 15:54 ` Matt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikCKjrtSJbcR7_GmcZfxS+jfekkkiOP9jyAaM7F@mail.gmail.com \
    --to=andy@luto.us \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).