All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Mailand <martin@tuxadero.com>
To: ceph-devel@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org, Sage Weil <sage@newdream.net>,
	chb@muc.de, Josef Bacik <josef@redhat.com>,
	chris.mason@oracle.com
Subject: Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]
Date: Thu, 27 Oct 2011 12:53:56 +0200	[thread overview]
Message-ID: <4EA93844.3010601@tuxadero.com> (raw)
In-Reply-To: <4EA86FD7.4030407@tuxadero.com>

[-- Attachment #1: Type: text/plain, Size: 5372 bytes --]

Hi
resend without the perf attachment, which could be found here:
http://tuxadero.com/multistorage/perf.report.txt.bz2

Best Regards,
  martin

-------- Original-Nachricht --------
Betreff: Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]
Datum: Wed, 26 Oct 2011 22:38:47 +0200
Von: Martin Mailand <martin@tuxadero.com>
Antwort an: martin@tuxadero.com
An: Sage Weil <sage@newdream.net>
Kopie (CC): Christian Brunner <chb@muc.de>, ceph-devel@vger.kernel.org, 
  linux-btrfs@vger.kernel.org

Hi,
I have more or less the same setup as Christian and I suffer the same
problems.
But as far as I can see the output of latencytop and perf differs form
Christian one, both are attached.
I was wondering about the high latency from btrfs-submit.

Process btrfs-submit-0 (970) Total: 2123.5 msec

I have as well the high IO rate and high IO wait.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.60    0.00    2.20   82.40    0.00   14.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    8.40     0.00    74.40
17.71     0.03    3.81    0.00    3.81   3.81   3.20
sdb               0.00     7.00    0.00  269.80     0.00  1224.80
9.08   107.19  398.69    0.00  398.69   3.15  85.00

top - 21:57:41 up  8:41,  1 user,  load average: 0.65, 0.79, 0.76
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.6%us,  2.4%sy,  0.0%ni, 70.8%id, 25.8%wa,  0.0%hi,  0.3%si,
0.0%st
Mem:   4018276k total,  1577728k used,  2440548k free,    10496k buffers
Swap:  1998844k total,        0k used,  1998844k free,  1316696k cached

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

   1399 root      20   0  548m 103m 3428 S  0.0  2.6   2:01.85 ceph-osd

   1401 root      20   0  548m 103m 3428 S  0.0  2.6   1:51.71 ceph-osd

   1400 root      20   0  548m 103m 3428 S  0.0  2.6   1:50.30 ceph-osd

   1391 root      20   0     0    0    0 S  0.0  0.0   1:18.39
btrfs-endio-wri

    976 root      20   0     0    0    0 S  0.0  0.0   1:18.11
btrfs-endio-wri

   1367 root      20   0     0    0    0 S  0.0  0.0   1:05.60
btrfs-worker-1

    968 root      20   0     0    0    0 S  0.0  0.0   1:05.45
btrfs-worker-0

   1163 root      20   0  141m 1636 1100 S  0.0  0.0   1:00.56 collectd

    970 root      20   0     0    0    0 S  0.0  0.0   0:47.73
btrfs-submit-0

   1402 root      20   0  548m 103m 3428 S  0.0  2.6   0:34.86 ceph-osd

   1392 root      20   0     0    0    0 S  0.0  0.0   0:33.70
btrfs-endio-met

    975 root      20   0     0    0    0 S  0.0  0.0   0:32.70
btrfs-endio-met

   1415 root      20   0  548m 103m 3428 S  0.0  2.6   0:28.29 ceph-osd

   1414 root      20   0  548m 103m 3428 S  0.0  2.6   0:28.24 ceph-osd

   1397 root      20   0  548m 103m 3428 S  0.0  2.6   0:24.60 ceph-osd

   1436 root      20   0  548m 103m 3428 S  0.0  2.6   0:13.31 ceph-osd


Here ist my setup.
Kernel v3.1 + Josef

The config for this osd (ceph version 0.37
(commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)) is:
[osd.1]
          host = s-brick-003
          osd journal = /dev/sda7
          btrfs devs = /dev/sdb
	btrfs options = noatime
	filestore_btrfs_snap = false

I hope this helps to pin point the problem.

Best Regards,
martin


Sage Weil schrieb:
> On Wed, 26 Oct 2011, Christian Brunner wrote:
>> 2011/10/26 Sage Weil <sage@newdream.net>:
>>> On Wed, 26 Oct 2011, Christian Brunner wrote:
>>>>>>> Christian, have you tweaked those settings in your ceph.conf?  It would be
>>>>>>> something like 'journal dio = false'.  If not, can you verify that
>>>>>>> directio shows true when the journal is initialized from your osd log?
>>>>>>> E.g.,
>>>>>>>
>>>>>>>  2011-10-21 15:21:02.026789 7ff7e5c54720 journal _open dev/osd0.journal fd 14: 104857600 bytes, block size 4096 bytes, directio = 1
>>>>>>>
>>>>>>> If directio = 1 for you, something else funky is causing those
>>>>>>> blkdev_fsync's...
>>>>>> I've looked it up in the logs - directio is 1:
>>>>>>
>>>>>> Oct 25 17:20:16 os00 osd.000[1696]: 7f0016841740 journal _open
>>>>>> /dev/vg01/lv_osd_journal_0 fd 15: 17179869184 bytes, block size 4096
>>>>>> bytes, directio = 1
>>>>> Do you mind capturing an strace?  I'd like to see where that blkdev_fsync
>>>>> is coming from.
>>>> Here is an strace. I can see a lot of sync_file_range operations.
>>> Yeah, these all look like the flusher thread, and shouldn't be hitting
>>> blkdev_fsync.  Can you confirm that with
>>>
>>>        filestore flusher = false
>>>        filestore sync flush = false
>>>
>>> you get no sync_file_range at all?  I wonder if this is also perf lying
>>> about the call chain.
>> Yes, setting this makes the sync_file_range calls go away.
>
> Okay.  That means either sync_file_range on a regular btrfs file is
> triggering blkdev_fsync somewhere in btrfs, there is an extremely sneaky
> bug that is mixing up file descriptors, or latencytop is lying.  I'm
> guessing the latter, given the other weirdness Josef and Chris were
> seeing.  :)
>
>> Is it safe to use these settings with "filestore btrfs snap = 0"?
>
> Yeah.  They're purely a performance thing to push as much dirty data to
> disk as quickly as possible to minimize the snapshot create latency.
> You'll notice the write throughput tends to tank when them off.
>
> sage


[-- Attachment #2: latencytop.txt.bz2 --]
[-- Type: application/x-bzip, Size: 5203 bytes --]

       reply	other threads:[~2011-10-27 10:54 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4EA86FD7.4030407@tuxadero.com>
2011-10-27 10:53 ` Martin Mailand [this message]
2011-10-27 10:59   ` ceph on btrfs [was Re: ceph on non-btrfs file systems] Stefan Majer
2011-10-27 10:59     ` Stefan Majer
2011-10-27 11:17     ` Martin Mailand
     [not found] <CAO47_-9L7SdQwhJ27B6yzrqG8xvj+CeZHeSutgeCixcv7kUidg@mail.gmail.com>
     [not found] ` <Pine.LNX.4.64.1110252221510.6574@cobra.newdream.net>
2011-10-26  8:12   ` Christian Brunner
2011-10-26  8:12     ` Christian Brunner
2011-10-26 16:32     ` Sage Weil
2011-10-24  1:54 ceph on non-btrfs file systems Sage Weil
2011-10-24 16:22 ` Christian Brunner
2011-10-24 17:06   ` ceph on btrfs [was Re: ceph on non-btrfs file systems] Sage Weil
2011-10-24 19:51     ` Josef Bacik
2011-10-24 20:35       ` Chris Mason
2011-10-24 21:34         ` Christian Brunner
2011-10-24 21:34           ` Christian Brunner
2011-10-24 21:37           ` Arne Jansen
2011-10-25 11:56       ` Christian Brunner
2011-10-25 12:23         ` Josef Bacik
2011-10-25 12:23           ` Josef Bacik
2011-10-25 14:25           ` Christian Brunner
2011-10-25 15:00             ` Josef Bacik
2011-10-25 15:00               ` Josef Bacik
2011-10-25 15:05             ` Josef Bacik
2011-10-25 15:05               ` Josef Bacik
2011-10-25 15:13               ` Christian Brunner
2011-10-25 15:13                 ` Christian Brunner
2011-10-25 20:15               ` Chris Mason
2011-10-25 20:22                 ` Josef Bacik
2011-10-26  0:16                   ` Christian Brunner
2011-10-26  0:16                     ` Christian Brunner
2011-10-26  8:21                     ` Christian Brunner
2011-10-26  8:21                       ` Christian Brunner
2011-10-26 13:23                   ` Chris Mason
2011-10-27 15:07                     ` Josef Bacik
2011-10-27 18:14                       ` Josef Bacik
2011-10-25 16:36           ` Sage Weil
2011-10-25 19:09             ` Christian Brunner
2011-10-25 19:09               ` Christian Brunner
2011-10-25 22:27               ` Sage Weil
2011-10-27 19:52         ` Josef Bacik
2011-10-27 19:52           ` Josef Bacik
2011-10-27 20:39           ` Christian Brunner
2011-10-27 20:39             ` Christian Brunner
     [not found]             ` <CAO47_-_+Oqs1sHeYEBfxgwugSUYKftQLQ9jEyDgFPFu8fXe34w@mail.gmail.com>
     [not found]               ` <CAO47_-8YGAxoYOBRKxLP2HULqEtV5bMugzzybq3srCVFZczgGA@mail.gmail.com>
2011-10-31 10:25                 ` Christian Brunner
2011-10-31 13:29                   ` Christian Brunner
2011-10-31 14:04                     ` Josef Bacik
2011-10-25 10:23     ` Christoph Hellwig
2011-10-25 16:23       ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EA93844.3010601@tuxadero.com \
    --to=martin@tuxadero.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chb@muc.de \
    --cc=chris.mason@oracle.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.