Slow file creating and deleting using bonnie ++ on Hammer

All of lore.kernel.org
 help / color / mirror / Atom feed

* Slow file creating and deleting using bonnie ++ on Hammer
@ 2015-05-20 18:15 Barclay Jameson
  2015-05-20 18:22 ` Gregory Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Barclay Jameson @ 2015-05-20 18:15 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

I am trying to find out why boniee++ is choking at the creating files
sequentially and deleting sequentially on cephfs.
I enabled mds debug for about 30 seconds and I find a bunch of lines
like the following:

2015-05-19 15:54:12.930829 7f6de4ecf700  1 -- 192.168.40.3:6800/461756
--> 192.168.40.100:0/1741116310 -- client_caps(grant ino 10000004ca9
19656 seq 2 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0
size 0/4194304 ts 1 mtime 2015-05-19 15:54:09.098825) v5 -- ?+0
0x557a800 con 0x4fdeca0

Is this statement showing that it takes 4 seconds to create an inode?

Watching ( watch --interval=.2 -d 'ceph df') ceph df, rados df, and
ceph osd pool stats seems to show that an object is being created or
deleted about every 4 seconds.

The test is now running for over 24 hours with the following bonnie++ comand:

~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r 131072 -d /cephfs/
-m CephBench -f -b

I can give more log output if needed.

OSD + MON/MDS
CentOS 7
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
kernel version : 3.10.0-229.el7.x86_64

Client
CentOS 6
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
kernel version : 4.0.0-1.el6.elrepo.x86_64

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-20 18:15 Slow file creating and deleting using bonnie ++ on Hammer Barclay Jameson
@ 2015-05-20 18:22 ` Gregory Farnum
  2015-05-20 19:00   ` Barclay Jameson
  0 siblings, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2015-05-20 18:22 UTC (permalink / raw)
  To: Barclay Jameson; +Cc: ceph-devel@vger.kernel.org

On Wed, May 20, 2015 at 11:15 AM, Barclay Jameson
<almightybeeij@gmail.com> wrote:
> I am trying to find out why boniee++ is choking at the creating files
> sequentially and deleting sequentially on cephfs.
> I enabled mds debug for about 30 seconds and I find a bunch of lines
> like the following:
>
> 2015-05-19 15:54:12.930829 7f6de4ecf700  1 -- 192.168.40.3:6800/461756
> --> 192.168.40.100:0/1741116310 -- client_caps(grant ino 10000004ca9
> 19656 seq 2 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0
> size 0/4194304 ts 1 mtime 2015-05-19 15:54:09.098825) v5 -- ?+0
> 0x557a800 con 0x4fdeca0
>
> Is this statement showing that it takes 4 seconds to create an inode?

I'm not sure where you're getting that from just this line — are you
comparing mtime and the log timestamp?
This particular message is just a message going to the client about
what access capabilities (caps) it has on that inode.

>
> Watching ( watch --interval=.2 -d 'ceph df') ceph df, rados df, and
> ceph osd pool stats seems to show that an object is being created or
> deleted about every 4 seconds.

A 4 second file create sounds much slower than it should be. What does
your cluster look like? Can you upload the output of ceph -s and your
MDS log somewhere?
-Greg

>
> The test is now running for over 24 hours with the following bonnie++ comand:
>
> ~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r 131072 -d /cephfs/
> -m CephBench -f -b
>
> I can give more log output if needed.
>
> OSD + MON/MDS
> CentOS 7
> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> kernel version : 3.10.0-229.el7.x86_64
>
> Client
> CentOS 6
> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> kernel version : 4.0.0-1.el6.elrepo.x86_64
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-20 18:22 ` Gregory Farnum
@ 2015-05-20 19:00   ` Barclay Jameson
  2015-05-22 15:25     ` Barclay Jameson
  0 siblings, 1 reply; 13+ messages in thread
From: Barclay Jameson @ 2015-05-20 19:00 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org

I dumped it out into my github wiki under the pages ceph MDS Log and
Ceph s output:

https://github.com/almightybeeij/Debug/wiki

Let me know if you would like anything else or if you can't get to it.

Thanks Greg!

On Wed, May 20, 2015 at 1:22 PM, Gregory Farnum <greg@gregs42.com> wrote:
> On Wed, May 20, 2015 at 11:15 AM, Barclay Jameson
> <almightybeeij@gmail.com> wrote:
>> I am trying to find out why boniee++ is choking at the creating files
>> sequentially and deleting sequentially on cephfs.
>> I enabled mds debug for about 30 seconds and I find a bunch of lines
>> like the following:
>>
>> 2015-05-19 15:54:12.930829 7f6de4ecf700  1 -- 192.168.40.3:6800/461756
>> --> 192.168.40.100:0/1741116310 -- client_caps(grant ino 10000004ca9
>> 19656 seq 2 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0
>> size 0/4194304 ts 1 mtime 2015-05-19 15:54:09.098825) v5 -- ?+0
>> 0x557a800 con 0x4fdeca0
>>
>> Is this statement showing that it takes 4 seconds to create an inode?
>
> I'm not sure where you're getting that from just this line — are you
> comparing mtime and the log timestamp?
> This particular message is just a message going to the client about
> what access capabilities (caps) it has on that inode.
>
>>
>> Watching ( watch --interval=.2 -d 'ceph df') ceph df, rados df, and
>> ceph osd pool stats seems to show that an object is being created or
>> deleted about every 4 seconds.
>
> A 4 second file create sounds much slower than it should be. What does
> your cluster look like? Can you upload the output of ceph -s and your
> MDS log somewhere?
> -Greg
>
>>
>> The test is now running for over 24 hours with the following bonnie++ comand:
>>
>> ~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r 131072 -d /cephfs/
>> -m CephBench -f -b
>>
>> I can give more log output if needed.
>>
>> OSD + MON/MDS
>> CentOS 7
>> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> kernel version : 3.10.0-229.el7.x86_64
>>
>> Client
>> CentOS 6
>> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>> kernel version : 4.0.0-1.el6.elrepo.x86_64
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-20 19:00   ` Barclay Jameson
@ 2015-05-22 15:25     ` Barclay Jameson
  2015-05-22 15:34       ` John Spray
  0 siblings, 1 reply; 13+ messages in thread
From: Barclay Jameson @ 2015-05-22 15:25 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org

The Bonnie++ job _FINALLY_ finished. If I am reading this correctly it
took days to create, stat, and delete 16 files??
[root@blarg cephfs]# ~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r
131072 -d /cephfs/ -m CephBench -f -b
Using uid:0, gid:0.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
CephBench      256G           1006417  76 90114  13           137110
8 329.8   7
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16     0   0 +++++ +++     0   0     0   0  5267  19     0   0
CephBench,256G,,,1006417,76,90114,13,,,137110,8,329.8,7,16,0,0,+++++,+++,0,0,0,0,5267,19,0,0

Any thoughts?

On Wed, May 20, 2015 at 2:00 PM, Barclay Jameson
<almightybeeij@gmail.com> wrote:
> I dumped it out into my github wiki under the pages ceph MDS Log and
> Ceph s output:
>
> https://github.com/almightybeeij/Debug/wiki
>
> Let me know if you would like anything else or if you can't get to it.
>
> Thanks Greg!
>
> On Wed, May 20, 2015 at 1:22 PM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Wed, May 20, 2015 at 11:15 AM, Barclay Jameson
>> <almightybeeij@gmail.com> wrote:
>>> I am trying to find out why boniee++ is choking at the creating files
>>> sequentially and deleting sequentially on cephfs.
>>> I enabled mds debug for about 30 seconds and I find a bunch of lines
>>> like the following:
>>>
>>> 2015-05-19 15:54:12.930829 7f6de4ecf700  1 -- 192.168.40.3:6800/461756
>>> --> 192.168.40.100:0/1741116310 -- client_caps(grant ino 10000004ca9
>>> 19656 seq 2 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0
>>> size 0/4194304 ts 1 mtime 2015-05-19 15:54:09.098825) v5 -- ?+0
>>> 0x557a800 con 0x4fdeca0
>>>
>>> Is this statement showing that it takes 4 seconds to create an inode?
>>
>> I'm not sure where you're getting that from just this line — are you
>> comparing mtime and the log timestamp?
>> This particular message is just a message going to the client about
>> what access capabilities (caps) it has on that inode.
>>
>>>
>>> Watching ( watch --interval=.2 -d 'ceph df') ceph df, rados df, and
>>> ceph osd pool stats seems to show that an object is being created or
>>> deleted about every 4 seconds.
>>
>> A 4 second file create sounds much slower than it should be. What does
>> your cluster look like? Can you upload the output of ceph -s and your
>> MDS log somewhere?
>> -Greg
>>
>>>
>>> The test is now running for over 24 hours with the following bonnie++ comand:
>>>
>>> ~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r 131072 -d /cephfs/
>>> -m CephBench -f -b
>>>
>>> I can give more log output if needed.
>>>
>>> OSD + MON/MDS
>>> CentOS 7
>>> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>>> kernel version : 3.10.0-229.el7.x86_64
>>>
>>> Client
>>> CentOS 6
>>> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>>> kernel version : 4.0.0-1.el6.elrepo.x86_64
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-22 15:25     ` Barclay Jameson
@ 2015-05-22 15:34       ` John Spray
  2015-05-22 17:34         ` Barclay Jameson
  0 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2015-05-22 15:34 UTC (permalink / raw)
  To: Barclay Jameson, Gregory Farnum; +Cc: ceph-devel@vger.kernel.org



On 22/05/2015 16:25, Barclay Jameson wrote:
> The Bonnie++ job _FINALLY_ finished. If I am reading this correctly it
> took days to create, stat, and delete 16 files??
> [root@blarg cephfs]# ~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r
> 131072 -d /cephfs/ -m CephBench -f -b
> Using uid:0, gid:0.
> Writing intelligently...done
> Rewriting...done
> Reading intelligently...done
> start 'em...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version 1.03e       ------Sequential Output------ --Sequential Input- --Random-
>                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> CephBench      256G           1006417  76 90114  13           137110
> 8 329.8   7
>                      ------Sequential Create------ --------Random Create--------
>                      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                   16     0   0 +++++ +++     0   0     0   0  5267  19     0   0
> CephBench,256G,,,1006417,76,90114,13,,,137110,8,329.8,7,16,0,0,+++++,+++,0,0,0,0,5267,19,0,0
>
> Any thoughts?
>
It's 16000 files by default (not 16), but this usually takes only a few 
minutes.

FWIW I tried running a quick bonnie++ (with -s 0 to skip the IO phase) 
on a development (vstart.sh) cluster with a fuse client, and it readily 
handles several hundred client requests per second (checked with "ceph 
daemonperf mds.<id>")

Nothing immediately leapt out at me from a quick look at the log you 
posted, but with issues like these it is always worth trying to narrow 
it down by trying the fuse client instead of the kernel client, and/or 
different kernel versions.

You may also want to check that your underlying RADOS cluster is 
performing reasonably by doing a rados bench too.

Cheers,
John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-22 15:34       ` John Spray
@ 2015-05-22 17:34         ` Barclay Jameson
  2015-05-26  6:55           ` Yan, Zheng
  0 siblings, 1 reply; 13+ messages in thread
From: Barclay Jameson @ 2015-05-22 17:34 UTC (permalink / raw)
  To: John Spray; +Cc: Gregory Farnum, ceph-devel@vger.kernel.org

Here are some more info :
rados bench -p cephfs_data 100 write --no-cleanup

Total time run:         100.096473
Total writes made:      21900
Write size:             4194304
Bandwidth (MB/sec):     875.156

Stddev Bandwidth:       96.1234
Max bandwidth (MB/sec): 932
Min bandwidth (MB/sec): 0
Average Latency:        0.0731273
Stddev Latency:         0.0439909
Max latency:            1.23972
Min latency:            0.0306901

(Again the numbers from bench don't match what is listed in client io.
Ceph -s shows anywhere from 200 MB/s to 1700 MB/s even when the max
bandwidth lists 932 as the highest)

rados bench -p cephfs_data 100 seq

Total time run:        29.460172
Total reads made:     21900
Read size:            4194304
Bandwidth (MB/sec):    2973.506

Average Latency:       0.0215173
Max latency:           0.693831
Min latency:           0.00519763



On client:

[root@blarg cephfs]# time for i in {1..100000}; do mkdir blarg"$i" ; done

real    10m36.794s
user    1m45.329s
sys    6m29.982s
[root@blarg cephfs]# time for i in {1..100000}; do touch yadda"$i" ; done

real    13m29.155s
user    3m55.256s
sys    7m50.301s

What variables are most important in the perf dump?
I would like to grep out the vars (ceph daemon
/var/run/ceph-mds.cephnautilus01.asok perf dump | jq '.') that are of
meaning while running the bonnie++ test again with -s 0.

Thanks,
BJ

On Fri, May 22, 2015 at 10:34 AM, John Spray <john.spray@redhat.com> wrote:
>
>
> On 22/05/2015 16:25, Barclay Jameson wrote:
>>
>> The Bonnie++ job _FINALLY_ finished. If I am reading this correctly it
>> took days to create, stat, and delete 16 files??
>> [root@blarg cephfs]# ~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r
>> 131072 -d /cephfs/ -m CephBench -f -b
>> Using uid:0, gid:0.
>> Writing intelligently...done
>> Rewriting...done
>> Reading intelligently...done
>> start 'em...done...done...done...
>> Create files in sequential order...done.
>> Stat files in sequential order...done.
>> Delete files in sequential order...done.
>> Create files in random order...done.
>> Stat files in random order...done.
>> Delete files in random order...done.
>> Version 1.03e       ------Sequential Output------ --Sequential Input-
>> --Random-
>>                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec %CP
>> CephBench      256G           1006417  76 90114  13           137110
>> 8 329.8   7
>>                      ------Sequential Create------ --------Random
>> Create--------
>>                      -Create-- --Read--- -Delete-- -Create-- --Read---
>> -Delete--
>>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>> /sec %CP
>>                   16     0   0 +++++ +++     0   0     0   0  5267  19
>> 0   0
>>
>> CephBench,256G,,,1006417,76,90114,13,,,137110,8,329.8,7,16,0,0,+++++,+++,0,0,0,0,5267,19,0,0
>>
>> Any thoughts?
>>
> It's 16000 files by default (not 16), but this usually takes only a few
> minutes.
>
> FWIW I tried running a quick bonnie++ (with -s 0 to skip the IO phase) on a
> development (vstart.sh) cluster with a fuse client, and it readily handles
> several hundred client requests per second (checked with "ceph daemonperf
> mds.<id>")
>
> Nothing immediately leapt out at me from a quick look at the log you posted,
> but with issues like these it is always worth trying to narrow it down by
> trying the fuse client instead of the kernel client, and/or different kernel
> versions.
>
> You may also want to check that your underlying RADOS cluster is performing
> reasonably by doing a rados bench too.
>
> Cheers,
> John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-22 17:34         ` Barclay Jameson
@ 2015-05-26  6:55           ` Yan, Zheng
  2015-05-26 11:57             ` John Spray
  0 siblings, 1 reply; 13+ messages in thread
From: Yan, Zheng @ 2015-05-26  6:55 UTC (permalink / raw)
  To: Barclay Jameson; +Cc: John Spray, Gregory Farnum, ceph-devel@vger.kernel.org

On Sat, May 23, 2015 at 1:34 AM, Barclay Jameson
<almightybeeij@gmail.com> wrote:
> Here are some more info :
> rados bench -p cephfs_data 100 write --no-cleanup
>
> Total time run:         100.096473
> Total writes made:      21900
> Write size:             4194304
> Bandwidth (MB/sec):     875.156
>
> Stddev Bandwidth:       96.1234
> Max bandwidth (MB/sec): 932
> Min bandwidth (MB/sec): 0
> Average Latency:        0.0731273
> Stddev Latency:         0.0439909
> Max latency:            1.23972
> Min latency:            0.0306901
>
> (Again the numbers from bench don't match what is listed in client io.
> Ceph -s shows anywhere from 200 MB/s to 1700 MB/s even when the max
> bandwidth lists 932 as the highest)
>
> rados bench -p cephfs_data 100 seq
>
> Total time run:        29.460172
> Total reads made:     21900
> Read size:            4194304
> Bandwidth (MB/sec):    2973.506
>
> Average Latency:       0.0215173
> Max latency:           0.693831
> Min latency:           0.00519763
>
>
>
> On client:
>
> [root@blarg cephfs]# time for i in {1..100000}; do mkdir blarg"$i" ; done
>
> real    10m36.794s
> user    1m45.329s
> sys    6m29.982s
> [root@blarg cephfs]# time for i in {1..100000}; do touch yadda"$i" ; done
>
> real    13m29.155s
> user    3m55.256s
> sys    7m50.301s
>
> What variables are most important in the perf dump?
> I would like to grep out the vars (ceph daemon
> /var/run/ceph-mds.cephnautilus01.asok perf dump | jq '.') that are of
> meaning while running the bonnie++ test again with -s 0.
>
> Thanks,
> BJ
>
> On Fri, May 22, 2015 at 10:34 AM, John Spray <john.spray@redhat.com> wrote:
>>
>>
>> On 22/05/2015 16:25, Barclay Jameson wrote:
>>>
>>> The Bonnie++ job _FINALLY_ finished. If I am reading this correctly it
>>> took days to create, stat, and delete 16 files??
>>> [root@blarg cephfs]# ~/bonnie++-1.03e/bonnie++ -u root:root -s 256g -r
>>> 131072 -d /cephfs/ -m CephBench -f -b
>>> Using uid:0, gid:0.
>>> Writing intelligently...done
>>> Rewriting...done
>>> Reading intelligently...done
>>> start 'em...done...done...done...
>>> Create files in sequential order...done.
>>> Stat files in sequential order...done.
>>> Delete files in sequential order...done.
>>> Create files in random order...done.
>>> Stat files in random order...done.
>>> Delete files in random order...done.
>>> Version 1.03e       ------Sequential Output------ --Sequential Input-
>>> --Random-
>>>                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>>> --Seeks--
>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>>> /sec %CP
>>> CephBench      256G           1006417  76 90114  13           137110
>>> 8 329.8   7
>>>                      ------Sequential Create------ --------Random
>>> Create--------
>>>                      -Create-- --Read--- -Delete-- -Create-- --Read---
>>> -Delete--
>>>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>>> /sec %CP
>>>                   16     0   0 +++++ +++     0   0     0   0  5267  19
>>> 0   0
>>>
>>> CephBench,256G,,,1006417,76,90114,13,,,137110,8,329.8,7,16,0,0,+++++,+++,0,0,0,0,5267,19,0,0
>>>
>>> Any thoughts?
>>>
>> It's 16000 files by default (not 16), but this usually takes only a few
>> minutes.
>>
>> FWIW I tried running a quick bonnie++ (with -s 0 to skip the IO phase) on a
>> development (vstart.sh) cluster with a fuse client, and it readily handles
>> several hundred client requests per second (checked with "ceph daemonperf
>> mds.<id>")
>>
>> Nothing immediately leapt out at me from a quick look at the log you posted,
>> but with issues like these it is always worth trying to narrow it down by
>> trying the fuse client instead of the kernel client, and/or different kernel
>> versions.
>>
>> You may also want to check that your underlying RADOS cluster is performing
>> reasonably by doing a rados bench too.
>>
>> Cheers,
>> John
> --

the reason for slow file creations is that bonnie++ call fsync(2)
after each creat(2). fsync() wait for safe replies of the create
requests.  MDS sends safe reply when log event for the request gets
journaled safely. MDS flush the journal every 5 seconds
(mds_tick_interval). So the speed of file creation for bonnie++ is one
file every file seconds.



> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-26  6:55           ` Yan, Zheng
@ 2015-05-26 11:57             ` John Spray
  2015-05-26 14:08               ` Mark Nelson
  2015-05-26 14:30               ` Gregory Farnum
  0 siblings, 2 replies; 13+ messages in thread
From: John Spray @ 2015-05-26 11:57 UTC (permalink / raw)
  To: Yan, Zheng, Barclay Jameson; +Cc: Gregory Farnum, ceph-devel@vger.kernel.org



On 26/05/2015 07:55, Yan, Zheng wrote:
> the reason for slow file creations is that bonnie++ call fsync(2) 
> after each creat(2). fsync() wait for safe replies of the create 
> requests. MDS sends safe reply when log event for the request gets 
> journaled safely. MDS flush the journal every 5 seconds 
> (mds_tick_interval). So the speed of file creation for bonnie++ is one 
> file every file seconds.

Ah, I hadn't noticed that the benchmark called... I wonder if I'm seeing 
the fuse client return quickly because it simply doesn't implement the 
fsyncdir call.  We should fix that!

It looks like we used to have an OP_FSYNC in the client-server protocol 
(perhaps for flushing the log immediately on fsyncs), anyone have any 
background on why that went away?

Cheers,
John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-26 11:57             ` John Spray
@ 2015-05-26 14:08               ` Mark Nelson
  2015-05-26 14:30               ` Gregory Farnum
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Nelson @ 2015-05-26 14:08 UTC (permalink / raw)
  To: John Spray, Yan, Zheng, Barclay Jameson
  Cc: Gregory Farnum, ceph-devel@vger.kernel.org



On 05/26/2015 06:57 AM, John Spray wrote:
>
>
> On 26/05/2015 07:55, Yan, Zheng wrote:
>> the reason for slow file creations is that bonnie++ call fsync(2)
>> after each creat(2). fsync() wait for safe replies of the create
>> requests. MDS sends safe reply when log event for the request gets
>> journaled safely. MDS flush the journal every 5 seconds
>> (mds_tick_interval). So the speed of file creation for bonnie++ is one
>> file every file seconds.
>
> Ah, I hadn't noticed that the benchmark called... I wonder if I'm seeing
> the fuse client return quickly because it simply doesn't implement the
> fsyncdir call.  We should fix that!

I seem to remember someone grumbling about this a while back but I don't 
really remember the details or who it was. :)

>
> It looks like we used to have an OP_FSYNC in the client-server protocol
> (perhaps for flushing the log immediately on fsyncs), anyone have any
> background on why that went away?
>
> Cheers,
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-26 11:57             ` John Spray
  2015-05-26 14:08               ` Mark Nelson
@ 2015-05-26 14:30               ` Gregory Farnum
  2015-05-26 14:50                 ` Barclay Jameson
  1 sibling, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2015-05-26 14:30 UTC (permalink / raw)
  To: John Spray; +Cc: Yan, Zheng, Barclay Jameson, ceph-devel@vger.kernel.org

On Tue, May 26, 2015 at 4:57 AM, John Spray <john.spray@redhat.com> wrote:
>
>
> On 26/05/2015 07:55, Yan, Zheng wrote:
>>
>> the reason for slow file creations is that bonnie++ call fsync(2) after
>> each creat(2). fsync() wait for safe replies of the create requests. MDS
>> sends safe reply when log event for the request gets journaled safely. MDS
>> flush the journal every 5 seconds (mds_tick_interval). So the speed of file
>> creation for bonnie++ is one file every file seconds.
>
>
> Ah, I hadn't noticed that the benchmark called... I wonder if I'm seeing the
> fuse client return quickly because it simply doesn't implement the fsyncdir
> call.  We should fix that!
>
> It looks like we used to have an OP_FSYNC in the client-server protocol
> (perhaps for flushing the log immediately on fsyncs), anyone have any
> background on why that went away?

Following up, we discussed this in standup and the answer was "no".

Seeing the question in context of this thread, though, I think maybe
I've asked before and either the answer or the deduced reason was
"because we didn't want to let clients force an (expensive) MDS log
flush".
-Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-26 14:30               ` Gregory Farnum
@ 2015-05-26 14:50                 ` Barclay Jameson
  2015-05-26 15:40                   ` John Spray
  0 siblings, 1 reply; 13+ messages in thread
From: Barclay Jameson @ 2015-05-26 14:50 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: John Spray, Yan, Zheng, ceph-devel@vger.kernel.org

Thank you for the great explanation Zheng! That definitely shows what
I was seeing with the bonnie++ test. What bad things would happen if I
modified the config option mds_tick_interval to flush the journal to a
second or less? Does this also mean any custom code written should
avoid use of fsync() if writing a large number of files?

On Tue, May 26, 2015 at 9:30 AM, Gregory Farnum <greg@gregs42.com> wrote:
> On Tue, May 26, 2015 at 4:57 AM, John Spray <john.spray@redhat.com> wrote:
>>
>>
>> On 26/05/2015 07:55, Yan, Zheng wrote:
>>>
>>> the reason for slow file creations is that bonnie++ call fsync(2) after
>>> each creat(2). fsync() wait for safe replies of the create requests. MDS
>>> sends safe reply when log event for the request gets journaled safely. MDS
>>> flush the journal every 5 seconds (mds_tick_interval). So the speed of file
>>> creation for bonnie++ is one file every file seconds.
>>
>>
>> Ah, I hadn't noticed that the benchmark called... I wonder if I'm seeing the
>> fuse client return quickly because it simply doesn't implement the fsyncdir
>> call.  We should fix that!
>>
>> It looks like we used to have an OP_FSYNC in the client-server protocol
>> (perhaps for flushing the log immediately on fsyncs), anyone have any
>> background on why that went away?
>
> Following up, we discussed this in standup and the answer was "no".
>
> Seeing the question in context of this thread, though, I think maybe
> I've asked before and either the answer or the deduced reason was
> "because we didn't want to let clients force an (expensive) MDS log
> flush".
> -Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-26 14:50                 ` Barclay Jameson
@ 2015-05-26 15:40                   ` John Spray
  2015-05-26 15:54                     ` Barclay Jameson
  0 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2015-05-26 15:40 UTC (permalink / raw)
  To: Barclay Jameson, Gregory Farnum; +Cc: Yan, Zheng, ceph-devel@vger.kernel.org

On 26/05/2015 15:50, Barclay Jameson wrote:
> Thank you for the great explanation Zheng! That definitely shows what
> I was seeing with the bonnie++ test. What bad things would happen if I
> modified the config option mds_tick_interval to flush the journal to a
> second or less?

The MDS does various pieces of housekeeping according to that interval, 
so setting it extremely low will cause some CPU cycles to be wasted, and 
flushing the log more often will cause a larger number of smaller IOs to 
get generated.  I would be very surprised if decreasing it to approx 1s 
was harmful though.

On a busy real world system, other metadata operations will often drive 
log writes through faster than waiting for a tick.

> Does this also mean any custom code written should
> avoid use of fsync() if writing a large number of files?

You should call it only when your application requires it for 
consistency, and always expect it to be a high latency operation. Add up 
the latency from your client to your server and from the server to the 
disk, and the length of the IO queue on the disk, and then the return 
leg -- that is the *minimum* time you should expect to wait for an fsync.

For example, a real world workload creating N files in a directory would 
hopefully call fsync on the directory once at the end, rather than in 
between every file, unless you really do need to be sure that the dentry 
for the preceding file will be persistent before you start writing the 
next file.

Sometimes it's easier to reason about it in terms of concurrency: if you 
have a bunch of IOs that you could safely run in parallel in a thread 
each, then you shouldn't be fsyncing between them, just at the point you 
join them.

John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Slow file creating and deleting using bonnie ++ on Hammer
  2015-05-26 15:40                   ` John Spray
@ 2015-05-26 15:54                     ` Barclay Jameson
  0 siblings, 0 replies; 13+ messages in thread
From: Barclay Jameson @ 2015-05-26 15:54 UTC (permalink / raw)
  To: John Spray; +Cc: Gregory Farnum, Yan, Zheng, ceph-devel@vger.kernel.org

Thanks John for the insight. If I understand you right, even if
programs used fsync() if there is sufficient clients and/or workload
then the journal logs will be flushed out to disks faster then the
tick. This will be helpful to know in case some of the programs we use
happen to use fsync() for ACD's sake.

On Tue, May 26, 2015 at 10:40 AM, John Spray <john.spray@redhat.com> wrote:
>
>
> On 26/05/2015 15:50, Barclay Jameson wrote:
>>
>> Thank you for the great explanation Zheng! That definitely shows what
>> I was seeing with the bonnie++ test. What bad things would happen if I
>> modified the config option mds_tick_interval to flush the journal to a
>> second or less?
>
>
> The MDS does various pieces of housekeeping according to that interval, so
> setting it extremely low will cause some CPU cycles to be wasted, and
> flushing the log more often will cause a larger number of smaller IOs to get
> generated.  I would be very surprised if decreasing it to approx 1s was
> harmful though.
>
> On a busy real world system, other metadata operations will often drive log
> writes through faster than waiting for a tick.
>
>> Does this also mean any custom code written should
>> avoid use of fsync() if writing a large number of files?
>
>
> You should call it only when your application requires it for consistency,
> and always expect it to be a high latency operation. Add up the latency from
> your client to your server and from the server to the disk, and the length
> of the IO queue on the disk, and then the return leg -- that is the
> *minimum* time you should expect to wait for an fsync.
>
> For example, a real world workload creating N files in a directory would
> hopefully call fsync on the directory once at the end, rather than in
> between every file, unless you really do need to be sure that the dentry for
> the preceding file will be persistent before you start writing the next
> file.
>
> Sometimes it's easier to reason about it in terms of concurrency: if you
> have a bunch of IOs that you could safely run in parallel in a thread each,
> then you shouldn't be fsyncing between them, just at the point you join
> them.
>
> John

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-05-26 15:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-20 18:15 Slow file creating and deleting using bonnie ++ on Hammer Barclay Jameson
2015-05-20 18:22 ` Gregory Farnum
2015-05-20 19:00   ` Barclay Jameson
2015-05-22 15:25     ` Barclay Jameson
2015-05-22 15:34       ` John Spray
2015-05-22 17:34         ` Barclay Jameson
2015-05-26  6:55           ` Yan, Zheng
2015-05-26 11:57             ` John Spray
2015-05-26 14:08               ` Mark Nelson
2015-05-26 14:30               ` Gregory Farnum
2015-05-26 14:50                 ` Barclay Jameson
2015-05-26 15:40                   ` John Spray
2015-05-26 15:54                     ` Barclay Jameson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.