From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jevon Qiao Subject: Any suggestion to deal with slow request? Date: Thu, 7 Jan 2016 12:03:41 +0800 Message-ID: <568DE39D.6060409@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0889333693==" Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: ceph-users-Qp0mS5GaXlQ@public.gmane.org, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: ceph-devel.vger.kernel.org This is a multi-part message in MIME format. --===============0889333693== Content-Type: multipart/alternative; boundary="------------040705000803090007020709" This is a multi-part message in MIME format. --------------040705000803090007020709 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi Cephers, We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with 3 replicas. Recently, some OSDs keep reporting slow request and the cluster has a performance downgrade. From the log of one OSD, I observe that all the slow requests are resulted from waiting for the replicas to complete. And the replication OSDs are not always some specific ones but could be any other two OSDs. 2016-01-06 08:17:11.887016 7f175ef25700 0 log [WRN] : slow request 1.162776 seconds old, received at 2016-01-06 08:17:11.887092: osd_op(client.13302933.0:839452 rbd_data.c2659c728b0ddb.0000000000000024 [stat,set-alloc-hint object_size 16777216 write_size 16777216,write 12099584~8192] 3.abd08522 ack+ondisk+write e4661) v4 currently waiting for subops from 24,31 I dumped out the historic Ops of the OSD and noticed the following information: 1) wait about 8 seconds for the replies from the replica OSDs. { "time": "2016-01-06 08:17:03.879264", "event": "op_applied"}, { "time": "2016-01-06 08:17:11.684598", "event": "sub_op_applied_rec"}, { "time": "2016-01-06 08:17:11.687016", "event": "sub_op_commit_rec"}, 2) spend more than 3 seconds in writeq and 2 seconds to write the journal. { "time": "2016-01-06 08:19:16.887519", "event": "commit_queued_for_journal_write"}, { "time": "2016-01-06 08:19:20.109339", "event": "write_thread_in_journal_buffer"}, { "time": "2016-01-06 08:19:22.177952", "event": "journaled_completion_queued"}, Any ideas or suggestions? BTW, I checked the underlying network with iperf, it works fine. Thanks, Jevon --------------040705000803090007020709 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Cephers,

We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with 3 replicas. Recently, some OSDs keep reporting slow request and the cluster has a performance downgrade.

From the log of one OSD, I observe that all the slow requests are resulted from waiting for the replicas to complete. And the replication OSDs are not always some specific ones but could be any other two OSDs.
2016-01-06 08:17:11.887016 7f175ef25700=C2=A0 0 log [WRN]= : slow request 1.162776 seconds old, received at 2016-01-06 08:17:11.887092: osd_op(client.13302933.0:839452 rbd_data.c2659c728b0ddb.0000000000000024 [stat,set-alloc-hint object_size 16777216 write_size 16777216,write 12099584~8192] 3.abd08522 ack+ondisk+write e4661) v4 currently waiting for subops from 24,31
I dumped out the historic Ops of the OSD and noticed the following information:
1) wait about 8 seconds for the replies from the replica OSDs.
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { "time": "2016-01-06 08:17= :03.879264",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "event": "op_ap= plied"},
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { "time": "2016-01-06 08:17= :11.684598",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "event": "sub_o= p_applied_rec"},
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { "time": "2016-01-06 08:17= :11.687016",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "event": "sub_o= p_commit_rec"},

2) spend more than 3 seconds in writeq and 2 seconds to write the journal.
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { "time": "2016-01-06 08:19:16.887519",=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "event": "commi= t_queued_for_journal_write"},
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { "time": "2016-01-06 08:19= :20.109339",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "event": "write= _thread_in_journal_buffer"},
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { "time": "2016-01-06 08:19= :22.177952",
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "event": "journ= aled_completion_queued"},

Any ideas or suggestions?

BTW, I checked the underlying network with iperf, it works fine.

Thanks,
Jevon
--------------040705000803090007020709-- --===============0889333693== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --===============0889333693==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert LeBlanc Subject: Re: [ceph-users] Any suggestion to deal with slow request? Date: Thu, 7 Jan 2016 09:43:55 -0700 Message-ID: References: <568DE39D.6060409@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: Received: from mail-vk0-f43.google.com ([209.85.213.43]:34915 "EHLO mail-vk0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750961AbcAGQn4 (ORCPT ); Thu, 7 Jan 2016 11:43:56 -0500 Received: by mail-vk0-f43.google.com with SMTP id k1so177028101vkb.2 for ; Thu, 07 Jan 2016 08:43:56 -0800 (PST) In-Reply-To: <568DE39D.6060409@gmail.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Jevon Qiao Cc: Ceph-User , ceph-devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 What is the file system on the OSDs? Anything interesting in iostat/atop? What are the drives backing the OSDs? A few more details would be helpful. - ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Jan 6, 2016 at 9:03 PM, Jevon Qiao wrote: > Hi Cephers, > > We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with 3 > replicas. Recently, some OSDs keep reporting slow request and the cluster > has a performance downgrade. > > From the log of one OSD, I observe that all the slow requests are resulted > from waiting for the replicas to complete. And the replication OSDs are not > always some specific ones but could be any other two OSDs. > > 2016-01-06 08:17:11.887016 7f175ef25700 0 log [WRN] : slow request 1.162776 > seconds old, received at 2016-01-06 08:17:11.887092: > osd_op(client.13302933.0:839452 rbd_data.c2659c728b0ddb.0000000000000024 > [stat,set-alloc-hint object_size 16777216 write_size 16777216,write > 12099584~8192] 3.abd08522 ack+ondisk+write e4661) v4 currently waiting for > subops from 24,31 > > I dumped out the historic Ops of the OSD and noticed the following > information: > 1) wait about 8 seconds for the replies from the replica OSDs. > { "time": "2016-01-06 08:17:03.879264", > "event": "op_applied"}, > { "time": "2016-01-06 08:17:11.684598", > "event": "sub_op_applied_rec"}, > { "time": "2016-01-06 08:17:11.687016", > "event": "sub_op_commit_rec"}, > > 2) spend more than 3 seconds in writeq and 2 seconds to write the journal. > { "time": "2016-01-06 08:19:16.887519", > "event": "commit_queued_for_journal_write"}, > { "time": "2016-01-06 08:19:20.109339", > "event": "write_thread_in_journal_buffer"}, > { "time": "2016-01-06 08:19:22.177952", > "event": "journaled_completion_queued"}, > > Any ideas or suggestions? > > BTW, I checked the underlying network with iperf, it works fine. > > Thanks, > Jevon > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.3.2 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJWjpXHCRDmVDuy+mK58QAAG7EP/Rho9KKiV1ipfzWja48T GOcXZECZZKRZTRO2GZR7jdxMj7ZfEVkm+JDo+i6ZWp6PrNwMDGA10t3ehkPQ FToE6O9Fj42heJGELUGYYVZVLif9d875ZHzjrUSUyPKM+Np6+N4FIjX9v0EV U1D7Kv6RCKHdnhuOm0LE/PWuUlgTTCzo50ujWP0lyCtsgRQoN/5ednz6HfsA ba4yiv8sl2g0/Qhd5KDXMqYKWJS26u3ST3nN8Pn7XI9AR+J7y79yGwrWiwre qMlSkuLOIrjyXmj2jhobEcOpyd9EOTq6/giKtgWc9p1Nu9+ypaQJNSomSF9T X2Stg5UKkl/cSG4m/5gUXOoO5fVzTxXOmiq7QcSQEXSE1LJO8+X1iWo7XcAD WUY001kZQNHxVNEexg/xDAvh348MsaKz39QKc79IlyFojM2sv4LS/65W9ZUp rh6CWnyLBLutLDg6Z1Gb3Aj8ThmOaMkCjE4O5GvgjiYqLgrcCYxuc558hVcx 2ywb+yb5xC8Y2mP1hUG7Zc2WVHtKoZKtUhOZvH5D2DpUBd4gOPdMbWyvi96o 2DNkN/zszlQMP1FHEWcmjd0zOauoxtVCKsUXGfzwAHha4Jn1hX/UyRt5ryM1 y9GBsTg7CeL1zIXYlNFlKn9039ySCNzjkxncxV4KVcRMTX/Ydp1xQquGUIw0 0Ytw =0D5S -----END PGP SIGNATURE----- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jevon Qiao Subject: Re: Any suggestion to deal with slow request? Date: Fri, 8 Jan 2016 12:22:04 +0800 Message-ID: <568F396C.9020700@gmail.com> References: <568DE39D.6060409@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0196118009==" Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Robert LeBlanc Cc: ceph-devel , Ceph-User List-Id: ceph-devel.vger.kernel.org This is a multi-part message in MIME format. --===============0196118009== Content-Type: multipart/alternative; boundary="------------080103060507070002030505" This is a multi-part message in MIME format. --------------080103060507070002030505 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi Robert, Thank you for the prompt response. The OSDs are built on XFS and the drives are Intel SSDs. Each SSD is=20 parted into two partitions, one is for journal, the other is for data.=20 There is no alignment issue for the partitions. When slow request msg is outputted, the workload is quite light on the=20 replication OSDs. Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.50 30.00 0.00 0.18 =20 12.33 0.00 0.08 0.08 0.25 sdb 0.00 0.50 0.50 78.00 0.00 0.75 =20 19.57 0.09 1.20 0.08 0.60 sdc 0.00 0.50 0.00 28.00 0.00 0.24 =20 17.75 0.01 0.32 0.11 0.30 I benchmarked some OSDs with 'ceph tell osd.x bench'=EF=BC=8Cand learned = that=20 the throughput for some OSDs(the disk usage is over 60%) is 21MB/s,=20 which seems abnormal. $ ceph tell osd.24 bench { "bytes_written": 1073741824, "blocksize": 4194304, "bytes_per_sec": "22995975.000000"} But the throughput for some newly added OSDs can reach 370MB/s. I=20 suspect if it is related to the GC of SSD. If so, it might explain why=20 it takes such long time to write journal. Any idea? Another phenomenon that the journal_write is queued in writeq for 3=20 seconds, I checked the corresponding process logic in function=20 FileJournal::submit_entry() and FileJournal::write_thread_entry(), I did=20 not find anything suspicious point. Thanks, Jevon On 8/1/16 00:43, Robert LeBlanc wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > What is the file system on the OSDs? Anything interesting in > iostat/atop? What are the drives backing the OSDs? A few more details > would be helpful. > - ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Wed, Jan 6, 2016 at 9:03 PM, Jevon Qiao wrote: >> Hi Cephers, >> >> We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with = 3 >> replicas. Recently, some OSDs keep reporting slow request and the clus= ter >> has a performance downgrade. >> >> From the log of one OSD, I observe that all the slow requests are res= ulted >> from waiting for the replicas to complete. And the replication OSDs ar= e not >> always some specific ones but could be any other two OSDs. >> >> 2016-01-06 08:17:11.887016 7f175ef25700 0 log [WRN] : slow request 1.= 162776 >> seconds old, received at 2016-01-06 08:17:11.887092: >> osd_op(client.13302933.0:839452 rbd_data.c2659c728b0ddb.00000000000000= 24 >> [stat,set-alloc-hint object_size 16777216 write_size 16777216,write >> 12099584~8192] 3.abd08522 ack+ondisk+write e4661) v4 currently waiting= for >> subops from 24,31 >> >> I dumped out the historic Ops of the OSD and noticed the following >> information: >> 1) wait about 8 seconds for the replies from the replica OSDs. >> { "time": "2016-01-06 08:17:03.879264", >> "event": "op_applied"}, >> { "time": "2016-01-06 08:17:11.684598", >> "event": "sub_op_applied_rec"}, >> { "time": "2016-01-06 08:17:11.687016", >> "event": "sub_op_commit_rec"}, >> >> 2) spend more than 3 seconds in writeq and 2 seconds to write the jour= nal. >> { "time": "2016-01-06 08:19:16.887519", >> "event": "commit_queued_for_journal_write"}, >> { "time": "2016-01-06 08:19:20.109339", >> "event": "write_thread_in_journal_buffer"}, >> { "time": "2016-01-06 08:19:22.177952", >> "event": "journaled_completion_queued"}, >> >> Any ideas or suggestions? >> >> BTW, I checked the underlying network with iperf, it works fine. >> >> Thanks, >> Jevon --------------080103060507070002030505 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Robert,

Thank you for the prompt response.

The OSDs are built on XFS and the drives are Intel SSDs.=C2=A0 Each S= SD is parted into two partitions, one is for journal, the other is for data. There is no alignment issue for the partitions.

When slow request msg is outputted, the workload is quite light on the replication OSDs.
Device:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 r= rqm/s=C2=A0=C2=A0 wrqm/s=C2=A0=C2=A0=C2=A0=C2=A0 r/s=C2=A0=C2=A0=C2=A0=C2= =A0 w/s=C2=A0=C2=A0=C2=A0 rMB/s=C2=A0=C2=A0=C2=A0 wMB/s avgrq-sz avgqu-sz=C2=A0=C2=A0 await=C2= =A0 svctm=C2=A0 %util
sda=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0=C2=A0 0.= 50=C2=A0=C2=A0 30.00=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0=C2=A0=C2=A0 0.18=C2=A0=C2=A0=C2=A0 12.33=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0= =C2=A0 0.08=C2=A0=C2=A0 0.08=C2=A0=C2=A0 0.25
sdb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0=C2=A0=C2=A0 0.50=C2=A0=C2=A0=C2=A0 0.= 50=C2=A0=C2=A0 78.00=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0=C2=A0=C2=A0 0.75=C2=A0=C2=A0=C2=A0 19.57=C2=A0=C2=A0=C2=A0=C2=A0 0.09=C2=A0=C2=A0= =C2=A0 1.20=C2=A0=C2=A0 0.08=C2=A0=C2=A0 0.60
sdc=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0=C2=A0=C2=A0 0.50=C2=A0=C2=A0=C2=A0 0.= 00=C2=A0=C2=A0 28.00=C2=A0=C2=A0=C2=A0=C2=A0 0.00=C2=A0=C2=A0=C2=A0=C2=A0 0.24=C2=A0=C2=A0=C2=A0 17.75=C2=A0=C2=A0=C2=A0=C2=A0 0.01=C2=A0=C2=A0= =C2=A0 0.32=C2=A0=C2=A0 0.11=C2=A0=C2=A0 0.30
I benchmarked some OSDs with 'ceph tell osd.x bench'=EF=BC=8Cand lear= ned that the throughput for some OSDs(the disk usage is over 60%) is 21MB/s, which seems abnormal.
$ ceph tell osd.24 bench
{ "bytes_written": 1073741824,
=C2=A0 "blocksize": 4194304,
=C2=A0 "bytes_per_sec": "22995975.000000"}
But the throughput for some newly added OSDs can reach 370MB/s. I suspect if it is related to the GC of SSD. If so, it might explain why it takes such long time to write journal. Any idea?

Another phenomenon that the journal_write is queued in writeq for 3 seconds, I checked the corresponding process logic in function FileJournal::submit_entry() and FileJournal::write_thread_entry(), I did not find anything suspicious point.

Thanks,
Jevon
On 8/1/16 00:43, Robert LeBlanc wrote:=
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

What is the file system on the OSDs? Anything interesting in
iostat/atop? What are the drives backing the OSDs? A few more details
would be helpful.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jan 6, 2016 at 9:03 PM, Jevon Qiao  wrote:
Hi Cephers,

We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with 3
replicas. Recently, some OSDs keep reporting slow request and the cluster
has a performance downgrade.

>From the log of one OSD, I observe that all the slow requests are resulte=
d
from waiting for the replicas to complete. And the replication OSDs are n=
ot
always some specific ones but could be any other two OSDs.

2016-01-06 08:17:11.887016 7f175ef25700  0 log [WRN] : slow request 1.162=
776
seconds old, received at 2016-01-06 08:17:11.887092:
osd_op(client.13302933.0:839452 rbd_data.c2659c728b0ddb.0000000000000024
[stat,set-alloc-hint object_size 16777216 write_size 16777216,write
12099584~8192] 3.abd08522 ack+ondisk+write e4661) v4 currently waiting fo=
r
subops from 24,31

I dumped out the historic Ops of the OSD and noticed the following
information:
1) wait about 8 seconds for the replies from the replica OSDs.
                    { "time": "2016-01-06 08:17:03.879264",
                      "event": "op_applied"},
                    { "time": "2016-01-06 08:17:11.684598",
                      "event": "sub_op_applied_rec"},
                    { "time": "2016-01-06 08:17:11.687016",
                      "event": "sub_op_commit_rec"},

2) spend more than 3 seconds in writeq and 2 seconds to write the journal=
.
                  { "time": "2016-01-06 08:19:16.887519",
                      "event": "commit_queued_for_journal_write"},
                    { "time": "2016-01-06 08:19:20.109339",
                      "event": "write_thread_in_journal_buffer"},
                    { "time": "2016-01-06 08:19:22.177952",
                      "event": "journaled_completion_queued"},

Any ideas or suggestions?

BTW, I checked the underlying network with iperf, it works fine.

Thanks,
Jevon
--------------080103060507070002030505-- --===============0196118009== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --===============0196118009==--