From: Alexandre DERUMIER <aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
To: Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org>
Cc: ceph-users <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>,
ceph-devel <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: ceph osd commit latency increase over time, until restart
Date: Wed, 30 Jan 2019 14:45:09 +0100 (CET) [thread overview]
Message-ID: <986643198.245013.1548855909082.JavaMail.zimbra@oxygem.tv> (raw)
In-Reply-To: <alpine.DEB.2.11.1901301331580.5535-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
>>I don't see any smoking gun here... :/
I need to test to compare when latency are going very high, but I need to wait more days/weeks.
>>The main difference between a warm OSD and a cold one is that on startup
>>the bluestore cache is empty. You might try setting the bluestore cache
>>size to something much smaller and see if that has an effect on the CPU
>>utilization?
I will try to test. I also wonder if the new auto memory tuning from Mark could help too ?
(I'm still on mimic 13.2.1, planning to update to 13.2.5 next month)
also, could check some bluestore related counters ? (onodes, rocksdb,bluestore cache....)
>>Note that this doesn't necessarily mean that's what you want. Maybe the
>>reason why the CPU utilization is higher is because the cache is warm and
>>the OSD is serving more requests per second...
Well, currently, the server is really quiet
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 2,00 515,00 48,00 1182,00 304,00 11216,00 18,73 0,01 0,00 0,00 0,00 0,01 1,20
%Cpu(s): 1,5 us, 1,0 sy, 0,0 ni, 97,2 id, 0,2 wa, 0,0 hi, 0,1 si, 0,0 st
And this is only with writes, not reads
----- Mail original -----
De: "Sage Weil" <sage@newdream.net>
À: "aderumier" <aderumier@odiso.com>
Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 30 Janvier 2019 14:33:23
Objet: Re: ceph osd commit latency increase over time, until restart
On Wed, 30 Jan 2019, Alexandre DERUMIER wrote:
> Hi,
>
> here some new results,
> different osd/ different cluster
>
> before osd restart latency was between 2-5ms
> after osd restart is around 1-1.5ms
>
> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms)
> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms)
> http://odisoweb1.odiso.net/cephperf2/diff.txt
I don't see any smoking gun here... :/
The main difference between a warm OSD and a cold one is that on startup
the bluestore cache is empty. You might try setting the bluestore cache
size to something much smaller and see if that has an effect on the CPU
utilization?
Note that this doesn't necessarily mean that's what you want. Maybe the
reason why the CPU utilization is higher is because the cache is warm and
the OSD is serving more requests per second...
sage
>
> >From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong.
>
> (I'm using tcmalloc 2.5-2.2)
>
>
> ----- Mail original -----
> De: "Sage Weil" <sage@newdream.net>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Vendredi 25 Janvier 2019 10:49:02
> Objet: Re: ceph osd commit latency increase over time, until restart
>
> Can you capture a perf top or perf record to see where teh CPU time is
> going on one of the OSDs wth a high latency?
>
> Thanks!
> sage
>
>
> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote:
>
> >
> > Hi,
> >
> > I have a strange behaviour of my osd, on multiple clusters,
> >
> > All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers,
> > workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup
> >
> > When the osd are refreshly started, the commit latency is between 0,5-1ms.
> >
> > But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy
> > values like 20-200ms.
> >
> > Some example graphs:
> >
> > http://odisoweb1.odiso.net/osdlatency1.png
> > http://odisoweb1.odiso.net/osdlatency2.png
> >
> > All osds have this behaviour, in all clusters.
> >
> > The latency of physical disks is ok. (Clusters are far to be full loaded)
> >
> > And if I restart the osd, the latency come back to 0,5-1ms.
> >
> > That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ?
> >
> > Any Hints for counters/logs to check ?
> >
> >
> > Regards,
> >
> > Alexandre
> >
> >
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
prev parent reply other threads:[~2019-01-30 13:45 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <395511117.2665.1548405853447.JavaMail.zimbra@oxygem.tv>
[not found] ` <395511117.2665.1548405853447.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-01-25 9:14 ` ceph osd commit latency increase over time, until restart Alexandre DERUMIER
[not found] ` <387140705.12275.1548407699184.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-01-25 9:49 ` Sage Weil
[not found] ` <alpine.DEB.2.11.1901250948390.1384-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
2019-01-25 10:06 ` Alexandre DERUMIER
[not found] ` <837655257.15253.1548410811958.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-01-25 16:32 ` Alexandre DERUMIER
[not found] ` <787014196.28895.1548433922173.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-01-25 16:40 ` Alexandre DERUMIER
2019-01-30 7:33 ` Alexandre DERUMIER
[not found] ` <1548181710.219518.1548833599717.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-01-30 7:45 ` Stefan Priebe - Profihost AG
[not found] ` <e81456d6-8361-5ca5-2b98-7a90948c0218-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2019-01-30 13:59 ` Alexandre DERUMIER
[not found] ` <317086845.245472.1548856741512.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-01-30 18:50 ` Stefan Priebe - Profihost AG
[not found] ` <85320911-75f8-0e9d-af71-151391839153-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2019-01-30 18:58 ` Alexandre DERUMIER
[not found] ` <1814646360.255765.1548874695212.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-04 8:38 ` Alexandre DERUMIER
[not found] ` <494474215.139609.1549269491013.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-04 14:17 ` Alexandre DERUMIER
[not found] ` <229754897.167048.1549289833437.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-04 14:51 ` Igor Fedotov
[not found] ` <0ab7d2b9-3611-c380-cbf6-c39cec0e673d-l3A5Bk7waGM@public.gmane.org>
2019-02-04 15:04 ` Alexandre DERUMIER
[not found] ` <1323366475.173629.1549292678511.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-04 15:40 ` Alexandre DERUMIER
[not found] ` <2062110719.174905.1549294821422.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-05 17:56 ` Igor Fedotov
[not found] ` <d4558d4b-b1c9-211a-626a-0c14df3e29b9-l3A5Bk7waGM@public.gmane.org>
2019-02-08 15:08 ` Alexandre DERUMIER
2019-02-08 15:14 ` Alexandre DERUMIER
[not found] ` <825077993.841032.1549638894023.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-08 15:57 ` Alexandre DERUMIER
[not found] ` <2132634351.842536.1549641461010.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-11 11:03 ` Igor Fedotov
[not found] ` <c26e0eca-1a1c-3354-bff6-4560e3aea4c5-l3A5Bk7waGM@public.gmane.org>
2019-02-13 8:42 ` Alexandre DERUMIER
[not found] ` <1554220830.1076801.1550047328269.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2019-02-15 12:46 ` Igor Fedotov
2019-02-15 12:47 ` Igor Fedotov
[not found] ` <f97b81e4-265d-cd8e-3053-321d988720c4-l3A5Bk7waGM@public.gmane.org>
2019-02-15 13:31 ` Alexandre DERUMIER
[not found] ` <19368722.1223708.1550237472044.JavaMail.zimbra-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
2019-02-15 13:50 ` Wido den Hollander
[not found] ` <056c13b4-fbcf-787f-cfbe-bb37044161f8-fspyXLx8qC4@public.gmane.org>
2019-02-15 13:54 ` Alexandre DERUMIER
[not found] ` <1345632100.1225626.1550238886648.JavaMail.zimbra-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
2019-02-15 13:59 ` Wido den Hollander
[not found] ` <fdd3eaa2-567b-8e02-aadb-64a19c78bc23-fspyXLx8qC4@public.gmane.org>
2019-02-16 8:29 ` Alexandre DERUMIER
[not found] ` <622347904.1243911.1550305749920.JavaMail.zimbra-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
2019-02-19 10:12 ` Igor Fedotov
[not found] ` <76764043-4d0d-bb46-2e2e-0b4261963a98-l3A5Bk7waGM@public.gmane.org>
2019-02-19 16:03 ` Alexandre DERUMIER
[not found] ` <121987882.59219.1550592238495.JavaMail.zimbra-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
2019-02-20 10:39 ` Alexandre DERUMIER
[not found] ` <190289279.94469.1550659174801.JavaMail.zimbra-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
2019-02-20 11:09 ` Alexandre DERUMIER
[not found] ` <1938718399.96269.1550660948828.JavaMail.zimbra-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
2019-02-20 13:43 ` Alexandre DERUMIER
[not found] ` <1979343949.99892.1550670199633.JavaMail.zimbra-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>
2019-02-21 16:27 ` Alexandre DERUMIER
2019-02-28 20:57 ` Stefan Kooman
[not found] ` <20190228205705.GB31731-VkyGEX2O1ez1kYbDYJMsfg@public.gmane.org>
2019-02-28 22:00 ` Igor Fedotov
[not found] ` <392d66bb-5647-9b19-c17b-5259f4ed6749-l3A5Bk7waGM@public.gmane.org>
2019-02-28 22:01 ` Igor Fedotov
[not found] ` <CAEYCsVJRqJDsS7iMXuk68ecFpPS9_qivuNPihXhy7E55o+GvoA@mail.gmail.com>
[not found] ` <CAEYCsVJRqJDsS7iMXuk68ecFpPS9_qivuNPihXhy7E55o+GvoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-03-01 10:24 ` Igor Fedotov
2019-03-01 10:26 ` Igor Fedotov
2019-03-01 8:29 ` Alexandre DERUMIER
2019-01-30 13:33 ` Sage Weil
[not found] ` <alpine.DEB.2.11.1901301331580.5535-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
2019-01-30 13:45 ` Alexandre DERUMIER [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=986643198.245013.1548855909082.JavaMail.zimbra@oxygem.tv \
--to=aderumier-u/x3por4x10avxtiumwx3w@public.gmane.org \
--cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org \
--cc=sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.