Multiple OSDs suicide because of client issues?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Multiple OSDs suicide because of client issues?
@ 2015-11-21  7:34 Robert LeBlanc
  2015-11-23 16:03 ` Gregory Farnum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2015-11-21  7:34 UTC (permalink / raw)
  To: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We had two interesting issues today. In both cases multiple OSDs
suicided at the exact same moment. The first incident had four OSDs,
the second had 12.

First set:
145,159,79,176

Second Set:
osd.177 down at 20:59:48,
osd.131, osd.136, osd.133, osd.139, osd.175, osd.170, osd.73 down at 20:00:03,
osd.178, osd.179 down at 21:00:07,
osd.159 down at 21:01:22,
osd.110 down at 21:01:28

Only one OSD failed both times and only a couple of boxes had more
than one OSD fail. The failures were spread out throughout the
cluster. There is nothing in dmesg/messages/sar that indicate there
was any type of hardware problem on the OSD hosts. All the OSDs
indicate slow I/O and heartbeats missing starting at 20:57:32.

The other odd thing is that most of the VMs across our 16 KVM hosts
were fine, but several VMs on one host had kernel panics. In the
messages logs of that host we see a kernel backtrace:

Nov 20 20:58:32 compute8 kernel: WARNING: CPU: 4 PID: 0 at
net/core/dev.c:2223 skb_warn_bad_offload+0xb6/0xbd()

That host's clock was exactly one minute fast. Everything points to
this host as having the issue, but I'm having a hard time
understanding how a client (or several clients) could cause several
OSDs to suicide. Can a non-responsive client in some way cause the OSD
to fault?

We have migrated all the VMs off this host and will continue to
monitor the cluster. If there is interest in the logs (librbd did not
dump any logs) from the OSDs, I can make them available.

Thanks,
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWUB5pCRDmVDuy+mK58QAAXbIP/jZyfbAalXRr4dFpEU4n
OL5X0vBLCeg1UMbXjBRXbWlrUKHBvkruU0JCWEWEb9FFDfdkEggYwZazUVx/
b3LU8LUGWD56wtaho8/V9FbDPsRD943k6TC+FoF4TL/FFuuiJ/Elnt97Fkkg
xSBTKWS3p0I7PpSrefX+lUsDMzWJ1n6HTlYnE8SmlkkOgudh4IFFRObrv0Yr
VPOzcD3RQULdFhEdtNZYUfVudypfKz1uFyq/FtgMSQbiHeQTn0JgD6ykWKZg
9VWAVIUiHjyQn97KasCwpJjc2Vab5cUKJuUzLg72WlEKV/Q8rRcqHmxP10Pm
+h30G2N1F9JsDmEeFtYNdd/AcvsoDIRaqQ7GzJf99sJAbLQCY9VT/LWd5H52
PPUtRTHU8pr78rtVdOtQG1sxOZHvaNpPM9MQYnoxRkiCixazbO6dWVmuq32S
iEaom2J1jNxUE+RUxHMVtb+qv4jOEMHBGpdragajslqiWKZrvtsPfVyn/E0s
8m3nj67jkN4xMro3/fRJqeLUqc6QHAN/BXoTMm7flzFJyQ1fZ1l/Up8xR07J
5xtl15vOf2Xa+IVFYPkLOoV+J/mTNiIQYaQKnkqYkL2OcbOq88TFHPUJ011+
SegMD1aIYCUjLYbq+DVqarsnsJbSC51B6aR5Ko+ZOvHyMYYyRPfU4DBqGWO/
GlcH
=sFlz
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-21  7:34 Multiple OSDs suicide because of client issues? Robert LeBlanc
@ 2015-11-23 16:03 ` Gregory Farnum
       [not found]   ` <CAANLjFo=vCsny5=JW1wYiQk5S=oXdtVd0OzXEC=uTGgmDO9ydA@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Gregory Farnum @ 2015-11-23 16:03 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: ceph-devel

On Sat, Nov 21, 2015 at 1:34 AM, Robert LeBlanc <robert@leblancnet.us> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> We had two interesting issues today. In both cases multiple OSDs
> suicided at the exact same moment. The first incident had four OSDs,
> the second had 12.
>
> First set:
> 145,159,79,176
>
> Second Set:
> osd.177 down at 20:59:48,
> osd.131, osd.136, osd.133, osd.139, osd.175, osd.170, osd.73 down at 20:00:03,
> osd.178, osd.179 down at 21:00:07,
> osd.159 down at 21:01:22,
> osd.110 down at 21:01:28
>
> Only one OSD failed both times and only a couple of boxes had more
> than one OSD fail. The failures were spread out throughout the
> cluster. There is nothing in dmesg/messages/sar that indicate there
> was any type of hardware problem on the OSD hosts. All the OSDs
> indicate slow I/O and heartbeats missing starting at 20:57:32.
>
> The other odd thing is that most of the VMs across our 16 KVM hosts
> were fine, but several VMs on one host had kernel panics. In the
> messages logs of that host we see a kernel backtrace:
>
> Nov 20 20:58:32 compute8 kernel: WARNING: CPU: 4 PID: 0 at
> net/core/dev.c:2223 skb_warn_bad_offload+0xb6/0xbd()
>
> That host's clock was exactly one minute fast. Everything points to
> this host as having the issue, but I'm having a hard time
> understanding how a client (or several clients) could cause several
> OSDs to suicide. Can a non-responsive client in some way cause the OSD
> to fault?

No, it shouldn't be able to just by having clock issues or whatever.
There *are* still some ways a malformed request can cause the OSDs to
crash, though — it looks like maybe this is a network card issue? That
could have maybe flipped some bits that broke stuff. What's the
backtrace on the OSDs?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
       [not found]   ` <CAANLjFo=vCsny5=JW1wYiQk5S=oXdtVd0OzXEC=uTGgmDO9ydA@mail.gmail.com>
@ 2015-11-23 17:17     ` Gregory Farnum
  2015-11-23 17:27       ` Robert LeBlanc
  0 siblings, 1 reply; 15+ messages in thread
From: Gregory Farnum @ 2015-11-23 17:17 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: ceph-devel

On Mon, Nov 23, 2015 at 11:03 AM, Robert LeBlanc <robert@leblancnet.us> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> The backtrace is:
>
> 2015-11-20 20:59:48.856679 7f7012ff7700 -1 common/HeartbeatMap.cc: In
> function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7f7012ff7700 time 2015-11-20
> 20:59:48.833166
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0xbc9d85]
>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
> const*, long)+0x2d9) [0xaff1f9]
>  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0xaffaee]
>  4: (OSD::handle_osd_ping(MOSDPing*)+0x733) [0x696c43]
>  5: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x697ebb]
>  6: (DispatchQueue::entry()+0x62a) [0xc84c9a]
>  7: (DispatchQueue::DispatchThread::entry()+0xd) [0xba81cd]
>  8: (()+0x7df5) [0x7f702d85ddf5]
>  9: (clone()+0x6d) [0x7f702c3401ad]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.
>
> - --- begin dump of recent events ---
>
> We have had problems with Large Receive Offloads and KVM VMs before. I
> think this host just got missed, or maybe it is something different.
> I'm ok with a host having a hard time accessing the Ceph cluster. I'm
> a bit concerned if a misbehaving client can cause multiple OSDs to
> fault. It would be good if the OSD is resistant to things like this by
> compartmentalizing them to only those cilents/connections.

Just this backtrace doesn't help much (something was slow, and it
timed out!), but there should be a log line including "had suicide
timed out after" just ahead of it (in that thread).
I guess it's vaguely possible the LRO got busted since the network
card on your client was dead? Not really anything we can do about that
though...

>I'm attaching the entire OSD log in case it is useful.

Uh, that doesn't have the backtrace in it.
-Greg

>
> Thanks for taking a look at this.
>
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Nov 23, 2015 at 9:03 AM, Gregory Farnum  wrote:
>> No, it shouldn't be able to just by having clock issues or whatever.
>> There *are* still some ways a malformed request can cause the OSDs to
>> crash, though — it looks like maybe this is a network card issue? That
>> could have maybe flipped some bits that broke stuff. What's the
>> backtrace on the OSDs?
>> -Greg
>
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWU0bgCRDmVDuy+mK58QAAcysP/1xI6paI89WDozrmE2sY
> ehaF4sZsyy6y6mizsp+g7dXErNXtCIRQIg+LDjtS+SOnni+Z/XAhmLlCb5xM
> tid3xqQhQPLD66QhFQsxEGQxvWI5urqHnGWRhpbjpz8Xa0ReAHYCLj8K6hh0
> f7FHyqEjsEDtcqrk3+EI6bklBW7xgJy4zHQG+0MiZarzh5gSXvEpxrXo2KIr
> qBUcEE585jddVhvEv+VQVuBagQlBEMLo4RTz+5mdwneijIGAIQlOUCXVTogp
> d6aLaVQyCNMiAblJoFzr/UeV7E5ajQzd4QZ5i9H7ZD1sCwWMdV/pQNyYoDWk
> 3dBQXeYrkU2KlH14iKOJa1jxAPWg9mnnsguesir1aWunR+LamL2tbBlgXcXG
> 0NjIfl7q0yMm89jb7/JVAr8nyp3gOHdNaPRfd8FTilYoLGJFEB1j25q2qlBP
> 8IBSZbldXlXi9HB78cU3/I2o44CsrPPzZgN0iJ0fT7mbRPujkZbsdk3SbFtu
> eG1dXsZLNdSOgll5gSj11U8Kt4HvkF9dhatmqYeyZGFeBHOJqKhi0dw6yZ2T
> sSFPsHRNt6vbc8ckF4NqyFyPTK5PTSqB8TdLiZXW8vHvWooxNOtdCFgjQtNY
> kdb1kLsNW/z5dgE218kvwUnAObXaB9RkEJ47xi9o2FbVya+eHMYdM0JaEYxt
> I48o
> =Uufa
> -----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 17:17     ` Gregory Farnum
@ 2015-11-23 17:27       ` Robert LeBlanc
  2015-11-23 17:33         ` Gregory Farnum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2015-11-23 17:27 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I checked the SAR data and the disks for all the OSDs showed usual
performance until 20:57:32 when over the next few minutes the I/OPs,
bandwidth and latency all decreased. The only thing that I can think
of is that some replies to the client got hung up and backed up the
OSD process or something. There are a couple of other backtraces in
the log file, but I could not trace any of them to something useful.

2015-11-20 20:59:48.867197 7f6f95637700  0 --
10.217.89.30:6804/1028318 >> 10.217.89.12:6800/29050 pipe(0x2fdd0000
sd=35 :57978 s=2 pgs=273 cs=1 l=0 c=0x419a9700).fault with nothing to
send, going to standby
2015-11-20 20:59:48.917626 7f7012ff7700 -1 *** Caught signal (Aborted) **
 in thread 7f7012ff7700

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xac8a32]
 2: (()+0xf130) [0x7f702d865130]
 3: (gsignal()+0x37) [0x7f702c27f5d7]
 4: (abort()+0x148) [0x7f702c280cc8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f702cb839b5]
 6: (()+0x5e926) [0x7f702cb81926]
 7: (()+0x5e953) [0x7f702cb81953]
 8: (()+0x5eb73) [0x7f702cb81b73]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x27a) [0xbc9f7a]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
const*, long)+0x2d9) [0xaff1f9]
 11: (ceph::HeartbeatMap::is_healthy()+0xde) [0xaffaee]
 12: (OSD::handle_osd_ping(MOSDPing*)+0x733) [0x696c43]
 13: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x697ebb]
 14: (DispatchQueue::entry()+0x62a) [0xc84c9a]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0xba81cd]
 16: (()+0x7df5) [0x7f702d85ddf5]
 17: (clone()+0x6d) [0x7f702c3401ad]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.

- --- begin dump of recent events ---
    -1> 2015-11-20 20:59:48.867197 7f6f95637700  0 --
10.217.89.30:6804/1028318 >> 10.217.89.12:6800/29050 pipe(0x2fdd0000
sd=35 :57978 s=2 pgs=273 cs=1 l=0 c=0x419a9700).fault with nothing to
send, going to standby
     0> 2015-11-20 20:59:48.917626 7f7012ff7700 -1 *** Caught signal
(Aborted) **
 in thread 7f7012ff7700

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xac8a32]
 2: (()+0xf130) [0x7f702d865130]
 3: (gsignal()+0x37) [0x7f702c27f5d7]
 4: (abort()+0x148) [0x7f702c280cc8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f702cb839b5]
 6: (()+0x5e926) [0x7f702cb81926]
 7: (()+0x5e953) [0x7f702cb81953]
 8: (()+0x5eb73) [0x7f702cb81b73]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x27a) [0xbc9f7a]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
const*, long)+0x2d9) [0xaff1f9]
 11: (ceph::HeartbeatMap::is_healthy()+0xde) [0xaffaee]
 12: (OSD::handle_osd_ping(MOSDPing*)+0x733) [0x696c43]
 13: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x697ebb]
 14: (DispatchQueue::entry()+0x62a) [0xc84c9a]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0xba81cd]
 16: (()+0x7df5) [0x7f702d85ddf5]
 17: (clone()+0x6d) [0x7f702c3401ad]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.

Since we took the VMs off that client, we haven't had the problem show up again.
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWU0yICRDmVDuy+mK58QAAyxcQAL7oA6TaXAEFLMzJRdO8
nt1LgGe0Q+l+PXqCatmk1kAKh8YM/yss0xriGCPpiar0m8KhiQtzlWOXTExk
DZIoYtFR7ZVzJCU2/1gQn8I/+tcYH7naxj2mCfyBuWz71wy1FFKfvdc/tUBx
h8pQ7e1w3eQfLayDw7ir/iU+iFlh4918DY61cqdblyAu5ALVvbNM1hdqVBau
nAwJsfIgtJyuzUXpxEk+TbH5VaZGwly1iJ2cVHvpPuSWhM0EzFGKsKYkHJbh
/XPecqMepzH6W9YK6cgmcqqKcWQoNoPoTCVvpBBkgzBCz5QiNIUobRKEx9yL
pQIy0eHlE7btLREEQRJ6jXXuvaBmLzVCHYiIBP68Efe5c9JU0+ZxmVjJ/H5b
gKWfi6SC80VMVyLPNEV35p+SK2UAjhmsplxpxErEkSj8U/8YdC0TzwauKwYN
k48ZiIWHfDN40cgcP/RuSZMuhfvqTSIyFifIGs5ADuDe47o3SIpI6rBt5MPs
ebmbvAMTT/1ez/JQ9ugJ83QKiSgPD/Sw5YffMF1S+J4mMKOGEl8mfv8HFyjo
J9chHcVYrQt8T3AaGKqJqwc4C4BKTGDm314Hf+iDxsROjMMzgtbGxGyQC7vv
SQnpMsQjikIZKsI/9hoAentFe9f3/ks7GZH2aEbUNTzz+BIn5pXHSycdXwb6
1TxG
=FmEY
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Nov 23, 2015 at 10:17 AM, Gregory Farnum <gfarnum@redhat.com> wrote:
> On Mon, Nov 23, 2015 at 11:03 AM, Robert LeBlanc <robert@leblancnet.us> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> The backtrace is:
>>
>> 2015-11-20 20:59:48.856679 7f7012ff7700 -1 common/HeartbeatMap.cc: In
>> function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
>> const char*, time_t)' thread 7f7012ff7700 time 2015-11-20
>> 20:59:48.833166
>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>>
>>  ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x85) [0xbc9d85]
>>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
>> const*, long)+0x2d9) [0xaff1f9]
>>  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0xaffaee]
>>  4: (OSD::handle_osd_ping(MOSDPing*)+0x733) [0x696c43]
>>  5: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x697ebb]
>>  6: (DispatchQueue::entry()+0x62a) [0xc84c9a]
>>  7: (DispatchQueue::DispatchThread::entry()+0xd) [0xba81cd]
>>  8: (()+0x7df5) [0x7f702d85ddf5]
>>  9: (clone()+0x6d) [0x7f702c3401ad]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.
>>
>> - --- begin dump of recent events ---
>>
>> We have had problems with Large Receive Offloads and KVM VMs before. I
>> think this host just got missed, or maybe it is something different.
>> I'm ok with a host having a hard time accessing the Ceph cluster. I'm
>> a bit concerned if a misbehaving client can cause multiple OSDs to
>> fault. It would be good if the OSD is resistant to things like this by
>> compartmentalizing them to only those cilents/connections.
>
> Just this backtrace doesn't help much (something was slow, and it
> timed out!), but there should be a log line including "had suicide
> timed out after" just ahead of it (in that thread).
> I guess it's vaguely possible the LRO got busted since the network
> card on your client was dead? Not really anything we can do about that
> though...
>
>>I'm attaching the entire OSD log in case it is useful.
>
> Uh, that doesn't have the backtrace in it.
> -Greg
>
>>
>> Thanks for taking a look at this.
>>
>> - ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Mon, Nov 23, 2015 at 9:03 AM, Gregory Farnum  wrote:
>>> No, it shouldn't be able to just by having clock issues or whatever.
>>> There *are* still some ways a malformed request can cause the OSDs to
>>> crash, though — it looks like maybe this is a network card issue? That
>>> could have maybe flipped some bits that broke stuff. What's the
>>> backtrace on the OSDs?
>>> -Greg
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v1.2.3
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWU0bgCRDmVDuy+mK58QAAcysP/1xI6paI89WDozrmE2sY
>> ehaF4sZsyy6y6mizsp+g7dXErNXtCIRQIg+LDjtS+SOnni+Z/XAhmLlCb5xM
>> tid3xqQhQPLD66QhFQsxEGQxvWI5urqHnGWRhpbjpz8Xa0ReAHYCLj8K6hh0
>> f7FHyqEjsEDtcqrk3+EI6bklBW7xgJy4zHQG+0MiZarzh5gSXvEpxrXo2KIr
>> qBUcEE585jddVhvEv+VQVuBagQlBEMLo4RTz+5mdwneijIGAIQlOUCXVTogp
>> d6aLaVQyCNMiAblJoFzr/UeV7E5ajQzd4QZ5i9H7ZD1sCwWMdV/pQNyYoDWk
>> 3dBQXeYrkU2KlH14iKOJa1jxAPWg9mnnsguesir1aWunR+LamL2tbBlgXcXG
>> 0NjIfl7q0yMm89jb7/JVAr8nyp3gOHdNaPRfd8FTilYoLGJFEB1j25q2qlBP
>> 8IBSZbldXlXi9HB78cU3/I2o44CsrPPzZgN0iJ0fT7mbRPujkZbsdk3SbFtu
>> eG1dXsZLNdSOgll5gSj11U8Kt4HvkF9dhatmqYeyZGFeBHOJqKhi0dw6yZ2T
>> sSFPsHRNt6vbc8ckF4NqyFyPTK5PTSqB8TdLiZXW8vHvWooxNOtdCFgjQtNY
>> kdb1kLsNW/z5dgE218kvwUnAObXaB9RkEJ47xi9o2FbVya+eHMYdM0JaEYxt
>> I48o
>> =Uufa
>> -----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 17:27       ` Robert LeBlanc
@ 2015-11-23 17:33         ` Gregory Farnum
  2015-11-23 18:03           ` Robert LeBlanc
  0 siblings, 1 reply; 15+ messages in thread
From: Gregory Farnum @ 2015-11-23 17:33 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: ceph-devel

On Mon, Nov 23, 2015 at 11:27 AM, Robert LeBlanc <robert@leblancnet.us> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> I checked the SAR data and the disks for all the OSDs showed usual
> performance until 20:57:32 when over the next few minutes the I/OPs,
> bandwidth and latency all decreased. The only thing that I can think
> of is that some replies to the client got hung up and backed up the
> OSD process or something.

That shouldn't really be possible but I seem to recall you've got a
weird network? So maybe.

> There are a couple of other backtraces in
> the log file, but I could not trace any of them to something useful.
>
> 2015-11-20 20:59:48.867197 7f6f95637700  0 --
> 10.217.89.30:6804/1028318 >> 10.217.89.12:6800/29050 pipe(0x2fdd0000
> sd=35 :57978 s=2 pgs=273 cs=1 l=0 c=0x419a9700).fault with nothing to
> send, going to standby
> 2015-11-20 20:59:48.917626 7f7012ff7700 -1 *** Caught signal (Aborted) **
>  in thread 7f7012ff7700
>
>  ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>  1: /usr/bin/ceph-osd() [0xac8a32]
>  2: (()+0xf130) [0x7f702d865130]
>  3: (gsignal()+0x37) [0x7f702c27f5d7]
>  4: (abort()+0x148) [0x7f702c280cc8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f702cb839b5]
>  6: (()+0x5e926) [0x7f702cb81926]
>  7: (()+0x5e953) [0x7f702cb81953]
>  8: (()+0x5eb73) [0x7f702cb81b73]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x27a) [0xbc9f7a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
> const*, long)+0x2d9) [0xaff1f9]
>  11: (ceph::HeartbeatMap::is_healthy()+0xde) [0xaffaee]
>  12: (OSD::handle_osd_ping(MOSDPing*)+0x733) [0x696c43]
>  13: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x697ebb]
>  14: (DispatchQueue::entry()+0x62a) [0xc84c9a]
>  15: (DispatchQueue::DispatchThread::entry()+0xd) [0xba81cd]
>  16: (()+0x7df5) [0x7f702d85ddf5]
>  17: (clone()+0x6d) [0x7f702c3401ad]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.
>
> - --- begin dump of recent events ---
>     -1> 2015-11-20 20:59:48.867197 7f6f95637700  0 --
> 10.217.89.30:6804/1028318 >> 10.217.89.12:6800/29050 pipe(0x2fdd0000
> sd=35 :57978 s=2 pgs=273 cs=1 l=0 c=0x419a9700).fault with nothing to
> send, going to standby
>      0> 2015-11-20 20:59:48.917626 7f7012ff7700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f7012ff7700
>
>  ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>  1: /usr/bin/ceph-osd() [0xac8a32]
>  2: (()+0xf130) [0x7f702d865130]
>  3: (gsignal()+0x37) [0x7f702c27f5d7]
>  4: (abort()+0x148) [0x7f702c280cc8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f702cb839b5]
>  6: (()+0x5e926) [0x7f702cb81926]
>  7: (()+0x5e953) [0x7f702cb81953]
>  8: (()+0x5eb73) [0x7f702cb81b73]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x27a) [0xbc9f7a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
> const*, long)+0x2d9) [0xaff1f9]
>  11: (ceph::HeartbeatMap::is_healthy()+0xde) [0xaffaee]
>  12: (OSD::handle_osd_ping(MOSDPing*)+0x733) [0x696c43]
>  13: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x697ebb]
>  14: (DispatchQueue::entry()+0x62a) [0xc84c9a]
>  15: (DispatchQueue::DispatchThread::entry()+0xd) [0xba81cd]
>  16: (()+0x7df5) [0x7f702d85ddf5]
>  17: (clone()+0x6d) [0x7f702c3401ad]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.
>
> Since we took the VMs off that client, we haven't had the problem show up again.

Yeah, we'd really need the actual log output that gets dumped to logs
on crash — it specifies precisely which thing failed.
-Greg

> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWU0yICRDmVDuy+mK58QAAyxcQAL7oA6TaXAEFLMzJRdO8
> nt1LgGe0Q+l+PXqCatmk1kAKh8YM/yss0xriGCPpiar0m8KhiQtzlWOXTExk
> DZIoYtFR7ZVzJCU2/1gQn8I/+tcYH7naxj2mCfyBuWz71wy1FFKfvdc/tUBx
> h8pQ7e1w3eQfLayDw7ir/iU+iFlh4918DY61cqdblyAu5ALVvbNM1hdqVBau
> nAwJsfIgtJyuzUXpxEk+TbH5VaZGwly1iJ2cVHvpPuSWhM0EzFGKsKYkHJbh
> /XPecqMepzH6W9YK6cgmcqqKcWQoNoPoTCVvpBBkgzBCz5QiNIUobRKEx9yL
> pQIy0eHlE7btLREEQRJ6jXXuvaBmLzVCHYiIBP68Efe5c9JU0+ZxmVjJ/H5b
> gKWfi6SC80VMVyLPNEV35p+SK2UAjhmsplxpxErEkSj8U/8YdC0TzwauKwYN
> k48ZiIWHfDN40cgcP/RuSZMuhfvqTSIyFifIGs5ADuDe47o3SIpI6rBt5MPs
> ebmbvAMTT/1ez/JQ9ugJ83QKiSgPD/Sw5YffMF1S+J4mMKOGEl8mfv8HFyjo
> J9chHcVYrQt8T3AaGKqJqwc4C4BKTGDm314Hf+iDxsROjMMzgtbGxGyQC7vv
> SQnpMsQjikIZKsI/9hoAentFe9f3/ks7GZH2aEbUNTzz+BIn5pXHSycdXwb6
> 1TxG
> =FmEY
> -----END PGP SIGNATURE-----
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Nov 23, 2015 at 10:17 AM, Gregory Farnum <gfarnum@redhat.com> wrote:
>> On Mon, Nov 23, 2015 at 11:03 AM, Robert LeBlanc <robert@leblancnet.us> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> The backtrace is:
>>>
>>> 2015-11-20 20:59:48.856679 7f7012ff7700 -1 common/HeartbeatMap.cc: In
>>> function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
>>> const char*, time_t)' thread 7f7012ff7700 time 2015-11-20
>>> 20:59:48.833166
>>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>>>
>>>  ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x85) [0xbc9d85]
>>>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
>>> const*, long)+0x2d9) [0xaff1f9]
>>>  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0xaffaee]
>>>  4: (OSD::handle_osd_ping(MOSDPing*)+0x733) [0x696c43]
>>>  5: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x697ebb]
>>>  6: (DispatchQueue::entry()+0x62a) [0xc84c9a]
>>>  7: (DispatchQueue::DispatchThread::entry()+0xd) [0xba81cd]
>>>  8: (()+0x7df5) [0x7f702d85ddf5]
>>>  9: (clone()+0x6d) [0x7f702c3401ad]
>>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.
>>>
>>> - --- begin dump of recent events ---
>>>
>>> We have had problems with Large Receive Offloads and KVM VMs before. I
>>> think this host just got missed, or maybe it is something different.
>>> I'm ok with a host having a hard time accessing the Ceph cluster. I'm
>>> a bit concerned if a misbehaving client can cause multiple OSDs to
>>> fault. It would be good if the OSD is resistant to things like this by
>>> compartmentalizing them to only those cilents/connections.
>>
>> Just this backtrace doesn't help much (something was slow, and it
>> timed out!), but there should be a log line including "had suicide
>> timed out after" just ahead of it (in that thread).
>> I guess it's vaguely possible the LRO got busted since the network
>> card on your client was dead? Not really anything we can do about that
>> though...
>>
>>>I'm attaching the entire OSD log in case it is useful.
>>
>> Uh, that doesn't have the backtrace in it.
>> -Greg
>>
>>>
>>> Thanks for taking a look at this.
>>>
>>> - ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Mon, Nov 23, 2015 at 9:03 AM, Gregory Farnum  wrote:
>>>> No, it shouldn't be able to just by having clock issues or whatever.
>>>> There *are* still some ways a malformed request can cause the OSDs to
>>>> crash, though — it looks like maybe this is a network card issue? That
>>>> could have maybe flipped some bits that broke stuff. What's the
>>>> backtrace on the OSDs?
>>>> -Greg
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: Mailvelope v1.2.3
>>> Comment: https://www.mailvelope.com
>>>
>>> wsFcBAEBCAAQBQJWU0bgCRDmVDuy+mK58QAAcysP/1xI6paI89WDozrmE2sY
>>> ehaF4sZsyy6y6mizsp+g7dXErNXtCIRQIg+LDjtS+SOnni+Z/XAhmLlCb5xM
>>> tid3xqQhQPLD66QhFQsxEGQxvWI5urqHnGWRhpbjpz8Xa0ReAHYCLj8K6hh0
>>> f7FHyqEjsEDtcqrk3+EI6bklBW7xgJy4zHQG+0MiZarzh5gSXvEpxrXo2KIr
>>> qBUcEE585jddVhvEv+VQVuBagQlBEMLo4RTz+5mdwneijIGAIQlOUCXVTogp
>>> d6aLaVQyCNMiAblJoFzr/UeV7E5ajQzd4QZ5i9H7ZD1sCwWMdV/pQNyYoDWk
>>> 3dBQXeYrkU2KlH14iKOJa1jxAPWg9mnnsguesir1aWunR+LamL2tbBlgXcXG
>>> 0NjIfl7q0yMm89jb7/JVAr8nyp3gOHdNaPRfd8FTilYoLGJFEB1j25q2qlBP
>>> 8IBSZbldXlXi9HB78cU3/I2o44CsrPPzZgN0iJ0fT7mbRPujkZbsdk3SbFtu
>>> eG1dXsZLNdSOgll5gSj11U8Kt4HvkF9dhatmqYeyZGFeBHOJqKhi0dw6yZ2T
>>> sSFPsHRNt6vbc8ckF4NqyFyPTK5PTSqB8TdLiZXW8vHvWooxNOtdCFgjQtNY
>>> kdb1kLsNW/z5dgE218kvwUnAObXaB9RkEJ47xi9o2FbVya+eHMYdM0JaEYxt
>>> I48o
>>> =Uufa
>>> -----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 17:33         ` Gregory Farnum
@ 2015-11-23 18:03           ` Robert LeBlanc
  2015-11-23 18:14             ` Gregory Farnum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2015-11-23 18:03 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 2398 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

This is one of our production clusters which is dual 40 Gb Ethernet
using VLANs for cluster and public networks. I don't think this is
unusual, not like my dev cluster which runs Infiniband and IPoIB. The
client nodes are connected at 10 GB Ethernet.

I wonder if you are talking about the system logs, not the Ceph OSD
logs. I'm attaching a snippet that includes the hour before and after.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Nov 23, 2015 at 10:33 AM, Gregory Farnum  wrote:
> On Mon, Nov 23, 2015 at 11:27 AM, Robert LeBlanc  wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> I checked the SAR data and the disks for all the OSDs showed usual
>> performance until 20:57:32 when over the next few minutes the I/OPs,
>> bandwidth and latency all decreased. The only thing that I can think
>> of is that some replies to the client got hung up and backed up the
>> OSD process or something.
>
> That shouldn't really be possible but I seem to recall you've got a
> weird network? So maybe.
>
>> There are a couple of other backtraces in
>> the log file, but I could not trace any of them to something useful.
>>
>> Since we took the VMs off that client, we haven't had the problem show up again.
>
> Yeah, we'd really need the actual log output that gets dumped to logs
> on crash — it specifies precisely which thing failed.
> -Greg

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWU1UGCRDmVDuy+mK58QAAOa0P/RIAO06Fd3myuzyyqlYo
N2VA9bWGaq06iwTLF1mufiEmbVaIPAIAQk+GaODgv/PKSJj6ecqS1/au832d
oO2LocnreeOTLJPL/n+mdeglos63ocwyvM4LP/XpvWJJ1C694mUWjvIxlWKR
4zFXH9V5DMTmCwm3kkY4qXqNUS/FJZyd5fwOg7NnqSzuy2UHIxEOzjGaKUwf
ipgVgy8iIn5tprx/rCawrYvuY141z4nOu1jIzEkXEa+F7pxfpKsXeKFQvEnw
aax/RNuikhLKu6rbCJKCQWL3uUZzrshp6EE3T/uXDP8rMX1ojOcmL1L1bJhh
4XqNdgXYuUXlP2cJtJSfxy7RFayZIw4Htn3YnWCrg7uqzrfwf2Hh2DGAE+06
ggH7qo9Z99hg7ENTDSzpFOyE5eM+oA8OQgpn+/8X7OyNG/eNwJnBlHTT0C+f
LunPV8I4HjRAuCNpkz16ZO/+pLnMAbk/Vp1wGJ3Qcdmxwk1UQ3L+UKASrwWd
S861pU4GOGoRymcse20DDRaChbhQRmK0nxjFq4/YXIo36lbMH2gcXyuAza5z
oFvmEkGwDoYneL0JZHJdHhRqkapMMMRqODC/2YU2EXa3fYatamKCwaHqPSdp
c0BN/yRFlB74RA7szvItUHORyiROxo/MnmGKlCBUNud0cVbBoyzSwfSBwCN1
zA7x
=g7l3
-----END PGP SIGNATURE-----

[-- Attachment #2: messages-20151122.snip.log.gz --]
[-- Type: application/x-gzip, Size: 17920 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 18:03           ` Robert LeBlanc
@ 2015-11-23 18:14             ` Gregory Farnum
  2015-11-23 19:03               ` Robert LeBlanc
  0 siblings, 1 reply; 15+ messages in thread
From: Gregory Farnum @ 2015-11-23 18:14 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: ceph-devel

On Mon, Nov 23, 2015 at 12:03 PM, Robert LeBlanc <robert@leblancnet.us> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> This is one of our production clusters which is dual 40 Gb Ethernet
> using VLANs for cluster and public networks. I don't think this is
> unusual, not like my dev cluster which runs Infiniband and IPoIB. The
> client nodes are connected at 10 GB Ethernet.
>
> I wonder if you are talking about the system logs, not the Ceph OSD
> logs. I'm attaching a snippet that includes the hour before and after.

Nope, I meant the OSD logs. Whenever they crash, it should dump out
the last 10000 in-memory log entries — the one you sent along didn't
have a crash included at all. The exact system which timed out will
certainly be in those log entries (it's output at level 1, so unless
you manually turned everything to 0, it'll show up on a crash.)

Anyway, I wouldn't expect that cluster config to have any issues with
a client dying since it's TCP over ethernet, but I have seen some
weird behaviors out of bonded NICs when one of them dies, so maybe.
-Greg

> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Nov 23, 2015 at 10:33 AM, Gregory Farnum  wrote:
>> On Mon, Nov 23, 2015 at 11:27 AM, Robert LeBlanc  wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> I checked the SAR data and the disks for all the OSDs showed usual
>>> performance until 20:57:32 when over the next few minutes the I/OPs,
>>> bandwidth and latency all decreased. The only thing that I can think
>>> of is that some replies to the client got hung up and backed up the
>>> OSD process or something.
>>
>> That shouldn't really be possible but I seem to recall you've got a
>> weird network? So maybe.
>>
>>> There are a couple of other backtraces in
>>> the log file, but I could not trace any of them to something useful.
>>>
>>> Since we took the VMs off that client, we haven't had the problem show up again.
>>
>> Yeah, we'd really need the actual log output that gets dumped to logs
>> on crash — it specifies precisely which thing failed.
>> -Greg
>
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWU1UGCRDmVDuy+mK58QAAOa0P/RIAO06Fd3myuzyyqlYo
> N2VA9bWGaq06iwTLF1mufiEmbVaIPAIAQk+GaODgv/PKSJj6ecqS1/au832d
> oO2LocnreeOTLJPL/n+mdeglos63ocwyvM4LP/XpvWJJ1C694mUWjvIxlWKR
> 4zFXH9V5DMTmCwm3kkY4qXqNUS/FJZyd5fwOg7NnqSzuy2UHIxEOzjGaKUwf
> ipgVgy8iIn5tprx/rCawrYvuY141z4nOu1jIzEkXEa+F7pxfpKsXeKFQvEnw
> aax/RNuikhLKu6rbCJKCQWL3uUZzrshp6EE3T/uXDP8rMX1ojOcmL1L1bJhh
> 4XqNdgXYuUXlP2cJtJSfxy7RFayZIw4Htn3YnWCrg7uqzrfwf2Hh2DGAE+06
> ggH7qo9Z99hg7ENTDSzpFOyE5eM+oA8OQgpn+/8X7OyNG/eNwJnBlHTT0C+f
> LunPV8I4HjRAuCNpkz16ZO/+pLnMAbk/Vp1wGJ3Qcdmxwk1UQ3L+UKASrwWd
> S861pU4GOGoRymcse20DDRaChbhQRmK0nxjFq4/YXIo36lbMH2gcXyuAza5z
> oFvmEkGwDoYneL0JZHJdHhRqkapMMMRqODC/2YU2EXa3fYatamKCwaHqPSdp
> c0BN/yRFlB74RA7szvItUHORyiROxo/MnmGKlCBUNud0cVbBoyzSwfSBwCN1
> zA7x
> =g7l3
> -----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 18:14             ` Gregory Farnum
@ 2015-11-23 19:03               ` Robert LeBlanc
  2015-11-23 19:12                 ` Sage Weil
  2015-11-23 19:14                 ` Mark Nelson
  0 siblings, 2 replies; 15+ messages in thread
From: Robert LeBlanc @ 2015-11-23 19:03 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We set the debugging to 0/0, but are you talking about lines like:

   -12> 2015-11-20 20:59:47.138746 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.133 since back 2015-11-20
20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
20:59:27.138720)
   -11> 2015-11-20 20:59:47.138749 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.136 since back 2015-11-20
20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
20:59:27.138720)
   -10> 2015-11-20 20:59:47.138751 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.139 since back 2015-11-20
20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
20:59:27.138720)
    -9> 2015-11-20 20:59:47.138758 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.147 since back 2015-11-20
20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
20:59:27.138720)
    -8> 2015-11-20 20:59:47.138761 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.159 since back 2015-11-20
20:58:51.427880 front 2015-11-20 20:58:51.427880 (cutoff 2015-11-20
20:59:27.138720)
    -7> 2015-11-20 20:59:47.138789 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.170 since back 2015-11-20
20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
20:59:27.138720)
    -6> 2015-11-20 20:59:47.138794 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.175 since back 2015-11-20
20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
20:59:27.138720)

There are 10,000 of those lines in the OSD log which shows all the
logs up to the crash. Unless setting the value to 0/0 is eliminating
what you are looking for. I've been wondering if setting it to 0/1 or
0/5 or even 0/20 has any runtime performance penalty? It seems like
more detailed info on crashes would be helpful, but we don't want to
write too much to the SATADOMs.

We do have the NICs bonded all across our environment.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Nov 23, 2015 at 11:14 AM, Gregory Farnum  wrote:
> On Mon, Nov 23, 2015 at 12:03 PM, Robert LeBlanc  wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> This is one of our production clusters which is dual 40 Gb Ethernet
>> using VLANs for cluster and public networks. I don't think this is
>> unusual, not like my dev cluster which runs Infiniband and IPoIB. The
>> client nodes are connected at 10 GB Ethernet.
>>
>> I wonder if you are talking about the system logs, not the Ceph OSD
>> logs. I'm attaching a snippet that includes the hour before and after.
>
> Nope, I meant the OSD logs. Whenever they crash, it should dump out
> the last 10000 in-memory log entries — the one you sent along didn't
> have a crash included at all. The exact system which timed out will
> certainly be in those log entries (it's output at level 1, so unless
> you manually turned everything to 0, it'll show up on a crash.)
>
> Anyway, I wouldn't expect that cluster config to have any issues with
> a client dying since it's TCP over ethernet, but I have seen some
> weird behaviors out of bonded NICs when one of them dies, so maybe.
> -Greg
>
>> - ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWU2LkCRDmVDuy+mK58QAA2EUP/22eOBNzAYDV5lGI4J9Z
wnSZE39UycEfo8e6v8cfikLdAUT7fbY8HBq+VPylLo7OtxA+sGwgjrcz3hzu
azRi9QuCeWNm+squPQpgISzXWnpDtSjlsA+7iQb+HJGW7/kcR+opixzMX/W5
AE0Z/hrRwImw3r7Ze3Avl/j+l7iamUznfZAnaBdeWyle7Nge/D8kV+QJSeHe
/zXDoWW8wPNiRwU/puJrH/GEzyYVZFZ4F9aPUKf9rXsp0chK5k55yysI8ABL
CfBLtZ1yXPbD20knMdEyuQrDXWMGQplQ+7Z2qFAKsbp+qMFGNqeIbtA6xmbM
+8RIXT5hTLmgH6lVLYFbk6wgiSphxTVFrkR4Bm6NzFHnloxZ3KuU1pqOZf2k
iJZ8eDPfUxuforHO2L8TWMDWAsrqTm5A2u0GFtvm7uPWvxWo6sv08sq5IICD
C75mnCRUIDGl/bQLxt06qvq7WwAtezwnNcwCth3kDFFS85WTgZGEtPgpFizt
IpBQI4ustiT6lNmYQr6V2cj4HT1G8YBT1ykKwSYmsbRnT2PWGQc7IJ11DxgC
E7i0c6UYcOMpWT18t+RTOzvv8AZGpna2X/xTJSPL2H10zIkiuXAwO/gZQ5oa
mgN/3fdhcki8q7uWbZaBCNtv814sZIoTzQy7C7kApQdxFu+kbe5LHRhHZJbZ
CExf
=cjG0
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 19:03               ` Robert LeBlanc
@ 2015-11-23 19:12                 ` Sage Weil
  2015-11-23 19:29                   ` Robert LeBlanc
  2015-11-23 19:14                 ` Mark Nelson
  1 sibling, 1 reply; 15+ messages in thread
From: Sage Weil @ 2015-11-23 19:12 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: Gregory Farnum, ceph-devel

On Mon, 23 Nov 2015, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> We set the debugging to 0/0, but are you talking about lines like:
> 
>    -12> 2015-11-20 20:59:47.138746 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.133 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>    -11> 2015-11-20 20:59:47.138749 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.136 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>    -10> 2015-11-20 20:59:47.138751 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.139 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>     -9> 2015-11-20 20:59:47.138758 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.147 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>     -8> 2015-11-20 20:59:47.138761 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.159 since back 2015-11-20
> 20:58:51.427880 front 2015-11-20 20:58:51.427880 (cutoff 2015-11-20
> 20:59:27.138720)
>     -7> 2015-11-20 20:59:47.138789 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.170 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>     -6> 2015-11-20 20:59:47.138794 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.175 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
> 
> There are 10,000 of those lines in the OSD log which shows all the
> logs up to the crash. Unless setting the value to 0/0 is eliminating
> what you are looking for. I've been wondering if setting it to 0/1 or
> 0/5 or even 0/20 has any runtime performance penalty? It seems like
> more detailed info on crashes would be helpful, but we don't want to
> write too much to the SATADOMs.

There is a performance impact but no disk IO (logs are accumulated in 
memory and only flushed out on a crash).

sage



> 
> We do have the NICs bonded all across our environment.
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Mon, Nov 23, 2015 at 11:14 AM, Gregory Farnum  wrote:
> > On Mon, Nov 23, 2015 at 12:03 PM, Robert LeBlanc  wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA256
> >>
> >> This is one of our production clusters which is dual 40 Gb Ethernet
> >> using VLANs for cluster and public networks. I don't think this is
> >> unusual, not like my dev cluster which runs Infiniband and IPoIB. The
> >> client nodes are connected at 10 GB Ethernet.
> >>
> >> I wonder if you are talking about the system logs, not the Ceph OSD
> >> logs. I'm attaching a snippet that includes the hour before and after.
> >
> > Nope, I meant the OSD logs. Whenever they crash, it should dump out
> > the last 10000 in-memory log entries ? the one you sent along didn't
> > have a crash included at all. The exact system which timed out will
> > certainly be in those log entries (it's output at level 1, so unless
> > you manually turned everything to 0, it'll show up on a crash.)
> >
> > Anyway, I wouldn't expect that cluster config to have any issues with
> > a client dying since it's TCP over ethernet, but I have seen some
> > weird behaviors out of bonded NICs when one of them dies, so maybe.
> > -Greg
> >
> >> - ----------------
> >> Robert LeBlanc
> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWU2LkCRDmVDuy+mK58QAA2EUP/22eOBNzAYDV5lGI4J9Z
> wnSZE39UycEfo8e6v8cfikLdAUT7fbY8HBq+VPylLo7OtxA+sGwgjrcz3hzu
> azRi9QuCeWNm+squPQpgISzXWnpDtSjlsA+7iQb+HJGW7/kcR+opixzMX/W5
> AE0Z/hrRwImw3r7Ze3Avl/j+l7iamUznfZAnaBdeWyle7Nge/D8kV+QJSeHe
> /zXDoWW8wPNiRwU/puJrH/GEzyYVZFZ4F9aPUKf9rXsp0chK5k55yysI8ABL
> CfBLtZ1yXPbD20knMdEyuQrDXWMGQplQ+7Z2qFAKsbp+qMFGNqeIbtA6xmbM
> +8RIXT5hTLmgH6lVLYFbk6wgiSphxTVFrkR4Bm6NzFHnloxZ3KuU1pqOZf2k
> iJZ8eDPfUxuforHO2L8TWMDWAsrqTm5A2u0GFtvm7uPWvxWo6sv08sq5IICD
> C75mnCRUIDGl/bQLxt06qvq7WwAtezwnNcwCth3kDFFS85WTgZGEtPgpFizt
> IpBQI4ustiT6lNmYQr6V2cj4HT1G8YBT1ykKwSYmsbRnT2PWGQc7IJ11DxgC
> E7i0c6UYcOMpWT18t+RTOzvv8AZGpna2X/xTJSPL2H10zIkiuXAwO/gZQ5oa
> mgN/3fdhcki8q7uWbZaBCNtv814sZIoTzQy7C7kApQdxFu+kbe5LHRhHZJbZ
> CExf
> =cjG0
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 19:03               ` Robert LeBlanc
  2015-11-23 19:12                 ` Sage Weil
@ 2015-11-23 19:14                 ` Mark Nelson
  2015-11-23 19:32                   ` Robert LeBlanc
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Nelson @ 2015-11-23 19:14 UTC (permalink / raw)
  To: Robert LeBlanc, Gregory Farnum; +Cc: ceph-devel

FWIW, if you've got collectl per-process logs, you might look for major 
pagefaults associated with the osd processes.  I've seen process 
swapping cause heartbeat timeouts in the past.  Not to say that's the 
issue, but worth confirming it's not happening.

Mark

On 11/23/2015 01:03 PM, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> We set the debugging to 0/0, but are you talking about lines like:
>
>     -12> 2015-11-20 20:59:47.138746 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.133 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>     -11> 2015-11-20 20:59:47.138749 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.136 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>     -10> 2015-11-20 20:59:47.138751 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.139 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>      -9> 2015-11-20 20:59:47.138758 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.147 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>      -8> 2015-11-20 20:59:47.138761 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.159 since back 2015-11-20
> 20:58:51.427880 front 2015-11-20 20:58:51.427880 (cutoff 2015-11-20
> 20:59:27.138720)
>      -7> 2015-11-20 20:59:47.138789 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.170 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>      -6> 2015-11-20 20:59:47.138794 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.175 since back 2015-11-20
> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
> 20:59:27.138720)
>
> There are 10,000 of those lines in the OSD log which shows all the
> logs up to the crash. Unless setting the value to 0/0 is eliminating
> what you are looking for. I've been wondering if setting it to 0/1 or
> 0/5 or even 0/20 has any runtime performance penalty? It seems like
> more detailed info on crashes would be helpful, but we don't want to
> write too much to the SATADOMs.
>
> We do have the NICs bonded all across our environment.
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Nov 23, 2015 at 11:14 AM, Gregory Farnum  wrote:
>> On Mon, Nov 23, 2015 at 12:03 PM, Robert LeBlanc  wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> This is one of our production clusters which is dual 40 Gb Ethernet
>>> using VLANs for cluster and public networks. I don't think this is
>>> unusual, not like my dev cluster which runs Infiniband and IPoIB. The
>>> client nodes are connected at 10 GB Ethernet.
>>>
>>> I wonder if you are talking about the system logs, not the Ceph OSD
>>> logs. I'm attaching a snippet that includes the hour before and after.
>>
>> Nope, I meant the OSD logs. Whenever they crash, it should dump out
>> the last 10000 in-memory log entries — the one you sent along didn't
>> have a crash included at all. The exact system which timed out will
>> certainly be in those log entries (it's output at level 1, so unless
>> you manually turned everything to 0, it'll show up on a crash.)
>>
>> Anyway, I wouldn't expect that cluster config to have any issues with
>> a client dying since it's TCP over ethernet, but I have seen some
>> weird behaviors out of bonded NICs when one of them dies, so maybe.
>> -Greg
>>
>>> - ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWU2LkCRDmVDuy+mK58QAA2EUP/22eOBNzAYDV5lGI4J9Z
> wnSZE39UycEfo8e6v8cfikLdAUT7fbY8HBq+VPylLo7OtxA+sGwgjrcz3hzu
> azRi9QuCeWNm+squPQpgISzXWnpDtSjlsA+7iQb+HJGW7/kcR+opixzMX/W5
> AE0Z/hrRwImw3r7Ze3Avl/j+l7iamUznfZAnaBdeWyle7Nge/D8kV+QJSeHe
> /zXDoWW8wPNiRwU/puJrH/GEzyYVZFZ4F9aPUKf9rXsp0chK5k55yysI8ABL
> CfBLtZ1yXPbD20knMdEyuQrDXWMGQplQ+7Z2qFAKsbp+qMFGNqeIbtA6xmbM
> +8RIXT5hTLmgH6lVLYFbk6wgiSphxTVFrkR4Bm6NzFHnloxZ3KuU1pqOZf2k
> iJZ8eDPfUxuforHO2L8TWMDWAsrqTm5A2u0GFtvm7uPWvxWo6sv08sq5IICD
> C75mnCRUIDGl/bQLxt06qvq7WwAtezwnNcwCth3kDFFS85WTgZGEtPgpFizt
> IpBQI4ustiT6lNmYQr6V2cj4HT1G8YBT1ykKwSYmsbRnT2PWGQc7IJ11DxgC
> E7i0c6UYcOMpWT18t+RTOzvv8AZGpna2X/xTJSPL2H10zIkiuXAwO/gZQ5oa
> mgN/3fdhcki8q7uWbZaBCNtv814sZIoTzQy7C7kApQdxFu+kbe5LHRhHZJbZ
> CExf
> =cjG0
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 19:12                 ` Sage Weil
@ 2015-11-23 19:29                   ` Robert LeBlanc
  2015-11-23 19:32                     ` Sage Weil
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2015-11-23 19:29 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Is there a way through the admin socket or inject args that can tell
the OSD process to dump the in memory logs without crashing? Do you
have an idea of the overhead? From the code it looks like it is always
evaluated, just depends on if it is stored in memory or dumped to
disk. I'm trying to figure out an issue with dout() right now in the
code I'm working on (invalid use of static member) and I'm trying to
understand how it works.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Nov 23, 2015 at 12:12 PM, Sage Weil  wrote:
> On Mon, 23 Nov 2015, Robert LeBlanc wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> We set the debugging to 0/0, but are you talking about lines like:
>>
>>    -12> 2015-11-20 20:59:47.138746 7f70067de700 -1 osd.177 103793
>> heartbeat_check: no reply from osd.133 since back 2015-11-20
>> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
>> 20:59:27.138720)
>>    -11> 2015-11-20 20:59:47.138749 7f70067de700 -1 osd.177 103793
>> heartbeat_check: no reply from osd.136 since back 2015-11-20
>> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
>> 20:59:27.138720)
>>    -10> 2015-11-20 20:59:47.138751 7f70067de700 -1 osd.177 103793
>> heartbeat_check: no reply from osd.139 since back 2015-11-20
>> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
>> 20:59:27.138720)
>>     -9> 2015-11-20 20:59:47.138758 7f70067de700 -1 osd.177 103793
>> heartbeat_check: no reply from osd.147 since back 2015-11-20
>> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
>> 20:59:27.138720)
>>     -8> 2015-11-20 20:59:47.138761 7f70067de700 -1 osd.177 103793
>> heartbeat_check: no reply from osd.159 since back 2015-11-20
>> 20:58:51.427880 front 2015-11-20 20:58:51.427880 (cutoff 2015-11-20
>> 20:59:27.138720)
>>     -7> 2015-11-20 20:59:47.138789 7f70067de700 -1 osd.177 103793
>> heartbeat_check: no reply from osd.170 since back 2015-11-20
>> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
>> 20:59:27.138720)
>>     -6> 2015-11-20 20:59:47.138794 7f70067de700 -1 osd.177 103793
>> heartbeat_check: no reply from osd.175 since back 2015-11-20
>> 20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutoff 2015-11-20
>> 20:59:27.138720)
>>
>> There are 10,000 of those lines in the OSD log which shows all the
>> logs up to the crash. Unless setting the value to 0/0 is eliminating
>> what you are looking for. I've been wondering if setting it to 0/1 or
>> 0/5 or even 0/20 has any runtime performance penalty? It seems like
>> more detailed info on crashes would be helpful, but we don't want to
>> write too much to the SATADOMs.
>
> There is a performance impact but no disk IO (logs are accumulated in
> memory and only flushed out on a crash).
>
> sage

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWU2kWCRDmVDuy+mK58QAAIgUQALpFu8+tdK3+oEktPy5t
J8JTqp/XBRBeb80nvQTBi4ePt5T6O0mDTtbiGE7mcHjNR4Nh/a30CQmWeO//
yRZ3fX+iv4Q2yAzhOArTnYhGPHVwo0mWPNHmvCAlkeLqZ8KAmYzNOaHSU+C0
aJKe7krtaGC/bJC5nYqp/uQza9++3OL9acI8ZnqbfdXAFDRrXIdyjfdg26+h
XJe27ietL83ZyOmtYq0NUaFyrxR14x0prvJhZpqLKuufvKoqGSd/DO6/+mZx
3Gr+w9erhBKdd5Wed454pIWw5AGvoqmIJySfcnqvbdS2M9DhDG4Cl+3Hdu/X
5RQiX//zS4Wq2ego2qISjt00X3ul+4RKOUlfKApQ1ATsLOKR6OWYlgwcSRo9
UWtU5A8cSKctqE+w1ltHW7dQ7D7vxuTxgHmMQi5j76MVvWzg9Rdw0V/IJOvk
vn9CWxpkXKcZIEadaEMx6hHfflW01Z3/6DUq8qpXpJtdbLGyzZcqCzqOEc4R
/o96otd14AXLdjokg8HNJ8FLa9hSd1vLCosm0bRRPLpN9JP5qyOGjeSkemaO
7MjwIubog5eOStsMuIhfsFOsUMttpWyL+BQmAh5YwObkepJl7w0u2IhBV3OB
f+jglWvwHdTPnSQ236gI+KdFTBv+jkoazyvmqviYuCQRM5RKiqQB7e5a5Wsc
va31
=h1p4
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 19:29                   ` Robert LeBlanc
@ 2015-11-23 19:32                     ` Sage Weil
  2015-11-23 19:37                       ` Robert LeBlanc
  0 siblings, 1 reply; 15+ messages in thread
From: Sage Weil @ 2015-11-23 19:32 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: Gregory Farnum, ceph-devel

On Mon, 23 Nov 2015, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Is there a way through the admin socket or inject args that can tell
> the OSD process to dump the in memory logs without crashing? Do you

Yep, 'ceph daemon osd.NN log dump'.

> have an idea of the overhead? From the code it looks like it is always
> evaluated, just depends on if it is stored in memory or dumped to
> disk. I'm trying to figure out an issue with dout() right now in the
> code I'm working on (invalid use of static member) and I'm trying to
> understand how it works.

What's the error and problematic line?

sage

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 19:14                 ` Mark Nelson
@ 2015-11-23 19:32                   ` Robert LeBlanc
  0 siblings, 0 replies; 15+ messages in thread
From: Robert LeBlanc @ 2015-11-23 19:32 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Gregory Farnum, ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I saw posts about that in the mailing lists. According to SAR, there
wasn't an abnormal amount of page faults. We have swap disabled and
have min_kbytes_free set to 6GB which has worked well for us so far.
We kicked around still setting swappiness to 10 (should help be more
aggressive on freeing up memory), but decided to migrate the VMs
first.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Nov 23, 2015 at 12:14 PM, Mark Nelson  wrote:
> FWIW, if you've got collectl per-process logs, you might look for major
> pagefaults associated with the osd processes.  I've seen process swapping
> cause heartbeat timeouts in the past.  Not to say that's the issue, but
> worth confirming it's not happening.
>
> Mark

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWU2nICRDmVDuy+mK58QAASAAP/ixjjqyg7a0tHpgt+Nn4
lSBUyAk5AQrvT0hv7cyMi5tt4g/R5ur05jeIJUKcrrKhroInO4s20B6cErwl
tRPJwVjFgvsmDxxvZIbqxwpZgPu1GpzHalV3XnNU0ez+pFIWG7Cu1dEFxezI
5hccNSy6afoIlC5wWCMb4I12bvpijtvhAP43pauFIDheUg0Eev4mbDCbSfU/
WuzzNiK9EZXnHcF1LVsqJtumT5PmcvmMktWFLm8BvhJk5k7ZZr7/27hgmjSK
aNddnld2RZGomDmCttUxERNeIV0I/xg0h39gc8kTc6P9buiGpABwfTlUCRhz
xI1iagmnSuV5hkCWCMc0KcZtSBdAeH1kNgkbxFI3qr1/AwPVVvAssUIrBBlV
spL+z2e/DKqNf5Hq4WHpc5GdJsBDlETFDaLYc6LquAzrYN/Bt1XbXD2pP+R1
JRNJITSrTzKcXxCFYVdpDKPzvmtojh8aVGr7LsnJYdrUTgNwYn5MBdoermjH
xKoHuP91Rmr1+0aMABU+2AD1LZ2+xY6SIOV3TDCJSUk/gLVvSwLuTOTNKbtP
3tdanJhqamh1QCX9ze4Wb96mLTCWmn0SFBzPb0dlHXCcwUinGg76po79tq0L
jTpI6Ont33fRwxC4R6lfYucanYMuDQxnrF3rrUad9K3cx8EfpCruHd6PE0Yc
iRYm
=b7ZB
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 19:32                     ` Sage Weil
@ 2015-11-23 19:37                       ` Robert LeBlanc
  2015-11-23 19:54                         ` Sage Weil
  0 siblings, 1 reply; 15+ messages in thread
From: Robert LeBlanc @ 2015-11-23 19:37 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Thanks for the log dump command, I'll keep that in the back pocket, it
would have been helpful in a few situations.

I'm trying to microbenchmark the new Weighted Round Robin queue I've
been working on and just trying to dump the info to the logs so that I
can see it at runtime. So this is in a branch that isn't published
yet.

In file included from osd/OSD.cc:37:0:
osd/OSD.h: In member function ‘virtual void
OSD::ShardedOpWQ::_process(uint32_t, ceph::heartbeat_handle_d*)’:
osd/OSD.h:1072:7: error: invalid use of non-static data member ‘OSD::whoami’
   int whoami;
       ^
osd/OSD.cc:8270:388: error: from this location
   dout(15) << "Wrr (" << dendl;


               ^
osd/OSD.cc:8270:413: error: cannot call member function ‘epoch_t
OSD::get_osdmap_epoch()’ without object
   dout(15) << "Wrr (" << dendl;


                                        ^
In file included from osd/OSD.cc:37:0:
osd/OSD.h: In member function ‘virtual void
OSD::ShardedOpWQ::_enqueue(std::pair<boost::intrusive_ptr<PG>,
PGQueueable>)’:
osd/OSD.h:1072:7: error: invalid use of non-static data member ‘OSD::whoami’
   int whoami;
       ^
osd/OSD.cc:8361:388: error: from this location
   dout(15) << "Wrr (" << std::this_thread::get_id() << ") " <<
sdata->pqueue.get_queue() <<


               ^
osd/OSD.cc:8361:413: error: cannot call member function ‘epoch_t
OSD::get_osdmap_epoch()’ without object
   dout(15) << "Wrr (" << std::this_thread::get_id() << ") " <<
sdata->pqueue.get_queue() <<


The second set of errors just gets a string from the get_queue()
function, but the problem seems to be with just "dout(15) << "Wrr ("
<< dendl;" I would think that there is already an object of the OSD
created, so I'm a bit confused. Thanks for helping with a simple
issue.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Nov 23, 2015 at 12:32 PM, Sage Weil <sweil@redhat.com> wrote:
> On Mon, 23 Nov 2015, Robert LeBlanc wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Is there a way through the admin socket or inject args that can tell
>> the OSD process to dump the in memory logs without crashing? Do you
>
> Yep, 'ceph daemon osd.NN log dump'.
>
>> have an idea of the overhead? From the code it looks like it is always
>> evaluated, just depends on if it is stored in memory or dumped to
>> disk. I'm trying to figure out an issue with dout() right now in the
>> code I'm working on (invalid use of static member) and I'm trying to
>> understand how it works.
>
> What's the error and problematic line?
>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Multiple OSDs suicide because of client issues?
  2015-11-23 19:37                       ` Robert LeBlanc
@ 2015-11-23 19:54                         ` Sage Weil
  0 siblings, 0 replies; 15+ messages in thread
From: Sage Weil @ 2015-11-23 19:54 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: ceph-devel

On Mon, 23 Nov 2015, Robert LeBlanc wrote:
> Thanks for the log dump command, I'll keep that in the back pocket, it
> would have been helpful in a few situations.
> 
> I'm trying to microbenchmark the new Weighted Round Robin queue I've
> been working on and just trying to dump the info to the logs so that I
> can see it at runtime. So this is in a branch that isn't published
> yet.
> 
> In file included from osd/OSD.cc:37:0:
> osd/OSD.h: In member function ?virtual void
> OSD::ShardedOpWQ::_process(uint32_t, ceph::heartbeat_handle_d*)?:
> osd/OSD.h:1072:7: error: invalid use of non-static data member ?OSD::whoami?
>    int whoami;
>        ^
> osd/OSD.cc:8270:388: error: from this location
>    dout(15) << "Wrr (" << dendl;

#undef
#define dout_prefix *_dout << "something: "

(whatever dout_perfix currently is for this code includes whoami... you're 
probably in a class other than OSD but still in OSD.cc.. move it to a 
different .cc file, or put it above the current class OSD stuff.

sage

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-23 19:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-21  7:34 Multiple OSDs suicide because of client issues? Robert LeBlanc
2015-11-23 16:03 ` Gregory Farnum
     [not found]   ` <CAANLjFo=vCsny5=JW1wYiQk5S=oXdtVd0OzXEC=uTGgmDO9ydA@mail.gmail.com>
2015-11-23 17:17     ` Gregory Farnum
2015-11-23 17:27       ` Robert LeBlanc
2015-11-23 17:33         ` Gregory Farnum
2015-11-23 18:03           ` Robert LeBlanc
2015-11-23 18:14             ` Gregory Farnum
2015-11-23 19:03               ` Robert LeBlanc
2015-11-23 19:12                 ` Sage Weil
2015-11-23 19:29                   ` Robert LeBlanc
2015-11-23 19:32                     ` Sage Weil
2015-11-23 19:37                       ` Robert LeBlanc
2015-11-23 19:54                         ` Sage Weil
2015-11-23 19:14                 ` Mark Nelson
2015-11-23 19:32                   ` Robert LeBlanc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.