* RHEL 6.5 shared library upgrade safety @ 2014-08-18 11:57 Loic Dachary 2014-08-18 12:11 ` Wido den Hollander 2014-08-18 15:17 ` Sage Weil 0 siblings, 2 replies; 6+ messages in thread From: Loic Dachary @ 2014-08-18 11:57 UTC (permalink / raw) To: Ceph Development [-- Attachment #1: Type: text/plain, Size: 440 bytes --] Hi Ceph, In RHEL 6.5, is the following scenario possible : a) an OSD dlopen a shared library for erasure-code, b) the shared library file is replaced while the OSD is running, c) the OSD starts using the new file instead of the old one. It seems unlikely but it would explain a weird stack trace at http://tracker.ceph.com/issues/9153#note-5 so I'm double checking ;-) Cheers -- Loïc Dachary, Artisan Logiciel Libre [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RHEL 6.5 shared library upgrade safety 2014-08-18 11:57 RHEL 6.5 shared library upgrade safety Loic Dachary @ 2014-08-18 12:11 ` Wido den Hollander 2014-08-18 14:06 ` Loic Dachary 2014-08-18 15:17 ` Sage Weil 1 sibling, 1 reply; 6+ messages in thread From: Wido den Hollander @ 2014-08-18 12:11 UTC (permalink / raw) To: Loic Dachary, Ceph Development On 08/18/2014 01:57 PM, Loic Dachary wrote: > Hi Ceph, > > In RHEL 6.5, is the following scenario possible : > > a) an OSD dlopen a shared library for erasure-code, > b) the shared library file is replaced while the OSD is running, > c) the OSD starts using the new file instead of the old one. > > It seems unlikely but it would explain a weird stack trace at http://tracker.ceph.com/issues/9153#note-5 so I'm double checking ;-) > Well, it could be that it does so. I'm not 100% sure, but afaik it could happen that when you replace a library certain parts might not be in memory. See: http://stackoverflow.com/questions/7767325/replacing-shared-object-so-file-while-main-program-is-running > Cheers > -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RHEL 6.5 shared library upgrade safety 2014-08-18 12:11 ` Wido den Hollander @ 2014-08-18 14:06 ` Loic Dachary 0 siblings, 0 replies; 6+ messages in thread From: Loic Dachary @ 2014-08-18 14:06 UTC (permalink / raw) To: Wido den Hollander, Ceph Development [-- Attachment #1: Type: text/plain, Size: 1914 bytes --] Hi Wido, On 18/08/2014 14:11, Wido den Hollander wrote:> On 08/18/2014 01:57 PM, Loic Dachary wrote: >> Hi Ceph, >> >> In RHEL 6.5, is the following scenario possible : >> >> a) an OSD dlopen a shared library for erasure-code, >> b) the shared library file is replaced while the OSD is running, >> c) the OSD starts using the new file instead of the old one. >> >> It seems unlikely but it would explain a weird stack trace at http://tracker.ceph.com/issues/9153#note-5 so I'm double checking ;-) >> > > Well, it could be that it does so. I'm not 100% sure, but afaik it could happen that when you replace a library certain parts might not be in memory. > > See: http://stackoverflow.com/questions/7767325/replacing-shared-object-so-file-while-main-program-is-running As it turns out, the problem is a simpler, but I still have not clue how it can happen. http://tracker.ceph.com/issues/9153 shows 537187718- ceph version 0.80.5-164-gcc4e625 (cc4e6258d67fb16d4a92c25078a0822a9849cd77) 537187795- 1: ceph-osd() [0x9b58c1] 537187821- 2: (()+0xf710) [0x7f06a3e24710] 537187854- 3: (memcpy()+0x15b) [0x7f06a2d4daab] 537187892- 4: (jerasure_matrix_dotprod()+0xc8) [0x7f067fd11618] 537187946- 5: (jerasure_matrix_encode()+0x75) [0x7f067fd11865] 537187999- 6: (ErasureCodeJerasureReedSolomonVandermonde::jerasure_encode(char**, char**, int)+0x21) [0x7f067fd294b1] 537188107- 7: (ErasureCodeJerasure::encode_chunks(std::set<int, std::less<int>, std::allocator<int> > const&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<int const, ceph::buffer::list> > >*)+0x607) [0x7f067fd2a807] Meaning ceph-osd firefly crashed trying to use a jerasure plugin coming from master, which is no surprise because the API is incompatible although the data coding / encoding is compatible. Cheers >> Cheers >> > > -- Loïc Dachary, Artisan Logiciel Libre [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RHEL 6.5 shared library upgrade safety 2014-08-18 11:57 RHEL 6.5 shared library upgrade safety Loic Dachary 2014-08-18 12:11 ` Wido den Hollander @ 2014-08-18 15:17 ` Sage Weil 2014-08-18 15:25 ` Loic Dachary 1 sibling, 1 reply; 6+ messages in thread From: Sage Weil @ 2014-08-18 15:17 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development On Mon, 18 Aug 2014, Loic Dachary wrote: > Hi Ceph, > > In RHEL 6.5, is the following scenario possible : > > a) an OSD dlopen a shared library for erasure-code, > b) the shared library file is replaced while the OSD is running, > c) the OSD starts using the new file instead of the old one. > > It seems unlikely but it would explain a weird stack trace at > http://tracker.ceph.com/issues/9153#note-5 so I'm double checking ;-) I think this is possible and likely. We had similar problems with the rados classes and eventually just made them load all available plugins on startup (and also on demand in case one is installed later). The simplest thing is probably to do that here as well... sage ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RHEL 6.5 shared library upgrade safety 2014-08-18 15:17 ` Sage Weil @ 2014-08-18 15:25 ` Loic Dachary 2014-08-18 15:32 ` Sage Weil 0 siblings, 1 reply; 6+ messages in thread From: Loic Dachary @ 2014-08-18 15:25 UTC (permalink / raw) To: Sage Weil; +Cc: Ceph Development [-- Attachment #1: Type: text/plain, Size: 1328 bytes --] On 18/08/2014 17:17, Sage Weil wrote: > On Mon, 18 Aug 2014, Loic Dachary wrote: >> Hi Ceph, >> >> In RHEL 6.5, is the following scenario possible : >> >> a) an OSD dlopen a shared library for erasure-code, >> b) the shared library file is replaced while the OSD is running, >> c) the OSD starts using the new file instead of the old one. >> >> It seems unlikely but it would explain a weird stack trace at >> http://tracker.ceph.com/issues/9153#note-5 so I'm double checking ;-) > > I think this is possible and likely. We had similar problems with the > rados classes and eventually just made them load all available plugins on > startup (and also on demand in case one is installed later). > > The simplest thing is probably to do that here as well... This will not solve the upgrade problem for Firefly daemons which are are already running, unfortunately. Stopping the daemons while the package is being upgraded seems safer and more generic (see the other thread). Or are there issues with this approach ? Cheers > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RHEL 6.5 shared library upgrade safety 2014-08-18 15:25 ` Loic Dachary @ 2014-08-18 15:32 ` Sage Weil 0 siblings, 0 replies; 6+ messages in thread From: Sage Weil @ 2014-08-18 15:32 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development On Mon, 18 Aug 2014, Loic Dachary wrote: > On 18/08/2014 17:17, Sage Weil wrote: > > On Mon, 18 Aug 2014, Loic Dachary wrote: > >> Hi Ceph, > >> > >> In RHEL 6.5, is the following scenario possible : > >> > >> a) an OSD dlopen a shared library for erasure-code, > >> b) the shared library file is replaced while the OSD is running, > >> c) the OSD starts using the new file instead of the old one. > >> > >> It seems unlikely but it would explain a weird stack trace at > >> http://tracker.ceph.com/issues/9153#note-5 so I'm double checking ;-) > > > > I think this is possible and likely. We had similar problems with the > > rados classes and eventually just made them load all available plugins on > > startup (and also on demand in case one is installed later). > > > > The simplest thing is probably to do that here as well... > > This will not solve the upgrade problem for Firefly daemons which are > are already running, unfortunately. Stopping the daemons while the > package is being upgraded seems safer and more generic (see the other > thread). Or are there issues with this approach ? Operationally it is not something people want to do. Usually admins upgrade and then do the restarts in a controlled way. At least, that's what I've heard anecdotally. FWIW the crash is also something that testing turns up but is unlikely to happen in production. In testing, the workload is just starting when we start upgrading so the plugins haven't always loaded. In production, it is unlikely that a user will be *just* starting to use the EC features right as they are also doing an upgrade. Unless they forgot to restart daemons... sage ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-08-18 15:32 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-18 11:57 RHEL 6.5 shared library upgrade safety Loic Dachary 2014-08-18 12:11 ` Wido den Hollander 2014-08-18 14:06 ` Loic Dachary 2014-08-18 15:17 ` Sage Weil 2014-08-18 15:25 ` Loic Dachary 2014-08-18 15:32 ` Sage Weil
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.