From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ajitha Robert Subject: Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout) Date: Fri, 26 Jul 2019 12:31:59 +0530 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1219209013617629813==" Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-users-Qp0mS5GaXlQ@public.gmane.org, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: ceph-devel.vger.kernel.org --===============1219209013617629813== Content-Type: multipart/alternative; boundary="0000000000001feefb058e901f1f" --0000000000001feefb058e901f1f Content-Type: text/plain; charset="UTF-8" I have a rbd mirroring setup with primary and secondary clusters as peers and I have a pool enabled image mode.., In this i created a rbd image , enabled with journaling. But whenever i enable mirroring on the image, I m getting error in rbdmirror.log and osd.log. I have increased the timeouts.. nothing worked and couldnt traceout the error please guide me to solve this error. *Logs* http://paste.openstack.org/show/754766/ -- *Regards,Ajitha R* --0000000000001feefb058e901f1f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

=C2=A0I have a rbd mirroring=20 setup with primary and secondary clusters as peers and I have a pool enable= d=20 image mode.., In this i created a rbd image , enabled with journaling.

B= ut whenever i enable mirroring on the image,=C2=A0 I m getting error in rbdmi= rror.log and=C2=A0=20 osd.log.
I have increased the timeouts.. nothing worked and = couldnt traceout the error
please g= uide me to solve this error.

--
Regards,
Ajitha R

=
--0000000000001feefb058e901f1f-- --===============1219209013617629813== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --===============1219209013617629813==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mykola Golub Subject: Re: Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout) Date: Fri, 26 Jul 2019 12:31:48 +0300 Message-ID: <20190726093147.GA31242@gmail.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Ajitha Robert Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-users-Qp0mS5GaXlQ@public.gmane.org, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: ceph-devel.vger.kernel.org On Fri, Jul 26, 2019 at 12:31:59PM +0530, Ajitha Robert wrote: > I have a rbd mirroring setup with primary and secondary clusters as peers > and I have a pool enabled image mode.., In this i created a rbd image , > enabled with journaling. > But whenever i enable mirroring on the image, I m getting error in > rbdmirror.log and osd.log. > I have increased the timeouts.. nothing worked and couldnt traceout the > error > please guide me to solve this error. > > *Logs* > http://paste.openstack.org/show/754766/ What do you mean by "nothing worked"? According to mirroring status the image is mirroring: it is in "up+stopped" state on the primary as expected, and in "up+replaying" state on the secondary with 0 entries behind master. The "failed to get omap key" error in the osd log is harmless, and just a week ago the fix was merged upstream not to display it. The cause of "InstanceWatcher: ... resending after timeout" error in the rbd-mirror log is not clear but if it is not repeating it is harmless too. I see you were trying to map the image with krbd. It is expected to fail as the krbd does not support "journaling" feature, which is necessary for mirroring. You can access those images only with librbd (e.g. mapping with rbd-nbd driver or via qemu). -- Mykola Golub From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ajitha Robert Subject: Re: Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout) Date: Fri, 26 Jul 2019 16:40:35 +0530 Message-ID: References: <20190726093147.GA31242@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4221330536792230175==" Return-path: In-Reply-To: <20190726093147.GA31242-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Mykola Golub Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-users-Qp0mS5GaXlQ@public.gmane.org, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: ceph-devel.vger.kernel.org --===============4221330536792230175== Content-Type: multipart/alternative; boundary="0000000000003d890c058e9398ed" --0000000000003d890c058e9398ed Content-Type: text/plain; charset="UTF-8" Thank you for the clarification. But i was trying with openstack-cinder.. when i load some data into the volume around 50gb, the image sync will stop by 5 % or something within 15%... What could be the reason? On Fri, Jul 26, 2019 at 3:01 PM Mykola Golub wrote: > On Fri, Jul 26, 2019 at 12:31:59PM +0530, Ajitha Robert wrote: > > I have a rbd mirroring setup with primary and secondary clusters as > peers > > and I have a pool enabled image mode.., In this i created a rbd image , > > enabled with journaling. > > But whenever i enable mirroring on the image, I m getting error in > > rbdmirror.log and osd.log. > > I have increased the timeouts.. nothing worked and couldnt traceout the > > error > > please guide me to solve this error. > > > > *Logs* > > http://paste.openstack.org/show/754766/ > > What do you mean by "nothing worked"? According to mirroring status > the image is mirroring: it is in "up+stopped" state on the primary as > expected, and in "up+replaying" state on the secondary with 0 entries > behind master. > > The "failed to get omap key" error in the osd log is harmless, and > just a week ago the fix was merged upstream not to display it. > > The cause of "InstanceWatcher: ... resending after timeout" error in > the rbd-mirror log is not clear but if it is not repeating it is > harmless too. > > I see you were trying to map the image with krbd. It is expected to > fail as the krbd does not support "journaling" feature, which is > necessary for mirroring. You can access those images only with librbd > (e.g. mapping with rbd-nbd driver or via qemu). > > -- > Mykola Golub > -- *Regards,Ajitha R* --0000000000003d890c058e9398ed Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for the clarification.

B= ut i was trying with openstack-cinder.. when i load some data into the volu= me around 50gb, the image sync will stop by 5 % or something within 15%...= =C2=A0 What could be the reason?






On Fri, Jul 26, 2019 at 3:01 PM Mykola Golu= b <to.my.trociny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
On = Fri, Jul 26, 2019 at 12:31:59PM +0530, Ajitha Robert wrote:
>=C2=A0 I have a rbd mirroring setup with primary and secondary clusters= as peers
> and I have a pool enabled image mode.., In this i created a rbd image = ,
> enabled with journaling.
> But whenever i enable mirroring on the image,=C2=A0 I m getting error = in
> rbdmirror.log and=C2=A0 osd.log.
> I have increased the timeouts.. nothing worked and couldnt traceout th= e
> error
> please guide me to solve this error.
>
> *Logs*
>
http://paste.openstack.org/show/754766/

What do you mean by "nothing worked"? According to mirroring stat= us
the image is mirroring: it is in "up+stopped" state on the primar= y as
expected, and in "up+replaying" state on the secondary with 0 ent= ries
behind master.

The "failed to get omap key" error in the osd log is harmless, an= d
just a week ago the fix was merged upstream not to display it.

The cause of "InstanceWatcher: ... resending after timeout" error= in
the rbd-mirror log is not clear but if it is not repeating it is
harmless too.

I see you were trying to map the image with krbd. It is expected to
fail as the krbd does not support "journaling" feature, which is<= br> necessary for mirroring. You can access those images only with librbd
(e.g. mapping with rbd-nbd driver or via qemu).

--
Mykola Golub


--
Regards,
Ajitha R

--0000000000003d890c058e9398ed-- --===============4221330536792230175== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --===============4221330536792230175==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mykola Golub Subject: Re: Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout) Date: Fri, 26 Jul 2019 16:25:46 +0300 Message-ID: <20190726132546.GA6825@gmail.com> References: <20190726093147.GA31242@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Ajitha Robert Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-users-Qp0mS5GaXlQ@public.gmane.org, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: ceph-devel.vger.kernel.org On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote: > Thank you for the clarification. > > But i was trying with openstack-cinder.. when i load some data into the > volume around 50gb, the image sync will stop by 5 % or something within > 15%... What could be the reason? I suppose you see image sync stop in mirror status output? Could you please provide an example? And I suppose you don't see any other messages in rbd-mirror log apart from what you have already posted? Depending on configuration rbd-mirror might log in several logs. Could you please try to find all its logs? `lsof |grep 'rbd-mirror.*log'` may be useful for this. BTW, what rbd-mirror version are you running? -- Mykola Golub From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Dillaman Subject: Re: Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout) Date: Fri, 26 Jul 2019 09:58:59 -0400 Message-ID: References: <20190726093147.GA31242@gmail.com> <20190726132546.GA6825@gmail.com> Reply-To: dillaman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20190726132546.GA6825-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Mykola Golub Cc: Ajitha Robert , ceph-users , ceph-users , ceph-devel List-Id: ceph-devel.vger.kernel.org On Fri, Jul 26, 2019 at 9:26 AM Mykola Golub wrote: > > On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote: > > Thank you for the clarification. > > > > But i was trying with openstack-cinder.. when i load some data into the > > volume around 50gb, the image sync will stop by 5 % or something within > > 15%... What could be the reason? > > I suppose you see image sync stop in mirror status output? Could you > please provide an example? And I suppose you don't see any other > messages in rbd-mirror log apart from what you have already posted? > Depending on configuration rbd-mirror might log in several logs. Could > you please try to find all its logs? `lsof |grep 'rbd-mirror.*log'` > may be useful for this. > > BTW, what rbd-mirror version are you running? >From the previous thread a few days ago (not sure why a new thread was started on this same topic), to me it sounded like one or more OSDs isn't reachable from the secondary site: > > Scenario 2: > > but when i create a 50gb volume with another glance image. Volume get created. and in the backend i could see the rbd images both in primary and secondary > > > > From rbd mirror image status i found secondary cluster starts copying , and syncing was struck at around 14 %... It will be in 14 % .. no progress at all. should I set any parameters for this like timeout?? > > > > I manually checked rbd --cluster primary object-map check .. No results came for the objects and the command was in hanging.. Thats why got worried on the failed to map object key log. I couldnt even rebuild the object map. > It sounds like one or more of your primary OSDs are not reachable from > the secondary site. If you run w/ "debug rbd-mirror = 20" and "debug > rbd = 20", you should be able to see the last object it attempted to > copy. From that, you could use "ceph osd map" to figure out the > primary OSD for that object. > -- > Mykola Golub > _______________________________________________ > ceph-users mailing list > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ajitha Robert Subject: Re: Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout) Date: Sat, 27 Jul 2019 18:08:58 +0530 Message-ID: References: <20190726093147.GA31242@gmail.com> <20190726132546.GA6825@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4887155585420644587==" Return-path: In-Reply-To: <20190726132546.GA6825-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Mykola Golub Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-users-Qp0mS5GaXlQ@public.gmane.org, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: ceph-devel.vger.kernel.org --===============4887155585420644587== Content-Type: multipart/alternative; boundary="00000000000054a15a058ea8f2bd" --00000000000054a15a058ea8f2bd Content-Type: text/plain; charset="UTF-8" Thanks for the clarification *1) Will there be any folder related to rbd-mirroring in /var/lib/ceph ? * *2) Is ceph rbd-mirror authentication mandatory?* *3)when even i create any cinder volume loaded with glance image i get the following error.. * 2019-07-27 17:26:46.762571 7f93eb0a5780 20 librbd::api::Mirror: peer_list: 2019-07-27 17:27:07.541701 7f939d7fa700 0 rbd::mirror::ImageReplayer: 0x7f93c800e9e0 [19/b6656be7-6006-4246-ba93-a49a220e33ce] handle_shut_down: remote image no longer exists: scheduling deletion 2019-07-27 17:27:16.766199 7f93eb0a5780 20 librbd::api::Mirror: peer_list: 2019-07-27 17:27:22.568970 7f939d7fa700 0 rbd::mirror::ImageReplayer: 0x7f93c800e9e0 [19/b6656be7-6006-4246-ba93-a49a220e33ce] handle_shut_down: mirror image no longer exists 2019-07-27 17:27:46.769158 7f93eb0a5780 20 librbd::api::Mirror: peer_list: 2019 *4) Finally,* *Attimes i can able to create bootable cinder volume apart from the above errors, * *but certain times i face the following* example, For a 50 gb volume, Local image get created, but it couldnt create a mirror image Logs of the image status showed as replaying but pls see the rbd-mirror log http://paste.openstack.org/show/754917/ rbd-mirror.log http://paste.openstack.org/show/754916/ On Fri, Jul 26, 2019 at 6:55 PM Mykola Golub wrote: > On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote: > > Thank you for the clarification. > > > > But i was trying with openstack-cinder.. when i load some data into the > > volume around 50gb, the image sync will stop by 5 % or something within > > 15%... What could be the reason? > > I suppose you see image sync stop in mirror status output? Could you > please provide an example? And I suppose you don't see any other > messages in rbd-mirror log apart from what you have already posted? > Depending on configuration rbd-mirror might log in several logs. Could > you please try to find all its logs? `lsof |grep 'rbd-mirror.*log'` > may be useful for this. > > BTW, what rbd-mirror version are you running? > > -- > Mykola Golub > -- *Regards,Ajitha R* --00000000000054a15a058ea8f2bd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Thanks for the cl= arification


1) Will there be any folder = related to rbd-mirroring in /var/lib/ceph ?


2) Is ceph rbd-mirror authentication mandatory?


3)when even i create any cinder volume loaded = with glance image i get the following error..

2019-07-27 17:26:46.762571 7f93eb0a5780 20 l= ibrbd::api::Mirror: peer_list:
2019-07-27 17:27:07.541701 7f939d7fa700=C2=A0 0 rbd::mirror::ImageReplayer:=20 0x7f93c800e9e0 [19/b6656be7-6006-4246-ba93-a49a220e33ce]=20 handle_shut_down: remote image no longer exists: scheduling deletion
201= 9-07-27 17:27:16.766199 7f93eb0a5780 20 librbd::api::Mirror: peer_list: 2019-07-27 17:27:22.568970 7f939d7fa700=C2=A0 0 rbd::mirror::ImageReplayer:=20 0x7f93c800e9e0 [19/b6656be7-6006-4246-ba93-a49a220e33ce]=20 handle_shut_down: mirror image no longer exists
2019-07-27 17:27:46.7691= 58 7f93eb0a5780 20 librbd::api::Mirror: peer_list:
2019


4) Finally,

Attimes i can able to create bootable cinder volume apart fr= om the above errors,


but= certain times i face the following


exa= mple, For a 50 gb volume, Local image get created, but it couldnt create a mirror image

Logs of the image status showed as replaying but pls see the rbd-mirror log

http://paste.openstack.org/show/754917/


rbd-mirror.log

http://paste.openstack.org/show/754916/




On Fri, Jul 26, 2019 at 6:55 PM Mykola Golub <to.my.trociny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wr= ote:
On Fri, Jul= 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote:
> Thank you for the clarification.
>
> But i was trying with openstack-cinder.. when i load some data into th= e
> volume around 50gb, the image sync will stop by 5 % or something withi= n
> 15%...=C2=A0 What could be the reason?

I suppose you see image sync stop in mirror status output? Could you
please provide an example? And I suppose you don't see any other
messages in rbd-mirror log apart from what you have already posted?
Depending on configuration rbd-mirror might log in several logs. Could
you please try to find all its logs? `lsof |grep 'rbd-mirror.*log'`=
may be useful for this.

BTW, what rbd-mirror version are you running?

--
Mykola Golub


--
Regards,
Ajitha R

--00000000000054a15a058ea8f2bd-- --===============4887155585420644587== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com --===============4887155585420644587==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mykola Golub Subject: Re: Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout) Date: Mon, 29 Jul 2019 19:30:36 +0300 Message-ID: <20190729163035.GB8882@gmail.com> References: <20190726093147.GA31242@gmail.com> <20190726132546.GA6825@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Ajitha Robert Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org, ceph-users-Qp0mS5GaXlQ@public.gmane.org, ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: ceph-devel.vger.kernel.org On Sat, Jul 27, 2019 at 06:08:58PM +0530, Ajitha Robert wrote: > *1) Will there be any folder related to rbd-mirroring in /var/lib/ceph ? * no > *2) Is ceph rbd-mirror authentication mandatory?* no. But why are you asking? > *3)when even i create any cinder volume loaded with glance image i get the > following error.. * > > 2019-07-27 17:26:46.762571 7f93eb0a5780 20 librbd::api::Mirror: peer_list: > 2019-07-27 17:27:07.541701 7f939d7fa700 0 rbd::mirror::ImageReplayer: > 0x7f93c800e9e0 [19/b6656be7-6006-4246-ba93-a49a220e33ce] handle_shut_down: > remote image no longer exists: scheduling deletion > 2019-07-27 17:27:16.766199 7f93eb0a5780 20 librbd::api::Mirror: peer_list: > 2019-07-27 17:27:22.568970 7f939d7fa700 0 rbd::mirror::ImageReplayer: > 0x7f93c800e9e0 [19/b6656be7-6006-4246-ba93-a49a220e33ce] handle_shut_down: > mirror image no longer exists > 2019-07-27 17:27:46.769158 7f93eb0a5780 20 librbd::api::Mirror: peer_list: > 2019 The log tells that the primary image was deleted by some reason and the rbd-mirror scheduled the secondary (mirrored) image deletion. From the logs it is not seen why the primary image was deleted. It might be sinder but can't exlude some bug in the rbd-mirror, running on the primary cluster, though I don't recall any issues like this. > *Attimes i can able to create bootable cinder volume apart from the above > errors, but certain times i face the following > > example, For a 50 gb volume, Local image get created, but it couldnt create > a mirror image "Connection timed out" errors suggest you have a connectivity issue between sites? -- Mykola Golub