* Osd failure detection
@ 2017-11-09 9:43 Wei Jin
2017-11-09 11:03 ` David Disseldorp
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Wei Jin @ 2017-11-09 9:43 UTC (permalink / raw)
To: ceph-devel
Hi, List,
From Luminous release, I noticed following information:
"Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire. This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running."
This is critical and we have no plan to upgrade to Luminous so far.
Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Osd failure detection 2017-11-09 9:43 Osd failure detection Wei Jin @ 2017-11-09 11:03 ` David Disseldorp 2017-11-09 12:13 ` Wei Jin 2017-11-09 14:19 ` Sage Weil [not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2 siblings, 1 reply; 8+ messages in thread From: David Disseldorp @ 2017-11-09 11:03 UTC (permalink / raw) To: Wei Jin; +Cc: ceph-devel On Thu, 9 Nov 2017 17:43:04 +0800, Wei Jin wrote: > Hi, List, > > From Luminous release, I noticed following information: > > "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire. This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running." I assume you're referring to the ECONNREFUSED-fast-fail functionality added by Piotr Dałek. > This is critical and we have no plan to upgrade to Luminous so far. > Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself. It was backported to Jewel, alongside a bunch of other async messenger fixes, and submitted via https://github.com/ceph/ceph/pull/13212 . IIRC, there's still a small async messenger leak blocking the PR. Cheers, David ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Osd failure detection 2017-11-09 11:03 ` David Disseldorp @ 2017-11-09 12:13 ` Wei Jin 2017-11-09 12:25 ` Piotr Dałek 0 siblings, 1 reply; 8+ messages in thread From: Wei Jin @ 2017-11-09 12:13 UTC (permalink / raw) To: David Disseldorp; +Cc: ceph-devel On Thu, Nov 9, 2017 at 7:03 PM, David Disseldorp <ddiss@suse.de> wrote: > On Thu, 9 Nov 2017 17:43:04 +0800, Wei Jin wrote: > >> Hi, List, >> >> From Luminous release, I noticed following information: >> >> "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire. This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running." > > I assume you're referring to the ECONNREFUSED-fast-fail functionality > added by Piotr Dałek. > Exactly. >> This is critical and we have no plan to upgrade to Luminous so far. >> Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself. > > It was backported to Jewel, alongside a bunch of other async messenger > fixes, and submitted via https://github.com/ceph/ceph/pull/13212 . IIRC, > there's still a small async messenger leak blocking the PR. > Yeah. It is still open and marked as DNM. And there are some issues with async messenger. As async is not the default one, why not just make it available for simple messenger? > Cheers, David ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Osd failure detection 2017-11-09 12:13 ` Wei Jin @ 2017-11-09 12:25 ` Piotr Dałek 0 siblings, 0 replies; 8+ messages in thread From: Piotr Dałek @ 2017-11-09 12:25 UTC (permalink / raw) To: Wei Jin; +Cc: David Disseldorp, ceph-devel On Thu, Nov 09, 2017 at 08:13:57PM +0800, Wei Jin wrote: > On Thu, Nov 9, 2017 at 7:03 PM, David Disseldorp <ddiss@suse.de> wrote: > > On Thu, 9 Nov 2017 17:43:04 +0800, Wei Jin wrote: > >> This is critical and we have no plan to upgrade to Luminous so far. > >> Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself. > > > > It was backported to Jewel, alongside a bunch of other async messenger > > fixes, and submitted via https://github.com/ceph/ceph/pull/13212 . IIRC, > > there's still a small async messenger leak blocking the PR. > > > > Yeah. It is still open and marked as DNM. > And there are some issues with async messenger. As async is not the > default one, why not just make it available for simple messenger? Because the change is so vast that it would affect async messenger anyway. In fact I didn't bother backporting it myself because I was certain that maintainers/backporters wouldn't give it a green light anyway. -- Piotr Dałek branch@predictor.org.pl http://blog.predictor.org.pl ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Osd failure detection 2017-11-09 9:43 Osd failure detection Wei Jin 2017-11-09 11:03 ` David Disseldorp @ 2017-11-09 14:19 ` Sage Weil [not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2 siblings, 0 replies; 8+ messages in thread From: Sage Weil @ 2017-11-09 14:19 UTC (permalink / raw) To: Wei Jin; +Cc: ceph-devel On Thu, 9 Nov 2017, Wei Jin wrote: > Hi, List, > > From Luminous release, I noticed following information: > > "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire. This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running." > > This is critical and we have no plan to upgrade to Luminous so far. > Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself. No plans. The original series merged here, a033dc6f5b4cef357db6f5951062d680e880ba0e, but there were likely many follow-on patches as well that you'll need to find. sage ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* ceph-deploy failed to deploy osd randomly [not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2017-11-15 12:24 ` Wei Jin 2017-11-15 13:31 ` Wei Jin 0 siblings, 1 reply; 8+ messages in thread From: Wei Jin @ 2017-11-15 12:24 UTC (permalink / raw) To: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-Qp0mS5GaXlQ [-- Attachment #1.1: Type: text/plain, Size: 13407 bytes --] Hi, List, My machine has 12 SSDs disk, and I use ceph-deploy to deploy them. But for some machine/disks,it failed to start osd. I tried many times, some success but others failed. But there is no error info. Following is ceph-deploy log for one disk: root@n10-075-012:~# ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.39): /usr/bin/ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb [ceph_deploy.cli][INFO ] ceph-deploy options: [ceph_deploy.cli][INFO ] username : None [ceph_deploy.cli][INFO ] block_db : None [ceph_deploy.cli][INFO ] disk : [('n10-075-094', '/dev/sdb', '/dev/sdb')] [ceph_deploy.cli][INFO ] dmcrypt : False [ceph_deploy.cli][INFO ] verbose : False [ceph_deploy.cli][INFO ] bluestore : None [ceph_deploy.cli][INFO ] block_wal : None [ceph_deploy.cli][INFO ] overwrite_conf : False [ceph_deploy.cli][INFO ] subcommand : create [ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys [ceph_deploy.cli][INFO ] quiet : False [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf object at 0x7f566b82a110> [ceph_deploy.cli][INFO ] cluster : ceph [ceph_deploy.cli][INFO ] fs_type : xfs [ceph_deploy.cli][INFO ] filestore : None [ceph_deploy.cli][INFO ] func : <function osd at 0x7f566ae9a938> [ceph_deploy.cli][INFO ] ceph_conf : None [ceph_deploy.cli][INFO ] default_release : False [ceph_deploy.cli][INFO ] zap_disk : True [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks n10-075-094:/dev/sdb:/dev/sdb [n10-075-094][DEBUG ] connected to host: n10-075-094 [n10-075-094][DEBUG ] detect platform information from remote host [n10-075-094][DEBUG ] detect machine type [n10-075-094][DEBUG ] find the location of an executable [ceph_deploy.osd][INFO ] Distro info: debian 8.9 jessie [ceph_deploy.osd][DEBUG ] Deploying osd to n10-075-094 [n10-075-094][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph_deploy.osd][DEBUG ] Preparing host n10-075-094 disk /dev/sdb journal /dev/sdb activate True [n10-075-094][DEBUG ] find the location of an executable [n10-075-094][INFO ] Running command: /usr/sbin/ceph-disk -v prepare --zap-disk --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdb [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is /sys/dev/block/8:18/dm/uuid [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] zap: Zapping partition table on /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --zap-all -- /dev/sdb [n10-075-094][WARNIN] Caution: invalid backup GPT header, but valid main header; regenerating [n10-075-094][WARNIN] backup header from main header. [n10-075-094][WARNIN] [n10-075-094][WARNIN] Warning! Main and backup partition tables differ! Use the 'c' and 'e' options [n10-075-094][WARNIN] on the recovery & transformation menu to examine the two tables. [n10-075-094][WARNIN] [n10-075-094][WARNIN] Warning! One or more CRCs don't match. You should repair the disk! [n10-075-094][WARNIN] [n10-075-094][DEBUG ] **************************************************************************** [n10-075-094][DEBUG ] Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk [n10-075-094][DEBUG ] verification and recovery are STRONGLY recommended. [n10-075-094][DEBUG ] **************************************************************************** [n10-075-094][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [n10-075-094][DEBUG ] other utilities. [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sdb [n10-075-094][DEBUG ] Creating new GPT entries. [n10-075-094][DEBUG ] The operation has completed successfully. [n10-075-094][WARNIN] update_partition: Calling partprobe on zapped device /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] ptype_tobe_for_name: name = journal [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] create_partition: Creating journal partition num 2 size 40960 on /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --new=2:0:+40960M --change-name=2:ceph journal --partition-guid=2:b7f01f38-f0d5-45ba-a913-ac7242820aed --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb [n10-075-094][DEBUG ] Setting name! [n10-075-094][DEBUG ] partNum is 1 [n10-075-094][DEBUG ] REALLY setting name! [n10-075-094][DEBUG ] The operation has completed successfully. [n10-075-094][WARNIN] update_partition: Calling partprobe on created device /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is /sys/dev/block/8:18/dm/uuid [n10-075-094][WARNIN] prepare_device: Journal is GPT partition /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed [n10-075-094][WARNIN] prepare_device: Journal is GPT partition /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] set_data_partition: Creating osd partition on /dev/sdb [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] ptype_tobe_for_name: name = data [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] create_partition: Creating data partition num 1 size 0 on /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:6e984e11-1b4b-4741-9080-131f13a73daa --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb [n10-075-094][DEBUG ] Setting name! [n10-075-094][DEBUG ] partNum is 0 [n10-075-094][DEBUG ] REALLY setting name! [n10-075-094][DEBUG ] The operation has completed successfully. [n10-075-094][WARNIN] update_partition: Calling partprobe on created device /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid [n10-075-094][WARNIN] populate_data_path_device: Creating xfs fs on /dev/sdb1 [n10-075-094][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdb1 [n10-075-094][DEBUG ] meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=55984277 blks [n10-075-094][DEBUG ] = sectsz=4096 attr=2, projid32bit=1 [n10-075-094][DEBUG ] = crc=0 finobt=0 [n10-075-094][DEBUG ] data = bsize=4096 blocks=223937105, imaxpct=25 [n10-075-094][DEBUG ] = sunit=0 swidth=0 blks [n10-075-094][DEBUG ] naming =version 2 bsize=4096 ascii-ci=0 ftype=0 [n10-075-094][DEBUG ] log =internal log bsize=4096 blocks=109344, version=2 [n10-075-094][DEBUG ] = sectsz=4096 sunit=1 blks, lazy-count=1 [n10-075-094][DEBUG ] realtime =none extsz=4096 blocks=0, rtextents=0 [n10-075-094][WARNIN] mount: Mounting /dev/sdb1 on /var/lib/ceph/tmp/mnt.N8D5Kd with options rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota [n10-075-094][WARNIN] command_check_call: Running command: /bin/mount -t xfs -o rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota -- /dev/sdb1 /var/lib/ceph/tmp/mnt.N8D5Kd [n10-075-094][WARNIN] populate_data_path: Preparing osd data dir /var/lib/ceph/tmp/mnt.N8D5Kd [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/ceph_fsid.11531.tmp [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/fsid.11531.tmp [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/magic.11531.tmp [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/journal_uuid.11531.tmp [n10-075-094][WARNIN] adjust_symlink: Creating symlink /var/lib/ceph/tmp/mnt.N8D5Kd/journal -> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd [n10-075-094][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.N8D5Kd [n10-075-094][WARNIN] command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.N8D5Kd [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb [n10-075-094][DEBUG ] Warning: The kernel is still using the old partition table. [n10-075-094][DEBUG ] The new table will be used at the next reboot. [n10-075-094][DEBUG ] The operation has completed successfully. [n10-075-094][WARNIN] update_partition: Calling partprobe on prepared device /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600 [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm trigger --action=add --sysname-match sdb1 [n10-075-094][INFO ] Running command: systemctl enable ceph.target [n10-075-094][INFO ] checking OSD status... [n10-075-094][DEBUG ] find the location of an executable [n10-075-094][INFO ] Running command: /usr/bin/ceph --cluster=ceph osd stat --format=json [ceph_deploy.osd][DEBUG ] Host n10-075-094 is now ready for osd use. [-- Attachment #1.2: Type: text/html, Size: 18218 bytes --] [-- Attachment #2: Type: text/plain, Size: 178 bytes --] _______________________________________________ ceph-users mailing list ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ceph-deploy failed to deploy osd randomly 2017-11-15 12:24 ` ceph-deploy failed to deploy osd randomly Wei Jin @ 2017-11-15 13:31 ` Wei Jin [not found] ` <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Wei Jin @ 2017-11-15 13:31 UTC (permalink / raw) To: ceph-devel, ceph-users I tried to do purge/purgedata and then redo the deploy command for a few times, and it still fails to start osd. And there is no error log, anyone know what's the problem? BTW, my os is dedian with 4.4 kernel. Thanks. On Wed, Nov 15, 2017 at 8:24 PM, Wei Jin <wjin.cn@gmail.com> wrote: > Hi, List, > > My machine has 12 SSDs disk, and I use ceph-deploy to deploy them. But for > some machine/disks,it failed to start osd. > I tried many times, some success but others failed. But there is no error > info. > Following is ceph-deploy log for one disk: > > > root@n10-075-012:~# ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb > [ceph_deploy.conf][DEBUG ] found configuration file at: > /root/.cephdeploy.conf > [ceph_deploy.cli][INFO ] Invoked (1.5.39): /usr/bin/ceph-deploy osd create > --zap-disk n10-075-094:sdb:sdb > [ceph_deploy.cli][INFO ] ceph-deploy options: > [ceph_deploy.cli][INFO ] username : None > [ceph_deploy.cli][INFO ] block_db : None > [ceph_deploy.cli][INFO ] disk : [('n10-075-094', > '/dev/sdb', '/dev/sdb')] > [ceph_deploy.cli][INFO ] dmcrypt : False > [ceph_deploy.cli][INFO ] verbose : False > [ceph_deploy.cli][INFO ] bluestore : None > [ceph_deploy.cli][INFO ] block_wal : None > [ceph_deploy.cli][INFO ] overwrite_conf : False > [ceph_deploy.cli][INFO ] subcommand : create > [ceph_deploy.cli][INFO ] dmcrypt_key_dir : > /etc/ceph/dmcrypt-keys > [ceph_deploy.cli][INFO ] quiet : False > [ceph_deploy.cli][INFO ] cd_conf : > <ceph_deploy.conf.cephdeploy.Conf object at 0x7f566b82a110> > [ceph_deploy.cli][INFO ] cluster : ceph > [ceph_deploy.cli][INFO ] fs_type : xfs > [ceph_deploy.cli][INFO ] filestore : None > [ceph_deploy.cli][INFO ] func : <function osd at > 0x7f566ae9a938> > [ceph_deploy.cli][INFO ] ceph_conf : None > [ceph_deploy.cli][INFO ] default_release : False > [ceph_deploy.cli][INFO ] zap_disk : True > [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks > n10-075-094:/dev/sdb:/dev/sdb > [n10-075-094][DEBUG ] connected to host: n10-075-094 > [n10-075-094][DEBUG ] detect platform information from remote host > [n10-075-094][DEBUG ] detect machine type > [n10-075-094][DEBUG ] find the location of an executable > [ceph_deploy.osd][INFO ] Distro info: debian 8.9 jessie > [ceph_deploy.osd][DEBUG ] Deploying osd to n10-075-094 > [n10-075-094][DEBUG ] write cluster configuration to > /etc/ceph/{cluster}.conf > [ceph_deploy.osd][DEBUG ] Preparing host n10-075-094 disk /dev/sdb journal > /dev/sdb activate True > [n10-075-094][DEBUG ] find the location of an executable > [n10-075-094][INFO ] Running command: /usr/sbin/ceph-disk -v prepare > --zap-disk --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdb > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd > --cluster=ceph --show-config-value=fsid > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd > --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log > --cluster ceph --setuser ceph --setgroup ceph > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd > --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log > --cluster ceph --setuser ceph --setgroup ceph > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd > --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log > --cluster ceph --setuser ceph --setgroup ceph > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd > --cluster=ceph --show-config-value=osd_journal_size > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is > /sys/dev/block/8:17/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is > /sys/dev/block/8:18/dm/uuid > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf > --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf > --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs > [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf > --cluster=ceph --name=osd. --lookup osd_mount_options_xfs > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] zap: Zapping partition table on /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk > --zap-all -- /dev/sdb > [n10-075-094][WARNIN] Caution: invalid backup GPT header, but valid main > header; regenerating > [n10-075-094][WARNIN] backup header from main header. > [n10-075-094][WARNIN] > [n10-075-094][WARNIN] Warning! Main and backup partition tables differ! Use > the 'c' and 'e' options > [n10-075-094][WARNIN] on the recovery & transformation menu to examine the > two tables. > [n10-075-094][WARNIN] > [n10-075-094][WARNIN] Warning! One or more CRCs don't match. You should > repair the disk! > [n10-075-094][WARNIN] > [n10-075-094][DEBUG ] > **************************************************************************** > [n10-075-094][DEBUG ] Caution: Found protective or hybrid MBR and corrupt > GPT. Using GPT, but disk > [n10-075-094][DEBUG ] verification and recovery are STRONGLY recommended. > [n10-075-094][DEBUG ] > **************************************************************************** > [n10-075-094][DEBUG ] GPT data structures destroyed! You may now partition > the disk using fdisk or > [n10-075-094][DEBUG ] other utilities. > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk > --clear --mbrtogpt -- /dev/sdb > [n10-075-094][DEBUG ] Creating new GPT entries. > [n10-075-094][DEBUG ] The operation has completed successfully. > [n10-075-094][WARNIN] update_partition: Calling partprobe on zapped device > /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb > /sbin/partprobe /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] ptype_tobe_for_name: name = journal > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] create_partition: Creating journal partition num 2 > size 40960 on /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk > --new=2:0:+40960M --change-name=2:ceph journal > --partition-guid=2:b7f01f38-f0d5-45ba-a913-ac7242820aed > --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb > [n10-075-094][DEBUG ] Setting name! > [n10-075-094][DEBUG ] partNum is 1 > [n10-075-094][DEBUG ] REALLY setting name! > [n10-075-094][DEBUG ] The operation has completed successfully. > [n10-075-094][WARNIN] update_partition: Calling partprobe on created device > /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb > /sbin/partprobe /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is > /sys/dev/block/8:18/dm/uuid > [n10-075-094][WARNIN] prepare_device: Journal is GPT partition > /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed > [n10-075-094][WARNIN] prepare_device: Journal is GPT partition > /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] set_data_partition: Creating osd partition on /dev/sdb > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] ptype_tobe_for_name: name = data > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] create_partition: Creating data partition num 1 size 0 > on /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk > --largest-new=1 --change-name=1:ceph data > --partition-guid=1:6e984e11-1b4b-4741-9080-131f13a73daa > --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb > [n10-075-094][DEBUG ] Setting name! > [n10-075-094][DEBUG ] partNum is 0 > [n10-075-094][DEBUG ] REALLY setting name! > [n10-075-094][DEBUG ] The operation has completed successfully. > [n10-075-094][WARNIN] update_partition: Calling partprobe on created device > /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb > /sbin/partprobe /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is > /sys/dev/block/8:17/dm/uuid > [n10-075-094][WARNIN] populate_data_path_device: Creating xfs fs on > /dev/sdb1 > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs > -f -i size=2048 -- /dev/sdb1 > [n10-075-094][DEBUG ] meta-data=/dev/sdb1 isize=2048 > agcount=4, agsize=55984277 blks > [n10-075-094][DEBUG ] = sectsz=4096 attr=2, > projid32bit=1 > [n10-075-094][DEBUG ] = crc=0 finobt=0 > [n10-075-094][DEBUG ] data = bsize=4096 > blocks=223937105, imaxpct=25 > [n10-075-094][DEBUG ] = sunit=0 swidth=0 > blks > [n10-075-094][DEBUG ] naming =version 2 bsize=4096 > ascii-ci=0 ftype=0 > [n10-075-094][DEBUG ] log =internal log bsize=4096 > blocks=109344, version=2 > [n10-075-094][DEBUG ] = sectsz=4096 sunit=1 > blks, lazy-count=1 > [n10-075-094][DEBUG ] realtime =none extsz=4096 > blocks=0, rtextents=0 > [n10-075-094][WARNIN] mount: Mounting /dev/sdb1 on > /var/lib/ceph/tmp/mnt.N8D5Kd with options > rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota > [n10-075-094][WARNIN] command_check_call: Running command: /bin/mount -t xfs > -o rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota -- > /dev/sdb1 /var/lib/ceph/tmp/mnt.N8D5Kd > [n10-075-094][WARNIN] populate_data_path: Preparing osd data dir > /var/lib/ceph/tmp/mnt.N8D5Kd > [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph > /var/lib/ceph/tmp/mnt.N8D5Kd/ceph_fsid.11531.tmp > [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph > /var/lib/ceph/tmp/mnt.N8D5Kd/fsid.11531.tmp > [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph > /var/lib/ceph/tmp/mnt.N8D5Kd/magic.11531.tmp > [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph > /var/lib/ceph/tmp/mnt.N8D5Kd/journal_uuid.11531.tmp > [n10-075-094][WARNIN] adjust_symlink: Creating symlink > /var/lib/ceph/tmp/mnt.N8D5Kd/journal -> > /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed > [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph > /var/lib/ceph/tmp/mnt.N8D5Kd > [n10-075-094][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.N8D5Kd > [n10-075-094][WARNIN] command_check_call: Running command: /bin/umount -- > /var/lib/ceph/tmp/mnt.N8D5Kd > [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is > /sys/dev/block/8:16/dm/uuid > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk > --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb > [n10-075-094][DEBUG ] Warning: The kernel is still using the old partition > table. > [n10-075-094][DEBUG ] The new table will be used at the next reboot. > [n10-075-094][DEBUG ] The operation has completed successfully. > [n10-075-094][WARNIN] update_partition: Calling partprobe on prepared device > /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb > /sbin/partprobe /dev/sdb > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > settle --timeout=600 > [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm > trigger --action=add --sysname-match sdb1 > [n10-075-094][INFO ] Running command: systemctl enable ceph.target > [n10-075-094][INFO ] checking OSD status... > [n10-075-094][DEBUG ] find the location of an executable > [n10-075-094][INFO ] Running command: /usr/bin/ceph --cluster=ceph osd stat > --format=json > [ceph_deploy.osd][DEBUG ] Host n10-075-094 is now ready for osd use. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: ceph-deploy failed to deploy osd randomly [not found] ` <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-11-16 13:08 ` Alfredo Deza 0 siblings, 0 replies; 8+ messages in thread From: Alfredo Deza @ 2017-11-16 13:08 UTC (permalink / raw) To: Wei Jin; +Cc: ceph-devel, ceph-users-Qp0mS5GaXlQ@public.gmane.org On Wed, Nov 15, 2017 at 8:31 AM, Wei Jin <wjin.cn@gmail.com> wrote: > I tried to do purge/purgedata and then redo the deploy command for a > few times, and it still fails to start osd. > And there is no error log, anyone know what's the problem? Seems like this is OSD 0, right? Have you checked the startup errors on /var/log/ceph/ ? Or by checking the output of the daemon with systemctl? If nothing is working still, maybe try running the OSD in the foreground with (assuming OSD 0): /usr/bin/ceph-osd --debug_osd 20 -d -f --cluster ceph --id 0 --setuser ceph --setgroup ceph Behind the scenes, ceph-disk is getting these devices ready and associated with the cluster as OSD 0, if you've tried this many times already I am suspicious on the same OSD id being used or drives being polluted. Seems like you are using filestore as well, so sdb1 will probably be your data and mounted at /var/lib/ceph/osd/ceph-0 and sdb2 your journal, linked at /var/lib/ceph/osd/ceph-0/journal Make sure those are mounted and linked properly. > BTW, my os is dedian with 4.4 kernel. > Thanks. > > > On Wed, Nov 15, 2017 at 8:24 PM, Wei Jin <wjin.cn@gmail.com> wrote: >> Hi, List, >> >> My machine has 12 SSDs disk, and I use ceph-deploy to deploy them. But for >> some machine/disks,it failed to start osd. >> I tried many times, some success but others failed. But there is no error >> info. >> Following is ceph-deploy log for one disk: >> >> >> root@n10-075-012:~# ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb >> [ceph_deploy.conf][DEBUG ] found configuration file at: >> /root/.cephdeploy.conf >> [ceph_deploy.cli][INFO ] Invoked (1.5.39): /usr/bin/ceph-deploy osd create >> --zap-disk n10-075-094:sdb:sdb >> [ceph_deploy.cli][INFO ] ceph-deploy options: >> [ceph_deploy.cli][INFO ] username : None >> [ceph_deploy.cli][INFO ] block_db : None >> [ceph_deploy.cli][INFO ] disk : [('n10-075-094', >> '/dev/sdb', '/dev/sdb')] >> [ceph_deploy.cli][INFO ] dmcrypt : False >> [ceph_deploy.cli][INFO ] verbose : False >> [ceph_deploy.cli][INFO ] bluestore : None >> [ceph_deploy.cli][INFO ] block_wal : None >> [ceph_deploy.cli][INFO ] overwrite_conf : False >> [ceph_deploy.cli][INFO ] subcommand : create >> [ceph_deploy.cli][INFO ] dmcrypt_key_dir : >> /etc/ceph/dmcrypt-keys >> [ceph_deploy.cli][INFO ] quiet : False >> [ceph_deploy.cli][INFO ] cd_conf : >> <ceph_deploy.conf.cephdeploy.Conf object at 0x7f566b82a110> >> [ceph_deploy.cli][INFO ] cluster : ceph >> [ceph_deploy.cli][INFO ] fs_type : xfs >> [ceph_deploy.cli][INFO ] filestore : None >> [ceph_deploy.cli][INFO ] func : <function osd at >> 0x7f566ae9a938> >> [ceph_deploy.cli][INFO ] ceph_conf : None >> [ceph_deploy.cli][INFO ] default_release : False >> [ceph_deploy.cli][INFO ] zap_disk : True >> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks >> n10-075-094:/dev/sdb:/dev/sdb >> [n10-075-094][DEBUG ] connected to host: n10-075-094 >> [n10-075-094][DEBUG ] detect platform information from remote host >> [n10-075-094][DEBUG ] detect machine type >> [n10-075-094][DEBUG ] find the location of an executable >> [ceph_deploy.osd][INFO ] Distro info: debian 8.9 jessie >> [ceph_deploy.osd][DEBUG ] Deploying osd to n10-075-094 >> [n10-075-094][DEBUG ] write cluster configuration to >> /etc/ceph/{cluster}.conf >> [ceph_deploy.osd][DEBUG ] Preparing host n10-075-094 disk /dev/sdb journal >> /dev/sdb activate True >> [n10-075-094][DEBUG ] find the location of an executable >> [n10-075-094][INFO ] Running command: /usr/sbin/ceph-disk -v prepare >> --zap-disk --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdb >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd >> --cluster=ceph --show-config-value=fsid >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd >> --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log >> --cluster ceph --setuser ceph --setgroup ceph >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd >> --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log >> --cluster ceph --setuser ceph --setgroup ceph >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd >> --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log >> --cluster ceph --setuser ceph --setgroup ceph >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd >> --cluster=ceph --show-config-value=osd_journal_size >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is >> /sys/dev/block/8:17/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is >> /sys/dev/block/8:18/dm/uuid >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf >> --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf >> --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs >> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf >> --cluster=ceph --name=osd. --lookup osd_mount_options_xfs >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] zap: Zapping partition table on /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk >> --zap-all -- /dev/sdb >> [n10-075-094][WARNIN] Caution: invalid backup GPT header, but valid main >> header; regenerating >> [n10-075-094][WARNIN] backup header from main header. >> [n10-075-094][WARNIN] >> [n10-075-094][WARNIN] Warning! Main and backup partition tables differ! Use >> the 'c' and 'e' options >> [n10-075-094][WARNIN] on the recovery & transformation menu to examine the >> two tables. >> [n10-075-094][WARNIN] >> [n10-075-094][WARNIN] Warning! One or more CRCs don't match. You should >> repair the disk! >> [n10-075-094][WARNIN] >> [n10-075-094][DEBUG ] >> **************************************************************************** >> [n10-075-094][DEBUG ] Caution: Found protective or hybrid MBR and corrupt >> GPT. Using GPT, but disk >> [n10-075-094][DEBUG ] verification and recovery are STRONGLY recommended. >> [n10-075-094][DEBUG ] >> **************************************************************************** >> [n10-075-094][DEBUG ] GPT data structures destroyed! You may now partition >> the disk using fdisk or >> [n10-075-094][DEBUG ] other utilities. >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk >> --clear --mbrtogpt -- /dev/sdb >> [n10-075-094][DEBUG ] Creating new GPT entries. >> [n10-075-094][DEBUG ] The operation has completed successfully. >> [n10-075-094][WARNIN] update_partition: Calling partprobe on zapped device >> /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb >> /sbin/partprobe /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] ptype_tobe_for_name: name = journal >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] create_partition: Creating journal partition num 2 >> size 40960 on /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk >> --new=2:0:+40960M --change-name=2:ceph journal >> --partition-guid=2:b7f01f38-f0d5-45ba-a913-ac7242820aed >> --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb >> [n10-075-094][DEBUG ] Setting name! >> [n10-075-094][DEBUG ] partNum is 1 >> [n10-075-094][DEBUG ] REALLY setting name! >> [n10-075-094][DEBUG ] The operation has completed successfully. >> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device >> /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb >> /sbin/partprobe /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is >> /sys/dev/block/8:18/dm/uuid >> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition >> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed >> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition >> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] set_data_partition: Creating osd partition on /dev/sdb >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] ptype_tobe_for_name: name = data >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] create_partition: Creating data partition num 1 size 0 >> on /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk >> --largest-new=1 --change-name=1:ceph data >> --partition-guid=1:6e984e11-1b4b-4741-9080-131f13a73daa >> --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb >> [n10-075-094][DEBUG ] Setting name! >> [n10-075-094][DEBUG ] partNum is 0 >> [n10-075-094][DEBUG ] REALLY setting name! >> [n10-075-094][DEBUG ] The operation has completed successfully. >> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device >> /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb >> /sbin/partprobe /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is >> /sys/dev/block/8:17/dm/uuid >> [n10-075-094][WARNIN] populate_data_path_device: Creating xfs fs on >> /dev/sdb1 >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs >> -f -i size=2048 -- /dev/sdb1 >> [n10-075-094][DEBUG ] meta-data=/dev/sdb1 isize=2048 >> agcount=4, agsize=55984277 blks >> [n10-075-094][DEBUG ] = sectsz=4096 attr=2, >> projid32bit=1 >> [n10-075-094][DEBUG ] = crc=0 finobt=0 >> [n10-075-094][DEBUG ] data = bsize=4096 >> blocks=223937105, imaxpct=25 >> [n10-075-094][DEBUG ] = sunit=0 swidth=0 >> blks >> [n10-075-094][DEBUG ] naming =version 2 bsize=4096 >> ascii-ci=0 ftype=0 >> [n10-075-094][DEBUG ] log =internal log bsize=4096 >> blocks=109344, version=2 >> [n10-075-094][DEBUG ] = sectsz=4096 sunit=1 >> blks, lazy-count=1 >> [n10-075-094][DEBUG ] realtime =none extsz=4096 >> blocks=0, rtextents=0 >> [n10-075-094][WARNIN] mount: Mounting /dev/sdb1 on >> /var/lib/ceph/tmp/mnt.N8D5Kd with options >> rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota >> [n10-075-094][WARNIN] command_check_call: Running command: /bin/mount -t xfs >> -o rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota -- >> /dev/sdb1 /var/lib/ceph/tmp/mnt.N8D5Kd >> [n10-075-094][WARNIN] populate_data_path: Preparing osd data dir >> /var/lib/ceph/tmp/mnt.N8D5Kd >> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph >> /var/lib/ceph/tmp/mnt.N8D5Kd/ceph_fsid.11531.tmp >> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph >> /var/lib/ceph/tmp/mnt.N8D5Kd/fsid.11531.tmp >> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph >> /var/lib/ceph/tmp/mnt.N8D5Kd/magic.11531.tmp >> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph >> /var/lib/ceph/tmp/mnt.N8D5Kd/journal_uuid.11531.tmp >> [n10-075-094][WARNIN] adjust_symlink: Creating symlink >> /var/lib/ceph/tmp/mnt.N8D5Kd/journal -> >> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed >> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph >> /var/lib/ceph/tmp/mnt.N8D5Kd >> [n10-075-094][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.N8D5Kd >> [n10-075-094][WARNIN] command_check_call: Running command: /bin/umount -- >> /var/lib/ceph/tmp/mnt.N8D5Kd >> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is >> /sys/dev/block/8:16/dm/uuid >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk >> --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb >> [n10-075-094][DEBUG ] Warning: The kernel is still using the old partition >> table. >> [n10-075-094][DEBUG ] The new table will be used at the next reboot. >> [n10-075-094][DEBUG ] The operation has completed successfully. >> [n10-075-094][WARNIN] update_partition: Calling partprobe on prepared device >> /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb >> /sbin/partprobe /dev/sdb >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> settle --timeout=600 >> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm >> trigger --action=add --sysname-match sdb1 >> [n10-075-094][INFO ] Running command: systemctl enable ceph.target >> [n10-075-094][INFO ] checking OSD status... >> [n10-075-094][DEBUG ] find the location of an executable >> [n10-075-094][INFO ] Running command: /usr/bin/ceph --cluster=ceph osd stat >> --format=json >> [ceph_deploy.osd][DEBUG ] Host n10-075-094 is now ready for osd use. > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-11-16 13:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-09 9:43 Osd failure detection Wei Jin
2017-11-09 11:03 ` David Disseldorp
2017-11-09 12:13 ` Wei Jin
2017-11-09 12:25 ` Piotr Dałek
2017-11-09 14:19 ` Sage Weil
[not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-15 12:24 ` ceph-deploy failed to deploy osd randomly Wei Jin
2017-11-15 13:31 ` Wei Jin
[not found] ` <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-16 13:08 ` Alfredo Deza
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.