Osd failure detection

All of lore.kernel.org
 help / color / mirror / Atom feed

* Osd failure detection
@ 2017-11-09  9:43 Wei Jin
  2017-11-09 11:03 ` David Disseldorp
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Wei Jin @ 2017-11-09  9:43 UTC (permalink / raw)
  To: ceph-devel

Hi, List,

From Luminous release, I noticed following information:

 "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire.  This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running."

This is critical and we have no plan to upgrade to Luminous so far.
Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself.
Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Osd failure detection
  2017-11-09  9:43 Osd failure detection Wei Jin
@ 2017-11-09 11:03 ` David Disseldorp
  2017-11-09 12:13   ` Wei Jin
  2017-11-09 14:19 ` Sage Weil
       [not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 8+ messages in thread
From: David Disseldorp @ 2017-11-09 11:03 UTC (permalink / raw)
  To: Wei Jin; +Cc: ceph-devel

On Thu, 9 Nov 2017 17:43:04 +0800, Wei Jin wrote:

> Hi, List,
> 
> From Luminous release, I noticed following information:
> 
>  "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire.  This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running."

I assume you're referring to the ECONNREFUSED-fast-fail functionality
added by Piotr Dałek.

> This is critical and we have no plan to upgrade to Luminous so far.
> Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself.

It was backported to Jewel, alongside a bunch of other async messenger
fixes, and submitted via https://github.com/ceph/ceph/pull/13212 . IIRC,
there's still a small async messenger leak blocking the PR.

Cheers, David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Osd failure detection
  2017-11-09 11:03 ` David Disseldorp
@ 2017-11-09 12:13   ` Wei Jin
  2017-11-09 12:25     ` Piotr Dałek
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Jin @ 2017-11-09 12:13 UTC (permalink / raw)
  To: David Disseldorp; +Cc: ceph-devel

On Thu, Nov 9, 2017 at 7:03 PM, David Disseldorp <ddiss@suse.de> wrote:
> On Thu, 9 Nov 2017 17:43:04 +0800, Wei Jin wrote:
>
>> Hi, List,
>>
>> From Luminous release, I noticed following information:
>>
>>  "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire.  This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running."
>
> I assume you're referring to the ECONNREFUSED-fast-fail functionality
> added by Piotr Dałek.
>

Exactly.

>> This is critical and we have no plan to upgrade to Luminous so far.
>> Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself.
>
> It was backported to Jewel, alongside a bunch of other async messenger
> fixes, and submitted via https://github.com/ceph/ceph/pull/13212 . IIRC,
> there's still a small async messenger leak blocking the PR.
>

Yeah. It is still open and marked as DNM.
And there are some issues with async messenger. As async is not the
default one, why not just make it available for simple messenger?

> Cheers, David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Osd failure detection
  2017-11-09 12:13   ` Wei Jin
@ 2017-11-09 12:25     ` Piotr Dałek
  0 siblings, 0 replies; 8+ messages in thread
From: Piotr Dałek @ 2017-11-09 12:25 UTC (permalink / raw)
  To: Wei Jin; +Cc: David Disseldorp, ceph-devel

On Thu, Nov 09, 2017 at 08:13:57PM +0800, Wei Jin wrote:
> On Thu, Nov 9, 2017 at 7:03 PM, David Disseldorp <ddiss@suse.de> wrote:
> > On Thu, 9 Nov 2017 17:43:04 +0800, Wei Jin wrote:
> >> This is critical and we have no plan to upgrade to Luminous so far.
> >> Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself.
> >
> > It was backported to Jewel, alongside a bunch of other async messenger
> > fixes, and submitted via https://github.com/ceph/ceph/pull/13212 . IIRC,
> > there's still a small async messenger leak blocking the PR.
> >
> 
> Yeah. It is still open and marked as DNM.
> And there are some issues with async messenger. As async is not the
> default one, why not just make it available for simple messenger?

Because the change is so vast that it would affect async messenger anyway.
In fact I didn't bother backporting it myself because I was certain that
maintainers/backporters wouldn't give it a green light anyway.

-- 
Piotr Dałek
branch@predictor.org.pl
http://blog.predictor.org.pl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Osd failure detection
  2017-11-09  9:43 Osd failure detection Wei Jin
  2017-11-09 11:03 ` David Disseldorp
@ 2017-11-09 14:19 ` Sage Weil
       [not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 0 replies; 8+ messages in thread
From: Sage Weil @ 2017-11-09 14:19 UTC (permalink / raw)
  To: Wei Jin; +Cc: ceph-devel

On Thu, 9 Nov 2017, Wei Jin wrote:
> Hi, List,
> 
> From Luminous release, I noticed following information:
> 
>  "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire.  This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running."
> 
> This is critical and we have no plan to upgrade to Luminous so far.
> Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself.

No plans.  The original series merged here, 
a033dc6f5b4cef357db6f5951062d680e880ba0e, but there were likely many 
follow-on patches as well that you'll need to find.

sage


^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* ceph-deploy failed to deploy osd randomly
       [not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-11-15 12:24   ` Wei Jin
  2017-11-15 13:31     ` Wei Jin
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Jin @ 2017-11-15 12:24 UTC (permalink / raw)
  To: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-Qp0mS5GaXlQ

[-- Attachment #1.1: Type: text/plain, Size: 13407 bytes --]

Hi, List,

My machine has 12 SSDs disk, and I use ceph-deploy to deploy them. But for some machine/disks，it failed to start osd.
I tried many times, some success but others failed. But there is no error info.
Following is ceph-deploy log for one disk:

root@n10-075-012:~# ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.39): /usr/bin/ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  block_db                      : None
[ceph_deploy.cli][INFO  ]  disk                          : [('n10-075-094', '/dev/sdb', '/dev/sdb')]
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  block_wal                     : None
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf object at 0x7f566b82a110>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  filestore                     : None
[ceph_deploy.cli][INFO  ]  func                          : <function osd at 0x7f566ae9a938>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  zap_disk                      : True
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks n10-075-094:/dev/sdb:/dev/sdb
[n10-075-094][DEBUG ] connected to host: n10-075-094
[n10-075-094][DEBUG ] detect platform information from remote host
[n10-075-094][DEBUG ] detect machine type
[n10-075-094][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: debian 8.9 jessie
[ceph_deploy.osd][DEBUG ] Deploying osd to n10-075-094
[n10-075-094][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host n10-075-094 disk /dev/sdb journal /dev/sdb activate True
[n10-075-094][DEBUG ] find the location of an executable
[n10-075-094][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare --zap-disk --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdb
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is /sys/dev/block/8:18/dm/uuid
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] zap: Zapping partition table on /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --zap-all -- /dev/sdb
[n10-075-094][WARNIN] Caution: invalid backup GPT header, but valid main header; regenerating
[n10-075-094][WARNIN] backup header from main header.
[n10-075-094][WARNIN]
[n10-075-094][WARNIN] Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
[n10-075-094][WARNIN] on the recovery & transformation menu to examine the two tables.
[n10-075-094][WARNIN]
[n10-075-094][WARNIN] Warning! One or more CRCs don't match. You should repair the disk!
[n10-075-094][WARNIN]
[n10-075-094][DEBUG ] ****************************************************************************
[n10-075-094][DEBUG ] Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
[n10-075-094][DEBUG ] verification and recovery are STRONGLY recommended.
[n10-075-094][DEBUG ] ****************************************************************************
[n10-075-094][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
[n10-075-094][DEBUG ] other utilities.
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sdb
[n10-075-094][DEBUG ] Creating new GPT entries.
[n10-075-094][DEBUG ] The operation has completed successfully.
[n10-075-094][WARNIN] update_partition: Calling partprobe on zapped device /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] ptype_tobe_for_name: name = journal
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] create_partition: Creating journal partition num 2 size 40960 on /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --new=2:0:+40960M --change-name=2:ceph journal --partition-guid=2:b7f01f38-f0d5-45ba-a913-ac7242820aed --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb
[n10-075-094][DEBUG ] Setting name!
[n10-075-094][DEBUG ] partNum is 1
[n10-075-094][DEBUG ] REALLY setting name!
[n10-075-094][DEBUG ] The operation has completed successfully.
[n10-075-094][WARNIN] update_partition: Calling partprobe on created device /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is /sys/dev/block/8:18/dm/uuid
[n10-075-094][WARNIN] prepare_device: Journal is GPT partition /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
[n10-075-094][WARNIN] prepare_device: Journal is GPT partition /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] set_data_partition: Creating osd partition on /dev/sdb
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] ptype_tobe_for_name: name = data
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] create_partition: Creating data partition num 1 size 0 on /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:6e984e11-1b4b-4741-9080-131f13a73daa --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb
[n10-075-094][DEBUG ] Setting name!
[n10-075-094][DEBUG ] partNum is 0
[n10-075-094][DEBUG ] REALLY setting name!
[n10-075-094][DEBUG ] The operation has completed successfully.
[n10-075-094][WARNIN] update_partition: Calling partprobe on created device /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
[n10-075-094][WARNIN] populate_data_path_device: Creating xfs fs on /dev/sdb1
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdb1
[n10-075-094][DEBUG ] meta-data=/dev/sdb1              isize=2048   agcount=4, agsize=55984277 blks
[n10-075-094][DEBUG ]          =                       sectsz=4096  attr=2, projid32bit=1
[n10-075-094][DEBUG ]          =                       crc=0        finobt=0
[n10-075-094][DEBUG ] data     =                       bsize=4096   blocks=223937105, imaxpct=25
[n10-075-094][DEBUG ]          =                       sunit=0      swidth=0 blks
[n10-075-094][DEBUG ] naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
[n10-075-094][DEBUG ] log      =internal log           bsize=4096   blocks=109344, version=2
[n10-075-094][DEBUG ]          =                       sectsz=4096  sunit=1 blks, lazy-count=1
[n10-075-094][DEBUG ] realtime =none                   extsz=4096   blocks=0, rtextents=0
[n10-075-094][WARNIN] mount: Mounting /dev/sdb1 on /var/lib/ceph/tmp/mnt.N8D5Kd with options rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota
[n10-075-094][WARNIN] command_check_call: Running command: /bin/mount -t xfs -o rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota -- /dev/sdb1 /var/lib/ceph/tmp/mnt.N8D5Kd
[n10-075-094][WARNIN] populate_data_path: Preparing osd data dir /var/lib/ceph/tmp/mnt.N8D5Kd
[n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/ceph_fsid.11531.tmp
[n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/fsid.11531.tmp
[n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/magic.11531.tmp
[n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd/journal_uuid.11531.tmp
[n10-075-094][WARNIN] adjust_symlink: Creating symlink /var/lib/ceph/tmp/mnt.N8D5Kd/journal -> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
[n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.N8D5Kd
[n10-075-094][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.N8D5Kd
[n10-075-094][WARNIN] command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.N8D5Kd
[n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb
[n10-075-094][DEBUG ] Warning: The kernel is still using the old partition table.
[n10-075-094][DEBUG ] The new table will be used at the next reboot.
[n10-075-094][DEBUG ] The operation has completed successfully.
[n10-075-094][WARNIN] update_partition: Calling partprobe on prepared device /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb /sbin/partprobe /dev/sdb
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm trigger --action=add --sysname-match sdb1
[n10-075-094][INFO  ] Running command: systemctl enable ceph.target
[n10-075-094][INFO  ] checking OSD status...
[n10-075-094][DEBUG ] find the location of an executable
[n10-075-094][INFO  ] Running command: /usr/bin/ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host n10-075-094 is now ready for osd use.

[-- Attachment #1.2: Type: text/html, Size: 18218 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ceph-deploy failed to deploy osd randomly
  2017-11-15 12:24   ` ceph-deploy failed to deploy osd randomly Wei Jin
@ 2017-11-15 13:31     ` Wei Jin
       [not found]       ` <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Jin @ 2017-11-15 13:31 UTC (permalink / raw)
  To: ceph-devel, ceph-users

I tried to do purge/purgedata and then redo the deploy command for a
few times, and it still fails to start osd.
And there is no error log, anyone know what's the problem?
BTW, my os is dedian with 4.4 kernel.
Thanks.


On Wed, Nov 15, 2017 at 8:24 PM, Wei Jin <wjin.cn@gmail.com> wrote:
> Hi, List,
>
> My machine has 12 SSDs disk, and I use ceph-deploy to deploy them. But for
> some machine/disks，it failed to start osd.
> I tried many times, some success but others failed. But there is no error
> info.
> Following is ceph-deploy log for one disk:
>
>
> root@n10-075-012:~# ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.39): /usr/bin/ceph-deploy osd create
> --zap-disk n10-075-094:sdb:sdb
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username                      : None
> [ceph_deploy.cli][INFO  ]  block_db                      : None
> [ceph_deploy.cli][INFO  ]  disk                          : [('n10-075-094',
> '/dev/sdb', '/dev/sdb')]
> [ceph_deploy.cli][INFO  ]  dmcrypt                       : False
> [ceph_deploy.cli][INFO  ]  verbose                       : False
> [ceph_deploy.cli][INFO  ]  bluestore                     : None
> [ceph_deploy.cli][INFO  ]  block_wal                     : None
> [ceph_deploy.cli][INFO  ]  overwrite_conf                : False
> [ceph_deploy.cli][INFO  ]  subcommand                    : create
> [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               :
> /etc/ceph/dmcrypt-keys
> [ceph_deploy.cli][INFO  ]  quiet                         : False
> [ceph_deploy.cli][INFO  ]  cd_conf                       :
> <ceph_deploy.conf.cephdeploy.Conf object at 0x7f566b82a110>
> [ceph_deploy.cli][INFO  ]  cluster                       : ceph
> [ceph_deploy.cli][INFO  ]  fs_type                       : xfs
> [ceph_deploy.cli][INFO  ]  filestore                     : None
> [ceph_deploy.cli][INFO  ]  func                          : <function osd at
> 0x7f566ae9a938>
> [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
> [ceph_deploy.cli][INFO  ]  default_release               : False
> [ceph_deploy.cli][INFO  ]  zap_disk                      : True
> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
> n10-075-094:/dev/sdb:/dev/sdb
> [n10-075-094][DEBUG ] connected to host: n10-075-094
> [n10-075-094][DEBUG ] detect platform information from remote host
> [n10-075-094][DEBUG ] detect machine type
> [n10-075-094][DEBUG ] find the location of an executable
> [ceph_deploy.osd][INFO  ] Distro info: debian 8.9 jessie
> [ceph_deploy.osd][DEBUG ] Deploying osd to n10-075-094
> [n10-075-094][DEBUG ] write cluster configuration to
> /etc/ceph/{cluster}.conf
> [ceph_deploy.osd][DEBUG ] Preparing host n10-075-094 disk /dev/sdb journal
> /dev/sdb activate True
> [n10-075-094][DEBUG ] find the location of an executable
> [n10-075-094][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare
> --zap-disk --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdb
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
> --cluster=ceph --show-config-value=fsid
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
> --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
> --cluster ceph --setuser ceph --setgroup ceph
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
> --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
> --cluster ceph --setuser ceph --setgroup ceph
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
> --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
> --cluster ceph --setuser ceph --setgroup ceph
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
> --cluster=ceph --show-config-value=osd_journal_size
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is
> /sys/dev/block/8:17/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is
> /sys/dev/block/8:18/dm/uuid
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
> --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
> --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
> --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] zap: Zapping partition table on /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
> --zap-all -- /dev/sdb
> [n10-075-094][WARNIN] Caution: invalid backup GPT header, but valid main
> header; regenerating
> [n10-075-094][WARNIN] backup header from main header.
> [n10-075-094][WARNIN]
> [n10-075-094][WARNIN] Warning! Main and backup partition tables differ! Use
> the 'c' and 'e' options
> [n10-075-094][WARNIN] on the recovery & transformation menu to examine the
> two tables.
> [n10-075-094][WARNIN]
> [n10-075-094][WARNIN] Warning! One or more CRCs don't match. You should
> repair the disk!
> [n10-075-094][WARNIN]
> [n10-075-094][DEBUG ]
> ****************************************************************************
> [n10-075-094][DEBUG ] Caution: Found protective or hybrid MBR and corrupt
> GPT. Using GPT, but disk
> [n10-075-094][DEBUG ] verification and recovery are STRONGLY recommended.
> [n10-075-094][DEBUG ]
> ****************************************************************************
> [n10-075-094][DEBUG ] GPT data structures destroyed! You may now partition
> the disk using fdisk or
> [n10-075-094][DEBUG ] other utilities.
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
> --clear --mbrtogpt -- /dev/sdb
> [n10-075-094][DEBUG ] Creating new GPT entries.
> [n10-075-094][DEBUG ] The operation has completed successfully.
> [n10-075-094][WARNIN] update_partition: Calling partprobe on zapped device
> /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
> /sbin/partprobe /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] ptype_tobe_for_name: name = journal
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] create_partition: Creating journal partition num 2
> size 40960 on /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
> --new=2:0:+40960M --change-name=2:ceph journal
> --partition-guid=2:b7f01f38-f0d5-45ba-a913-ac7242820aed
> --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb
> [n10-075-094][DEBUG ] Setting name!
> [n10-075-094][DEBUG ] partNum is 1
> [n10-075-094][DEBUG ] REALLY setting name!
> [n10-075-094][DEBUG ] The operation has completed successfully.
> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device
> /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
> /sbin/partprobe /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is
> /sys/dev/block/8:18/dm/uuid
> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition
> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition
> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] set_data_partition: Creating osd partition on /dev/sdb
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] ptype_tobe_for_name: name = data
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] create_partition: Creating data partition num 1 size 0
> on /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
> --largest-new=1 --change-name=1:ceph data
> --partition-guid=1:6e984e11-1b4b-4741-9080-131f13a73daa
> --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb
> [n10-075-094][DEBUG ] Setting name!
> [n10-075-094][DEBUG ] partNum is 0
> [n10-075-094][DEBUG ] REALLY setting name!
> [n10-075-094][DEBUG ] The operation has completed successfully.
> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device
> /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
> /sbin/partprobe /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is
> /sys/dev/block/8:17/dm/uuid
> [n10-075-094][WARNIN] populate_data_path_device: Creating xfs fs on
> /dev/sdb1
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs
> -f -i size=2048 -- /dev/sdb1
> [n10-075-094][DEBUG ] meta-data=/dev/sdb1              isize=2048
> agcount=4, agsize=55984277 blks
> [n10-075-094][DEBUG ]          =                       sectsz=4096  attr=2,
> projid32bit=1
> [n10-075-094][DEBUG ]          =                       crc=0        finobt=0
> [n10-075-094][DEBUG ] data     =                       bsize=4096
> blocks=223937105, imaxpct=25
> [n10-075-094][DEBUG ]          =                       sunit=0      swidth=0
> blks
> [n10-075-094][DEBUG ] naming   =version 2              bsize=4096
> ascii-ci=0 ftype=0
> [n10-075-094][DEBUG ] log      =internal log           bsize=4096
> blocks=109344, version=2
> [n10-075-094][DEBUG ]          =                       sectsz=4096  sunit=1
> blks, lazy-count=1
> [n10-075-094][DEBUG ] realtime =none                   extsz=4096
> blocks=0, rtextents=0
> [n10-075-094][WARNIN] mount: Mounting /dev/sdb1 on
> /var/lib/ceph/tmp/mnt.N8D5Kd with options
> rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota
> [n10-075-094][WARNIN] command_check_call: Running command: /bin/mount -t xfs
> -o rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota --
> /dev/sdb1 /var/lib/ceph/tmp/mnt.N8D5Kd
> [n10-075-094][WARNIN] populate_data_path: Preparing osd data dir
> /var/lib/ceph/tmp/mnt.N8D5Kd
> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
> /var/lib/ceph/tmp/mnt.N8D5Kd/ceph_fsid.11531.tmp
> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
> /var/lib/ceph/tmp/mnt.N8D5Kd/fsid.11531.tmp
> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
> /var/lib/ceph/tmp/mnt.N8D5Kd/magic.11531.tmp
> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
> /var/lib/ceph/tmp/mnt.N8D5Kd/journal_uuid.11531.tmp
> [n10-075-094][WARNIN] adjust_symlink: Creating symlink
> /var/lib/ceph/tmp/mnt.N8D5Kd/journal ->
> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
> /var/lib/ceph/tmp/mnt.N8D5Kd
> [n10-075-094][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.N8D5Kd
> [n10-075-094][WARNIN] command_check_call: Running command: /bin/umount --
> /var/lib/ceph/tmp/mnt.N8D5Kd
> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
> /sys/dev/block/8:16/dm/uuid
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
> --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb
> [n10-075-094][DEBUG ] Warning: The kernel is still using the old partition
> table.
> [n10-075-094][DEBUG ] The new table will be used at the next reboot.
> [n10-075-094][DEBUG ] The operation has completed successfully.
> [n10-075-094][WARNIN] update_partition: Calling partprobe on prepared device
> /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
> /sbin/partprobe /dev/sdb
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> settle --timeout=600
> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
> trigger --action=add --sysname-match sdb1
> [n10-075-094][INFO  ] Running command: systemctl enable ceph.target
> [n10-075-094][INFO  ] checking OSD status...
> [n10-075-094][DEBUG ] find the location of an executable
> [n10-075-094][INFO  ] Running command: /usr/bin/ceph --cluster=ceph osd stat
> --format=json
> [ceph_deploy.osd][DEBUG ] Host n10-075-094 is now ready for osd use.

^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: ceph-deploy failed to deploy osd randomly
       [not found]       ` <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-16 13:08         ` Alfredo Deza
  0 siblings, 0 replies; 8+ messages in thread
From: Alfredo Deza @ 2017-11-16 13:08 UTC (permalink / raw)
  To: Wei Jin; +Cc: ceph-devel, ceph-users-Qp0mS5GaXlQ@public.gmane.org

On Wed, Nov 15, 2017 at 8:31 AM, Wei Jin <wjin.cn@gmail.com> wrote:
> I tried to do purge/purgedata and then redo the deploy command for a
> few times, and it still fails to start osd.
> And there is no error log, anyone know what's the problem?

Seems like this is OSD 0, right? Have you checked the startup errors
on /var/log/ceph/ ? Or by checking the output of the daemon with
systemctl?

If nothing is working still, maybe try running the OSD in the
foreground with (assuming OSD 0):

    /usr/bin/ceph-osd --debug_osd 20 -d -f --cluster ceph --id 0
--setuser ceph --setgroup ceph

Behind the scenes, ceph-disk is getting these devices ready and
associated with the cluster as OSD 0, if you've tried this many times
already I am suspicious
on the same OSD id being used or drives being polluted.

Seems like you are using filestore as well, so sdb1 will probably be
your data and mounted at /var/lib/ceph/osd/ceph-0 and sdb2 your
journal, linked at /var/lib/ceph/osd/ceph-0/journal

Make sure those are mounted and linked properly.

> BTW, my os is dedian with 4.4 kernel.
> Thanks.
>
>
> On Wed, Nov 15, 2017 at 8:24 PM, Wei Jin <wjin.cn@gmail.com> wrote:
>> Hi, List,
>>
>> My machine has 12 SSDs disk, and I use ceph-deploy to deploy them. But for
>> some machine/disks，it failed to start osd.
>> I tried many times, some success but others failed. But there is no error
>> info.
>> Following is ceph-deploy log for one disk:
>>
>>
>> root@n10-075-012:~# ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb
>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /root/.cephdeploy.conf
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.39): /usr/bin/ceph-deploy osd create
>> --zap-disk n10-075-094:sdb:sdb
>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> [ceph_deploy.cli][INFO  ]  username                      : None
>> [ceph_deploy.cli][INFO  ]  block_db                      : None
>> [ceph_deploy.cli][INFO  ]  disk                          : [('n10-075-094',
>> '/dev/sdb', '/dev/sdb')]
>> [ceph_deploy.cli][INFO  ]  dmcrypt                       : False
>> [ceph_deploy.cli][INFO  ]  verbose                       : False
>> [ceph_deploy.cli][INFO  ]  bluestore                     : None
>> [ceph_deploy.cli][INFO  ]  block_wal                     : None
>> [ceph_deploy.cli][INFO  ]  overwrite_conf                : False
>> [ceph_deploy.cli][INFO  ]  subcommand                    : create
>> [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               :
>> /etc/ceph/dmcrypt-keys
>> [ceph_deploy.cli][INFO  ]  quiet                         : False
>> [ceph_deploy.cli][INFO  ]  cd_conf                       :
>> <ceph_deploy.conf.cephdeploy.Conf object at 0x7f566b82a110>
>> [ceph_deploy.cli][INFO  ]  cluster                       : ceph
>> [ceph_deploy.cli][INFO  ]  fs_type                       : xfs
>> [ceph_deploy.cli][INFO  ]  filestore                     : None
>> [ceph_deploy.cli][INFO  ]  func                          : <function osd at
>> 0x7f566ae9a938>
>> [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
>> [ceph_deploy.cli][INFO  ]  default_release               : False
>> [ceph_deploy.cli][INFO  ]  zap_disk                      : True
>> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
>> n10-075-094:/dev/sdb:/dev/sdb
>> [n10-075-094][DEBUG ] connected to host: n10-075-094
>> [n10-075-094][DEBUG ] detect platform information from remote host
>> [n10-075-094][DEBUG ] detect machine type
>> [n10-075-094][DEBUG ] find the location of an executable
>> [ceph_deploy.osd][INFO  ] Distro info: debian 8.9 jessie
>> [ceph_deploy.osd][DEBUG ] Deploying osd to n10-075-094
>> [n10-075-094][DEBUG ] write cluster configuration to
>> /etc/ceph/{cluster}.conf
>> [ceph_deploy.osd][DEBUG ] Preparing host n10-075-094 disk /dev/sdb journal
>> /dev/sdb activate True
>> [n10-075-094][DEBUG ] find the location of an executable
>> [n10-075-094][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare
>> --zap-disk --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdb
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --cluster=ceph --show-config-value=fsid
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
>> --cluster ceph --setuser ceph --setgroup ceph
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
>> --cluster ceph --setuser ceph --setgroup ceph
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
>> --cluster ceph --setuser ceph --setgroup ceph
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --cluster=ceph --show-config-value=osd_journal_size
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is
>> /sys/dev/block/8:17/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is
>> /sys/dev/block/8:18/dm/uuid
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
>> --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
>> --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
>> --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] zap: Zapping partition table on /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --zap-all -- /dev/sdb
>> [n10-075-094][WARNIN] Caution: invalid backup GPT header, but valid main
>> header; regenerating
>> [n10-075-094][WARNIN] backup header from main header.
>> [n10-075-094][WARNIN]
>> [n10-075-094][WARNIN] Warning! Main and backup partition tables differ! Use
>> the 'c' and 'e' options
>> [n10-075-094][WARNIN] on the recovery & transformation menu to examine the
>> two tables.
>> [n10-075-094][WARNIN]
>> [n10-075-094][WARNIN] Warning! One or more CRCs don't match. You should
>> repair the disk!
>> [n10-075-094][WARNIN]
>> [n10-075-094][DEBUG ]
>> ****************************************************************************
>> [n10-075-094][DEBUG ] Caution: Found protective or hybrid MBR and corrupt
>> GPT. Using GPT, but disk
>> [n10-075-094][DEBUG ] verification and recovery are STRONGLY recommended.
>> [n10-075-094][DEBUG ]
>> ****************************************************************************
>> [n10-075-094][DEBUG ] GPT data structures destroyed! You may now partition
>> the disk using fdisk or
>> [n10-075-094][DEBUG ] other utilities.
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --clear --mbrtogpt -- /dev/sdb
>> [n10-075-094][DEBUG ] Creating new GPT entries.
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on zapped device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] ptype_tobe_for_name: name = journal
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] create_partition: Creating journal partition num 2
>> size 40960 on /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --new=2:0:+40960M --change-name=2:ceph journal
>> --partition-guid=2:b7f01f38-f0d5-45ba-a913-ac7242820aed
>> --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb
>> [n10-075-094][DEBUG ] Setting name!
>> [n10-075-094][DEBUG ] partNum is 1
>> [n10-075-094][DEBUG ] REALLY setting name!
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is
>> /sys/dev/block/8:18/dm/uuid
>> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition
>> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
>> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition
>> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] set_data_partition: Creating osd partition on /dev/sdb
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] ptype_tobe_for_name: name = data
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] create_partition: Creating data partition num 1 size 0
>> on /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --largest-new=1 --change-name=1:ceph data
>> --partition-guid=1:6e984e11-1b4b-4741-9080-131f13a73daa
>> --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb
>> [n10-075-094][DEBUG ] Setting name!
>> [n10-075-094][DEBUG ] partNum is 0
>> [n10-075-094][DEBUG ] REALLY setting name!
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is
>> /sys/dev/block/8:17/dm/uuid
>> [n10-075-094][WARNIN] populate_data_path_device: Creating xfs fs on
>> /dev/sdb1
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs
>> -f -i size=2048 -- /dev/sdb1
>> [n10-075-094][DEBUG ] meta-data=/dev/sdb1              isize=2048
>> agcount=4, agsize=55984277 blks
>> [n10-075-094][DEBUG ]          =                       sectsz=4096  attr=2,
>> projid32bit=1
>> [n10-075-094][DEBUG ]          =                       crc=0        finobt=0
>> [n10-075-094][DEBUG ] data     =                       bsize=4096
>> blocks=223937105, imaxpct=25
>> [n10-075-094][DEBUG ]          =                       sunit=0      swidth=0
>> blks
>> [n10-075-094][DEBUG ] naming   =version 2              bsize=4096
>> ascii-ci=0 ftype=0
>> [n10-075-094][DEBUG ] log      =internal log           bsize=4096
>> blocks=109344, version=2
>> [n10-075-094][DEBUG ]          =                       sectsz=4096  sunit=1
>> blks, lazy-count=1
>> [n10-075-094][DEBUG ] realtime =none                   extsz=4096
>> blocks=0, rtextents=0
>> [n10-075-094][WARNIN] mount: Mounting /dev/sdb1 on
>> /var/lib/ceph/tmp/mnt.N8D5Kd with options
>> rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota
>> [n10-075-094][WARNIN] command_check_call: Running command: /bin/mount -t xfs
>> -o rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota --
>> /dev/sdb1 /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] populate_data_path: Preparing osd data dir
>> /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/ceph_fsid.11531.tmp
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/fsid.11531.tmp
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/magic.11531.tmp
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/journal_uuid.11531.tmp
>> [n10-075-094][WARNIN] adjust_symlink: Creating symlink
>> /var/lib/ceph/tmp/mnt.N8D5Kd/journal ->
>> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] command_check_call: Running command: /bin/umount --
>> /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb
>> [n10-075-094][DEBUG ] Warning: The kernel is still using the old partition
>> table.
>> [n10-075-094][DEBUG ] The new table will be used at the next reboot.
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on prepared device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> trigger --action=add --sysname-match sdb1
>> [n10-075-094][INFO  ] Running command: systemctl enable ceph.target
>> [n10-075-094][INFO  ] checking OSD status...
>> [n10-075-094][DEBUG ] find the location of an executable
>> [n10-075-094][INFO  ] Running command: /usr/bin/ceph --cluster=ceph osd stat
>> --format=json
>> [ceph_deploy.osd][DEBUG ] Host n10-075-094 is now ready for osd use.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-16 13:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-09  9:43 Osd failure detection Wei Jin
2017-11-09 11:03 ` David Disseldorp
2017-11-09 12:13   ` Wei Jin
2017-11-09 12:25     ` Piotr Dałek
2017-11-09 14:19 ` Sage Weil
     [not found] ` <A80E9066-6266-49E2-9BD7-137E8136D6B9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-15 12:24   ` ceph-deploy failed to deploy osd randomly Wei Jin
2017-11-15 13:31     ` Wei Jin
     [not found]       ` <CAPpSHbUxYxekAOib+jE7y+7nQE44GkBwbrWN3FFaUh2Ev3ENvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-16 13:08         ` Alfredo Deza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.