From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Wilderoth Subject: Re: osd stops Date: Wed, 13 Apr 2011 14:12:51 +0200 (CEST) Message-ID: <174374365.14569.1302696771139.JavaMail.root@mail.linserv.se> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from 194-17-14-101.customer.telia.com ([194.17.14.101]:54946 "EHLO mail.linserv.se" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1755673Ab1DMMTw convert rfc822-to-8bit (ORCPT ); Wed, 13 Apr 2011 08:19:52 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Gregory Farnum Cc: ceph-devel@vger.kernel.org This is my config, ; ; Sample ceph ceph.conf file. ; ; This file defines cluster membership, the various locations ; that Ceph stores data, and any other runtime options. ; If a 'host' is defined for a daemon, the start/stop script will ; verify that it matches the hostname (or else ignore it). If it is ; not defined, it is assumed that the daemon is intended to start on ; the current host (e.g., in a setup with a startup.conf on each ; node). ; global [global] ; enable secure authentication auth supported =3D cephx keyring =3D /etc/ceph/keyring.bin ; allow ourselves to open a lot of files max open files =3D 131072 pid file =3D /var/run/ceph/$name.pid debug ms =3D 1 ; monitors ; You need at least one. You need at least three if you want to ; tolerate any node failures. Always create an odd number. [mon] mon data =3D /data/mon$id ; logging, for debugging monitor crashes, in order of ; their likelihood of being helpful :) ;debug ms =3D 1 ;debug mon =3D 20 ;debug paxos =3D 20 ;debug auth =3D 20 [mon0] host =3D ceph1 mon addr =3D 10.0.6.10:6789 [mon1] host =3D ceph2 mon addr =3D 10.0.6.11:6789 [mon2] host =3D ceph3 mon addr =3D 10.0.6.12:6789 ; mds ; You need at least one. Define two to get a standby. [mds] ; where the mds keeps it's secret encryption keys keyring =3D /etc/ceph/keyring.$name ; mds logging to debug issues. ;debug ms =3D 1 ;debug mds =3D 20 [mds0] host =3D ceph1 [mds1] host =3D ceph2 [mds2] host =3D ceph3 ; osd ; You need at least one. Two if you want data to be replicated. ; Define as many as you like. [osd] sudo =3D true ; This is where the btrfs volume will be mounted. osd data =3D /data/osd$id ; where the ods keeps it's secret encryption keys keyring =3D /etc/ceph/keyring.$name ; Ideally, make this a separate disk or partition. A few ; hundred MB should be enough; more if you have fast or many ; disks. You can use a file under the osd data dir if need be ; (e.g. /data/osd$id/journal), but it will be slower than a ; separate disk or partition. ; This is an example of a file-based journal. ;osd journal =3D /data/osd$id/journal ;osd journal size =3D 1000 ; journal size, in megabytes ; osd logging to debug osd issues, in order of likelihood of be= ing ; helpful ; debug ms =3D 1 ; debug osd =3D 25 ; debug monc =3D 20 ; debug journal =3D 20 ; debug filestore =3D 10 ; osd use stale snap =3D true [osd0] host =3D ceph1 ; if 'btrfs devs' is not specified, you're responsible for ; setting up the 'osd data' dir. if it is not btrfs, things ; will behave up until you try to recover from a crash (which ; usually fine for basic testing). btrfs devs =3D /dev/sdc osd journal =3D /dev/sda1 [osd1] host =3D ceph1 btrfs devs =3D /dev/sdd osd journal =3D /dev/sda2 [osd2] host =3D ceph2 btrfs devs =3D /dev/sdc osd journal =3D /dev/sda1 [osd3] host =3D ceph2 btrfs devs =3D /dev/sdd osd journal =3D /dev/sda2 [osd4] host =3D ceph3 btrfs devs =3D /dev/sdc osd journal =3D /dev/sda1 [osd5] host =3D ceph3 btrfs devs =3D /dev/sdd osd journal =3D /dev/sda2 The statistics of the disks, this is after the crash of osd2 and osd4. /dev/sdc 143373312 124954676 18418636 88% /data/osd0 /dev/sdd 143373312 137639524 5733788 97% /data/osd1 /dev/sdc 143373312 120350584 23022728 84% /data/osd2 /dev/sdd 143373312 141986188 1387124 100% /data/osd3 /dev/sdc 143373312 112025716 31347596 79% /data/osd4 /dev/sdd 143373312 115163124 28210188 81% /data/osd5 I will send some statistic of the ext3 as well ----- Ursprungligt meddelande -----=20 =46r=C3=A5n: "Gregory Farnum" =20 Till: "Martin Wilderoth" =20 Kopia: ceph-devel@vger.kernel.org=20 Skickat: tisdag, 12 apr 2011 14:24:14=20 =C3=84mne: Re: osd stops=20 On Tuesday, April 12, 2011 at 11:05 AM, Martin Wilderoth wrote:=20 Thanks for the answer, now I know the reson. Some of my osd had 90% of = data, dmesg also shows error with the btrfs on the hosts. I will run th= e test with another file system ext3 :-) or is any other filesystem bet= ter. It's a backuppc filesystem with a lot of hardlinks and data I woul= d like to test to run in ceph.=20 ext3 or really any other FS will handle it better, although Ceph itself= is also not super-resilient to such situations. Eventually we will hav= e automatic rebalancing of data but it's not in there right now.=20 Could you maybe send along your config file and the local filesystem st= atistics on each of your OSDs? CRUSH is psuedo-random and so it's not g= oing to have perfectly even utilization but if the variance is too high= we'll want to look into it sooner rather than later.=20 -Greg=20 --=20 To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n=20 the body of a message to majordomo@vger.kernel.org=20 More majordomo info at http://vger.kernel.org/majordomo-info.html=20 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html