* Fwd: Kernel 3.0.0 + ext4 + ceph == ...
[not found] ` <CAO47_-_EC4s1HF1pOGNzPRitYGyigOd1hfgz1qDPy6dqwGMMQA@mail.gmail.com>
@ 2011-07-30 14:53 ` Christian Brunner
2011-11-15 15:46 ` Eric Sandeen
0 siblings, 1 reply; 4+ messages in thread
From: Christian Brunner @ 2011-07-30 14:53 UTC (permalink / raw)
To: linux-ext4, ceph-devel
Fyodor and I are struggling to get a fully stable ceph cluster up and running.
When we run an Ceph-Objectstore (OSD) ontop of an ext4 filesystem, we
get fsck errors, when we check the filesystem (see below).
Fyodor is running 3.0.
I am running a RHEL6.1 Kernel (2.6.32-131.6.1.el6.x86_64).
Any help or hints on how to trace the bug would be appreciated.
Thanks,
Christian
2011/7/30 Fyodor Ustinov <ufm@ufm.su>:
> fail. Epic fail.
>
> Absolutely reproducible.
>
> I have ceph cluster with this configuration:
>
> 8 physical servers
> 14 osd servers.
> Each osd server have personal fs.
> 48T total size of ceph cluster.
> 17T used.
>
> Now, step by step:
>
> 1. Stop ceph server osd0
> /etc/init.d/ceph stop
>
> 2. Make fresh fs for osd
> umount /osd.0
> mkfs.ext4 /dev/sdc1
> tune2fs -o journal_data_writeback /dev/sdc1
> mount -a
> # string from /etc/fstab:
> # /dev/sdc1 /osd.0 ext4
> user_xattr,rw,noexec,nodev,noatime,nodiratime,data=writeback,barrier=0
> 0 2
> ceph mon getmap -o /tmp/monmap
> cosd --mkfs -i 0 --monmap /tmp/monmap
>
> 3. Start ceph server osd0
> /etc/init.d/ceph start
>
> Now, make a big cup of coffee and begin to wait.
>
> After completion of rebalancing do:
> /etc/init.d/ceph stop
> umount /osd.0
> fsck.ext4 -fy /dev/sdc1
>
> and see many-many messages like:
>
> Inode 238551053, i_blocks is 24, should be 32. Fix? yes
>
> Inode 238551054, i_blocks is 40, should be 32. Fix? yes
>
> Inode 238551066, i_blocks is 24, should be 32. Fix? yes
>
> Inode 238944257, i_blocks is 8, should be 16. Fix? yes
>
> Inode 239206414, i_blocks is 8, should be 16. Fix? yes
>
> Inode 239206416, i_blocks is 40, should be 32. Fix? yes
>
> Inode 239206431, i_blocks is 8, should be 16. Fix? yes
>
> Inode 239206441, i_blocks is 24, should be 32. Fix? yes
>
> Voila.
>
> P.S. No any message in syslog. No any message in console.
>
> WBR,
> Fyodor.
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Kernel 3.0.0 + ext4 + ceph == ...
[not found] ` <CAC-hyiHzmn25ryJkNUdzQvk7c7chwVDfmwDeo8X2+4zTbDuFGQ@mail.gmail.com>
@ 2011-08-08 20:07 ` Christian Brunner
2011-08-18 9:19 ` Christian Brunner
0 siblings, 1 reply; 4+ messages in thread
From: Christian Brunner @ 2011-08-08 20:07 UTC (permalink / raw)
To: Yehuda Sadeh Weinraub
Cc: Sage Weil, Theodore Tso, Fyodor Ustinov, ceph-devel, linux-ext4
I tried 3.0.1 today, which contains the commit Theodore suggested and
was no longer able to reproduce the problem.
So I think the corruption we have seen is indeed related to:
commit 7132de744ba76930d13033061018ddd7e3e8cd91
Author: Maxim Patlasov <maxim.patlasov@gmail.com>
Date: Sun Jul 10 19:37:48 2011 -0400
ext4: fix i_blocks/quota accounting when extent insertion fails
I will now try to apply this patch to the RHEL6.1 kernel and see what
happens...
Thanks for your help.
Christian
2011/8/3 Yehuda Sadeh Weinraub <yehuda.sadeh@dreamhost.com>:
> On Wed, Aug 3, 2011 at 7:16 AM, Christian Brunner <chb@muc.de> wrote:
> ...
>> I tried to reproduce this without ceph, but wasn't able to...
>>
>> In the meantime it seams, that I can also see the side effects on the
>> librbd side: I get an "librbd: data error!" when I do an "rbd copy".
>>
>> When I look at the librbd code this is related to a sparse_read not
>> returning the right size of the object.
>>
>> I don't know if it helps, but I think that the problem is also related
>> to sparse file usage.
>>
>
> There were a few sparse-read issues that we fixed not too long ago,
> but should have been fixed for at least the previous ceph version. I'm
> not sure what version you're using.
> There was a ext4 fiemap issue that I was hitting on specific
> environments but couldn't determine whether it was fixed in later
> kernel versions (I was using 2.6.32). Now is a good time to try and
> get to the bottom of it. Here's a script I was using to reproduce it:
>
> #!/bin/sh
> dd if=/dev/urandom of=bla bs=1 seek=$((0x6f000)) count=$((0x1000)); sync
> dd if=/dev/urandom of=bla bs=1 seek=$((0x70000)) count=$((0x1000)); sync
> dd if=/dev/urandom of=bla bs=1 seek=$((0x71000)) count=$((0x1000)); sync
> dd if=/dev/urandom of=bla bs=1 seek=$((0x72000)) count=$((0x1000)); sync
> dd if=/dev/urandom of=bla bs=1 seek=$((0x73000)) count=$((0x1000)); sync
> dd if=/dev/urandom of=bla bs=1 seek=$((0x74000)) count=$((0x2000)); sync
> dd if=/dev/urandom of=bla bs=1 seek=$((0x2ae000)) count=$((0x2000)); sync
>
> You can compile and run the following utility to dump all the extents:
> http://pastebin.com/h2Cnpk2Q
>
> Thanks,
> Yehuda
>
> Oh, btw, You can effectively disable the use of fiemap by setting the
> 'filestore fiemap threshold' config option with large enough value
> (e.g., anything bigger than 4 MB should be enough for rbd).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Kernel 3.0.0 + ext4 + ceph == ...
2011-08-08 20:07 ` Christian Brunner
@ 2011-08-18 9:19 ` Christian Brunner
0 siblings, 0 replies; 4+ messages in thread
From: Christian Brunner @ 2011-08-18 9:19 UTC (permalink / raw)
To: linux-ext4; +Cc: Sage Weil, Theodore Tso, Fyodor Ustinov, ceph-devel
I'm sorry, that I have to correct this:
The problem is still happening with 3.0.1. Although it only seems to
happen under high load now.
I also did some tracing (with 3.0.0 as the problem is easier to
reproduce here). What might be interesting to note is, that the
corruption does not occur, when I do an "strace -f cosd". (Maybe a
race condition?).
To reproduce the problem I have now setup a ceph cluster on a single machine
with replication between /ceph/osd.000 and /ceph/osd.001.
My setup now has only two active placement groups with 2 objects.
The corruption is happening, when I start replication from osd.000 to
osd.001. It is reproducible most of the time (but not allways), when I
do the following:
# mkfs.ext4 -T largefile /dev/sdb1
# mount -o noatime,user_xattr /dev/sdb1 /ceph/osd.001/
# cosd -i 001 --mkjournal --mkfs --monmap /tmp/monmap
# /usr/bin/cosd -d -i 001 -c /etc/ceph/ceph.conf
### wait until replication has finished and then stop the cosd
# umount /dev/sdb1
# fsck.ext4 -f /dev/sdb
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Inode 43, i_blocks is 8, should be 16. Fix<y>? no
Inode 2078, i_blocks is 24, should be 16. Fix<y>? no
I can also provide an e2image with the metadata and the strace output
of the cosd, if this would be helpful.
Regards,
Christian
2011/8/8 Christian Brunner <chb@muc.de>:
> I tried 3.0.1 today, which contains the commit Theodore suggested and
> was no longer able to reproduce the problem.
>
> So I think the corruption we have seen is indeed related to:
>
> commit 7132de744ba76930d13033061018ddd7e3e8cd91
> Author: Maxim Patlasov <maxim.patlasov@gmail.com>
> Date: Sun Jul 10 19:37:48 2011 -0400
>
> ext4: fix i_blocks/quota accounting when extent insertion fails
>
>
> I will now try to apply this patch to the RHEL6.1 kernel and see what
> happens...
>
> Thanks for your help.
>
> Christian
>
>
> 2011/8/3 Yehuda Sadeh Weinraub <yehuda.sadeh@dreamhost.com>:
>> On Wed, Aug 3, 2011 at 7:16 AM, Christian Brunner <chb@muc.de> wrote:
>> ...
>>> I tried to reproduce this without ceph, but wasn't able to...
>>>
>>> In the meantime it seams, that I can also see the side effects on the
>>> librbd side: I get an "librbd: data error!" when I do an "rbd copy".
>>>
>>> When I look at the librbd code this is related to a sparse_read not
>>> returning the right size of the object.
>>>
>>> I don't know if it helps, but I think that the problem is also related
>>> to sparse file usage.
>>>
>>
>> There were a few sparse-read issues that we fixed not too long ago,
>> but should have been fixed for at least the previous ceph version. I'm
>> not sure what version you're using.
>> There was a ext4 fiemap issue that I was hitting on specific
>> environments but couldn't determine whether it was fixed in later
>> kernel versions (I was using 2.6.32). Now is a good time to try and
>> get to the bottom of it. Here's a script I was using to reproduce it:
>>
>> #!/bin/sh
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x6f000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x70000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x71000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x72000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x73000)) count=$((0x1000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x74000)) count=$((0x2000)); sync
>> dd if=/dev/urandom of=bla bs=1 seek=$((0x2ae000)) count=$((0x2000)); sync
>>
>> You can compile and run the following utility to dump all the extents:
>> http://pastebin.com/h2Cnpk2Q
>>
>> Thanks,
>> Yehuda
>>
>> Oh, btw, You can effectively disable the use of fiemap by setting the
>> 'filestore fiemap threshold' config option with large enough value
>> (e.g., anything bigger than 4 MB should be enough for rbd).
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Fwd: Kernel 3.0.0 + ext4 + ceph == ...
2011-07-30 14:53 ` Fwd: Kernel 3.0.0 + ext4 + ceph == Christian Brunner
@ 2011-11-15 15:46 ` Eric Sandeen
0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2011-11-15 15:46 UTC (permalink / raw)
To: chb; +Cc: linux-ext4, ceph-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 7/30/11 9:53 AM, Christian Brunner wrote:
> Fyodor and I are struggling to get a fully stable ceph cluster up and running.
>
> When we run an Ceph-Objectstore (OSD) ontop of an ext4 filesystem, we
> get fsck errors, when we check the filesystem (see below).
BTW, this should be fixed now as of my commit 6d6a435190bdf2e04c9465cde5bdc3ac68cf11a4
ext4: fix race in xattr block allocation path
I think it made its way to a couple older -stable kernels, too.
- -Eric
> Fyodor is running 3.0.
> I am running a RHEL6.1 Kernel (2.6.32-131.6.1.el6.x86_64).
>
> Any help or hints on how to trace the bug would be appreciated.
>
> Thanks,
> Christian
>
> 2011/7/30 Fyodor Ustinov <ufm@ufm.su>:
>> fail. Epic fail.
>>
>> Absolutely reproducible.
>>
>> I have ceph cluster with this configuration:
>>
>> 8 physical servers
>> 14 osd servers.
>> Each osd server have personal fs.
>> 48T total size of ceph cluster.
>> 17T used.
>>
>> Now, step by step:
>>
>> 1. Stop ceph server osd0
>> /etc/init.d/ceph stop
>>
>> 2. Make fresh fs for osd
>> umount /osd.0
>> mkfs.ext4 /dev/sdc1
>> tune2fs -o journal_data_writeback /dev/sdc1
>> mount -a
>> # string from /etc/fstab:
>> # /dev/sdc1 /osd.0 ext4
>> user_xattr,rw,noexec,nodev,noatime,nodiratime,data=writeback,barrier=0
>> 0 2
>> ceph mon getmap -o /tmp/monmap
>> cosd --mkfs -i 0 --monmap /tmp/monmap
>>
>> 3. Start ceph server osd0
>> /etc/init.d/ceph start
>>
>> Now, make a big cup of coffee and begin to wait.
>>
>> After completion of rebalancing do:
>> /etc/init.d/ceph stop
>> umount /osd.0
>> fsck.ext4 -fy /dev/sdc1
>>
>> and see many-many messages like:
>>
>> Inode 238551053, i_blocks is 24, should be 32. Fix? yes
>>
>> Inode 238551054, i_blocks is 40, should be 32. Fix? yes
>>
>> Inode 238551066, i_blocks is 24, should be 32. Fix? yes
>>
>> Inode 238944257, i_blocks is 8, should be 16. Fix? yes
>>
>> Inode 239206414, i_blocks is 8, should be 16. Fix? yes
>>
>> Inode 239206416, i_blocks is 40, should be 32. Fix? yes
>>
>> Inode 239206431, i_blocks is 8, should be 16. Fix? yes
>>
>> Inode 239206441, i_blocks is 24, should be 32. Fix? yes
>>
>> Voila.
>>
>> P.S. No any message in syslog. No any message in console.
>>
>> WBR,
>> Fyodor.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQIcBAEBAgAGBQJOwoljAAoJECCuFpLhPd7gjxIQAJ7B+f7EYxBZ+48gUrncmB5r
Izkkv2ACza+27g/CUi9ku9j1o3pjZwLNhzo3Fj0gwweB3WaY9T+JMXnfInSFegeR
GCT/8XQqGWFVoRQKKc4wUBKGgW5f+3HTgYLqUY0Z38MqMHpIMXYswXdOSB1Wc4MC
p+jEjHmTWftklpIjv+Vm61AejpoUO93SFE5gUuBeKSZxwjifV1uTUXtaZCQXUG5N
EFz+sS7YvGrttAldK+lbiq7sa7IKINnB5lbDs5ChSZoytSF9hPIRgDOTLrkAZ+k8
YovLWbu2gwGMcZEhu3ZLJ7NdtZbn45A/fh/grNU8nezTo0cTHBTYZCLqtjsUDuMr
mwUIDNUEAv6LIz0OyeJMftDX4TzxjQyEQOgYg5wyCKCjE2Nyktyap2T5sAFKamJJ
pgTUt0JSpXgDnDBL7Y3M6RbY8DQsDHIir3A7aOwdINGKweNiJXBYC3LWYHIXY0bd
yoKXT6e/Bentlj+Peugg51bw91JtlqxJT4qJfk6HMF00uxrfWHlvzht7Lu61YxrW
LBQgNyQ+Gu1drHIHyIFu95UePhzEGQcLXB3YUe7BKFGe4Vde8Jcrwn1RSFmILU6H
o9jPncZVanQYy9URQqnrcHzqpRfViuVeyhuAUh3lPt4Q7jIrr+2Ug6xWxIkBrtTt
/iKT0p8+aR3HhakrGqp4
=VbZG
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-11-15 15:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4E33D101.1050504@ufm.su>
[not found] ` <CAO47_-_EC4s1HF1pOGNzPRitYGyigOd1hfgz1qDPy6dqwGMMQA@mail.gmail.com>
2011-07-30 14:53 ` Fwd: Kernel 3.0.0 + ext4 + ceph == Christian Brunner
2011-11-15 15:46 ` Eric Sandeen
[not found] ` <9BF9E529-C532-4A94-8362-93C2D1B778DB@mit.edu>
[not found] ` <4E3432FC.9030204@ufm.su>
[not found] ` <20110730165001.GI7361@thunk.org>
[not found] ` <Pine.LNX.4.64.1107301016120.23447@cobra.newdream.net>
[not found] ` <20110730221900.GK7361@thunk.org>
[not found] ` <Pine.LNX.4.64.1107302149430.23447@cobra.newdream.net>
[not found] ` <4E353D9E.5080802@ufm.su>
[not found] ` <Pine.LNX.4.64.1107310951550.2348@cobra.newdream.net>
[not found] ` <4E35B833.6070304@ufm.su>
[not found] ` <Pine.LNX.4.64.1107311339530.23447@cobra.newdream.net>
[not found] ` <80E3795B-C981-492F-9312-DC91D57E4017@mit.edu>
[not found] ` <Pine.LNX.4.64.1108010918580.6290@cobra.newdream.net>
[not found] ` <CAO47_-9DmxqfBsBF2K_8ScX_4d-HPz01QeQ-2FFwZS-nCDEOsw@mail.gmail.com>
[not found] ` <CAC-hyiHzmn25ryJkNUdzQvk7c7chwVDfmwDeo8X2+4zTbDuFGQ@mail.gmail.com>
2011-08-08 20:07 ` Christian Brunner
2011-08-18 9:19 ` Christian Brunner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).