* Random data corruption in VM, possibly caused by rbd
@ 2012-06-07 18:04 Guido Winkelmann
2012-06-07 18:18 ` Stefan Priebe
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Guido Winkelmann @ 2012-06-07 18:04 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2119 bytes --]
Hi,
I'm using Ceph with RBD to provide network-transparent disk images for KVM-
based virtual servers. The last two days, I've been hunting some weird elusive
bug where data in the virtual machines would be corrupted in weird ways. It
usually manifests in files having some random data - usually zeroes - at the
start before the actual contents that should be in there start.
To track this down, I wrote a simple io tester. It does the following:
- Create 1 Megabyte of random data
- Calculate the SHA256 hash of that data
- Write the data to a file on the harddisk, in a given directory, using the
hash as the filename
- Repeat until the disk is full
- Delete the last file (because it is very likely to be incompletely written)
- Read and delete all the files just written while checking that their sha256
sums are equal to their filenames
When running this io tester in a VM that uses a qcow2 file on a local harddisk
for its virtual disk, no errors are found. When the same VM is running using
rbd, the io tester finds on average about one corruption every 200 Megabytes,
reproducably.
(As in an interesting aside, the io tester also prints how long it took to
read or write 100 MB, and it turns out reading the data back in again is about
three times slower than writing them in the first place...)
Ceph is version 0.47.2. Qemu KVM is 1.0, compiled with the spec file from
http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=summary
(And compiled after ceph 0.47.2 was installed on that machine, so it would use
the correct headers...)
Both the Ceph cluster and the KVM host machines are running on Fedora 16, with
a fairly recent 3.3.x kernel.
The ceph cluster uses btrf for the osd's data dirs. The journal is on a tmpfs.
(This is not a production setup - luckily.)
The virtual machine is using ext4 as its filesystem.
There were no obvious other problems with either the ceph cluster or the KVM
host machines.
I have attached a copy of the ceph.conf in use, in case it might be helpful.
This is a huge problem, and any help in tracking it down would be much
appreciated.
Regards,
Guido
[-- Attachment #2: ceph.conf --]
[-- Type: application/octet-stream, Size: 1358 bytes --]
; global
[global]
; enable secure authentication
; auth supported = cephx
max open files = 131072
log file = /var/log/ceph/$name.log
; log_to_syslog = true ; uncomment this line to log to syslog
pid file = /var/run/ceph/$name.pid
; monitors
[mon]
mon data = /mondata/$name
[mon.alpha]
host = storage1
mon addr = 10.6.224.129:6789
[mon.beta]
host = storage2
mon addr = 10.6.224.130:6789
[mon.gamma]
host = storage3
mon addr = 10.6.224.131:6789
; mds
[mds]
; where the mds keeps it's secret encryption keys
keyring = /mdsdata/keyring.$name
[mds.alpha]
host = storage1
[mds.beta]
host = storage2
[mds.gamma]
host = storage3
; osd
[osd]
osd data = /osddata/$name
osd journal = /journaldata/$name/journal
osd journal size = 1000 ; journal size, in megabytes
; If you want to run the journal on a tmpfs, disable DirectIO
journal dio = false
osd recovery max active = 5
btrfs devs = /dev/sda5 /dev/sdb5
keyring = /osddata/$name/keyring
[osd.0]
host = storage1
cluster addr = 10.6.224.193
public addr = 10.6.224.129
[osd.1]
host = storage2
cluster addr = 10.6.224.194
public addr = 10.6.224.130
[osd.2]
host = storage3
cluster addr = 10.6.224.195
public addr = 10.6.224.131
[client] ; userspace client
; debug ms = 1
; debug client = 10
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 18:04 Random data corruption in VM, possibly caused by rbd Guido Winkelmann @ 2012-06-07 18:18 ` Stefan Priebe 2012-06-07 18:37 ` Guido Winkelmann 2012-06-07 18:40 ` Oliver Francke 2012-06-07 19:48 ` Josh Durgin 2 siblings, 1 reply; 30+ messages in thread From: Stefan Priebe @ 2012-06-07 18:18 UTC (permalink / raw) To: Guido Winkelmann; +Cc: ceph-devel@vger.kernel.org I think the test script would help a lot so others can test too. Am 07.06.2012 um 20:04 schrieb Guido Winkelmann <guido-ceph@thisisnotatest.de>: > Hi, > > I'm using Ceph with RBD to provide network-transparent disk images for KVM- > based virtual servers. The last two days, I've been hunting some weird elusive > bug where data in the virtual machines would be corrupted in weird ways. It > usually manifests in files having some random data - usually zeroes - at the > start before the actual contents that should be in there start. > > To track this down, I wrote a simple io tester. It does the following: > > - Create 1 Megabyte of random data > - Calculate the SHA256 hash of that data > - Write the data to a file on the harddisk, in a given directory, using the > hash as the filename > - Repeat until the disk is full > - Delete the last file (because it is very likely to be incompletely written) > - Read and delete all the files just written while checking that their sha256 > sums are equal to their filenames > > When running this io tester in a VM that uses a qcow2 file on a local harddisk > for its virtual disk, no errors are found. When the same VM is running using > rbd, the io tester finds on average about one corruption every 200 Megabytes, > reproducably. > > (As in an interesting aside, the io tester also prints how long it took to > read or write 100 MB, and it turns out reading the data back in again is about > three times slower than writing them in the first place...) > > Ceph is version 0.47.2. Qemu KVM is 1.0, compiled with the spec file from > http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=summary > (And compiled after ceph 0.47.2 was installed on that machine, so it would use > the correct headers...) > Both the Ceph cluster and the KVM host machines are running on Fedora 16, with > a fairly recent 3.3.x kernel. > The ceph cluster uses btrf for the osd's data dirs. The journal is on a tmpfs. > (This is not a production setup - luckily.) > The virtual machine is using ext4 as its filesystem. > There were no obvious other problems with either the ceph cluster or the KVM > host machines. > > I have attached a copy of the ceph.conf in use, in case it might be helpful. > > This is a huge problem, and any help in tracking it down would be much > appreciated. > > Regards, > > Guido > <ceph.conf> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 18:18 ` Stefan Priebe @ 2012-06-07 18:37 ` Guido Winkelmann 2012-06-07 19:54 ` Andrey Korolyov 2012-06-07 21:53 ` Marcus Sorensen 0 siblings, 2 replies; 30+ messages in thread From: Guido Winkelmann @ 2012-06-07 18:37 UTC (permalink / raw) To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 356 bytes --] Am Donnerstag, 7. Juni 2012, 20:18:52 schrieb Stefan Priebe: > I think the test script would help a lot so others can test too. Okay, I've attached the program. It's barely 2 KB. You need Boost 1.45+, CMake 2.6+ and Crypto++ to compile it. Warning: This will fill up your harddisk completely, which is not a good idea on in-production machines. Guido [-- Attachment #2: iotester.tgz --] [-- Type: application/x-compressed-tar, Size: 1864 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 18:37 ` Guido Winkelmann @ 2012-06-07 19:54 ` Andrey Korolyov 2012-06-07 21:03 ` Guido Winkelmann 2012-06-07 21:53 ` Marcus Sorensen 1 sibling, 1 reply; 30+ messages in thread From: Andrey Korolyov @ 2012-06-07 19:54 UTC (permalink / raw) To: Guido Winkelmann; +Cc: ceph-devel Hmm, can`t reproduce that(phew!). Qemu-1.1-release, 0.47.2, guest/host mainly debian wheezy. Only one main difference with my setup from yours is a underlying fs - I`m tired of btrfs unpredictable load issues and moved back to xfs. BTW you calculate sha1 in test suite, not sha256 as you mentioned above. On Thu, Jun 7, 2012 at 10:37 PM, Guido Winkelmann <guido-ceph@thisisnotatest.de> wrote: > Am Donnerstag, 7. Juni 2012, 20:18:52 schrieb Stefan Priebe: >> I think the test script would help a lot so others can test too. > > Okay, I've attached the program. It's barely 2 KB. You need Boost 1.45+, CMake > 2.6+ and Crypto++ to compile it. > > Warning: This will fill up your harddisk completely, which is not a good idea > on in-production machines. > > Guido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 19:54 ` Andrey Korolyov @ 2012-06-07 21:03 ` Guido Winkelmann 0 siblings, 0 replies; 30+ messages in thread From: Guido Winkelmann @ 2012-06-07 21:03 UTC (permalink / raw) To: Andrey Korolyov; +Cc: ceph-devel On Thursday 07 June 2012 23:54:04 Andrey Korolyov wrote: > Hmm, can`t reproduce that(phew!). Qemu-1.1-release, 0.47.2, guest/host > mainly debian wheezy. Only one main difference with my setup from > yours is a underlying fs - I`m tired of btrfs unpredictable load > issues and moved back to xfs. I guess I'll try that tomorrow as well. Is there a guide somewhere for switching from btrfs to xfs? > BTW you calculate sha1 in test suite, not sha256 as you mentioned above. True. I must have mixed that up somehow. Not that it matters - for what I'm doing with it here, I might as well be using CRC32. Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 18:37 ` Guido Winkelmann 2012-06-07 19:54 ` Andrey Korolyov @ 2012-06-07 21:53 ` Marcus Sorensen 2012-06-07 22:12 ` Guido Winkelmann 1 sibling, 1 reply; 30+ messages in thread From: Marcus Sorensen @ 2012-06-07 21:53 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Stefan Priebe, ceph-devel@vger.kernel.org Maybe I did something wrong with your iotester, but I had to mkdir ./iotest to get it to run. I straced and found that it died on 'no such file'. On Thu, Jun 7, 2012 at 12:37 PM, Guido Winkelmann <guido-ceph@thisisnotatest.de> wrote: > Am Donnerstag, 7. Juni 2012, 20:18:52 schrieb Stefan Priebe: >> I think the test script would help a lot so others can test too. > > Okay, I've attached the program. It's barely 2 KB. You need Boost 1.45+, CMake > 2.6+ and Crypto++ to compile it. > > Warning: This will fill up your harddisk completely, which is not a good idea > on in-production machines. > > Guido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 21:53 ` Marcus Sorensen @ 2012-06-07 22:12 ` Guido Winkelmann 0 siblings, 0 replies; 30+ messages in thread From: Guido Winkelmann @ 2012-06-07 22:12 UTC (permalink / raw) To: Marcus Sorensen; +Cc: Stefan Priebe, ceph-devel@vger.kernel.org On Thursday 07 June 2012 15:53:18 Marcus Sorensen wrote: > Maybe I did something wrong with your iotester, but I had to mkdir > ./iotest to get it to run. I straced and found that it died on 'no > such file'. It's a bit quick and dirty... You are supposed to pass the directory where it is to put its test files on the commandline, like ./iotester /mnt/filesystem_to_test Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 18:04 Random data corruption in VM, possibly caused by rbd Guido Winkelmann 2012-06-07 18:18 ` Stefan Priebe @ 2012-06-07 18:40 ` Oliver Francke 2012-06-07 19:48 ` Josh Durgin 2 siblings, 0 replies; 30+ messages in thread From: Oliver Francke @ 2012-06-07 18:40 UTC (permalink / raw) To: Guido Winkelmann; +Cc: ceph-devel@vger.kernel.org Hi Guido, unfortunately this sounds very familiar to me. We have been on a long road with similar "weird" errors. Our setup is something like "start a couple of VM's ( qemu-*), let them create a 1G-file each and randomly seek and write 4MB blocks filled with md5sums of the block as payload, to be verifiable after completely written. Furthermore create some 10000 files every-now-and-then and try to remove them after the verify-run. This produced the same things than you are experiencing - zero'ed blocks - with the main difference, that my tests are now clean with 0.47-2 and friends. After a couple of hundreds of runs. Our setup is with XFS as OSD-data partition, as we had too many errors with btrfs in the past. My assumption now would be, that there are some relations to your filesystem…?! Would be cool if you are able to change your setup to XFS. At least that would be a starting-point for further investigations. Regards, Oliver. Am 07.06.2012 um 20:04 schrieb Guido Winkelmann: > Hi, > > I'm using Ceph with RBD to provide network-transparent disk images for KVM- > based virtual servers. The last two days, I've been hunting some weird elusive > bug where data in the virtual machines would be corrupted in weird ways. It > usually manifests in files having some random data - usually zeroes - at the > start before the actual contents that should be in there start. > > To track this down, I wrote a simple io tester. It does the following: > > - Create 1 Megabyte of random data > - Calculate the SHA256 hash of that data > - Write the data to a file on the harddisk, in a given directory, using the > hash as the filename > - Repeat until the disk is full > - Delete the last file (because it is very likely to be incompletely written) > - Read and delete all the files just written while checking that their sha256 > sums are equal to their filenames > > When running this io tester in a VM that uses a qcow2 file on a local harddisk > for its virtual disk, no errors are found. When the same VM is running using > rbd, the io tester finds on average about one corruption every 200 Megabytes, > reproducably. > > (As in an interesting aside, the io tester also prints how long it took to > read or write 100 MB, and it turns out reading the data back in again is about > three times slower than writing them in the first place...) > > Ceph is version 0.47.2. Qemu KVM is 1.0, compiled with the spec file from > http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=summary > (And compiled after ceph 0.47.2 was installed on that machine, so it would use > the correct headers...) > Both the Ceph cluster and the KVM host machines are running on Fedora 16, with > a fairly recent 3.3.x kernel. > The ceph cluster uses btrf for the osd's data dirs. The journal is on a tmpfs. > (This is not a production setup - luckily.) > The virtual machine is using ext4 as its filesystem. > There were no obvious other problems with either the ceph cluster or the KVM > host machines. > > I have attached a copy of the ceph.conf in use, in case it might be helpful. > > This is a huge problem, and any help in tracking it down would be much > appreciated. > > Regards, > > Guido<ceph.conf> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 18:04 Random data corruption in VM, possibly caused by rbd Guido Winkelmann 2012-06-07 18:18 ` Stefan Priebe 2012-06-07 18:40 ` Oliver Francke @ 2012-06-07 19:48 ` Josh Durgin 2012-06-07 21:36 ` Guido Winkelmann 2012-06-08 12:55 ` Guido Winkelmann 2 siblings, 2 replies; 30+ messages in thread From: Josh Durgin @ 2012-06-07 19:48 UTC (permalink / raw) To: Guido Winkelmann; +Cc: ceph-devel@vger.kernel.org On 06/07/2012 11:04 AM, Guido Winkelmann wrote: > Hi, > > I'm using Ceph with RBD to provide network-transparent disk images for KVM- > based virtual servers. The last two days, I've been hunting some weird elusive > bug where data in the virtual machines would be corrupted in weird ways. It > usually manifests in files having some random data - usually zeroes - at the > start before the actual contents that should be in there start. I definitely want to figure out what's going on with this. A few questions: Are you using rbd caching? If so, what settings? In either case, does the corruption still occur if you switch caching on/off? There are different I/O paths here, and this might tell us if the problem is on the client side. Another thing to try is turning off sparse reads on the osd by setting filestore fiemap threshold = 0 > To track this down, I wrote a simple io tester. It does the following: > > - Create 1 Megabyte of random data > - Calculate the SHA256 hash of that data > - Write the data to a file on the harddisk, in a given directory, using the > hash as the filename > - Repeat until the disk is full > - Delete the last file (because it is very likely to be incompletely written) > - Read and delete all the files just written while checking that their sha256 > sums are equal to their filenames > > When running this io tester in a VM that uses a qcow2 file on a local harddisk > for its virtual disk, no errors are found. When the same VM is running using > rbd, the io tester finds on average about one corruption every 200 Megabytes, > reproducably. > > (As in an interesting aside, the io tester also prints how long it took to > read or write 100 MB, and it turns out reading the data back in again is about > three times slower than writing them in the first place...) > > Ceph is version 0.47.2. Qemu KVM is 1.0, compiled with the spec file from > http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=summary > (And compiled after ceph 0.47.2 was installed on that machine, so it would use > the correct headers...) > Both the Ceph cluster and the KVM host machines are running on Fedora 16, with > a fairly recent 3.3.x kernel. Those versions should all work. > The ceph cluster uses btrf for the osd's data dirs. The journal is on a tmpfs. > (This is not a production setup - luckily.) > The virtual machine is using ext4 as its filesystem. > There were no obvious other problems with either the ceph cluster or the KVM > host machines. Were there any nodes with osds restarted during the test runs? I wonder if it's a problem with losing the tmpfs journal. As Oliver suggested, switching the osd data dir filesystem might help too. > I have attached a copy of the ceph.conf in use, in case it might be helpful. > > This is a huge problem, and any help in tracking it down would be much > appreciated. Agreed, and I'm happy to help. Josh > Regards, > > Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 19:48 ` Josh Durgin @ 2012-06-07 21:36 ` Guido Winkelmann 2012-06-07 22:13 ` Tommi Virtanen 2012-06-08 12:55 ` Guido Winkelmann 1 sibling, 1 reply; 30+ messages in thread From: Guido Winkelmann @ 2012-06-07 21:36 UTC (permalink / raw) To: Josh Durgin; +Cc: ceph-devel@vger.kernel.org On Thursday 07 June 2012 12:48:05 Josh Durgin wrote: > On 06/07/2012 11:04 AM, Guido Winkelmann wrote: > > Hi, > > > > I'm using Ceph with RBD to provide network-transparent disk images for > > KVM- > > based virtual servers. The last two days, I've been hunting some weird > > elusive bug where data in the virtual machines would be corrupted in > > weird ways. It usually manifests in files having some random data - > > usually zeroes - at the start before the actual contents that should be > > in there start. > > I definitely want to figure out what's going on with this. > A few questions: > > Are you using rbd caching? If so, what settings? I'm not using rbd caching, and I wasn't planning on even trying before I have a much better understanding of how it affects VM migration. > In either case, does the corruption still occur if you > switch caching on/off? There are different I/O paths here, > and this might tell us if the problem is on the client side. > > Another thing to try is turning off sparse reads on the osd by setting > filestore fiemap threshold = 0 Okay, I will try these things tomorrow. [...] > > The ceph cluster uses btrf for the osd's data dirs. The journal is on a > > tmpfs. (This is not a production setup - luckily.) > > The virtual machine is using ext4 as its filesystem. > > There were no obvious other problems with either the ceph cluster or the > > KVM host machines. > > Were there any nodes with osds restarted during the test runs? I wonder > if it's a problem with losing the tmpfs journal. No, from the point when the rbd volume was created, all nodes were online all the time. No nodes were added or removed. > As Oliver suggested, switching the osd data dir filesystem might help > too. Again, I'll try that tomorrow. BTW, I could use some advice on how to go about that. Right I would stop one osd process (not the whole machine), reformat and remount its btrfs devices as XFS, delete the journal, restart the osd, wait until the cluster is healthy again, repeat for all the osds in the cluster. Is that sufficient? Oh, one other thing I just thought of: The rbd volume in question was created as a copy, using the rbd cp command, from a template volume. I cannot recall seeing any corruption while using the original volume (which was created using rbd import). Maybe the bug only bites volumes that have been created as copies of other volumes? I'll have to do more tests along those lines as well... Regards, Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 21:36 ` Guido Winkelmann @ 2012-06-07 22:13 ` Tommi Virtanen 0 siblings, 0 replies; 30+ messages in thread From: Tommi Virtanen @ 2012-06-07 22:13 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Josh Durgin, ceph-devel@vger.kernel.org On Thu, Jun 7, 2012 at 2:36 PM, Guido Winkelmann <guido-ceph@thisisnotatest.de> wrote: > Again, I'll try that tomorrow. BTW, I could use some advice on how to go about > that. Right I would stop one osd process (not the whole machine), reformat and > remount its btrfs devices as XFS, delete the journal, restart the osd, wait > until the cluster is healthy again, repeat for all the osds in the cluster. Is > that sufficient? Before restarting the osd, you need to do a ceph-osd --mkfs. Otherwise, yeah that looks good. > The rbd volume in question was created as a copy, using the rbd cp command, > from a template volume. I cannot recall seeing any corruption while using the > original volume (which was created using rbd import). Maybe the bug only bites > volumes that have been created as copies of other volumes? I'll have to do > more tests along those lines as well... Hmm. There should be no difference between the end results of rbd cp and rbd import. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-07 19:48 ` Josh Durgin 2012-06-07 21:36 ` Guido Winkelmann @ 2012-06-08 12:55 ` Guido Winkelmann 2012-06-08 13:08 ` Guido Winkelmann 2012-06-08 13:36 ` Oliver Francke 1 sibling, 2 replies; 30+ messages in thread From: Guido Winkelmann @ 2012-06-08 12:55 UTC (permalink / raw) To: Josh Durgin; +Cc: ceph-devel Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie: > On 06/07/2012 11:04 AM, Guido Winkelmann wrote: > > Hi, > > > > I'm using Ceph with RBD to provide network-transparent disk images for > > KVM- > > based virtual servers. The last two days, I've been hunting some weird > > elusive bug where data in the virtual machines would be corrupted in > > weird ways. It usually manifests in files having some random data - > > usually zeroes - at the start before the actual contents that should be > > in there start. > > I definitely want to figure out what's going on with this. > A few questions: > > Are you using rbd caching? If so, what settings? > > In either case, does the corruption still occur if you > switch caching on/off? There are different I/O paths here, > and this might tell us if the problem is on the client side. Okay, I've tried enabling rbd caching now, and so far, the problem appears to be gone. I am using libvirt for starting and managing the virtual machines, and what I did was change the <source> element for the virtual disk from <source protocol='rbd' name='rbd/name_of_image'> to <source protocol='rbd' name='rbd/name_of_image:rbd_cache=true'> and then restart the VM. (I found that in one of your mails on this list; there does not appear to be any proper documentation on this...) The iotester does not find any corruptions with these settings. The VM ist still horribly broken, but that's probably lingering filesystem damage from yesterday. I'll try with a fresh image next. I did not change anything else in the setup. In particular, the OSDs still use btrfs. One of the OSD has been restarted, though. I will run another test with a VM without rbd caching, to make sure it wasn't by random chance restarting that one osd that made the real difference. Enabling btrfs did not appear to make any difference wrt performance, but that's probably because my tests mostly create sustained sequential IO, for which caches are generally not very helpful. Enabling rbd caching is not a solution I particularly like, for two reasons: 1. In my setup, migrating VMs from one host to another is a normal part of operation, and I still don't know ho to prevent data corruption (in the form of silently lost writes) when combining rbd caching and migration. 2. I'm not really looking into speeding up single VM, I'm really more interested in just how many VMs I can run before performance starts degrading for everyone, and I don't think rbd caching will help with that. Regards, Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 12:55 ` Guido Winkelmann @ 2012-06-08 13:08 ` Guido Winkelmann 2012-06-08 13:36 ` Oliver Francke 1 sibling, 0 replies; 30+ messages in thread From: Guido Winkelmann @ 2012-06-08 13:08 UTC (permalink / raw) To: Josh Durgin; +Cc: ceph-devel Am Freitag, 8. Juni 2012, 14:55:44 schrieb Guido Winkelmann: > I did not change anything else in the setup. In particular, the OSDs still > use btrfs. One of the OSD has been restarted, though. I will run another > test with a VM without rbd caching, to make sure it wasn't by random chance > restarting that one osd that made the real difference. Did that now, the problem is still reproducable: Start a VM normally -> Lots and lots of data corruption (and it seems to be getting worse...) Start a VM with rbd caching -> No data corruption Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 12:55 ` Guido Winkelmann 2012-06-08 13:08 ` Guido Winkelmann @ 2012-06-08 13:36 ` Oliver Francke 2012-06-08 13:55 ` Sage Weil 1 sibling, 1 reply; 30+ messages in thread From: Oliver Francke @ 2012-06-08 13:36 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Josh Durgin, ceph-devel Hi Guido, yeah, there is something weird going on. I just started to establish some test-VM's. Freshly imported from running *.qcow2 images. Kernel panic with INIT, seg-faults and other "funny" stuff. Just added the rbd_cache=true in my config, voila. All is fast-n-up-n-running... All my testing was done with cache enabled... Since our errors all came from rbd_writeback from former ceph-versions... Josh? Sage? Help?! Oliver. On 06/08/2012 02:55 PM, Guido Winkelmann wrote: > Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie: >> On 06/07/2012 11:04 AM, Guido Winkelmann wrote: >>> Hi, >>> >>> I'm using Ceph with RBD to provide network-transparent disk images for >>> KVM- >>> based virtual servers. The last two days, I've been hunting some weird >>> elusive bug where data in the virtual machines would be corrupted in >>> weird ways. It usually manifests in files having some random data - >>> usually zeroes - at the start before the actual contents that should be >>> in there start. >> I definitely want to figure out what's going on with this. >> A few questions: >> >> Are you using rbd caching? If so, what settings? >> >> In either case, does the corruption still occur if you >> switch caching on/off? There are different I/O paths here, >> and this might tell us if the problem is on the client side. > Okay, I've tried enabling rbd caching now, and so far, the problem appears to > be gone. > > I am using libvirt for starting and managing the virtual machines, and what I > did was change the<source> element for the virtual disk from > > <source protocol='rbd' name='rbd/name_of_image'> > > to > > <source protocol='rbd' name='rbd/name_of_image:rbd_cache=true'> > > and then restart the VM. > (I found that in one of your mails on this list; there does not appear to be > any proper documentation on this...) > > The iotester does not find any corruptions with these settings. > > The VM ist still horribly broken, but that's probably lingering filesystem > damage from yesterday. I'll try with a fresh image next. > > I did not change anything else in the setup. In particular, the OSDs still use > btrfs. One of the OSD has been restarted, though. I will run another test with > a VM without rbd caching, to make sure it wasn't by random chance restarting > that one osd that made the real difference. > > Enabling btrfs did not appear to make any difference wrt performance, but > that's probably because my tests mostly create sustained sequential IO, for > which caches are generally not very helpful. > > Enabling rbd caching is not a solution I particularly like, for two reasons: > > 1. In my setup, migrating VMs from one host to another is a normal part of > operation, and I still don't know ho to prevent data corruption (in the form > of silently lost writes) when combining rbd caching and migration. > > 2. I'm not really looking into speeding up single VM, I'm really more > interested in just how many VMs I can run before performance starts degrading > for everyone, and I don't think rbd caching will help with that. > > Regards, > Guido > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Oliver Francke filoo GmbH Moltkestraße 25a 33330 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 13:36 ` Oliver Francke @ 2012-06-08 13:55 ` Sage Weil 2012-06-08 14:50 ` Josh Durgin 2012-06-11 15:50 ` Guido Winkelmann 0 siblings, 2 replies; 30+ messages in thread From: Sage Weil @ 2012-06-08 13:55 UTC (permalink / raw) To: Oliver Francke; +Cc: Guido Winkelmann, Josh Durgin, ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 4530 bytes --] On Fri, 8 Jun 2012, Oliver Francke wrote: > Hi Guido, > > yeah, there is something weird going on. I just started to establish some > test-VM's. Freshly imported from running *.qcow2 images. > Kernel panic with INIT, seg-faults and other "funny" stuff. > > Just added the rbd_cache=true in my config, voila. All is > fast-n-up-n-running... > All my testing was done with cache enabled... Since our errors all came from > rbd_writeback from former ceph-versions... Are you guys able to reproduce the corruption with 'debug osd = 20' and 'debug ms = 1'? Ideally we'd like to: - reproduce from a fresh vm, with osd logs - identify the bad file - map that file to a block offset (see http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) - use that to identify the badness in the log I suspect the cache is just masking the problem because it submits fewer IOs... sage > > Josh? Sage? Help?! > > Oliver. > > On 06/08/2012 02:55 PM, Guido Winkelmann wrote: > > Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie: > > > On 06/07/2012 11:04 AM, Guido Winkelmann wrote: > > > > Hi, > > > > > > > > I'm using Ceph with RBD to provide network-transparent disk images for > > > > KVM- > > > > based virtual servers. The last two days, I've been hunting some weird > > > > elusive bug where data in the virtual machines would be corrupted in > > > > weird ways. It usually manifests in files having some random data - > > > > usually zeroes - at the start before the actual contents that should be > > > > in there start. > > > I definitely want to figure out what's going on with this. > > > A few questions: > > > > > > Are you using rbd caching? If so, what settings? > > > > > > In either case, does the corruption still occur if you > > > switch caching on/off? There are different I/O paths here, > > > and this might tell us if the problem is on the client side. > > Okay, I've tried enabling rbd caching now, and so far, the problem appears > > to > > be gone. > > > > I am using libvirt for starting and managing the virtual machines, and what > > I > > did was change the<source> element for the virtual disk from > > > > <source protocol='rbd' name='rbd/name_of_image'> > > > > to > > > > <source protocol='rbd' name='rbd/name_of_image:rbd_cache=true'> > > > > and then restart the VM. > > (I found that in one of your mails on this list; there does not appear to be > > any proper documentation on this...) > > > > The iotester does not find any corruptions with these settings. > > > > The VM ist still horribly broken, but that's probably lingering filesystem > > damage from yesterday. I'll try with a fresh image next. > > > > I did not change anything else in the setup. In particular, the OSDs still > > use > > btrfs. One of the OSD has been restarted, though. I will run another test > > with > > a VM without rbd caching, to make sure it wasn't by random chance restarting > > that one osd that made the real difference. > > > > Enabling btrfs did not appear to make any difference wrt performance, but > > that's probably because my tests mostly create sustained sequential IO, for > > which caches are generally not very helpful. > > > > Enabling rbd caching is not a solution I particularly like, for two reasons: > > > > 1. In my setup, migrating VMs from one host to another is a normal part of > > operation, and I still don't know ho to prevent data corruption (in the form > > of silently lost writes) when combining rbd caching and migration. > > > > 2. I'm not really looking into speeding up single VM, I'm really more > > interested in just how many VMs I can run before performance starts > > degrading > > for everyone, and I don't think rbd caching will help with that. > > > > Regards, > > Guido > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > Oliver Francke > > filoo GmbH > Moltkestraße 25a > 33330 Gütersloh > HRB4355 AG Gütersloh > > Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 13:55 ` Sage Weil @ 2012-06-08 14:50 ` Josh Durgin 2012-06-08 15:39 ` Oliver Francke 2012-06-08 17:15 ` Guido Winkelmann 2012-06-11 15:50 ` Guido Winkelmann 1 sibling, 2 replies; 30+ messages in thread From: Josh Durgin @ 2012-06-08 14:50 UTC (permalink / raw) To: Sage Weil; +Cc: Oliver Francke, Guido Winkelmann, ceph-devel On 06/08/2012 06:55 AM, Sage Weil wrote: > On Fri, 8 Jun 2012, Oliver Francke wrote: >> Hi Guido, >> >> yeah, there is something weird going on. I just started to establish some >> test-VM's. Freshly imported from running *.qcow2 images. >> Kernel panic with INIT, seg-faults and other "funny" stuff. >> >> Just added the rbd_cache=true in my config, voila. All is >> fast-n-up-n-running... >> All my testing was done with cache enabled... Since our errors all came from >> rbd_writeback from former ceph-versions... > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > 'debug ms = 1'? Ideally we'd like to: > > - reproduce from a fresh vm, with osd logs > - identify the bad file > - map that file to a block offset (see > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > - use that to identify the badness in the log > > I suspect the cache is just masking the problem because it submits fewer > IOs... The cache also doesn't do sparse reads. Is it still reproducible with a fresh vm when you set filestore_fiemap_threshold = 0 for the osds, and run without rbd caching? Josh > sage > > >> >> Josh? Sage? Help?! >> >> Oliver. >> >> On 06/08/2012 02:55 PM, Guido Winkelmann wrote: >>> Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie: >>>> On 06/07/2012 11:04 AM, Guido Winkelmann wrote: >>>>> Hi, >>>>> >>>>> I'm using Ceph with RBD to provide network-transparent disk images for >>>>> KVM- >>>>> based virtual servers. The last two days, I've been hunting some weird >>>>> elusive bug where data in the virtual machines would be corrupted in >>>>> weird ways. It usually manifests in files having some random data - >>>>> usually zeroes - at the start before the actual contents that should be >>>>> in there start. >>>> I definitely want to figure out what's going on with this. >>>> A few questions: >>>> >>>> Are you using rbd caching? If so, what settings? >>>> >>>> In either case, does the corruption still occur if you >>>> switch caching on/off? There are different I/O paths here, >>>> and this might tell us if the problem is on the client side. >>> Okay, I've tried enabling rbd caching now, and so far, the problem appears >>> to >>> be gone. >>> >>> I am using libvirt for starting and managing the virtual machines, and what >>> I >>> did was change the<source> element for the virtual disk from >>> >>> <source protocol='rbd' name='rbd/name_of_image'> >>> >>> to >>> >>> <source protocol='rbd' name='rbd/name_of_image:rbd_cache=true'> >>> >>> and then restart the VM. >>> (I found that in one of your mails on this list; there does not appear to be >>> any proper documentation on this...) >>> >>> The iotester does not find any corruptions with these settings. >>> >>> The VM ist still horribly broken, but that's probably lingering filesystem >>> damage from yesterday. I'll try with a fresh image next. >>> >>> I did not change anything else in the setup. In particular, the OSDs still >>> use >>> btrfs. One of the OSD has been restarted, though. I will run another test >>> with >>> a VM without rbd caching, to make sure it wasn't by random chance restarting >>> that one osd that made the real difference. >>> >>> Enabling btrfs did not appear to make any difference wrt performance, but >>> that's probably because my tests mostly create sustained sequential IO, for >>> which caches are generally not very helpful. >>> >>> Enabling rbd caching is not a solution I particularly like, for two reasons: >>> >>> 1. In my setup, migrating VMs from one host to another is a normal part of >>> operation, and I still don't know ho to prevent data corruption (in the form >>> of silently lost writes) when combining rbd caching and migration. >>> >>> 2. I'm not really looking into speeding up single VM, I'm really more >>> interested in just how many VMs I can run before performance starts >>> degrading >>> for everyone, and I don't think rbd caching will help with that. >>> >>> Regards, >>> Guido >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> -- >> >> Oliver Francke >> >> filoo GmbH >> Moltkestraße 25a >> 33330 Gütersloh >> HRB4355 AG Gütersloh >> >> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz >> >> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 14:50 ` Josh Durgin @ 2012-06-08 15:39 ` Oliver Francke 2012-06-08 17:15 ` Guido Winkelmann 1 sibling, 0 replies; 30+ messages in thread From: Oliver Francke @ 2012-06-08 15:39 UTC (permalink / raw) To: Josh Durgin; +Cc: Sage Weil, Guido Winkelmann, ceph-devel Well then, quite busy, too with some other stuff, but... On 06/08/2012 04:50 PM, Josh Durgin wrote: > On 06/08/2012 06:55 AM, Sage Weil wrote: >> On Fri, 8 Jun 2012, Oliver Francke wrote: >>> Hi Guido, >>> >>> yeah, there is something weird going on. I just started to establish >>> some >>> test-VM's. Freshly imported from running *.qcow2 images. >>> Kernel panic with INIT, seg-faults and other "funny" stuff. >>> >>> Just added the rbd_cache=true in my config, voila. All is >>> fast-n-up-n-running... >>> All my testing was done with cache enabled... Since our errors all >>> came from >>> rbd_writeback from former ceph-versions... >> >> Are you guys able to reproduce the corruption with 'debug osd = 20' and >> 'debug ms = 1'? Ideally we'd like to: >> >> - reproduce from a fresh vm, with osd logs >> - identify the bad file >> - map that file to a block offset (see >> http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) >> - use that to identify the badness in the log a logfile with debugging is available at our local store... >> >> I suspect the cache is just masking the problem because it submits fewer >> IOs... > > The cache also doesn't do sparse reads. Is it still reproducible with > a fresh vm when you set filestore_fiemap_threshold = 0 for the osds, > and run without rbd caching? restarted OSDs with this setting, but without rbd_cache I still get errors. *sigh* Oliver. > > Josh > >> sage >> >> >>> >>> Josh? Sage? Help?! >>> >>> Oliver. >>> >>> On 06/08/2012 02:55 PM, Guido Winkelmann wrote: >>>> Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie: >>>>> On 06/07/2012 11:04 AM, Guido Winkelmann wrote: >>>>>> Hi, >>>>>> >>>>>> I'm using Ceph with RBD to provide network-transparent disk >>>>>> images for >>>>>> KVM- >>>>>> based virtual servers. The last two days, I've been hunting some >>>>>> weird >>>>>> elusive bug where data in the virtual machines would be corrupted in >>>>>> weird ways. It usually manifests in files having some random data - >>>>>> usually zeroes - at the start before the actual contents that >>>>>> should be >>>>>> in there start. >>>>> I definitely want to figure out what's going on with this. >>>>> A few questions: >>>>> >>>>> Are you using rbd caching? If so, what settings? >>>>> >>>>> In either case, does the corruption still occur if you >>>>> switch caching on/off? There are different I/O paths here, >>>>> and this might tell us if the problem is on the client side. >>>> Okay, I've tried enabling rbd caching now, and so far, the problem >>>> appears >>>> to >>>> be gone. >>>> >>>> I am using libvirt for starting and managing the virtual machines, >>>> and what >>>> I >>>> did was change the<source> element for the virtual disk from >>>> >>>> <source protocol='rbd' name='rbd/name_of_image'> >>>> >>>> to >>>> >>>> <source protocol='rbd' name='rbd/name_of_image:rbd_cache=true'> >>>> >>>> and then restart the VM. >>>> (I found that in one of your mails on this list; there does not >>>> appear to be >>>> any proper documentation on this...) >>>> >>>> The iotester does not find any corruptions with these settings. >>>> >>>> The VM ist still horribly broken, but that's probably lingering >>>> filesystem >>>> damage from yesterday. I'll try with a fresh image next. >>>> >>>> I did not change anything else in the setup. In particular, the >>>> OSDs still >>>> use >>>> btrfs. One of the OSD has been restarted, though. I will run >>>> another test >>>> with >>>> a VM without rbd caching, to make sure it wasn't by random chance >>>> restarting >>>> that one osd that made the real difference. >>>> >>>> Enabling btrfs did not appear to make any difference wrt >>>> performance, but >>>> that's probably because my tests mostly create sustained sequential >>>> IO, for >>>> which caches are generally not very helpful. >>>> >>>> Enabling rbd caching is not a solution I particularly like, for two >>>> reasons: >>>> >>>> 1. In my setup, migrating VMs from one host to another is a normal >>>> part of >>>> operation, and I still don't know ho to prevent data corruption (in >>>> the form >>>> of silently lost writes) when combining rbd caching and migration. >>>> >>>> 2. I'm not really looking into speeding up single VM, I'm really more >>>> interested in just how many VMs I can run before performance starts >>>> degrading >>>> for everyone, and I don't think rbd caching will help with that. >>>> >>>> Regards, >>>> Guido >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe >>>> ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> -- >>> >>> Oliver Francke >>> >>> filoo GmbH >>> Moltkestraße 25a >>> 33330 Gütersloh >>> HRB4355 AG Gütersloh >>> >>> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz >>> >>> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > -- Oliver Francke filoo GmbH Moltkestraße 25a 33330 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 14:50 ` Josh Durgin 2012-06-08 15:39 ` Oliver Francke @ 2012-06-08 17:15 ` Guido Winkelmann 2012-06-10 3:04 ` Sage Weil 1 sibling, 1 reply; 30+ messages in thread From: Guido Winkelmann @ 2012-06-08 17:15 UTC (permalink / raw) To: Josh Durgin; +Cc: Sage Weil, Oliver Francke, ceph-devel Am Freitag, 8. Juni 2012, 07:50:36 schrieb Josh Durgin: > On 06/08/2012 06:55 AM, Sage Weil wrote: > > On Fri, 8 Jun 2012, Oliver Francke wrote: > >> Hi Guido, > >> > >> yeah, there is something weird going on. I just started to establish some > >> test-VM's. Freshly imported from running *.qcow2 images. > >> Kernel panic with INIT, seg-faults and other "funny" stuff. > >> > >> Just added the rbd_cache=true in my config, voila. All is > >> fast-n-up-n-running... > >> All my testing was done with cache enabled... Since our errors all came > >> from rbd_writeback from former ceph-versions... > > > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > > > > 'debug ms = 1'? Ideally we'd like to: > > - reproduce from a fresh vm, with osd logs > > - identify the bad file > > - map that file to a block offset (see > > > > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > > > > - use that to identify the badness in the log > > > > I suspect the cache is just masking the problem because it submits fewer > > IOs... > > The cache also doesn't do sparse reads. Is it still reproducible with > a fresh vm when you set filestore_fiemap_threshold = 0 for the osds, > and run without rbd caching? I have set filestore_fiemap_threshold = 0 on all osds and restarted them. The problem is still there, and so bad I cannot even run this fiemap utility that Sage posted. I guess I should have tried booting the VM from a livecd instead... Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 17:15 ` Guido Winkelmann @ 2012-06-10 3:04 ` Sage Weil 2012-06-10 3:07 ` Sage Weil 2012-06-11 14:15 ` Guido Winkelmann 0 siblings, 2 replies; 30+ messages in thread From: Sage Weil @ 2012-06-10 3:04 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Josh Durgin, Sage Weil, Oliver Francke, ceph-devel On Fri, 8 Jun 2012, Guido Winkelmann wrote: > Am Freitag, 8. Juni 2012, 07:50:36 schrieb Josh Durgin: > > On 06/08/2012 06:55 AM, Sage Weil wrote: > > > On Fri, 8 Jun 2012, Oliver Francke wrote: > > >> Hi Guido, > > >> > > >> yeah, there is something weird going on. I just started to establish some > > >> test-VM's. Freshly imported from running *.qcow2 images. > > >> Kernel panic with INIT, seg-faults and other "funny" stuff. > > >> > > >> Just added the rbd_cache=true in my config, voila. All is > > >> fast-n-up-n-running... > > >> All my testing was done with cache enabled... Since our errors all came > > >> from rbd_writeback from former ceph-versions... > > > > > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > > > > > > 'debug ms = 1'? Ideally we'd like to: > > > - reproduce from a fresh vm, with osd logs > > > - identify the bad file > > > - map that file to a block offset (see > > > > > > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > > > > > > - use that to identify the badness in the log > > > > > > I suspect the cache is just masking the problem because it submits fewer > > > IOs... > > > > The cache also doesn't do sparse reads. Is it still reproducible with > > a fresh vm when you set filestore_fiemap_threshold = 0 for the osds, > > and run without rbd caching? > > I have set filestore_fiemap_threshold = 0 on all osds and restarted them. The > problem is still there, and so bad I cannot even run this fiemap utility that > Sage posted. I guess I should have tried booting the VM from a livecd > instead... Whoops, filestore fiemap threshold = 0 doesn't turn it off, but filestore fiemap = false will. (Or filestore fiemap threshold = 1000000000 would too.) Can you try again with the above? Thanks! sage ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-10 3:04 ` Sage Weil @ 2012-06-10 3:07 ` Sage Weil 2012-06-11 14:15 ` Guido Winkelmann 1 sibling, 0 replies; 30+ messages in thread From: Sage Weil @ 2012-06-10 3:07 UTC (permalink / raw) To: Sage Weil; +Cc: Guido Winkelmann, Josh Durgin, Oliver Francke, ceph-devel On Sat, 9 Jun 2012, Sage Weil wrote: > On Fri, 8 Jun 2012, Guido Winkelmann wrote: > > Am Freitag, 8. Juni 2012, 07:50:36 schrieb Josh Durgin: > > > On 06/08/2012 06:55 AM, Sage Weil wrote: > > > > On Fri, 8 Jun 2012, Oliver Francke wrote: > > > >> Hi Guido, > > > >> > > > >> yeah, there is something weird going on. I just started to establish some > > > >> test-VM's. Freshly imported from running *.qcow2 images. > > > >> Kernel panic with INIT, seg-faults and other "funny" stuff. > > > >> > > > >> Just added the rbd_cache=true in my config, voila. All is > > > >> fast-n-up-n-running... > > > >> All my testing was done with cache enabled... Since our errors all came > > > >> from rbd_writeback from former ceph-versions... > > > > > > > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > > > > > > > > 'debug ms = 1'? Ideally we'd like to: > > > > - reproduce from a fresh vm, with osd logs > > > > - identify the bad file > > > > - map that file to a block offset (see > > > > > > > > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > > > > > > > > - use that to identify the badness in the log > > > > > > > > I suspect the cache is just masking the problem because it submits fewer > > > > IOs... > > > > > > The cache also doesn't do sparse reads. Is it still reproducible with > > > a fresh vm when you set filestore_fiemap_threshold = 0 for the osds, > > > and run without rbd caching? > > > > I have set filestore_fiemap_threshold = 0 on all osds and restarted them. The > > problem is still there, and so bad I cannot even run this fiemap utility that > > Sage posted. I guess I should have tried booting the VM from a livecd > > instead... > > Whoops, > > filestore fiemap threshold = 0 > > doesn't turn it off, but > > filestore fiemap = false > > will. (Or filestore fiemap threshold = 1000000000 would too.) Can you > try again with the above? BTW, I opened http://tracker.newdream.net/issues/2535 with (I think) all the relevant info so far. sage ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-10 3:04 ` Sage Weil 2012-06-10 3:07 ` Sage Weil @ 2012-06-11 14:15 ` Guido Winkelmann 1 sibling, 0 replies; 30+ messages in thread From: Guido Winkelmann @ 2012-06-11 14:15 UTC (permalink / raw) To: Sage Weil; +Cc: Josh Durgin, Oliver Francke, ceph-devel Am Samstag, 9. Juni 2012, 20:04:20 schrieb Sage Weil: > On Fri, 8 Jun 2012, Guido Winkelmann wrote: > > Am Freitag, 8. Juni 2012, 07:50:36 schrieb Josh Durgin: > > > On 06/08/2012 06:55 AM, Sage Weil wrote: > > > > On Fri, 8 Jun 2012, Oliver Francke wrote: > > > >> Hi Guido, > > > >> > > > >> yeah, there is something weird going on. I just started to establish > > > >> some > > > >> test-VM's. Freshly imported from running *.qcow2 images. > > > >> Kernel panic with INIT, seg-faults and other "funny" stuff. > > > >> > > > >> Just added the rbd_cache=true in my config, voila. All is > > > >> fast-n-up-n-running... > > > >> All my testing was done with cache enabled... Since our errors all > > > >> came > > > >> from rbd_writeback from former ceph-versions... > > > > > > > > Are you guys able to reproduce the corruption with 'debug osd = 20' > > > > and > > > > > > > > 'debug ms = 1'? Ideally we'd like to: > > > > - reproduce from a fresh vm, with osd logs > > > > - identify the bad file > > > > - map that file to a block offset (see > > > > > > > > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > > > > > > > > - use that to identify the badness in the log > > > > > > > > I suspect the cache is just masking the problem because it submits > > > > fewer > > > > IOs... > > > > > > The cache also doesn't do sparse reads. Is it still reproducible with > > > a fresh vm when you set filestore_fiemap_threshold = 0 for the osds, > > > and run without rbd caching? > > > > I have set filestore_fiemap_threshold = 0 on all osds and restarted them. > > The problem is still there, and so bad I cannot even run this fiemap > > utility that Sage posted. I guess I should have tried booting the VM from > > a livecd instead... > > Whoops, > > filestore fiemap threshold = 0 > > doesn't turn it off, but > > filestore fiemap = false Okay, I changed "filestore fiemap threshold = 0" to "filestore fiemap = false" under [osd]. So far, the problem does not seem to resurface. Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-08 13:55 ` Sage Weil 2012-06-08 14:50 ` Josh Durgin @ 2012-06-11 15:50 ` Guido Winkelmann 2012-06-11 16:30 ` Sage Weil 1 sibling, 1 reply; 30+ messages in thread From: Guido Winkelmann @ 2012-06-11 15:50 UTC (permalink / raw) To: Sage Weil; +Cc: Oliver Francke, Josh Durgin, ceph-devel Am Freitag, 8. Juni 2012, 06:55:19 schrieb Sage Weil: > On Fri, 8 Jun 2012, Oliver Francke wrote: > > Hi Guido, > > > > yeah, there is something weird going on. I just started to establish some > > test-VM's. Freshly imported from running *.qcow2 images. > > Kernel panic with INIT, seg-faults and other "funny" stuff. > > > > Just added the rbd_cache=true in my config, voila. All is > > fast-n-up-n-running... > > All my testing was done with cache enabled... Since our errors all came > > from rbd_writeback from former ceph-versions... > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > 'debug ms = 1'? Ideally we'd like to: > > - reproduce from a fresh vm, with osd logs > - identify the bad file > - map that file to a block offset (see > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > - use that to identify the badness in the log > > I suspect the cache is just masking the problem because it submits fewer > IOs... Okay, I added 'debug osd = 20' and 'debug ms = 1' under [global] and 'filestore fiemap = false' under [osd] and started a new VM. That worked nicely, and the iotester found no corruptions. Then I removed 'filestore fiemap = false' from the config, restarted all osds and ran the iotester again. Output is as follows: testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date Mon Jun 11 17:34:44 CEST 2012 Wrote 100 MiB of data in 1943 milliseconds Wrote 100 MiB of data in 1858 milliseconds Wrote 100 MiB of data in 2213 milliseconds Wrote 100 MiB of data in 3441 milliseconds Wrote 100 MiB of data in 2705 milliseconds Wrote 100 MiB of data in 1778 milliseconds Wrote 100 MiB of data in 1974 milliseconds Wrote 100 MiB of data in 2780 milliseconds Wrote 100 MiB of data in 1961 milliseconds Wrote 100 MiB of data in 2366 milliseconds Wrote 100 MiB of data in 1886 milliseconds Wrote 100 MiB of data in 3589 milliseconds Wrote 100 MiB of data in 1973 milliseconds Wrote 100 MiB of data in 2506 milliseconds Wrote 100 MiB of data in 1937 milliseconds Wrote 100 MiB of data in 3404 milliseconds Wrote 100 MiB of data in 1990 milliseconds Wrote 100 MiB of data in 3713 milliseconds Read 100 MiB of data in 4856 milliseconds Digest wrong for file "/var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa" Mon Jun 11 17:35:34 CEST 2012 testserver-rbd11 iotester # ~/fiemap /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa File /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa has 1 extents: # Logical Physical Length Flags 0: 0000000000000000 00000000a8200000 0000000000100000 000 I looked into the file in question, and it started with zero-bytes from the start until position 0xbff, even though it was supposed to all random data. I have included timestamps in the hopes they might make it easier to find the related entries in the logs. So what do I do now? The logs are very large and complex, and I don't understand most of what's in there. I don't even know which OSD served that particular block/object. Regards, Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-11 15:50 ` Guido Winkelmann @ 2012-06-11 16:30 ` Sage Weil 2012-06-11 17:07 ` Guido Winkelmann 2012-06-12 12:31 ` Guido Winkelmann 0 siblings, 2 replies; 30+ messages in thread From: Sage Weil @ 2012-06-11 16:30 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Sage Weil, Oliver Francke, Josh Durgin, ceph-devel On Mon, 11 Jun 2012, Guido Winkelmann wrote: > Am Freitag, 8. Juni 2012, 06:55:19 schrieb Sage Weil: > > On Fri, 8 Jun 2012, Oliver Francke wrote: > > > Hi Guido, > > > > > > yeah, there is something weird going on. I just started to establish some > > > test-VM's. Freshly imported from running *.qcow2 images. > > > Kernel panic with INIT, seg-faults and other "funny" stuff. > > > > > > Just added the rbd_cache=true in my config, voila. All is > > > fast-n-up-n-running... > > > All my testing was done with cache enabled... Since our errors all came > > > from rbd_writeback from former ceph-versions... > > > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > > 'debug ms = 1'? Ideally we'd like to: > > > > - reproduce from a fresh vm, with osd logs > > - identify the bad file > > - map that file to a block offset (see > > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > > - use that to identify the badness in the log > > > > I suspect the cache is just masking the problem because it submits fewer > > IOs... > > Okay, I added 'debug osd = 20' and 'debug ms = 1' under [global] and > 'filestore fiemap = false' under [osd] and started a new VM. That worked > nicely, and the iotester found no corruptions. Then I removed 'filestore > fiemap = false' from the config, restarted all osds and ran the iotester > again. Output is as follows: > > testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date > Mon Jun 11 17:34:44 CEST 2012 > Wrote 100 MiB of data in 1943 milliseconds > Wrote 100 MiB of data in 1858 milliseconds > Wrote 100 MiB of data in 2213 milliseconds > Wrote 100 MiB of data in 3441 milliseconds > Wrote 100 MiB of data in 2705 milliseconds > Wrote 100 MiB of data in 1778 milliseconds > Wrote 100 MiB of data in 1974 milliseconds > Wrote 100 MiB of data in 2780 milliseconds > Wrote 100 MiB of data in 1961 milliseconds > Wrote 100 MiB of data in 2366 milliseconds > Wrote 100 MiB of data in 1886 milliseconds > Wrote 100 MiB of data in 3589 milliseconds > Wrote 100 MiB of data in 1973 milliseconds > Wrote 100 MiB of data in 2506 milliseconds > Wrote 100 MiB of data in 1937 milliseconds > Wrote 100 MiB of data in 3404 milliseconds > Wrote 100 MiB of data in 1990 milliseconds > Wrote 100 MiB of data in 3713 milliseconds > Read 100 MiB of data in 4856 milliseconds > Digest wrong for file "/var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa" > Mon Jun 11 17:35:34 CEST 2012 > testserver-rbd11 iotester # ~/fiemap > /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa > File /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa has 1 extents: > # Logical Physical Length Flags > 0: 0000000000000000 00000000a8200000 0000000000100000 000 > > I looked into the file in question, and it started with zero-bytes from the > start until position 0xbff, even though it was supposed to all random data. > > I have included timestamps in the hopes they might make it easier to find the > related entries in the logs. > > So what do I do now? The logs are very large and complex, and I don't > understand most of what's in there. I don't even know which OSD served that > particular block/object. If you can reproduce it with 'debug filestore = 20' too, that will be better, as it will tell us what the FIEMAP ioctl is returning. Also, if you can attach/post the contents of the object itself (rados -p rbd get rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right data (and the sparse-read operation that librbd is doing is the culprit). As for the log: First, map the offset to an rbd block. For example, taking the 'Physical' value of 00000000a8200000 from above: $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) 0000000002a0 Then figure out what the object name prefix is: $ rbd info <imagename> | grep prefix block_name_prefix: rb.0.1 Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. Then map that back to an osd with $ ceph osd map rbd rb.0.1.0000000002a0 osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 (2.5) -> up [0,2] acting [0,2] You'll see the osd ids listed in brackets after 'active'. We want the first one, 0 in my example. The log from that OSD is what we need. Thanks! sage ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-11 16:30 ` Sage Weil @ 2012-06-11 17:07 ` Guido Winkelmann 2012-06-11 17:12 ` Sage Weil 2012-06-11 17:29 ` Josh Durgin 2012-06-12 12:31 ` Guido Winkelmann 1 sibling, 2 replies; 30+ messages in thread From: Guido Winkelmann @ 2012-06-11 17:07 UTC (permalink / raw) To: Sage Weil; +Cc: Oliver Francke, Josh Durgin, ceph-devel Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: > On Mon, 11 Jun 2012, Guido Winkelmann wrote: > > Am Freitag, 8. Juni 2012, 06:55:19 schrieb Sage Weil: > > > On Fri, 8 Jun 2012, Oliver Francke wrote: > > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > > > > > > 'debug ms = 1'? Ideally we'd like to: > > > - reproduce from a fresh vm, with osd logs > > > - identify the bad file > > > - map that file to a block offset (see > > > > > > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > > > > > > - use that to identify the badness in the log > > > > > > I suspect the cache is just masking the problem because it submits fewer > > > IOs... > > > > Okay, I added 'debug osd = 20' and 'debug ms = 1' under [global] and > > 'filestore fiemap = false' under [osd] and started a new VM. That worked > > nicely, and the iotester found no corruptions. Then I removed 'filestore > > fiemap = false' from the config, restarted all osds and ran the iotester > > again. Output is as follows: > > > > testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date > > Mon Jun 11 17:34:44 CEST 2012 > > Wrote 100 MiB of data in 1943 milliseconds > > Wrote 100 MiB of data in 1858 milliseconds > > Wrote 100 MiB of data in 2213 milliseconds > > Wrote 100 MiB of data in 3441 milliseconds > > Wrote 100 MiB of data in 2705 milliseconds > > Wrote 100 MiB of data in 1778 milliseconds > > Wrote 100 MiB of data in 1974 milliseconds > > Wrote 100 MiB of data in 2780 milliseconds > > Wrote 100 MiB of data in 1961 milliseconds > > Wrote 100 MiB of data in 2366 milliseconds > > Wrote 100 MiB of data in 1886 milliseconds > > Wrote 100 MiB of data in 3589 milliseconds > > Wrote 100 MiB of data in 1973 milliseconds > > Wrote 100 MiB of data in 2506 milliseconds > > Wrote 100 MiB of data in 1937 milliseconds > > Wrote 100 MiB of data in 3404 milliseconds > > Wrote 100 MiB of data in 1990 milliseconds > > Wrote 100 MiB of data in 3713 milliseconds > > Read 100 MiB of data in 4856 milliseconds > > Digest wrong for file > > "/var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa" Mon Jun 11 > > 17:35:34 CEST 2012 > > testserver-rbd11 iotester # ~/fiemap > > /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa > > File /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa has 1 extents: > > # Logical Physical Length Flags > > 0: 0000000000000000 00000000a8200000 0000000000100000 000 > > > > I looked into the file in question, and it started with zero-bytes from > > the > > start until position 0xbff, even though it was supposed to all random > > data. > > > > I have included timestamps in the hopes they might make it easier to find > > the related entries in the logs. > > > > So what do I do now? The logs are very large and complex, and I don't > > understand most of what's in there. I don't even know which OSD served > > that > > particular block/object. > > If you can reproduce it with 'debug filestore = 20' too, that will be > better, as it will tell us what the FIEMAP ioctl is returning. Also, if > you can attach/post the contents of the object itself (rados -p rbd get > rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right > data (and the sparse-read operation that librbd is doing is the culprit). Um. Maybe... That's the problem with using random data, I can't just look at it and recognize it. I guess tomorrow I'll slap something together to see if I can find any 1 Meg-range of data in there that matches the expect checksum. > > As for the log: > > First, map the offset to an rbd block. For example, taking the 'Physical' > value of 00000000a8200000 from above: > > $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) > 0000000002a0 > > Then figure out what the object name prefix is: > > $ rbd info <imagename> | grep prefix > block_name_prefix: rb.0.1 > > Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. > > Then map that back to an osd with > > $ ceph osd map rbd rb.0.1.0000000002a0 > osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 > (2.5) -> up [0,2] acting [0,2] > > You'll see the osd ids listed in brackets after 'active'. We want the > first one, 0 in my example. The log from that OSD is what we need. I'm getting osdmap e89 pool 'rbd' (2) object 'rb.0.13.0000000002a0' -> pg 2.aca5eccb (2.4b) -> up [1,2] acting [1,2] from that command, so I guess it's osd.1 then. Do you have somewhere I can upload the log? It is 1.1 GiB in size. Bzip2 gets it down to 53 MiB, but that's still too large to be sent to a mailing list... Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-11 17:07 ` Guido Winkelmann @ 2012-06-11 17:12 ` Sage Weil 2012-06-11 17:29 ` Josh Durgin 1 sibling, 0 replies; 30+ messages in thread From: Sage Weil @ 2012-06-11 17:12 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Oliver Francke, Josh Durgin, ceph-devel On Mon, 11 Jun 2012, Guido Winkelmann wrote: > Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: > > On Mon, 11 Jun 2012, Guido Winkelmann wrote: > > > Am Freitag, 8. Juni 2012, 06:55:19 schrieb Sage Weil: > > > > On Fri, 8 Jun 2012, Oliver Francke wrote: > > > > > Are you guys able to reproduce the corruption with 'debug osd = 20' and > > > > > > > > 'debug ms = 1'? Ideally we'd like to: > > > > - reproduce from a fresh vm, with osd logs > > > > - identify the bad file > > > > - map that file to a block offset (see > > > > > > > > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) > > > > > > > > - use that to identify the badness in the log > > > > > > > > I suspect the cache is just masking the problem because it submits fewer > > > > IOs... > > > > > > Okay, I added 'debug osd = 20' and 'debug ms = 1' under [global] and > > > 'filestore fiemap = false' under [osd] and started a new VM. That worked > > > nicely, and the iotester found no corruptions. Then I removed 'filestore > > > fiemap = false' from the config, restarted all osds and ran the iotester > > > again. Output is as follows: > > > > > > testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date > > > Mon Jun 11 17:34:44 CEST 2012 > > > Wrote 100 MiB of data in 1943 milliseconds > > > Wrote 100 MiB of data in 1858 milliseconds > > > Wrote 100 MiB of data in 2213 milliseconds > > > Wrote 100 MiB of data in 3441 milliseconds > > > Wrote 100 MiB of data in 2705 milliseconds > > > Wrote 100 MiB of data in 1778 milliseconds > > > Wrote 100 MiB of data in 1974 milliseconds > > > Wrote 100 MiB of data in 2780 milliseconds > > > Wrote 100 MiB of data in 1961 milliseconds > > > Wrote 100 MiB of data in 2366 milliseconds > > > Wrote 100 MiB of data in 1886 milliseconds > > > Wrote 100 MiB of data in 3589 milliseconds > > > Wrote 100 MiB of data in 1973 milliseconds > > > Wrote 100 MiB of data in 2506 milliseconds > > > Wrote 100 MiB of data in 1937 milliseconds > > > Wrote 100 MiB of data in 3404 milliseconds > > > Wrote 100 MiB of data in 1990 milliseconds > > > Wrote 100 MiB of data in 3713 milliseconds > > > Read 100 MiB of data in 4856 milliseconds > > > Digest wrong for file > > > "/var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa" Mon Jun 11 > > > 17:35:34 CEST 2012 > > > testserver-rbd11 iotester # ~/fiemap > > > /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa > > > File /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa has 1 extents: > > > # Logical Physical Length Flags > > > 0: 0000000000000000 00000000a8200000 0000000000100000 000 > > > > > > I looked into the file in question, and it started with zero-bytes from > > > the > > > start until position 0xbff, even though it was supposed to all random > > > data. > > > > > > I have included timestamps in the hopes they might make it easier to find > > > the related entries in the logs. > > > > > > So what do I do now? The logs are very large and complex, and I don't > > > understand most of what's in there. I don't even know which OSD served > > > that > > > particular block/object. > > > > If you can reproduce it with 'debug filestore = 20' too, that will be > > better, as it will tell us what the FIEMAP ioctl is returning. Also, if > > you can attach/post the contents of the object itself (rados -p rbd get > > rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right > > data (and the sparse-read operation that librbd is doing is the culprit). > > Um. Maybe... That's the problem with using random data, I can't just look at > it and recognize it. I guess tomorrow I'll slap something together to see if I > can find any 1 Meg-range of data in there that matches the expect checksum. The process below will identify the object in question.. > > > > As for the log: > > > > First, map the offset to an rbd block. For example, taking the 'Physical' > > value of 00000000a8200000 from above: > > > > $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) > > 0000000002a0 > > > > Then figure out what the object name prefix is: > > > > $ rbd info <imagename> | grep prefix > > block_name_prefix: rb.0.1 > > > > Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. > > > > Then map that back to an osd with > > > > $ ceph osd map rbd rb.0.1.0000000002a0 > > osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 > > (2.5) -> up [0,2] acting [0,2] > > > > You'll see the osd ids listed in brackets after 'active'. We want the > > first one, 0 in my example. The log from that OSD is what we need. > > I'm getting > > osdmap e89 pool 'rbd' (2) object 'rb.0.13.0000000002a0' -> pg 2.aca5eccb > (2.4b) -> up [1,2] acting [1,2] > > from that command, so I guess it's osd.1 then. > Do you have somewhere I can upload the log? It is 1.1 GiB in size. Bzip2 > gets it down to 53 MiB, but that's still too large to be sent to a > mailing list... Yeah, but it'll be more useful if its generated with 'debug filestore = 20'... :) sage ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-11 17:07 ` Guido Winkelmann 2012-06-11 17:12 ` Sage Weil @ 2012-06-11 17:29 ` Josh Durgin 1 sibling, 0 replies; 30+ messages in thread From: Josh Durgin @ 2012-06-11 17:29 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Sage Weil, Oliver Francke, ceph-devel On 06/11/2012 10:07 AM, Guido Winkelmann wrote: > Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: >> On Mon, 11 Jun 2012, Guido Winkelmann wrote: >>> Am Freitag, 8. Juni 2012, 06:55:19 schrieb Sage Weil: >>>> On Fri, 8 Jun 2012, Oliver Francke wrote: > >>>> Are you guys able to reproduce the corruption with 'debug osd = 20' and >>>> >>>> 'debug ms = 1'? Ideally we'd like to: >>>> - reproduce from a fresh vm, with osd logs >>>> - identify the bad file >>>> - map that file to a block offset (see >>>> >>>> http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) >>>> >>>> - use that to identify the badness in the log >>>> >>>> I suspect the cache is just masking the problem because it submits fewer >>>> IOs... >>> >>> Okay, I added 'debug osd = 20' and 'debug ms = 1' under [global] and >>> 'filestore fiemap = false' under [osd] and started a new VM. That worked >>> nicely, and the iotester found no corruptions. Then I removed 'filestore >>> fiemap = false' from the config, restarted all osds and ran the iotester >>> again. Output is as follows: >>> >>> testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date >>> Mon Jun 11 17:34:44 CEST 2012 >>> Wrote 100 MiB of data in 1943 milliseconds >>> Wrote 100 MiB of data in 1858 milliseconds >>> Wrote 100 MiB of data in 2213 milliseconds >>> Wrote 100 MiB of data in 3441 milliseconds >>> Wrote 100 MiB of data in 2705 milliseconds >>> Wrote 100 MiB of data in 1778 milliseconds >>> Wrote 100 MiB of data in 1974 milliseconds >>> Wrote 100 MiB of data in 2780 milliseconds >>> Wrote 100 MiB of data in 1961 milliseconds >>> Wrote 100 MiB of data in 2366 milliseconds >>> Wrote 100 MiB of data in 1886 milliseconds >>> Wrote 100 MiB of data in 3589 milliseconds >>> Wrote 100 MiB of data in 1973 milliseconds >>> Wrote 100 MiB of data in 2506 milliseconds >>> Wrote 100 MiB of data in 1937 milliseconds >>> Wrote 100 MiB of data in 3404 milliseconds >>> Wrote 100 MiB of data in 1990 milliseconds >>> Wrote 100 MiB of data in 3713 milliseconds >>> Read 100 MiB of data in 4856 milliseconds >>> Digest wrong for file >>> "/var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa" Mon Jun 11 >>> 17:35:34 CEST 2012 >>> testserver-rbd11 iotester # ~/fiemap >>> /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa >>> File /var/iotest/b45e1a59f26830fb98af1ba9ed1360106f5580aa has 1 extents: >>> # Logical Physical Length Flags >>> 0: 0000000000000000 00000000a8200000 0000000000100000 000 >>> >>> I looked into the file in question, and it started with zero-bytes from >>> the >>> start until position 0xbff, even though it was supposed to all random >>> data. >>> >>> I have included timestamps in the hopes they might make it easier to find >>> the related entries in the logs. >>> >>> So what do I do now? The logs are very large and complex, and I don't >>> understand most of what's in there. I don't even know which OSD served >>> that >>> particular block/object. >> >> If you can reproduce it with 'debug filestore = 20' too, that will be >> better, as it will tell us what the FIEMAP ioctl is returning. Also, if >> you can attach/post the contents of the object itself (rados -p rbd get >> rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right >> data (and the sparse-read operation that librbd is doing is the culprit). > > Um. Maybe... That's the problem with using random data, I can't just look at > it and recognize it. I guess tomorrow I'll slap something together to see if I > can find any 1 Meg-range of data in there that matches the expect checksum. > >> >> As for the log: >> >> First, map the offset to an rbd block. For example, taking the 'Physical' >> value of 00000000a8200000 from above: >> >> $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) >> 0000000002a0 >> >> Then figure out what the object name prefix is: >> >> $ rbd info<imagename> | grep prefix >> block_name_prefix: rb.0.1 >> >> Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. >> >> Then map that back to an osd with >> >> $ ceph osd map rbd rb.0.1.0000000002a0 >> osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 >> (2.5) -> up [0,2] acting [0,2] >> >> You'll see the osd ids listed in brackets after 'active'. We want the >> first one, 0 in my example. The log from that OSD is what we need. > > I'm getting > > osdmap e89 pool 'rbd' (2) object 'rb.0.13.0000000002a0' -> pg 2.aca5eccb > (2.4b) -> up [1,2] acting [1,2] > > from that command, so I guess it's osd.1 then. > Do you have somewhere I can upload the log? It is 1.1 GiB in size. Bzip2 gets > it down to 53 MiB, but that's still too large to be sent to a mailing list... You can attach it to the tracker: http://tracker.newdream.net/issues/2535 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-11 16:30 ` Sage Weil 2012-06-11 17:07 ` Guido Winkelmann @ 2012-06-12 12:31 ` Guido Winkelmann 2012-06-15 12:14 ` Stefan Majer 1 sibling, 1 reply; 30+ messages in thread From: Guido Winkelmann @ 2012-06-12 12:31 UTC (permalink / raw) To: Sage Weil; +Cc: Oliver Francke, Josh Durgin, ceph-devel Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: > If you can reproduce it with 'debug filestore = 20' too, that will be > better, as it will tell us what the FIEMAP ioctl is returning. I ran another testrun with 'debug filestore = 20'. > Also, if > you can attach/post the contents of the object itself (rados -p rbd get > rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right > data (and the sparse-read operation that librbd is doing is the culprit). I tried that, with the block name that the steps further below gave me: rados -p rbd get rb.0.13.00000000045a block When I looked into the block, it looked like a bunch of temp files from the portage system with padding in between, although it should be random data... I think I got the wrong block after all... Here's what I did: Run the iotester again: testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date Tue Jun 12 13:51:58 CEST 2012 Wrote 100 MiB of data in 2004 milliseconds [snip lots of irrelevant lines] Wrote 100 MiB of data in 2537 milliseconds Read 100 MiB of data in 3794 milliseconds Read 100 MiB of data in 10150 milliseconds Digest wrong for file "/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e" Tue Jun 12 13:55:00 CEST 2012 Run the fiemap tool on that file: testserver-rbd11 ~ # ./fiemap /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e File /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e has 1 extents: # Logical Physical Length Flags 0: 0000000000000000 0000000116900000 0000000000100000 0001 > As for the log: > > First, map the offset to an rbd block. For example, taking the 'Physical' > value of 00000000a8200000 from above: > > $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) > 0000000002a0 That gave me $ printf "%012x\n" $((0x0000000116900000 / (4096*1024) )) 00000000045a > Then figure out what the object name prefix is: > > $ rbd info <imagename> | grep prefix > block_name_prefix: rb.0.1 Result: block_name_prefix: rb.0.13 > Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. Result: rb.0.13.00000000045a > Then map that back to an osd with > > $ ceph osd map rbd rb.0.1.0000000002a0 > osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 > (2.5) -> up [0,2] acting [0,2] That gives me [root@storage1 ~]# ceph osd map rbd rb.0.13.00000000045a 2> /dev/null osdmap e101 pool 'rbd' (2) object 'rb.0.13.00000000045a' -> pg 2.80b039fb (2.7b) -> up [2,1] acting [2,1] > You'll see the osd ids listed in brackets after 'active'. We want the > first one, 0 in my example. The log from that OSD is what we need. Okay, i'm attaching the compressed log for osd.2 and the compressed block to the issue report in the redmine. Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-12 12:31 ` Guido Winkelmann @ 2012-06-15 12:14 ` Stefan Majer 2012-06-15 15:38 ` Josh Durgin 0 siblings, 1 reply; 30+ messages in thread From: Stefan Majer @ 2012-06-15 12:14 UTC (permalink / raw) To: Guido Winkelmann; +Cc: Sage Weil, Oliver Francke, Josh Durgin, ceph-devel Hi, We had today a catastrophic fs corruption in one of our virtual machines, after fsck ~100MB was inside lost+found :-( So is think we hit the same bug (ceph-0.45.2, sparse rbd images) Is there any progress on this topic, or any hint how to help on this would be helpful. Greetings Stefan Majer On Tue, Jun 12, 2012 at 2:31 PM, Guido Winkelmann <guido-ceph@thisisnotatest.de> wrote: > Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: >> If you can reproduce it with 'debug filestore = 20' too, that will be >> better, as it will tell us what the FIEMAP ioctl is returning. > > I ran another testrun with 'debug filestore = 20'. > >> Also, if >> you can attach/post the contents of the object itself (rados -p rbd get >> rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right >> data (and the sparse-read operation that librbd is doing is the culprit). > > I tried that, with the block name that the steps further below gave me: > > rados -p rbd get rb.0.13.00000000045a block > > When I looked into the block, it looked like a bunch of temp files from the > portage system with padding in between, although it should be random data... I > think I got the wrong block after all... > > Here's what I did: > Run the iotester again: > testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date > Tue Jun 12 13:51:58 CEST 2012 > Wrote 100 MiB of data in 2004 milliseconds > [snip lots of irrelevant lines] > Wrote 100 MiB of data in 2537 milliseconds > Read 100 MiB of data in 3794 milliseconds > Read 100 MiB of data in 10150 milliseconds > Digest wrong for file "/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e" > Tue Jun 12 13:55:00 CEST 2012 > > Run the fiemap tool on that file: > > testserver-rbd11 ~ # ./fiemap > /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e > File /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e has 1 extents: > # Logical Physical Length Flags > 0: 0000000000000000 0000000116900000 0000000000100000 0001 > >> As for the log: >> >> First, map the offset to an rbd block. For example, taking the 'Physical' >> value of 00000000a8200000 from above: >> >> $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) >> 0000000002a0 > > That gave me > > $ printf "%012x\n" $((0x0000000116900000 / (4096*1024) )) > 00000000045a > >> Then figure out what the object name prefix is: >> >> $ rbd info <imagename> | grep prefix >> block_name_prefix: rb.0.1 > > Result: block_name_prefix: rb.0.13 > >> Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. > > Result: rb.0.13.00000000045a > >> Then map that back to an osd with >> >> $ ceph osd map rbd rb.0.1.0000000002a0 >> osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 >> (2.5) -> up [0,2] acting [0,2] > > That gives me > [root@storage1 ~]# ceph osd map rbd rb.0.13.00000000045a 2> /dev/null > osdmap e101 pool 'rbd' (2) object 'rb.0.13.00000000045a' -> pg 2.80b039fb > (2.7b) -> up [2,1] acting [2,1] > >> You'll see the osd ids listed in brackets after 'active'. We want the >> first one, 0 in my example. The log from that OSD is what we need. > > Okay, i'm attaching the compressed log for osd.2 and the compressed block to > the issue report in the redmine. > > Guido > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Stefan Majer -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-15 12:14 ` Stefan Majer @ 2012-06-15 15:38 ` Josh Durgin 2012-06-15 18:50 ` Josh Durgin 0 siblings, 1 reply; 30+ messages in thread From: Josh Durgin @ 2012-06-15 15:38 UTC (permalink / raw) To: Stefan Majer; +Cc: Guido Winkelmann, Sage Weil, Oliver Francke, ceph-devel Short version: you should set 'filestore fiemap = false' for your osds. I was able to reproduce the crash with all the debugging I needed yesterday via test_librbd_fsx, and the problem looks like a bug in fiemap. Even though we call fsync before each fiemap call, we were getting different results (one bad result, which resulted in the corruption, and the correct result later, with no writes to the file in between). This was on XFS kernel 3.3.1, so I'll be sending a report to the xfs list with the log when I get to the office. I don't know which other versions are affected yet. In the meantime, you should turn fiemap usage off on the osd by setting 'filestore fiemap = false' in your ceph.conf [osd] section. I think we should make that the default in 0.48 as well. Josh On 06/15/2012 05:14 AM, Stefan Majer wrote: > Hi, > > We had today a catastrophic fs corruption in one of our virtual > machines, after fsck ~100MB was inside lost+found :-( > So is think we hit the same bug (ceph-0.45.2, sparse rbd images) > > Is there any progress on this topic, or any hint how to help on this > would be helpful. > > Greetings > Stefan Majer > > On Tue, Jun 12, 2012 at 2:31 PM, Guido Winkelmann > <guido-ceph@thisisnotatest.de> wrote: >> Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: >>> If you can reproduce it with 'debug filestore = 20' too, that will be >>> better, as it will tell us what the FIEMAP ioctl is returning. >> >> I ran another testrun with 'debug filestore = 20'. >> >>> Also, if >>> you can attach/post the contents of the object itself (rados -p rbd get >>> rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right >>> data (and the sparse-read operation that librbd is doing is the culprit). >> >> I tried that, with the block name that the steps further below gave me: >> >> rados -p rbd get rb.0.13.00000000045a block >> >> When I looked into the block, it looked like a bunch of temp files from the >> portage system with padding in between, although it should be random data... I >> think I got the wrong block after all... >> >> Here's what I did: >> Run the iotester again: >> testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date >> Tue Jun 12 13:51:58 CEST 2012 >> Wrote 100 MiB of data in 2004 milliseconds >> [snip lots of irrelevant lines] >> Wrote 100 MiB of data in 2537 milliseconds >> Read 100 MiB of data in 3794 milliseconds >> Read 100 MiB of data in 10150 milliseconds >> Digest wrong for file "/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e" >> Tue Jun 12 13:55:00 CEST 2012 >> >> Run the fiemap tool on that file: >> >> testserver-rbd11 ~ # ./fiemap >> /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e >> File /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e has 1 extents: >> # Logical Physical Length Flags >> 0: 0000000000000000 0000000116900000 0000000000100000 0001 >> >>> As for the log: >>> >>> First, map the offset to an rbd block. For example, taking the 'Physical' >>> value of 00000000a8200000 from above: >>> >>> $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) >>> 0000000002a0 >> >> That gave me >> >> $ printf "%012x\n" $((0x0000000116900000 / (4096*1024) )) >> 00000000045a >> >>> Then figure out what the object name prefix is: >>> >>> $ rbd info<imagename> | grep prefix >>> block_name_prefix: rb.0.1 >> >> Result: block_name_prefix: rb.0.13 >> >>> Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0. >> >> Result: rb.0.13.00000000045a >> >>> Then map that back to an osd with >>> >>> $ ceph osd map rbd rb.0.1.0000000002a0 >>> osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 >>> (2.5) -> up [0,2] acting [0,2] >> >> That gives me >> [root@storage1 ~]# ceph osd map rbd rb.0.13.00000000045a 2> /dev/null >> osdmap e101 pool 'rbd' (2) object 'rb.0.13.00000000045a' -> pg 2.80b039fb >> (2.7b) -> up [2,1] acting [2,1] >> >>> You'll see the osd ids listed in brackets after 'active'. We want the >>> first one, 0 in my example. The log from that OSD is what we need. >> >> Okay, i'm attaching the compressed log for osd.2 and the compressed block to >> the issue report in the redmine. >> >> Guido ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random data corruption in VM, possibly caused by rbd 2012-06-15 15:38 ` Josh Durgin @ 2012-06-15 18:50 ` Josh Durgin 0 siblings, 0 replies; 30+ messages in thread From: Josh Durgin @ 2012-06-15 18:50 UTC (permalink / raw) To: Stefan Majer; +Cc: Guido Winkelmann, Sage Weil, Oliver Francke, ceph-devel Since Guido was seeing this problem on btrfs as well, I'm going to try tracking down more precisely where it was introduced. Josh On 06/15/2012 08:38 AM, Josh Durgin wrote: > Short version: you should set 'filestore fiemap = false' for your osds. > > I was able to reproduce the crash with all the debugging I needed > yesterday via test_librbd_fsx, and the problem looks like a bug in > fiemap. Even though we call fsync before each fiemap call, we were > getting different results (one bad result, which resulted in the > corruption, and the correct result later, with no writes to the file > in between). > > This was on XFS kernel 3.3.1, so I'll be sending a report to the xfs > list with the log when I get to the office. I don't know which > other versions are affected yet. > > In the meantime, you should turn fiemap usage off on the osd by setting > 'filestore fiemap = false' in your ceph.conf [osd] section. I think > we should make that the default in 0.48 as well. > > Josh > > On 06/15/2012 05:14 AM, Stefan Majer wrote: >> Hi, >> >> We had today a catastrophic fs corruption in one of our virtual >> machines, after fsck ~100MB was inside lost+found :-( >> So is think we hit the same bug (ceph-0.45.2, sparse rbd images) >> >> Is there any progress on this topic, or any hint how to help on this >> would be helpful. >> >> Greetings >> Stefan Majer >> >> On Tue, Jun 12, 2012 at 2:31 PM, Guido Winkelmann >> <guido-ceph@thisisnotatest.de> wrote: >>> Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil: >>>> If you can reproduce it with 'debug filestore = 20' too, that will be >>>> better, as it will tell us what the FIEMAP ioctl is returning. >>> >>> I ran another testrun with 'debug filestore = 20'. >>> >>>> Also, if >>>> you can attach/post the contents of the object itself (rados -p rbd get >>>> rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right >>>> data (and the sparse-read operation that librbd is doing is the >>>> culprit). >>> >>> I tried that, with the block name that the steps further below gave me: >>> >>> rados -p rbd get rb.0.13.00000000045a block >>> >>> When I looked into the block, it looked like a bunch of temp files >>> from the >>> portage system with padding in between, although it should be random >>> data... I >>> think I got the wrong block after all... >>> >>> Here's what I did: >>> Run the iotester again: >>> testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date >>> Tue Jun 12 13:51:58 CEST 2012 >>> Wrote 100 MiB of data in 2004 milliseconds >>> [snip lots of irrelevant lines] >>> Wrote 100 MiB of data in 2537 milliseconds >>> Read 100 MiB of data in 3794 milliseconds >>> Read 100 MiB of data in 10150 milliseconds >>> Digest wrong for file >>> "/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e" >>> Tue Jun 12 13:55:00 CEST 2012 >>> >>> Run the fiemap tool on that file: >>> >>> testserver-rbd11 ~ # ./fiemap >>> /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e >>> File /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e has 1 extents: >>> # Logical Physical Length Flags >>> 0: 0000000000000000 0000000116900000 0000000000100000 0001 >>> >>>> As for the log: >>>> >>>> First, map the offset to an rbd block. For example, taking the >>>> 'Physical' >>>> value of 00000000a8200000 from above: >>>> >>>> $ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) )) >>>> 0000000002a0 >>> >>> That gave me >>> >>> $ printf "%012x\n" $((0x0000000116900000 / (4096*1024) )) >>> 00000000045a >>> >>>> Then figure out what the object name prefix is: >>>> >>>> $ rbd info<imagename> | grep prefix >>>> block_name_prefix: rb.0.1 >>> >>> Result: block_name_prefix: rb.0.13 >>> >>>> Then add the block number, 0000000002a0 to that, e.g. >>>> rb.0.1.0000000002a0. >>> >>> Result: rb.0.13.00000000045a >>> >>>> Then map that back to an osd with >>>> >>>> $ ceph osd map rbd rb.0.1.0000000002a0 >>>> osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65 >>>> (2.5) -> up [0,2] acting [0,2] >>> >>> That gives me >>> [root@storage1 ~]# ceph osd map rbd rb.0.13.00000000045a 2> /dev/null >>> osdmap e101 pool 'rbd' (2) object 'rb.0.13.00000000045a' -> pg >>> 2.80b039fb >>> (2.7b) -> up [2,1] acting [2,1] >>> >>>> You'll see the osd ids listed in brackets after 'active'. We want the >>>> first one, 0 in my example. The log from that OSD is what we need. >>> >>> Okay, i'm attaching the compressed log for osd.2 and the compressed >>> block to >>> the issue report in the redmine. >>> >>> Guido > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2012-06-15 18:50 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-07 18:04 Random data corruption in VM, possibly caused by rbd Guido Winkelmann 2012-06-07 18:18 ` Stefan Priebe 2012-06-07 18:37 ` Guido Winkelmann 2012-06-07 19:54 ` Andrey Korolyov 2012-06-07 21:03 ` Guido Winkelmann 2012-06-07 21:53 ` Marcus Sorensen 2012-06-07 22:12 ` Guido Winkelmann 2012-06-07 18:40 ` Oliver Francke 2012-06-07 19:48 ` Josh Durgin 2012-06-07 21:36 ` Guido Winkelmann 2012-06-07 22:13 ` Tommi Virtanen 2012-06-08 12:55 ` Guido Winkelmann 2012-06-08 13:08 ` Guido Winkelmann 2012-06-08 13:36 ` Oliver Francke 2012-06-08 13:55 ` Sage Weil 2012-06-08 14:50 ` Josh Durgin 2012-06-08 15:39 ` Oliver Francke 2012-06-08 17:15 ` Guido Winkelmann 2012-06-10 3:04 ` Sage Weil 2012-06-10 3:07 ` Sage Weil 2012-06-11 14:15 ` Guido Winkelmann 2012-06-11 15:50 ` Guido Winkelmann 2012-06-11 16:30 ` Sage Weil 2012-06-11 17:07 ` Guido Winkelmann 2012-06-11 17:12 ` Sage Weil 2012-06-11 17:29 ` Josh Durgin 2012-06-12 12:31 ` Guido Winkelmann 2012-06-15 12:14 ` Stefan Majer 2012-06-15 15:38 ` Josh Durgin 2012-06-15 18:50 ` Josh Durgin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.