All of lore.kernel.org
 help / color / mirror / Atom feed
* VM Corruption on 0.54 when 'client cache = false'
@ 2012-12-03  2:18 Matthew Anderson
  2012-12-03  5:01 ` Yehuda Sadeh
  2012-12-03 10:00 ` Josh Durgin
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Anderson @ 2012-12-03  2:18 UTC (permalink / raw)
  To: 'ceph-devel@vger.kernel.org'

Hi All,

I've run into a corruption bug when the RBD client cache is set to false under QEMU-KVM. With the cache on everything is fine but write speeds drop considerably, 4KB sequential goes from 5.1MB/s to 1.8MB/s no matter what size the cache is or if writethrough is used. With the cache off I am usually able to boot the virtual machine once after copying a template to RBD using qemu-img. If I shut the VM down completely and boot it up again the virtual machine no longer sees it's partitions correctly and boots into restore mode where it can't fix itself. The test VM I was using was Windows Server 2012 Standard and Ceph is setup as a single node.

Ceph version is 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)

Host setup is -
Dual Intel 5620, 48GB 
4x 480GB SSD attached via the onboard SATA.
Each OSD was setup with a 1GB journal partition and the rest of the space as BTRFS
40GB Infiniband + 2x 1GBe
Scientific Linux 6.3 running mainline Kernel 3.6.7 from Elrepo
QEMU-KVM userspace 1.2.0 compiled from source

I was able to find a reference to a previous bug which was resolved by setting "filestore fiemap threshold = 0" and "filestore fiemap = false" but this didn't have any effect on the issue. I have also tried the latest GIT version (as of 3 days a go) and the issue appeared to be there still but I didn't test enough to say conclusively that the bug is the exact same. 

Is anyone able to suggest anything that may help? If you need more information just let me know.

Thanks
-Matt


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VM Corruption on 0.54 when 'client cache = false'
  2012-12-03  2:18 VM Corruption on 0.54 when 'client cache = false' Matthew Anderson
@ 2012-12-03  5:01 ` Yehuda Sadeh
  2012-12-03 10:00 ` Josh Durgin
  1 sibling, 0 replies; 5+ messages in thread
From: Yehuda Sadeh @ 2012-12-03  5:01 UTC (permalink / raw)
  To: Matthew Anderson; +Cc: ceph-devel@vger.kernel.org

On Sun, Dec 2, 2012 at 6:18 PM, Matthew Anderson <matthewa@base3.com.au> wrote:
> Hi All,
>
> I've run into a corruption bug when the RBD client cache is set to false under QEMU-KVM. With the cache on everything is fine but write speeds drop considerably, 4KB sequential goes from 5.1MB/s to 1.8MB/s no matter what size the cache is or if writethrough is used. With the cache off I am usually able to boot the virtual machine once after copying a template to RBD using qemu-img. If I shut the VM down completely and boot it up again the virtual machine no longer sees it's partitions correctly and boots into restore mode where it can't fix itself. The test VM I was using was Windows Server 2012 Standard and Ceph is setup as a single node.
>
> Ceph version is 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
>
> Host setup is -
> Dual Intel 5620, 48GB
> 4x 480GB SSD attached via the onboard SATA.
> Each OSD was setup with a 1GB journal partition and the rest of the space as BTRFS
> 40GB Infiniband + 2x 1GBe
> Scientific Linux 6.3 running mainline Kernel 3.6.7 from Elrepo
> QEMU-KVM userspace 1.2.0 compiled from source
>
> I was able to find a reference to a previous bug which was resolved by setting "filestore fiemap threshold = 0" and "filestore fiemap = false" but this didn't have any effect on the issue. I have also tried the latest GIT version (as of 3 days a go) and the issue appeared to be there still but I didn't test enough to say conclusively that the bug is the exact same.
>
> Is anyone able to suggest anything that may help? If you need more information just let me know.
>

Not that it matters, as you've set 'filestore fiemap  = false', but
'filestore fiemap threshold = 0' won't turn off the use of fiemap, but
instead will try to use it with every sparse read (unless 'filestore
fiemap' has been set to false).

Yehuda

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VM Corruption on 0.54 when 'client cache = false'
  2012-12-03  2:18 VM Corruption on 0.54 when 'client cache = false' Matthew Anderson
  2012-12-03  5:01 ` Yehuda Sadeh
@ 2012-12-03 10:00 ` Josh Durgin
  2012-12-05 13:51   ` Matthew Anderson
  1 sibling, 1 reply; 5+ messages in thread
From: Josh Durgin @ 2012-12-03 10:00 UTC (permalink / raw)
  To: Matthew Anderson; +Cc: 'ceph-devel@vger.kernel.org'

On 2012-12-02 18:18, Matthew Anderson wrote:
> Hi All,
>
> I've run into a corruption bug when the RBD client cache is set to
> false under QEMU-KVM. With the cache on everything is fine but write
> speeds drop considerably, 4KB sequential goes from 5.1MB/s to 1.8MB/s
> no matter what size the cache is or if writethrough is used. With the
> cache off I am usually able to boot the virtual machine once after
> copying a template to RBD using qemu-img. If I shut the VM down
> completely and boot it up again the virtual machine no longer sees
> it's partitions correctly and boots into restore mode where it can't
> fix itself. The test VM I was using was Windows Server 2012 Standard
> and Ceph is setup as a single node.

That disabling caching improves write speed sounds like something 
strange
is going on. What's the full QEMU/KVM command line and ceph.conf used
when running the VM?

The corruption issue is more serious, and not something I've seen
reported before. Does it occur only with Windows Server 2012 VMs, or
does it happen with a Linux VM as well? More specific debugging
suggestions are below.

> Ceph version is 0.54 
> (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
>
> Host setup is -
> Dual Intel 5620, 48GB
> 4x 480GB SSD attached via the onboard SATA.
> Each OSD was setup with a 1GB journal partition and the rest of the
> space as BTRFS
> 40GB Infiniband + 2x 1GBe
> Scientific Linux 6.3 running mainline Kernel 3.6.7 from Elrepo
> QEMU-KVM userspace 1.2.0 compiled from source
>
> I was able to find a reference to a previous bug which was resolved
> by setting "filestore fiemap threshold = 0" and "filestore fiemap =
> false" but this didn't have any effect on the issue. I have also 
> tried
> the latest GIT version (as of 3 days a go) and the issue appeared to
> be there still but I didn't test enough to say conclusively that the
> bug is the exact same.

fiemap is off by default since we discovered that issue, so this is a
different bug.

> Is anyone able to suggest anything that may help? If you need more
> information just let me know.

Since the guest can't find its partitions, could you try exporting
the image to a file (rbd export pool/image filename), and then run
gdisk -l on the file? Doing this before booting, and then again after
the corruption occurs and the VM is shut down might help determine the 
nature
of the corruption, and which parts of the image are corrupted.
If you run the VM with 'debug ms = 1', 'debug objectcacher = 20', 
'debug librbd = 20',
and  'log file = /path/to/file/writeable/by/qemu' in the [client] 
section of ceph.conf,
we might be able to see what's happening to the problematic parts of 
the image.
If the logs are long, you can attach them to a bug report referring to 
this email at
http://tracker.newdream.net.

Another thing to try is running 'ceph osd deep-scrub', which will check
for consistency of objects across OSDs, and report problems in 'ceph 
-s'.

Josh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: VM Corruption on 0.54 when 'client cache = false'
  2012-12-03 10:00 ` Josh Durgin
@ 2012-12-05 13:51   ` Matthew Anderson
  2012-12-05 21:25     ` Josh Durgin
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Anderson @ 2012-12-05 13:51 UTC (permalink / raw)
  To: 'Josh Durgin'; +Cc: 'ceph-devel@vger.kernel.org'

Thanks for getting back to me Josh. 

I've updated to the new 0.55 release and I haven't been able to reproduce the problem. I have the feeling I may be to blame for the problem as when I updated to 0.55 qemu-img segfaulted with a librbd error because there was an old version of the librbd library in another path (which I think was from 0.54). Once I cleaned everything up it worked fine.  

 One thing I didn't notice about the 0.55 release is that 'ceph osd create' no longer accepts arguments and gives '(22) Invalid argument' if you try to specify an OSD number. Running the command without an argument correctly creates an OSD with the next free osd number. I wasn't sure if this was a bug or that the command has changed for 0.55+ and the documentation hasn't been updated yet (Add/Remove OSD's page in the wiki refernces the command with arguments).

Thanks again
-Matt


-----Original Message-----
From: Josh Durgin [mailto:josh.durgin@inktank.com] 
Sent: Monday, 3 December 2012 6:01 PM
To: Matthew Anderson
Cc: 'ceph-devel@vger.kernel.org'
Subject: Re: VM Corruption on 0.54 when 'client cache = false'

That disabling caching improves write speed sounds like something strange is going on. What's the full QEMU/KVM command line and ceph.conf used when running the VM?

The corruption issue is more serious, and not something I've seen reported before. Does it occur only with Windows Server 2012 VMs, or does it happen with a Linux VM as well? More specific debugging suggestions are below.

fiemap is off by default since we discovered that issue, so this is a different bug.

Since the guest can't find its partitions, could you try exporting the image to a file (rbd export pool/image filename), and then run gdisk -l on the file? Doing this before booting, and then again after the corruption occurs and the VM is shut down might help determine the nature of the corruption, and which parts of the image are corrupted.
If you run the VM with 'debug ms = 1', 'debug objectcacher = 20', 'debug librbd = 20', and  'log file = /path/to/file/writeable/by/qemu' in the [client] section of ceph.conf, we might be able to see what's happening to the problematic parts of the image.
If the logs are long, you can attach them to a bug report referring to this email at http://tracker.newdream.net.

Another thing to try is running 'ceph osd deep-scrub', which will check for consistency of objects across OSDs, and report problems in 'ceph -s'.

Josh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VM Corruption on 0.54 when 'client cache = false'
  2012-12-05 13:51   ` Matthew Anderson
@ 2012-12-05 21:25     ` Josh Durgin
  0 siblings, 0 replies; 5+ messages in thread
From: Josh Durgin @ 2012-12-05 21:25 UTC (permalink / raw)
  To: Matthew Anderson; +Cc: 'ceph-devel@vger.kernel.org'

On 12/05/2012 05:51 AM, Matthew Anderson wrote:
> Thanks for getting back to me Josh.
>
> I've updated to the new 0.55 release and I haven't been able to reproduce the problem. I have the feeling I may be to blame for the problem as when I updated to 0.55 qemu-img segfaulted with a librbd error because there was an old version of the librbd library in another path (which I think was from 0.54). Once I cleaned everything up it worked fine.

Good to hear.

>   One thing I didn't notice about the 0.55 release is that 'ceph osd create' no longer accepts arguments and gives '(22) Invalid argument' if you try to specify an OSD number. Running the command without an argument correctly creates an OSD with the next free osd number. I wasn't sure if this was a bug or that the command has changed for 0.55+ and the documentation hasn't been updated yet (Add/Remove OSD's page in the wiki refernces the command with arguments).

This actually changed from accepting an osd id to a uuid back in 0.47,
but 0.55 is the first version to reject a non-uuid. The docs were
updated by 36e7b077a77fa0a6c87289f400391c85dcdb1d42, but seem to
have been accidentally reverted during a reorganization. This is
fixed again now.

This thread has the rationale for the change from id to uuid:

http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8296

Josh

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-12-05 21:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-03  2:18 VM Corruption on 0.54 when 'client cache = false' Matthew Anderson
2012-12-03  5:01 ` Yehuda Sadeh
2012-12-03 10:00 ` Josh Durgin
2012-12-05 13:51   ` Matthew Anderson
2012-12-05 21:25     ` Josh Durgin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.