Re: Lost partition tables on ide-hd + ahci drive

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: John Snow <jsnow@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: QEMU Developers <qemu-devel@nongnu.org>,
	 "open list:Network Block Dev..." <qemu-block@nongnu.org>,
	Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: Lost partition tables on ide-hd + ahci drive
Date: Tue, 14 Feb 2023 13:21:32 -0500	[thread overview]
Message-ID: <CAFn=p-ahLoVd3W2GaFp5EUFq5EOudz+bUkEk5DV+Z07AjHaHtg@mail.gmail.com> (raw)
In-Reply-To: <ad7e1294-f19f-5bea-e891-f6adbe323cd5@proxmox.com>

On Thu, Feb 2, 2023 at 7:08 AM Fiona Ebner <f.ebner@proxmox.com> wrote:
>
> Hi,
> over the years we've got 1-2 dozen reports[0] about suddenly
> missing/corrupted MBR/partition tables. The issue seems to be very rare
> and there was no success in trying to reproduce it yet. I'm asking here
> in the hope that somebody has seen something similar.
>
> The only commonality seems to be the use of an ide-hd drive with ahci bus.
>
> It does seem to happen with both Linux and Windows guests (one of the
> reports even mentions FreeBSD) and backing storages for the VMs include
> ZFS, RBD, LVM-Thin as well as file-based storages.
>
> Relevant part of an example configuration:
>
> >   -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
> >   -drive 'file=/dev/zvol/myzpool/vm-168-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' \
> >   -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
>
> The first reports are from before io_uring was used and there are also
> reports with writeback cache mode and discard=on,detect-zeroes=unmap.
>
> Some reports say that the issue occurred under high IO load.
>
> Many reports suspect backups causing the issue. Our backup mechanism
> uses backup_job_create() for each drive and runs the jobs sequentially.
> It uses a custom block driver as the backup target which just forwards
> the writes to the actual target which can be a file or our backup server.
> (If you really want to see the details, apply the patches in [1] and see
> pve-backup.c and block/backup-dump.c).
>
> Of course, the backup job will read sector 0 of the source disk, but I
> really can't see where a stray write would happen, why the issue would
> trigger so rarely or why seemingly only ide-hd+ahci would be affected.
>
> So again, just asking if somebody has seen something similar or has a
> hunch of what the cause might be.
>

Hi Floria;

I'm sorry to say that I haven't worked on the block devices (or
backup) for a little while now, so I am not immediately sure what
might be causing this problem. In general, I advise against using AHCI
in production as better performance (and dev support) can be achieved
through virtio. Still, I am not sure why the combination of AHCI with
backup_job_create() would be corrupting the early sectors of the disk.

Do you have any analysis on how much data gets corrupted? Is it the
first sector only, the first few? Has anyone taken a peek at the
backing storage to see if there are any interesting patterns that can
be observed? (Zeroes, garbage, old data?)

Have any errors or warnings been observed in either the guest or the
host that might offer some clues?

Is there any commonality in the storage format being used? Is it
qcow2? Is it network-backed?

Apologies for the "tier 1" questions.

> [0]: https://bugzilla.proxmox.com/show_bug.cgi?id=2874
> [1]: https://git.proxmox.com/?p=pve-qemu.git;a=tree;f=debian/patches;hb=HEAD
>

next prev parent reply	other threads:[~2023-02-14 18:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-02 12:08 Lost partition tables on ide-hd + ahci drive Fiona Ebner
2023-02-14 18:21 ` John Snow [this message]
2023-02-15 10:53   ` Fiona Ebner
2023-02-15 21:47     ` John Snow
2023-02-16  8:58       ` Fiona Ebner
2023-02-16 14:17     ` Mike Maslenkin
2023-02-16 15:25       ` Fiona Ebner
2023-02-16 16:15         ` Mike Maslenkin
2023-02-17 12:25           ` Fiona Ebner
2023-02-17 13:40       ` Fiona Ebner
2023-02-17 21:22         ` Mike Maslenkin
2023-08-23  8:47           ` Fiona Ebner
2023-08-23  9:17             ` Fiona Ebner
2023-08-26 18:07               ` Mike Maslenkin
2023-02-17  9:44     ` Aaron Lauterer
2023-06-14 14:48 ` Simon J. Rowe
2023-06-15  7:04   ` Fiona Ebner
2023-06-15  8:24     ` Simon Rowe
2023-07-27 13:22   ` Simon Rowe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFn=p-ahLoVd3W2GaFp5EUFq5EOudz+bUkEk5DV+Z07AjHaHtg@mail.gmail.com' \
    --to=jsnow@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).