From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CAB00C61DA4 for ; Thu, 16 Feb 2023 08:59:38 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pSa6A-00045R-MB; Thu, 16 Feb 2023 03:58:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pSa68-00045F-IL; Thu, 16 Feb 2023 03:58:36 -0500 Received: from proxmox-new.maurer-it.com ([94.136.29.106]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pSa66-0002ti-LM; Thu, 16 Feb 2023 03:58:36 -0500 Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id F283B4750E; Thu, 16 Feb 2023 09:58:21 +0100 (CET) Message-ID: <439f75e5-dbc6-264c-f82c-ee427a2a489a@proxmox.com> Date: Thu, 16 Feb 2023 09:58:17 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: Lost partition tables on ide-hd + ahci drive To: John Snow Cc: QEMU Developers , "open list:Network Block Dev..." , Thomas Lamprecht , Aaron Lauterer References: Content-Language: en-US From: Fiona Ebner In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=94.136.29.106; envelope-from=f.ebner@proxmox.com; helo=proxmox-new.maurer-it.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, NICE_REPLY_A=-0.257, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Am 15.02.23 um 22:47 schrieb John Snow: > Hm, I'm not sure I see any pattern that might help. Could be that AHCI > is just bugged during load, but it's tough to know in what way. If we ever get a backtrace where the bad write actually goes through QEMU, I'll let you know. We are considering providing a custom build to affected users (using GDB-hooks leads to too much slowdown in these performance-critical paths) in the hope to catch it if it triggers again. We can't really roll it out to all users, because most writes to sector zero are legitimate after all and most users are not affected. > What versions of QEMU are in use here? Is there a date on which you > noticed an increased frequency of these reports? There were a few reports around the time we rolled out 4.2 and 5.0 (Q2/Q3 of 2020), but the frequency was always very low. AFAICT, there's about 20-40 reports that could be this issue in total. The earliest I know of with lost partitions, but not much more information, are forum threads from 2017/2018. With 4.2, there was a rework with our backup patches so naturally, I suspected that. Before 4.2, we had extended the backup job to allow using a callback to handle the writes instead of the BlockDriverState target. But starting from 4.2, we are not messing with that anymore and using a custom driver as the backup target. That custom driver doesn't even know about the source. The source is handled by the usual backup job mechanisms. If there was some general mix-up there, I'd not expect it to work for >99.99% of backups and only trigger in combination with AHCI, but who knows? Best Regards, Fiona