From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 221F5C4707C for ; Wed, 10 Jan 2024 13:40:31 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rNYnu-0005F4-7v; Wed, 10 Jan 2024 08:39:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rNYnr-0005EU-Gj for qemu-devel@nongnu.org; Wed, 10 Jan 2024 08:39:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rNYnp-00086J-Io for qemu-devel@nongnu.org; Wed, 10 Jan 2024 08:39:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1704893967; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=G5qKq6OjsfBvKbOzh9uZ9+fUz29Dyj/V3yWhtqk4CkI=; b=PX71i+/0+7YqdEHknlF2CuMrWKJYZ+h6XPbCMKKEK5QF3PY76SvG4JtPY1Jt2iUuMb1Xaw 7lwweOq8ekbKoto7/m3g0k77gjaM5frJsYSw1lJTNNnzkkduGmxYVjwcZXYY+q+q++Nif5 M0sxP14HIvnWoM1eOTItl7yL217NYf8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-358-bmZ711zAP8y6HRvBBwwIZA-1; Wed, 10 Jan 2024 08:39:22 -0500 X-MC-Unique: bmZ711zAP8y6HRvBBwwIZA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0E40B85A596; Wed, 10 Jan 2024 13:39:22 +0000 (UTC) Received: from redhat.com (unknown [10.39.194.222]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EF5E52166B32; Wed, 10 Jan 2024 13:39:20 +0000 (UTC) Date: Wed, 10 Jan 2024 14:39:19 +0100 From: Kevin Wolf To: Ari Sundholm Cc: qemu-devel@nongnu.org, Hanna Reitz , qemu-block@nongnu.org, qemu-stable@nongnu.org, stefanha@redhat.com Subject: Re: [PATCH] block/blklogwrites: Fix a bug when logging "write zeroes" operations. Message-ID: References: <20240109184646.1128475-1-megari@gmx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240109184646.1128475-1-megari@gmx.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.133.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -38 X-Spam_score: -3.9 X-Spam_bar: --- X-Spam_report: (-3.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.774, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Am 09.01.2024 um 19:46 hat megari@gmx.com geschrieben: > From: Ari Sundholm > > There is a bug in the blklogwrites driver pertaining to logging "write > zeroes" operations, causing log corruption. This can be easily observed > by setting detect-zeroes to something other than "off" for the driver. > > The issue is caused by a concurrency bug pertaining to the fact that > "write zeroes" operations have to be logged in two parts: first the log > entry metadata, then the zeroed-out region. While the log entry > metadata is being written by bdrv_co_pwritev(), another operation may > begin in the meanwhile and modify the state of the blklogwrites driver. > This is as intended by the coroutine-driven I/O model in QEMU, of > course. > > Unfortunately, this specific scenario is mishandled. A short example: > 1. Initially, in the current operation (#1), the current log sector > number in the driver state is only incremented by the number of sectors > taken by the log entry metadata, after which the log entry metadata is > written. The current operation yields. > 2. Another operation (#2) may start while the log entry metadata is > being written. It uses the current log position as the start offset for > its log entry. This is in the sector right after the operation #1 log > entry metadata, which is bad! > 3. After bdrv_co_pwritev() returns (#1), the current log sector > number is reread from the driver state in order to find out the start > offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the > offset will be the sector right after the (misplaced) operation #2 log > entry, which means that the zeroed-out region begins at the wrong > offset. > 4. As a result of the above, the log is corrupt. > > Fix this by only reading the driver metadata once, computing the > offsets and sizes in one go (including the optional zeroed-out region) > and setting the log sector number to the appropriate value for the next > operation in line. > > Signed-off-by: Ari Sundholm > Cc: qemu-stable@nongnu.org Thanks, applied to the block branch. Note that while this fixes the single threaded case, it is not thread safe and will still break with multiqueue enabled (see the new iothread-vq-mapping option added to virtio-blk very recently). Most block drivers already take a lock when modifying their global state, and it looks like we missed blklogwrites when checking what needs to be prepared for the block layer to be thread safe. So I think we'll want another patch to add a QemuMutex that can be taken while you do the calculations on s->cur_log_sector. But this patch is a good first step because it means that we don't need to keep a lock across an I/O request (just for the sake of completeness, this would have had to be a CoMutex rather than a QemuMutex). Kevin