From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EE8FCC77B7C for ; Tue, 9 May 2023 13:54:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pwNmj-0003Ml-T6; Tue, 09 May 2023 09:53:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pwNmg-0003MA-I6 for qemu-devel@nongnu.org; Tue, 09 May 2023 09:53:44 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pwNme-0006ff-9j for qemu-devel@nongnu.org; Tue, 09 May 2023 09:53:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683640419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2iC/N0E1P97UuSO/WUAiwwMTDCWRBlsJFGZ61MYickc=; b=GHb0jSN1Ak816BxjTdniUj98/8usJYbt1Mjs9UpvvBzLqGHKrcxDq36At7y3su8jDzl3Mo 8KE9FAQ3zlD/doBAywbbTlYRqYAnMyKLr7aIGylCX/9nITmweMv4YfuZ7Xr2pB6iS9op+R S8SjNXrrVy1dXmYMgv2yWX7WMVQT6XA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-588-mmLiEDh5MCiHjECVoFCLlg-1; Tue, 09 May 2023 09:53:35 -0400 X-MC-Unique: mmLiEDh5MCiHjECVoFCLlg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 749793815F0B; Tue, 9 May 2023 13:53:35 +0000 (UTC) Received: from redhat.com (unknown [10.39.194.192]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 46BBB492B00; Tue, 9 May 2023 13:53:33 +0000 (UTC) Date: Tue, 9 May 2023 15:53:32 +0200 From: Kevin Wolf To: Fiona Ebner Cc: "open list:Network Block Dev..." , Hanna Czenczek , eesposit@redhat.com, Paolo Bonzini , QEMU Developers , Thomas Lamprecht Subject: Re: Question about graph locking preconditions regarding qemu_in_main_thread() Message-ID: References: <9d413dd9-a606-8e80-2ae7-a068cd582bb5@proxmox.com> <8af6f1ef-9b91-7024-36a1-e98ba87a9feb@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8af6f1ef-9b91-7024-36a1-e98ba87a9feb@proxmox.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Am 09.05.2023 um 12:26 hat Fiona Ebner geschrieben: > Am 08.05.23 um 12:40 schrieb Kevin Wolf: > > Am 05.05.2023 um 11:35 hat Fiona Ebner geschrieben: > >> Hi, > >> I noticed that the bdrv_graph_co_rd_lock() and bdrv_graph_co_rd_unlock() > >> functions use qemu_in_main_thread() as a conditional to return early. > >> What high-level requirements ensure that qemu_in_main_thread() will > >> evaluate to the same value during locking and unlocking? > > > > I think it's actually wrong. > > > > There are some conditions under which we don't actually need to do > > anything for locking: Writing the graph is only possible from the main > > context while holding the BQL. So if a reader is running in the main > > contextunder the BQL and knows that it won't be interrupted until the > > next writer runs, we don't actually need to do anything. > > > > This is the case if the reader code neither has a nested event loop > > (this is forbidden anyway while you hold the lock) nor is a coroutine > > (because a writer could run when the coroutine has yielded). > > Sorry, I don't understand the first part. If bdrv_writev_vmstate() is > called, then, because it is a generated coroutine wrapper, > AIO_WAIT_WHILE()/aio_poll() is used. And that is the case regardless of > whether the lock is held or not, i.e. there is a nested event loop even > if the lock is held? With "lock" you mean the graph lock here, not the BQL, right? These generated coroutine wrapper are a bit ugly because they behave different when called from a coroutine and when called outside of coroutine context: * In coroutine context, the caller MUST hold the lock * Outside of coroutine context, the caller MUST NOT hold the lock The second requirement comes from AIO_WAIT_WHILE(), like you say. If you hold the lock and you're not in a coroutine, you simply can't call such functions. However, as bdrv_graph_co_rdlock() is a coroutine_fn, you won't usually hold the lock outside of coroutine context anyway. But it's something to be careful with when you have a writer lock. > > These conditions are exactly what bdrv_graph_rdlock_main_loop() asserts. > > They are not fulfilled in bdrv_graph_co_rdlock, which always runs in a > > coroutine. > > > > Thank you for the explanation! > > >> In the output below, the boolean value after the backtraces of > >> bdrv_graph_co_rd(un)lock is the value of qemu_in_main_thread(). > >> > >> AFAICT, what happened below is: > >> 1. QMP function is executed in the main thread and drops the BQL. > >> 2. bdrv_co_writev_vmstate_entry is called, increasing the reader count, > >> because qemu_in_main_thread() is false. > >> 3. A vCPU thread issued a write, not increasing the reader count, > >> because qemu_in_main_thread() is true. > >> 4. The write is finished in the main thread, decreasing the reader > >> count, because qemu_in_main_thread() is false. > > > > This one is weird. Why don't we hold the BQL there? > > > > Ah, I actually have an idea: Maybe it is executed by the nested event > > loop in bdrv_writev_vmstate(). This nested event loop runs now without > > the BQL, too, and it can call any sort of callback in QEMU that really > > expects to be called with the BQL. Scary! > > > > If this is what happens, you actually have your proof that not holding > > the BQL can cause problems even if there is no other thread active that > > could interfere. > > > > Can you try to confirm this in gdb? In case you haven't debugged > > coroutines much yet: If you have this state in gdb for a running > > instance, you can repeat 'fin' until you reach the next coroutine > > switch, and then you should be able to get the stack trace to see who > > entered the coroutine. > > > > Thank you for the guidance! Hope I did it correctly, but AFAICS, you > are spot-on :) Yes, this looks like what I suspected. Thanks for confirming! Kevin