From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30994C433ED for ; Fri, 9 Apr 2021 09:53:02 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B84926102A for ; Fri, 9 Apr 2021 09:53:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B84926102A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:51218 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lUnoy-0006DD-Id for qemu-devel@archiver.kernel.org; Fri, 09 Apr 2021 05:53:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39018) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lUno5-0005Rk-4B for qemu-devel@nongnu.org; Fri, 09 Apr 2021 05:52:05 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:38806) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lUno2-00032d-1z for qemu-devel@nongnu.org; Fri, 09 Apr 2021 05:52:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1617961920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=axJYE2TZSFVxsQ3wBEWFmhDHoMQ/L/pSkZ2xR9P1brg=; b=YkQiLXj1ELznkpb2fmAZd2hO0FEBj8cS/J03xfKZwydWWl4NHZaqVFmDfrmm7ROfb+KFt9 lABeeYmZ0yVXmhUBbwnNmEaiGlV4Br+OLi0pXIkLABDScgaT4y4pIPO/yN798qPlRCLHzq 7dE1XxrmKVbp5jh2tR/iKkG02NrfM+Y= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-352-0JBEh7CaNsyXqZ9K33MATg-1; Fri, 09 Apr 2021 05:51:59 -0400 X-MC-Unique: 0JBEh7CaNsyXqZ9K33MATg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1435310053E8; Fri, 9 Apr 2021 09:51:58 +0000 (UTC) Received: from dresden.str.redhat.com (ovpn-114-67.ams2.redhat.com [10.36.114.67]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 05BD219C66; Fri, 9 Apr 2021 09:51:56 +0000 (UTC) Subject: Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete To: Vladimir Sementsov-Ogievskiy , John Snow , qemu-block@nongnu.org References: <20210408162039.242670-1-mreitz@redhat.com> <20210408162039.242670-2-mreitz@redhat.com> <505ba75a-996b-0c65-0c49-add50e55e3ce@virtuozzo.com> <66c60724-d3b5-383b-7a19-9e9498e1c132@redhat.com> From: Max Reitz Message-ID: Date: Fri, 9 Apr 2021 11:51:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mreitz@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.133.124; envelope-from=mreitz@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote: > 08.04.2021 20:04, John Snow wrote: >> On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote: >>> job-complete command is async. Can we instead just add a boolean like >>> job->completion_requested, and set it if job-complete called in >>> STANDBY state, and on job_resume job_complete will be called >>> automatically if this boolean is true? >> >> job_complete has a synchronous setup, though -- we lose out on a lot >> of synchronous error checking in that circumstance. > > yes, that's a problem.. > >> >> I was not able to audit it to determine that it'd be safe to attempt >> that setup during a drained section -- I imagine it won't work and >> will fail, though. >> >> So I thought we'd have to signal completion and run the setup *later*, >> but what do we do if we get an error then? Does the entire job fail? >> Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it >> recoverable? >> > > Isn't it possible even now, that after successful job-complete job still > fails and we report BLOCK_JOB_COMPLETED with error? > > And actually, how much benefit user get from the fact that job-complete > may fail? > > We can make job-complete a simple always-success boolean flag setter > like job-pause. I wanted to say the following: But job-pause does always succeed, in contrast to block-job-complete. block-job-complete is more akin to job-finalize, which too is a synchronous operation. But when I wrote that last sentence, I asked myself whether what mirror_complete() does isn’t actually a remnant of what we had to do when we didn’t have job-finalize yet. Shouldn’t that all be in mirror_exit_common()? What’s the advantage of opening the backing chain or putting blockers on the to-replace node in block-job-complete? Aren’t that all graph-changing operation, basically, i.e. stuff that should be done in job-finalize? If we move everything to mirror_exit_common(), all that remains to do is basically set some should_complete flag (could even be part of the Job struct), and then the whole problem disappears. Thoughts? Max > And actual completion will be done in background, when possible. And if > it fail, job just fails, like it does for any background io error. And > user have to check error/success status of final BLOCK_JOB_COMPLETED > anyway. >