From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43064) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCpWa-0004aL-RG for qemu-devel@nongnu.org; Wed, 08 Jul 2015 09:36:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZCpWX-000601-Hn for qemu-devel@nongnu.org; Wed, 08 Jul 2015 09:36:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44638) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCpWX-0005zo-A9 for qemu-devel@nongnu.org; Wed, 08 Jul 2015 09:36:29 -0400 Date: Wed, 8 Jul 2015 14:36:26 +0100 From: Stefan Hajnoczi Message-ID: <20150708133626.GC20502@stefanha-thinkpad.redhat.com> References: <1436192669-10062-1-git-send-email-stefanha@redhat.com> <1436192669-10062-6-git-send-email-stefanha@redhat.com> <20150707073245.GD9515@ad.nay.redhat.com> <20150707125949.GA28673@stefanha-thinkpad.redhat.com> <20150708015924.GF10382@ad.nay.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ctP54qlpMx3WjD+/" Content-Disposition: inline In-Reply-To: <20150708015924.GF10382@ad.nay.redhat.com> Subject: Re: [Qemu-devel] [PATCH v2 05/10] block: add block job transactions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: Kevin Wolf , Jeff Cody , qemu-devel@nongnu.org, Max Reitz , vsementsov@parallels.com, John Snow --ctP54qlpMx3WjD+/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 08, 2015 at 09:59:24AM +0800, Fam Zheng wrote: > On Tue, 07/07 13:59, Stefan Hajnoczi wrote: > > On Tue, Jul 07, 2015 at 03:32:45PM +0800, Fam Zheng wrote: > > > On Mon, 07/06 15:24, Stefan Hajnoczi wrote: > > > > +/** > > > > + * block_job_txn_add_job: > > > > + * @txn: The transaction (may be NULL) > > > > + * @job: Job to add to the transaction > > > > + * > > > > + * Add @job to the transaction. The @job must not already be in a= transaction. > > > > + * The block job driver must call block_job_txn_prepare_to_complet= e() before > > >=20 > > > s/block_job_txn_prepare_to_complete/block_job_txn_job_done/ > > >=20 > > > Reading this for a second time I start to feel it too complicated for= the good. > > >=20 > > > I have another idea: in block_job_completed, check if other jobs have= failed, > > > and call this job driver's (imaginary) "abort()" callback accordingly= ; if all > > > jobs has succeeded, call a "commit" callback during last block_job_co= mpleted. > > >=20 > > > Does that make sense? > >=20 > > I think you've skipped the hard part: immediate cancellation. If a job > > is cancelled by the user or a job fails, then all other jobs are > > cancelled immediately. > >=20 > > Immediate cancellation has the problem that jobs could be running in any > > AioContext, so you need to handle concurrency. That's where the > > locking, juggling AioContexts, and interaction between blockjobs comes > > in. >=20 > OK, let me try again: >=20 > The idea is intercepting job->cb so we can handle jobs completely in > block_job_completed (which is in main loop), rather than the coroutines t= hat > can be any AioContext. >=20 > 1) If a job is cancelled or failed, it goes to block_job_completed immedi= ately, > with block_job_is_cancelled() =3D=3D true. In this case, we call > block_job_cancel_sync on all other jobs, and then call "abort()" callback= s to > reclaim any bitmaps, then emit QMP events. Some other jobs may have alrea= dy > completed before this point, but it's not a problem because we always def= er the > actual completions (abort/commit and QMP) altogether. >=20 > 2) If there is no job failed or canceled, in the last block_job_completed= , we > call "commit()" to abdicate bitmaps, and emit the QMP events. >=20 > This would still require BlockJobTxn to track the block jobs in a group, = but > hopefully it could reduce the complexity of interactions between block jo= bs. >=20 > I can prototype it if this isn't missing anything obvious. Yes, please try it. It's half-way between what John originally did and what I did. It might be the simplest solution. Be careful with the final piece of code used to complete jobs from block_job_defer_to_main_loop(). It runs from a BH in the main loop after the coroutine has terminated. In the fail/cancel case you might need to protect against race conditions - especially if two jobs finish in the same event loop iteration. I didn't handle that since block_job_txn_job_done() is called while the coroutine is still alive. --ctP54qlpMx3WjD+/ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVnSdaAAoJEJykq7OBq3PI5NEH/ReidaKSERzr8qSQ5WkQhnid m6ikE4fr1sMK1TYhgJn0ZkHs5V1ZNtD6V4u7GmWCdb27+SMRMW3MeUsBtBxx47PY CKRx87dVB17eZ5Na6x7jMYozn/8mG7fSj/H9LiGQm7OeBmEjUXCHvvDc/4DjCzLq hMgAM5xqlcOW6W3bD2YgrN9KHbh2GgY1ifEaedJh7YiMVdblJ3wh+IPmXlfKIlG1 C8meK2enT37OnhSf6am6QTJrppBiiP6awB8sxf98K+2iijRmz+Rj0mu7neB2phbE gTkXTHs0W5Pr8ZehqkkRQpHPD7DYTQcNB54ckaKmYnmj+SOQn+3CYsdGhI6VlfI= =QIHQ -----END PGP SIGNATURE----- --ctP54qlpMx3WjD+/--