From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51343) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ezJYg-00060P-1T for qemu-devel@nongnu.org; Fri, 23 Mar 2018 06:04:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ezJYZ-0000WW-PK for qemu-devel@nongnu.org; Fri, 23 Mar 2018 06:04:26 -0400 MIME-Version: 1.0 In-Reply-To: <20180323034356.72130-2-haoqf@linux.vnet.ibm.com> References: <20180323034356.72130-1-haoqf@linux.vnet.ibm.com> <20180323034356.72130-2-haoqf@linux.vnet.ibm.com> From: Stefan Hajnoczi Date: Fri, 23 Mar 2018 10:04:06 +0000 Message-ID: Content-Type: text/plain; charset="UTF-8" Subject: Re: [Qemu-devel] [PATCH v2 1/1] iotests: fix test case 185 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: QingFeng Hao Cc: qemu block , Kevin Wolf , Fam Zheng , Jeff Cody , Cornelia Huck , qemu-devel , Christian Borntraeger , Stefan Hajnoczi On Fri, Mar 23, 2018 at 3:43 AM, QingFeng Hao wrote: > Test case 185 failed since commit 4486e89c219 --- "vl: introduce vm_shutdown()". > It's because of the newly introduced function vm_shutdown calls bdrv_drain_all, > which is called later by bdrv_close_all. bdrv_drain_all resumes the jobs > that doubles the speed and offset is doubled. > Some jobs' status are changed as well. > > The fix is to not resume the jobs that are already yielded and also change > 185.out accordingly. > > Suggested-by: Stefan Hajnoczi > Signed-off-by: QingFeng Hao > --- > blockjob.c | 10 +++++++++- > include/block/blockjob.h | 5 +++++ > tests/qemu-iotests/185.out | 11 +++++++++-- If drain no longer forces the block job to iterate, shouldn't the test output remain the same? (The means the test is fixed by the QEMU patch.) > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/blockjob.c b/blockjob.c > index ef3ed69ff1..fa9838ac97 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -206,11 +206,16 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job) > > static void block_job_pause(BlockJob *job) > { > - job->pause_count++; > + if (!job->yielded) { > + job->pause_count++; > + } The pause cannot be ignored. This change introduces a bug. Pause is not a synchronous operation that stops the job immediately. Pause just remembers that the job needs to be paused. When the job runs again (e.g. timer callback, fd handler) it eventually reaches block_job_pause_point() where it really pauses. The bug in this patch is: 1. The job has a timer pending. 2. block_job_pause() is called during drain. 3. The timer fires during drain but now the job doesn't know it needs to pause, so it continues running! Instead what needs to happen is that block_job_pause() remains unmodified but block_job_resume if extended: static void block_job_resume(BlockJob *job) { assert(job->pause_count > 0); job->pause_count--; if (job->pause_count) { return; } + if (job_yielded_before_pause_and_is_still_yielded) { block_job_enter(job); + } } This handles the case I mentioned above, where the yield ends before pause ends (therefore resume must enter the job!). To make this a little clearer, there are two cases to consider: Case 1: 1. Job yields 2. Pause 3. Job is entered from timer/fd callback 4. Resume (enter job? yes) Case 2: 1. Job yields 2. Pause 3. Resume (enter job? no) 4. Job is entered from timer/fd callback Stefan