From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACCB7C4CEC4 for ; Wed, 18 Sep 2019 20:33:21 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6C76E21897 for ; Wed, 18 Sep 2019 20:33:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C76E21897 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:35030 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iAgdc-0005Fx-Du for qemu-devel@archiver.kernel.org; Wed, 18 Sep 2019 16:33:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42635) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iAgbV-0004Eu-Ry for qemu-devel@nongnu.org; Wed, 18 Sep 2019 16:31:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iAgbU-0005mw-Co for qemu-devel@nongnu.org; Wed, 18 Sep 2019 16:31:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55484) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iAgbQ-0005lu-NT; Wed, 18 Sep 2019 16:31:04 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0732588134F; Wed, 18 Sep 2019 20:31:04 +0000 (UTC) Received: from [10.10.124.73] (ovpn-124-73.rdu2.redhat.com [10.10.124.73]) by smtp.corp.redhat.com (Postfix) with ESMTP id D06E510013A1; Wed, 18 Sep 2019 20:31:02 +0000 (UTC) From: John Snow To: Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf References: <20190809201333.29033-1-jsnow@redhat.com> <154bc276-d782-443f-3db6-38d87992d609@redhat.com> <20190910081942.GA23976@stefanha-x1.localdomain> <9bf835d7-8bfa-feba-c2f7-acd6cda4a81e@redhat.com> Autocrypt: addr=jsnow@redhat.com; prefer-encrypt=mutual; keydata= mQINBFTKefwBEAChvwqYC6saTzawbih87LqBYq0d5A8jXYXaiFMV/EvMSDqqY4EY6whXliNO IYzhgrPEe7ZmPxbCSe4iMykjhwMh5byIHDoPGDU+FsQty2KXuoxto+ZdrP9gymAgmyqdk3aV vzzmCa3cOppcqKvA0Kqr10UeX/z4OMVV390V+DVWUvzXpda45/Sxup57pk+hyY52wxxjIqef rj8u5BN93s5uCVTus0oiVA6W+iXYzTvVDStMFVqnTxSxlpZoH5RGKvmoWV3uutByQyBPHW2U 1Y6n6iEZ9MlP3hcDqlo0S8jeP03HaD4gOqCuqLceWF5+2WyHzNfylpNMFVi+Hp0H/nSDtCvQ ua7j+6Pt7q5rvqgHvRipkDDVsjqwasuNc3wyoHexrBeLU/iJBuDld5iLy+dHXoYMB3HmjMxj 3K5/8XhGrDx6BDFeO3HIpi3u2z1jniB7RtyVEtdupED6lqsDj0oSz9NxaOFZrS3Jf6z/kHIf h42mM9Sx7+s4c07N2LieUxcfqhFTaa/voRibF4cmkBVUhOD1AKXNfhEsTvmcz9NbUchCkcvA T9119CrsxfVsE7bXiGvdXnzyGLXdsoosjzwacKdOrVaDmN3Uy+SHiQXo6TlkSdV0XH2PUxTM LsBFIO9qXO43Ai6J6iPAP/01l8fuZfpJE0/L/c25yyaND7xA3wARAQABtCpKb2huIFNub3cg KEpvaG4gSHVzdG9uKSA8anNub3dAcmVkaGF0LmNvbT6JAlQEEwECAD4CGwMCHgECF4AFCwkI BwMFFQoJCAsFFgIDAQAWIQT665cRoSz0dYEvGPKIqQZNGDVh6wUCXF392gUJC1Xq3gAKCRCI qQZNGDVh6558D/9pM4pu4njX5aT6uUW3vAmbWLF1jfPxiTQgSHAnm9EBMZED/fsvkzj97clo LN7JKmbYZNgJmR01A7flG45V4iOR/249qAfaVuD+ZzZi1R4jFzr13WS+IEdn0hYp9ITndb7R ezW+HGu6/rP2PnfmDnNowgJu6Dp6IUEabq8SXXwGHXZPuMIrsXJxUdKJdGnh1o2u7271yNO7 J9PEMuMDsgjsdnaGtv7aQ9CECtXvBleAc06pLW2HU10r5wQyBMZGITemJdBhhdzGmbHAL0M6 vKi/bafHRWqfMqOAdDkv3Jg4arl2NCG/uNateR1z5e529+UlB4XVAQT+f5T/YyI65DFTY940 il3aZhA8u788jZEPMXmt94u7uPZbEYp7V0jt68SrTaOgO7NaXsboXFjwEa42Ug5lB5d5/Qdp 1AITUv0NJ51kKwhHL1dEagGeloIsGVQILmpS0MLdtitBHqZLsnJkRvtMaxo47giyBlv2ewmq tIGTlVLxHx9xkc9aVepOuiGlZaZB72c9AvZs9rKaAjgU2UfJHlB/Hr4uSk/1EY0IgMv4vnsG 1sA5gvS7A4T4euu0PqHtn2sZEWDrk5RDbw0yIb53JYdXboLFmFXKzVASfKh2ZVeXRBlQQSJi 3PBR1GzzqORlfryby7mkY857xzCI2NkIkD2eq+HhzFTfFOTdGrkCDQRUynn8ARAAwbhP45BE d/zAMBPV2dk2WwIwKRSKULElP3kXpcuiDWYQob3UODUUqClO+3aXVRndaNmZX9WbzGYexVo3 5j+CVBCGr3DlU8AL9pp3KQ3SJihWcDed1LSmUf8tS+10d6mdGxDqgnd/OWU214isvhgWZtZG MM/Xj7cx5pERIiP+jqu7PT1cibcfcEKhPjYdyV1QnLtKNGrTg/UMKaL+qkWBUI/8uBoa0HLs NH63bXsRtNAG8w6qG7iiueYZUIXKc4IHINUguqYQJVdSe+u8b2N5XNhDSEUhdlqFYraJvX6d TjxMTW5lzVG2KjztfErRNSUmu2gezbw1/CV0ztniOKDA7mkQi6UIUDRh4LxRm5mflfKiCyDQ L6P/jxHBxFv+sIgjuLrfNhIC1p3z9rvCh+idAVJgtHtYl8p6GAVrF+4xQV2zZH45tgmHo2+S JsLPjXZtWVsWANpepXnesyabWtNAV4qQB7/SfC77zZwsVX0OOY2Qc+iohmXo8U7DgXVDgl/R /5Qgfnlv0/3rOdMt6ZPy5LJr8D9LJmcP0RvX98jyoBOf06Q9QtEwJsNLCOCo2LKNL71DNjZr nXEwjUH66CXiRXDbDKprt71BiSTitkFhGGU88XCtrp8R9yArXPf4MN+wNYBjfT7K29gWTzxt 9DYQIvEf69oZD5Z5qHYGp031E90AEQEAAYkCPAQYAQIAJgIbDBYhBPrrlxGhLPR1gS8Y8oip Bk0YNWHrBQJcXf3JBQkLVerNAAoJEIipBk0YNWHrU1AP/1FOK2SBGbyhHa5vDHuf47fgLipC e0/h1E0vdSonzlhPxuZoQ47FjzG9uOhqqQG6/PqtWs/FJIyz8aGG4aV+pSA/9Ko3/2ND8MSY ZflWs7Y8Peg08Ro01GTHFITjEUgHpTpHiT6TNcZB5aZNJ8jqCtW5UlqvXXbVeSTmO70ZiVtc vUJbpvSxYmzhFfZWaXIPcNcKWL1rnmnzs67lDhMLdkYVf91aml/XtyMUlfB8Iaejzud9Ht3r C0pA9MG57pLblX7okEshxAC0+tUdY2vANWFeX0mgqRt1GSuG9XM9H/cKP1czfUV/FgaWo/Ya fM4eMhUAlL/y+/AJxxumPhBXftM4yuiktp2JMezoIMJI9fmhjfWDw7+2jVrx9ze1joLakFD1 rVAoHxVJ7ORfQ4Ni/qWbQm3T6qQkSMt4N/scNsMczibdTPxU7qtwQwIeFOOc3wEwmJ9Qe3ox TODQ0agXiWVj0OXYCHJ6MxTDswtyTGQW+nUHpKBgHGwUaR6d1kr/LK9+5LpOfRlK9VRfEu7D PGNiRkr8Abp8jHsrBqQWfUS1bAf62bq6XUel0kUCtb7qCq024aOczXYWPFpJFX+nhp4d7NeH Edq+wlC13sBSiSHC7T5yssJ+7JPa2ATLlSKhEvBsLe2TsSTTtFlA0nBclqhfJXzimiuge9qU E40lvMWBuQINBFTKimUBEADDbJ+pQ5M4QBMWkaWImRj7c598xIZ37oKM6rGaSnuB1SVb7YCr Ci2MTwQcrQscA2jm80O8VFqWk+/XsEp62dty47GVwSfdGje/3zv3VTH2KhOCKOq3oPP5ZXWY rz2d2WnTvx++o6lU7HLHDEC3NGLYNLkL1lyVxLhnhvcMxkf1EGA1DboEcMgnJrNB1pGP27ww cSfvdyPGseV+qZZa8kuViDga1oxmnYDxFKMGLxrClqHrRt8geQL1Wj5KFM5hFtGTK4da5lPn wGNd6/CINMeCT2AWZY5ySz7/tSZe5F22vPvVZGoPgQicYWdNc3ap7+7IKP86JNjmec/9RJcz jvrYjJdiqBVldXou72CtDydKVLVSKv8c2wBDJghYZitfYIaL8cTvQfUHRYTfo0n5KKSec8Vo vjDuxmdbOUBA+SkRxqmneP5OxGoZ92VusrwWCjry8HRsNdR+2T+ClDCO6Wpihu4V3CPkQwTy eCuMHPAT0ka5paTwLrnZIxsdfnjUa96T10vzmQgAxpbbiaLvgKJ8+76OPdDnhddyxd2ldYfw RkF5PEGg3mqZnYKNNBtwjvX49SAvgETQvLzQ8IKVgZS0m4z9qHHvtc1BsQnFfe+LJOFjzZr7 CrDNJMqk1JTHYsSi2JcN3vY32WMezXSQ0TzeMK4kdnclSQyp/h23GWod5QARAQABiQRbBBgB AgAmAhsCFiEE+uuXEaEs9HWBLxjyiKkGTRg1YesFAlxd/coFCQtV2mQCKcFdIAQZAQIABgUC VMqKZQAKCRB974EGqvw5DiJoEACLmuiRq9ifvOh5DyBFwRS7gvA14DsGQngmC57EzV0EFcfM XVi1jX5OtwUyUe0Az5r6lHyyHDsDsIpLKBlWrYCeLpUhRR3oy181T7UNxvujGFeTkzvLAOo6 Hs3b8Wv9ARg+7acRYkQRNY7k0GIJ6YZz149tRyRKAy/vSjsaB9Lt0NOd1wf2EQMKwRVELwJD y0AazGn+0PRP7Bua2YbtxaBmhBBDb2tPpwn8U9xdckB4Vlft9lcWNsC/18Gi9bpjd9FSbdH/ sOUI+3ToWYENeoT4IP09wn6EkgWaJS3nAUN/MOycNej2i4Yhy2wDDSKyTAnVkSSSoXk+tK91 HfqtokbDanB8daP+K5LgoiWHzjfWzsxA2jKisI4YCGjrYQzTyGOT6P6u6SEeoEx10865B/zc 8/vN50kncdjYz2naacIDEKQNZlnGLsGkpCbfmfdi3Zg4vuWKNdWr0wGUzDUcpqW0y/lUXna+ 6uyQShX5e4JD2UPuf9WAQ9HtgSAkaDd4O1I2J41sleePzZOVB3DmYgy+ECRJJ5nw3ihdxpgc y/v3lfcJaqiyCv0PF+K/gSOvwhH7CbVqARmptT7yhhxqFdaYWo2Z2ksuKyoKSRMFCXQY5oac uTmyPIT4STFyUQFeqSCWDum/NFNoSKhmItw2Td+4VSJHShRVbg39KNFPZ7mXYAkQiKkGTRg1 YesWJA/+PV3qDUtPNEGwjVvjQqHSbrBy94tu6gJvPHgGPtRDYvxnCaJsmgiC0pGB2KFRsnfl 2zBNBEWF/XwsI081jQE5UO60GKmHTputChLXpVobyuc+lroG2YhknXRBAV969SLnZR4BS/1s Gi046gOXfaKYatve8BiZr5it5Foq3FMPDNgZMit1H9Dk8rkKFfDMRf8EGS/Z+TmyEsIf99H7 TH3n7lco8qO81fSFwkh4pvo2kWRFYTC5vsIVQ+GqVUp+W1DZJHxX8LwWuF1AzUt4MUTtNAvy TXl5EgsmoY9mpNNL7ZnW65oG63nEP5KNiybvuQJzXVxR8eqzOh2Mod4nHg3PE7UCd3DvLNsn GXFRo44WyT/G2lArBtjpkut7bDm0i1nENABy2UgS+1QvdmgNu6aEZxdNthwRjUhuuvCCDMA4 rCDQYyakH2tJNQgkXkeLodBKF4bHiBbuwj0E39S9wmGgg+q4OTnAO/yhQGknle7a7G5xHBwE i0HjnLoJP5jDcoMTabZTIazXmJz3pKM11HYJ5/ZsTIf3ZRJJKIvXJpbmcAPVwTZII6XxiJdh RSSX4Mvd5pL/+5WI6NTdW6DMfigTtdd85fe6PwBNVJL2ZvBfsBJZ5rxg1TOH3KLsYBqBTgW2 glQofxhkJhDEcvjLhe3Y2BlbCWKOmvM8XS9TRt0OwUs= Message-ID: <0abc4992-9322-010a-118b-62e79cbc5b58@redhat.com> Date: Wed, 18 Sep 2019 16:31:02 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: <9bf835d7-8bfa-feba-c2f7-acd6cda4a81e@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.69]); Wed, 18 Sep 2019 20:31:04 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] block/backup: install notifier during creation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Max Reitz , Vladimir Sementsov-Ogievskiy , "qemu-devel@nongnu.org" , "qemu-block@nongnu.org" , "qemu-stable@nongnu.org" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On 9/10/19 9:23 AM, John Snow wrote: >=20 >=20 > On 9/10/19 4:19 AM, Stefan Hajnoczi wrote: >> On Wed, Aug 21, 2019 at 04:01:52PM -0400, John Snow wrote: >>> >>> >>> On 8/21/19 10:41 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> 09.08.2019 23:13, John Snow wrote: >>>>> Backup jobs may yield prior to installing their handler, because of= the >>>>> job_co_entry shim which guarantees that a job won't begin work unti= l >>>>> we are ready to start an entire transaction. >>>>> >>>>> Unfortunately, this makes proving correctness about transactional >>>>> points-in-time for backup hard to reason about. Make it explicitly = clear >>>>> by moving the handler registration to creation time, and changing t= he >>>>> write notifier to a no-op until the job is started. >>>>> >>>>> Reported-by: Vladimir Sementsov-Ogievskiy >>>>> Signed-off-by: John Snow >>>>> --- >>>>> block/backup.c | 32 +++++++++++++++++++++++--------- >>>>> include/qemu/job.h | 5 +++++ >>>>> job.c | 2 +- >>>>> 3 files changed, 29 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/block/backup.c b/block/backup.c >>>>> index 07d751aea4..4df5b95415 100644 >>>>> --- a/block/backup.c >>>>> +++ b/block/backup.c >>>>> @@ -344,6 +344,13 @@ static int coroutine_fn backup_before_write_no= tify( >>>>> assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE)); >>>>> assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE)); >>>>> =20 >>>>> + /* The handler is installed at creation time; the actual point= -in-time >>>>> + * starts at job_start(). Transactions guarantee those two poi= nts are >>>>> + * the same point in time. */ >>>>> + if (!job_started(&job->common.job)) { >>>>> + return 0; >>>>> + } >>>> >>>> Hmm, sorry if it is a stupid question, I'm not good in multiprocessi= ng and in >>>> Qemu iothreads.. >>>> >>>> job_started just reads job->co. If bs runs in iothread, and therefor= e write-notifier >>>> is in iothread, when job_start is called from main thread.. Is it gu= aranteed that >>>> write-notifier will see job->co variable change early enough to not = miss guest write? >>>> Should not job->co be volatile for example or something like this? >>>> >>>> If not think about this patch looks good for me. >>>> >>> >>> You know, it's a really good question. >>> So good, in fact, that I have no idea. >>> >>> =C2=AF\_(=E3=83=84)_/=C2=AF >>> >>> I'm fairly certain that IO will not come in until the .clean phase of= a >>> qmp_transaction, because bdrv_drained_begin(bs) is called during >>> .prepare, and we activate the handler (by starting the job) in .commi= t. >>> We do not end the drained section until .clean. >>> >>> I'm not fully clear on what threading guarantees we have otherwise, >>> though; is it possible that "Thread A" would somehow lift the bdrv_dr= ain >>> on an IO thread ("Thread B") and, after that, "Thread B" would someho= w >>> still be able to see an outdated version of job->co that was set by >>> "Thread A"? >>> >>> I doubt it; but I can't prove it. >> >> In the qmp_backup() case (not qmp_transaction()) there is: >> >> void qmp_drive_backup(DriveBackup *arg, Error **errp) >> { >> >> BlockJob *job; >> job =3D do_drive_backup(arg, NULL, errp); >> if (job) { >> job_start(&job->job); >> } >> } >> >> job_start() is called without any thread synchronization, which is >> usually fine because the coroutine doesn't run until job_start() calls >> aio_co_enter(). >> >> Now that the before write notifier has been installed early, there is >> indeed a race between job_start() and the write notifier accessing >> job->co from an IOThread. >> >> The write before notifier might see job->co !=3D NULL before job_start= () >> has finished. This could lead to issues if job_*() APIs are invoked b= y >> the write notifier and access an in-between job state. >> >=20 > I see. I think in this case, as long as it sees !=3D NULL, that the > notifier is actually safe to run. I agree that this might be confusing > to verify and could bite us in the future. The worry we had, too, is > more the opposite: will it see NULL for too long? We want to make sure > that it is registering as true *before the first yield*. >=20 >> A safer approach is to set a BackupBlockJob variable at the beginning = of >> backup_run() and check it from the before write notifier. >> >=20 > That's too late, for reasons below. >=20 >> That said, I don't understand the benefit of this patch and IMO it mak= es >> the code harder to understand because now we need to think about the >> created but not started state too. >> >> Stefan >> >=20 > It's always possible I've hyped myself up into believing there's a > problem where there isn't one, but the fear is this: >=20 > The point in time from a QMP transaction covers the job creation and th= e > job start, but when we start the job it will actually yield before we > get to backup_run -- and there is no guarantee that the handler will ge= t > installed synchronously, so the point in time ends before the handler > activates. >=20 i.e., the handler might get installed AFTER the critical region of a transaction. We could drop initial writes if we were unlucky. (I think.) > The yield occurs in job_co_entry as an intentional feature of forcing a > yield and pause point at run time -- so it's harder to write a job that > accidentally hogs the thread during initialization. >=20 > This is an attempt to get the handler installed earlier to ensure the > point of time stays synchronized with creation time to provide a > stronger transactional guarantee. >=20 Squeaky wheel gets the grease. Any comment?