* Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes [not found] ` <20190329193224.GP5081@localhost.localdomain> @ 2019-04-01 7:21 ` Vladimir Sementsov-Ogievskiy 2019-04-03 16:41 ` Max Reitz 0 siblings, 1 reply; 3+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2019-04-01 7:21 UTC (permalink / raw) To: Kevin Wolf Cc: Max Reitz, qemu-devel@nongnu.org, qemu-block@nongnu.org, Denis Lunev, fam@euphon.net, eblake@redhat.com, jsnow@redhat.com 29.03.2019 22:32, Kevin Wolf wrote: > Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote: >>> 29.03.2019 20:44, Max Reitz wrote: >>>> On 29.03.19 18:40, Kevin Wolf wrote: >>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben: >>>>>> On 29.03.19 18:24, Kevin Wolf wrote: >>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben: >>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on >>>>>>>>> loosening permissions. However file-locking operations may fail even >>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash. >>>>>>>>> >>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on >>>>>>>>> permission update commit and abort. >>>>>>>>> >>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>>>>>>> --- >>>>>>>>> block/file-posix.c | 12 ++++++++++++ >>>>>>>>> 1 file changed, 12 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c >>>>>>>>> index db4cccbe51..1cf4ee49eb 100644 >>>>>>>>> --- a/block/file-posix.c >>>>>>>>> +++ b/block/file-posix.c >>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs, >>>>>>>>> switch (op) { >>>>>>>>> case RAW_PL_PREPARE: >>>>>>>>> + if ((s->perm | new_perm) == s->perm && >>>>>>>>> + (s->shared_perm & new_shared) == s->shared_perm) >>>>>>>>> + { >>>>>>>>> + /* >>>>>>>>> + * We are going to unlock bytes, it should not fail. If it fail due >>>>>>>>> + * to some fs-dependent permission-unrelated reasons (which occurs >>>>>>>>> + * sometimes on NFS and leads to abort in bdrv_replace_child) we >>>>>>>>> + * can't prevent such errors by any check here. And we ignore them >>>>>>>>> + * anyway in ABORT and COMMIT. >>>>>>>>> + */ >>>>>>>>> + return 0; >>>>>>>>> + } >>>>>>>>> ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm, >>>>>>>>> ~s->shared_perm | ~new_shared, >>>>>>>>> false, errp); >>>>>>>> >>>>>>>> Help me understand the exact issue, please. I understand that there are >>>>>>>> operations like bdrv_replace_child() that pass &error_abort to >>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should >>>>>>>> not fail. >>>>>>>> >>>>>>>> However, if the whole effect really would be to loosen permissions, >>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway: >>>>>>>> @unlock is passed as false, so no bytes will be unlocked. And if >>>>>>>> permissions are just loosened (as your condition checks), it should not >>>>>>>> lock any bytes. >>>>>>>> >>>>>>>> So why does it attempt lock any bytes in the first place? There must be >>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm >>>>>>>> and s->locked_shared_perm. How does that occur? >>>>>>> >>>>>>> I suppose raw_check_lock_bytes() is what is failing, not >>>>>>> raw_apply_lock_bytes(). >>>>>> >>>>>> Hm, maybe in Vladimir's case, but not in e.g. >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 . >>>>> >>>>> This is reported against 3.0, which didn't avoid re-locking permissions >>>>> that we already hold, so there raw_apply_lock_bytes() can still fail. >>>> >>>> That makes sense. Which leaves the question why Vladimir still seems to >>>> see the error there...? >>>> >>> >>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed >>> upstream. I don't have a reproducer, only old coredumps. >>> >>> So, now it looks like we don't need this patch, as on permission loosening file-posix >>> don't call any FS apis, yes? >>> >> >> >> Ah, you mentioned, that raw_check_lock_bytes is still buggy. > > I haven't tried it out, but from looking at the code it seems so. Maybe > you can reproduce on master just to be sure? > I don't have a reproducer :( -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes 2019-04-01 7:21 ` [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes Vladimir Sementsov-Ogievskiy @ 2019-04-03 16:41 ` Max Reitz 2019-04-03 16:56 ` Max Reitz 0 siblings, 1 reply; 3+ messages in thread From: Max Reitz @ 2019-04-03 16:41 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Kevin Wolf Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, Denis Lunev, fam@euphon.net, eblake@redhat.com, jsnow@redhat.com [-- Attachment #1: Type: text/plain, Size: 5975 bytes --] On 01.04.19 09:21, Vladimir Sementsov-Ogievskiy wrote: > 29.03.2019 22:32, Kevin Wolf wrote: >> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote: >>>> 29.03.2019 20:44, Max Reitz wrote: >>>>> On 29.03.19 18:40, Kevin Wolf wrote: >>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben: >>>>>>> On 29.03.19 18:24, Kevin Wolf wrote: >>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben: >>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on >>>>>>>>>> loosening permissions. However file-locking operations may fail even >>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash. >>>>>>>>>> >>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on >>>>>>>>>> permission update commit and abort. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>>>>>>>> --- >>>>>>>>>> block/file-posix.c | 12 ++++++++++++ >>>>>>>>>> 1 file changed, 12 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c >>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644 >>>>>>>>>> --- a/block/file-posix.c >>>>>>>>>> +++ b/block/file-posix.c >>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs, >>>>>>>>>> switch (op) { >>>>>>>>>> case RAW_PL_PREPARE: >>>>>>>>>> + if ((s->perm | new_perm) == s->perm && >>>>>>>>>> + (s->shared_perm & new_shared) == s->shared_perm) >>>>>>>>>> + { >>>>>>>>>> + /* >>>>>>>>>> + * We are going to unlock bytes, it should not fail. If it fail due >>>>>>>>>> + * to some fs-dependent permission-unrelated reasons (which occurs >>>>>>>>>> + * sometimes on NFS and leads to abort in bdrv_replace_child) we >>>>>>>>>> + * can't prevent such errors by any check here. And we ignore them >>>>>>>>>> + * anyway in ABORT and COMMIT. >>>>>>>>>> + */ >>>>>>>>>> + return 0; >>>>>>>>>> + } >>>>>>>>>> ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm, >>>>>>>>>> ~s->shared_perm | ~new_shared, >>>>>>>>>> false, errp); >>>>>>>>> >>>>>>>>> Help me understand the exact issue, please. I understand that there are >>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to >>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should >>>>>>>>> not fail. >>>>>>>>> >>>>>>>>> However, if the whole effect really would be to loosen permissions, >>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway: >>>>>>>>> @unlock is passed as false, so no bytes will be unlocked. And if >>>>>>>>> permissions are just loosened (as your condition checks), it should not >>>>>>>>> lock any bytes. >>>>>>>>> >>>>>>>>> So why does it attempt lock any bytes in the first place? There must be >>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm >>>>>>>>> and s->locked_shared_perm. How does that occur? >>>>>>>> >>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not >>>>>>>> raw_apply_lock_bytes(). >>>>>>> >>>>>>> Hm, maybe in Vladimir's case, but not in e.g. >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 . >>>>>> >>>>>> This is reported against 3.0, which didn't avoid re-locking permissions >>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail. >>>>> >>>>> That makes sense. Which leaves the question why Vladimir still seems to >>>>> see the error there...? >>>>> >>>> >>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed >>>> upstream. I don't have a reproducer, only old coredumps. >>>> >>>> So, now it looks like we don't need this patch, as on permission loosening file-posix >>>> don't call any FS apis, yes? >>>> >>> >>> >>> Ah, you mentioned, that raw_check_lock_bytes is still buggy. >> >> I haven't tried it out, but from looking at the code it seems so. Maybe >> you can reproduce on master just to be sure? >> > > I don't have a reproducer :( I have one, but it only breaks before 2996ffad3acabe890fbb4f84a069cdc325a68108: First, setup on an NFS mount on /mnt/nfs. Second: $ qemu-img create -f qcow2 /mnt/nfs/foo.qcow2 64M Formatting '/mnt/nfs/foo.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ (sleep 5; echo "{'execute':'qmp_capabilities'}"; \ echo "{'execute':'blockdev-del','arguments':{'node-name':'fmt'}}"; echo "{'execute':'quit'}") \ | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \ -blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.qcow2 \ -blockdev node-name=fmt,driver=qcow2,file=proto {"QMP": {"version": {"qemu": {"micro": 90, "minor": 0, "major": 3}, "package": "v3.1.0-rc0-71-ga883d6a0bc"}, "capabilities": []}} Before the sleep is done, stop the service on the NFS host: $ systemctl stop nfs-service Once the sleep has run out (you get a {"return": {}} over QMP), start the service again: $ systemctl start nfs-service And then this happens: Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:705: Failed to lock byte 100 [1] 30486 done ( sleep 5; echo "{'execute':'qmp_capabilities'}"; echo ; echo ; ) | 30487 abort (core dumped) x86_64-softmmu/qemu-system-x86_64 -qmp stdio -blockdev -blockdev It works fine after 2996ffad3acabe890fbb4f84a069cdc325a68108. Max [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes 2019-04-03 16:41 ` Max Reitz @ 2019-04-03 16:56 ` Max Reitz 0 siblings, 0 replies; 3+ messages in thread From: Max Reitz @ 2019-04-03 16:56 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Kevin Wolf Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, Denis Lunev, fam@euphon.net, eblake@redhat.com, jsnow@redhat.com [-- Attachment #1: Type: text/plain, Size: 7654 bytes --] On 03.04.19 18:41, Max Reitz wrote: > On 01.04.19 09:21, Vladimir Sementsov-Ogievskiy wrote: >> 29.03.2019 22:32, Kevin Wolf wrote: >>> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote: >>>>> 29.03.2019 20:44, Max Reitz wrote: >>>>>> On 29.03.19 18:40, Kevin Wolf wrote: >>>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben: >>>>>>>> On 29.03.19 18:24, Kevin Wolf wrote: >>>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben: >>>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on >>>>>>>>>>> loosening permissions. However file-locking operations may fail even >>>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash. >>>>>>>>>>> >>>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on >>>>>>>>>>> permission update commit and abort. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>>>>>>>>> --- >>>>>>>>>>> block/file-posix.c | 12 ++++++++++++ >>>>>>>>>>> 1 file changed, 12 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c >>>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644 >>>>>>>>>>> --- a/block/file-posix.c >>>>>>>>>>> +++ b/block/file-posix.c >>>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs, >>>>>>>>>>> switch (op) { >>>>>>>>>>> case RAW_PL_PREPARE: >>>>>>>>>>> + if ((s->perm | new_perm) == s->perm && >>>>>>>>>>> + (s->shared_perm & new_shared) == s->shared_perm) >>>>>>>>>>> + { >>>>>>>>>>> + /* >>>>>>>>>>> + * We are going to unlock bytes, it should not fail. If it fail due >>>>>>>>>>> + * to some fs-dependent permission-unrelated reasons (which occurs >>>>>>>>>>> + * sometimes on NFS and leads to abort in bdrv_replace_child) we >>>>>>>>>>> + * can't prevent such errors by any check here. And we ignore them >>>>>>>>>>> + * anyway in ABORT and COMMIT. >>>>>>>>>>> + */ >>>>>>>>>>> + return 0; >>>>>>>>>>> + } >>>>>>>>>>> ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm, >>>>>>>>>>> ~s->shared_perm | ~new_shared, >>>>>>>>>>> false, errp); >>>>>>>>>> >>>>>>>>>> Help me understand the exact issue, please. I understand that there are >>>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to >>>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should >>>>>>>>>> not fail. >>>>>>>>>> >>>>>>>>>> However, if the whole effect really would be to loosen permissions, >>>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway: >>>>>>>>>> @unlock is passed as false, so no bytes will be unlocked. And if >>>>>>>>>> permissions are just loosened (as your condition checks), it should not >>>>>>>>>> lock any bytes. >>>>>>>>>> >>>>>>>>>> So why does it attempt lock any bytes in the first place? There must be >>>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm >>>>>>>>>> and s->locked_shared_perm. How does that occur? >>>>>>>>> >>>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not >>>>>>>>> raw_apply_lock_bytes(). >>>>>>>> >>>>>>>> Hm, maybe in Vladimir's case, but not in e.g. >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 . >>>>>>> >>>>>>> This is reported against 3.0, which didn't avoid re-locking permissions >>>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail. >>>>>> >>>>>> That makes sense. Which leaves the question why Vladimir still seems to >>>>>> see the error there...? >>>>>> >>>>> >>>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed >>>>> upstream. I don't have a reproducer, only old coredumps. >>>>> >>>>> So, now it looks like we don't need this patch, as on permission loosening file-posix >>>>> don't call any FS apis, yes? >>>>> >>>> >>>> >>>> Ah, you mentioned, that raw_check_lock_bytes is still buggy. >>> >>> I haven't tried it out, but from looking at the code it seems so. Maybe >>> you can reproduce on master just to be sure? >>> >> >> I don't have a reproducer :( > > I have one, but it only breaks before > 2996ffad3acabe890fbb4f84a069cdc325a68108: > > First, setup on an NFS mount on /mnt/nfs. Second: > > $ qemu-img create -f qcow2 /mnt/nfs/foo.qcow2 64M > Formatting '/mnt/nfs/foo.qcow2', fmt=qcow2 size=67108864 > cluster_size=65536 lazy_refcounts=off refcount_bits=16 > $ (sleep 5; echo "{'execute':'qmp_capabilities'}"; \ > echo "{'execute':'blockdev-del','arguments':{'node-name':'fmt'}}"; > echo "{'execute':'quit'}") \ > | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \ > -blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.qcow2 \ > -blockdev node-name=fmt,driver=qcow2,file=proto > {"QMP": {"version": {"qemu": {"micro": 90, "minor": 0, "major": 3}, > "package": "v3.1.0-rc0-71-ga883d6a0bc"}, "capabilities": []}} > > Before the sleep is done, stop the service on the NFS host: > > $ systemctl stop nfs-service > > Once the sleep has run out (you get a {"return": {}} over QMP), start > the service again: > > $ systemctl start nfs-service > > And then this happens: > > Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:705: > Failed to lock byte 100 > [1] 30486 done ( sleep 5; echo > "{'execute':'qmp_capabilities'}"; echo ; echo ; ) | > 30487 abort (core dumped) x86_64-softmmu/qemu-system-x86_64 -qmp > stdio -blockdev -blockdev > > It works fine after 2996ffad3acabe890fbb4f84a069cdc325a68108. Now I have a reproducer that breaks before this patch here and works afterwards: You just need two parents and delete one of them, so some permissions stay taken. So, we can do this: $ (echo "{'execute':'qmp_capabilities'}"; \ echo "{'execute':'nbd-server-start', 'arguments':{'addr':{'type':'inet', 'data':{'host':'0.0.0.0','port':'10809'}}}}"; \ echo "{'execute':'nbd-server-add', 'arguments':{'device':'proto'}}"; \ sleep 5; \ echo "{'execute':'nbd-server-stop'}"; \ echo "{'execute':'quit'}") \ | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \ -blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.img \ -device virtio-blk,drive=proto {"QMP": {"version": {"qemu": {"micro": 91, "minor": 1, "major": 3}, "package": "v4.0.0-rc1-74-g38e694fcc9"}, "capabilities": ["oob"]}} {"return": {}} {"return": {}} {"return": {}} Then immediately this on the NFS host: $ sudo systemctl stop nfs-server; sleep 6; \ sudo systemctl start nfs-server And this happens on the client: Unexpected error in raw_check_lock_bytes() at block/file-posix.c:775: Failed to get "consistent read" lock [1] 21289 done ( echo "{'execute':'qmp_capabilities'}"; echo ; echo ; sleep 5; echo ; echo ; | 21290 abort (core dumped) x86_64-softmmu/qemu-system-x86_64 -qmp stdio -blockdev -device No issues after 696aaaed579ac5bf5fa336216909b46d3d8f07a8 (this patch here). Max [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-04-03 16:56 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20190329110454.82409-1-vsementsov@virtuozzo.com> [not found] ` <5d4b5522-c31b-f69c-93c2-506fef535745@redhat.com> [not found] ` <20190329172411.GM5081@localhost.localdomain> [not found] ` <85784659-e4ed-4a21-8606-c12de3db064a@redhat.com> [not found] ` <20190329174059.GO5081@localhost.localdomain> [not found] ` <2f820393-b9e1-cd23-a220-bae14b98ab4a@redhat.com> [not found] ` <993ee436-f998-bba4-cd50-03142174b7cd@virtuozzo.com> [not found] ` <4f035f02-ded6-7796-836e-937ce45079f0@virtuozzo.com> [not found] ` <20190329193224.GP5081@localhost.localdomain> 2019-04-01 7:21 ` [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes Vladimir Sementsov-Ogievskiy 2019-04-03 16:41 ` Max Reitz 2019-04-03 16:56 ` Max Reitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).