* Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes
[not found] ` <20190329193224.GP5081@localhost.localdomain>
@ 2019-04-01 7:21 ` Vladimir Sementsov-Ogievskiy
2019-04-03 16:41 ` Max Reitz
0 siblings, 1 reply; 3+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-01 7:21 UTC (permalink / raw)
To: Kevin Wolf
Cc: Max Reitz, qemu-devel@nongnu.org, qemu-block@nongnu.org,
Denis Lunev, fam@euphon.net, eblake@redhat.com, jsnow@redhat.com
29.03.2019 22:32, Kevin Wolf wrote:
> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
>>> 29.03.2019 20:44, Max Reitz wrote:
>>>> On 29.03.19 18:40, Kevin Wolf wrote:
>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>>>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>>>>
>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>>>>> permission update commit and abort.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>> ---
>>>>>>>>> block/file-posix.c | 12 ++++++++++++
>>>>>>>>> 1 file changed, 12 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>>>>> --- a/block/file-posix.c
>>>>>>>>> +++ b/block/file-posix.c
>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>>>> switch (op) {
>>>>>>>>> case RAW_PL_PREPARE:
>>>>>>>>> + if ((s->perm | new_perm) == s->perm &&
>>>>>>>>> + (s->shared_perm & new_shared) == s->shared_perm)
>>>>>>>>> + {
>>>>>>>>> + /*
>>>>>>>>> + * We are going to unlock bytes, it should not fail. If it fail due
>>>>>>>>> + * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>>>>> + * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>>>>> + * can't prevent such errors by any check here. And we ignore them
>>>>>>>>> + * anyway in ABORT and COMMIT.
>>>>>>>>> + */
>>>>>>>>> + return 0;
>>>>>>>>> + }
>>>>>>>>> ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>>>> ~s->shared_perm | ~new_shared,
>>>>>>>>> false, errp);
>>>>>>>>
>>>>>>>> Help me understand the exact issue, please. I understand that there are
>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>>>>> not fail.
>>>>>>>>
>>>>>>>> However, if the whole effect really would be to loosen permissions,
>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>>>>> @unlock is passed as false, so no bytes will be unlocked. And if
>>>>>>>> permissions are just loosened (as your condition checks), it should not
>>>>>>>> lock any bytes.
>>>>>>>>
>>>>>>>> So why does it attempt lock any bytes in the first place? There must be
>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>>>>> and s->locked_shared_perm. How does that occur?
>>>>>>>
>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>>>>> raw_apply_lock_bytes().
>>>>>>
>>>>>> Hm, maybe in Vladimir's case, but not in e.g.
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>>>>
>>>>> This is reported against 3.0, which didn't avoid re-locking permissions
>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail.
>>>>
>>>> That makes sense. Which leaves the question why Vladimir still seems to
>>>> see the error there...?
>>>>
>>>
>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
>>> upstream. I don't have a reproducer, only old coredumps.
>>>
>>> So, now it looks like we don't need this patch, as on permission loosening file-posix
>>> don't call any FS apis, yes?
>>>
>>
>>
>> Ah, you mentioned, that raw_check_lock_bytes is still buggy.
>
> I haven't tried it out, but from looking at the code it seems so. Maybe
> you can reproduce on master just to be sure?
>
I don't have a reproducer :(
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes
2019-04-01 7:21 ` [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes Vladimir Sementsov-Ogievskiy
@ 2019-04-03 16:41 ` Max Reitz
2019-04-03 16:56 ` Max Reitz
0 siblings, 1 reply; 3+ messages in thread
From: Max Reitz @ 2019-04-03 16:41 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, Denis Lunev,
fam@euphon.net, eblake@redhat.com, jsnow@redhat.com
[-- Attachment #1: Type: text/plain, Size: 5975 bytes --]
On 01.04.19 09:21, Vladimir Sementsov-Ogievskiy wrote:
> 29.03.2019 22:32, Kevin Wolf wrote:
>> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
>>>> 29.03.2019 20:44, Max Reitz wrote:
>>>>> On 29.03.19 18:40, Kevin Wolf wrote:
>>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>>>>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>>>>>
>>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>>>>>> permission update commit and abort.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>>> ---
>>>>>>>>>> block/file-posix.c | 12 ++++++++++++
>>>>>>>>>> 1 file changed, 12 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>>>>>> --- a/block/file-posix.c
>>>>>>>>>> +++ b/block/file-posix.c
>>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>>>>> switch (op) {
>>>>>>>>>> case RAW_PL_PREPARE:
>>>>>>>>>> + if ((s->perm | new_perm) == s->perm &&
>>>>>>>>>> + (s->shared_perm & new_shared) == s->shared_perm)
>>>>>>>>>> + {
>>>>>>>>>> + /*
>>>>>>>>>> + * We are going to unlock bytes, it should not fail. If it fail due
>>>>>>>>>> + * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>>>>>> + * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>>>>>> + * can't prevent such errors by any check here. And we ignore them
>>>>>>>>>> + * anyway in ABORT and COMMIT.
>>>>>>>>>> + */
>>>>>>>>>> + return 0;
>>>>>>>>>> + }
>>>>>>>>>> ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>>>>> ~s->shared_perm | ~new_shared,
>>>>>>>>>> false, errp);
>>>>>>>>>
>>>>>>>>> Help me understand the exact issue, please. I understand that there are
>>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>>>>>> not fail.
>>>>>>>>>
>>>>>>>>> However, if the whole effect really would be to loosen permissions,
>>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>>>>>> @unlock is passed as false, so no bytes will be unlocked. And if
>>>>>>>>> permissions are just loosened (as your condition checks), it should not
>>>>>>>>> lock any bytes.
>>>>>>>>>
>>>>>>>>> So why does it attempt lock any bytes in the first place? There must be
>>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>>>>>> and s->locked_shared_perm. How does that occur?
>>>>>>>>
>>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>>>>>> raw_apply_lock_bytes().
>>>>>>>
>>>>>>> Hm, maybe in Vladimir's case, but not in e.g.
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>>>>>
>>>>>> This is reported against 3.0, which didn't avoid re-locking permissions
>>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail.
>>>>>
>>>>> That makes sense. Which leaves the question why Vladimir still seems to
>>>>> see the error there...?
>>>>>
>>>>
>>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
>>>> upstream. I don't have a reproducer, only old coredumps.
>>>>
>>>> So, now it looks like we don't need this patch, as on permission loosening file-posix
>>>> don't call any FS apis, yes?
>>>>
>>>
>>>
>>> Ah, you mentioned, that raw_check_lock_bytes is still buggy.
>>
>> I haven't tried it out, but from looking at the code it seems so. Maybe
>> you can reproduce on master just to be sure?
>>
>
> I don't have a reproducer :(
I have one, but it only breaks before
2996ffad3acabe890fbb4f84a069cdc325a68108:
First, setup on an NFS mount on /mnt/nfs. Second:
$ qemu-img create -f qcow2 /mnt/nfs/foo.qcow2 64M
Formatting '/mnt/nfs/foo.qcow2', fmt=qcow2 size=67108864
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ (sleep 5; echo "{'execute':'qmp_capabilities'}"; \
echo "{'execute':'blockdev-del','arguments':{'node-name':'fmt'}}";
echo "{'execute':'quit'}") \
| x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
-blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.qcow2 \
-blockdev node-name=fmt,driver=qcow2,file=proto
{"QMP": {"version": {"qemu": {"micro": 90, "minor": 0, "major": 3},
"package": "v3.1.0-rc0-71-ga883d6a0bc"}, "capabilities": []}}
Before the sleep is done, stop the service on the NFS host:
$ systemctl stop nfs-service
Once the sleep has run out (you get a {"return": {}} over QMP), start
the service again:
$ systemctl start nfs-service
And then this happens:
Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:705:
Failed to lock byte 100
[1] 30486 done ( sleep 5; echo
"{'execute':'qmp_capabilities'}"; echo ; echo ; ) |
30487 abort (core dumped) x86_64-softmmu/qemu-system-x86_64 -qmp
stdio -blockdev -blockdev
It works fine after 2996ffad3acabe890fbb4f84a069cdc325a68108.
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes
2019-04-03 16:41 ` Max Reitz
@ 2019-04-03 16:56 ` Max Reitz
0 siblings, 0 replies; 3+ messages in thread
From: Max Reitz @ 2019-04-03 16:56 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, Denis Lunev,
fam@euphon.net, eblake@redhat.com, jsnow@redhat.com
[-- Attachment #1: Type: text/plain, Size: 7654 bytes --]
On 03.04.19 18:41, Max Reitz wrote:
> On 01.04.19 09:21, Vladimir Sementsov-Ogievskiy wrote:
>> 29.03.2019 22:32, Kevin Wolf wrote:
>>> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 29.03.2019 20:44, Max Reitz wrote:
>>>>>> On 29.03.19 18:40, Kevin Wolf wrote:
>>>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>>>>>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>>>>>>
>>>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>>>>>>> permission update commit and abort.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>>>> ---
>>>>>>>>>>> block/file-posix.c | 12 ++++++++++++
>>>>>>>>>>> 1 file changed, 12 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>>>>>>> --- a/block/file-posix.c
>>>>>>>>>>> +++ b/block/file-posix.c
>>>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>>>>>> switch (op) {
>>>>>>>>>>> case RAW_PL_PREPARE:
>>>>>>>>>>> + if ((s->perm | new_perm) == s->perm &&
>>>>>>>>>>> + (s->shared_perm & new_shared) == s->shared_perm)
>>>>>>>>>>> + {
>>>>>>>>>>> + /*
>>>>>>>>>>> + * We are going to unlock bytes, it should not fail. If it fail due
>>>>>>>>>>> + * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>>>>>>> + * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>>>>>>> + * can't prevent such errors by any check here. And we ignore them
>>>>>>>>>>> + * anyway in ABORT and COMMIT.
>>>>>>>>>>> + */
>>>>>>>>>>> + return 0;
>>>>>>>>>>> + }
>>>>>>>>>>> ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>>>>>> ~s->shared_perm | ~new_shared,
>>>>>>>>>>> false, errp);
>>>>>>>>>>
>>>>>>>>>> Help me understand the exact issue, please. I understand that there are
>>>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>>>>>>> not fail.
>>>>>>>>>>
>>>>>>>>>> However, if the whole effect really would be to loosen permissions,
>>>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>>>>>>> @unlock is passed as false, so no bytes will be unlocked. And if
>>>>>>>>>> permissions are just loosened (as your condition checks), it should not
>>>>>>>>>> lock any bytes.
>>>>>>>>>>
>>>>>>>>>> So why does it attempt lock any bytes in the first place? There must be
>>>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>>>>>>> and s->locked_shared_perm. How does that occur?
>>>>>>>>>
>>>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>>>>>>> raw_apply_lock_bytes().
>>>>>>>>
>>>>>>>> Hm, maybe in Vladimir's case, but not in e.g.
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>>>>>>
>>>>>>> This is reported against 3.0, which didn't avoid re-locking permissions
>>>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail.
>>>>>>
>>>>>> That makes sense. Which leaves the question why Vladimir still seems to
>>>>>> see the error there...?
>>>>>>
>>>>>
>>>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
>>>>> upstream. I don't have a reproducer, only old coredumps.
>>>>>
>>>>> So, now it looks like we don't need this patch, as on permission loosening file-posix
>>>>> don't call any FS apis, yes?
>>>>>
>>>>
>>>>
>>>> Ah, you mentioned, that raw_check_lock_bytes is still buggy.
>>>
>>> I haven't tried it out, but from looking at the code it seems so. Maybe
>>> you can reproduce on master just to be sure?
>>>
>>
>> I don't have a reproducer :(
>
> I have one, but it only breaks before
> 2996ffad3acabe890fbb4f84a069cdc325a68108:
>
> First, setup on an NFS mount on /mnt/nfs. Second:
>
> $ qemu-img create -f qcow2 /mnt/nfs/foo.qcow2 64M
> Formatting '/mnt/nfs/foo.qcow2', fmt=qcow2 size=67108864
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> $ (sleep 5; echo "{'execute':'qmp_capabilities'}"; \
> echo "{'execute':'blockdev-del','arguments':{'node-name':'fmt'}}";
> echo "{'execute':'quit'}") \
> | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
> -blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.qcow2 \
> -blockdev node-name=fmt,driver=qcow2,file=proto
> {"QMP": {"version": {"qemu": {"micro": 90, "minor": 0, "major": 3},
> "package": "v3.1.0-rc0-71-ga883d6a0bc"}, "capabilities": []}}
>
> Before the sleep is done, stop the service on the NFS host:
>
> $ systemctl stop nfs-service
>
> Once the sleep has run out (you get a {"return": {}} over QMP), start
> the service again:
>
> $ systemctl start nfs-service
>
> And then this happens:
>
> Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:705:
> Failed to lock byte 100
> [1] 30486 done ( sleep 5; echo
> "{'execute':'qmp_capabilities'}"; echo ; echo ; ) |
> 30487 abort (core dumped) x86_64-softmmu/qemu-system-x86_64 -qmp
> stdio -blockdev -blockdev
>
> It works fine after 2996ffad3acabe890fbb4f84a069cdc325a68108.
Now I have a reproducer that breaks before this patch here and works
afterwards: You just need two parents and delete one of them, so some
permissions stay taken.
So, we can do this:
$ (echo "{'execute':'qmp_capabilities'}"; \
echo "{'execute':'nbd-server-start',
'arguments':{'addr':{'type':'inet',
'data':{'host':'0.0.0.0','port':'10809'}}}}"; \
echo "{'execute':'nbd-server-add',
'arguments':{'device':'proto'}}"; \
sleep 5; \
echo "{'execute':'nbd-server-stop'}"; \
echo "{'execute':'quit'}") \
| x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
-blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.img \
-device virtio-blk,drive=proto
{"QMP": {"version": {"qemu": {"micro": 91, "minor": 1, "major": 3},
"package": "v4.0.0-rc1-74-g38e694fcc9"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}
{"return": {}}
Then immediately this on the NFS host:
$ sudo systemctl stop nfs-server; sleep 6; \
sudo systemctl start nfs-server
And this happens on the client:
Unexpected error in raw_check_lock_bytes() at block/file-posix.c:775:
Failed to get "consistent read" lock
[1] 21289 done ( echo
"{'execute':'qmp_capabilities'}"; echo ; echo ; sleep 5; echo ; echo ; |
21290 abort (core dumped) x86_64-softmmu/qemu-system-x86_64 -qmp
stdio -blockdev -device
No issues after 696aaaed579ac5bf5fa336216909b46d3d8f07a8 (this patch here).
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-04-03 16:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190329110454.82409-1-vsementsov@virtuozzo.com>
[not found] ` <5d4b5522-c31b-f69c-93c2-506fef535745@redhat.com>
[not found] ` <20190329172411.GM5081@localhost.localdomain>
[not found] ` <85784659-e4ed-4a21-8606-c12de3db064a@redhat.com>
[not found] ` <20190329174059.GO5081@localhost.localdomain>
[not found] ` <2f820393-b9e1-cd23-a220-bae14b98ab4a@redhat.com>
[not found] ` <993ee436-f998-bba4-cd50-03142174b7cd@virtuozzo.com>
[not found] ` <4f035f02-ded6-7796-836e-937ce45079f0@virtuozzo.com>
[not found] ` <20190329193224.GP5081@localhost.localdomain>
2019-04-01 7:21 ` [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes Vladimir Sementsov-Ogievskiy
2019-04-03 16:41 ` Max Reitz
2019-04-03 16:56 ` Max Reitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).