* [PATCH 0/2] Fix test-char reference counting bug
@ 2020-12-15 22:41 Eduardo Habkost
  2020-12-15 22:41 ` [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal() Eduardo Habkost
  2020-12-15 22:41 ` [PATCH 2/2] qom: Assert that objects being destroyed have no parent Eduardo Habkost
  0 siblings, 2 replies; 10+ messages in thread
From: Eduardo Habkost @ 2020-12-15 22:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marc-André Lureau, qemu-devel
  Cc: Daniel P. Berrangé, Eduardo Habkost,
	Philippe Mathieu-Daudé
This series addresses a bug that seems to be the cause of the
following crash, that is reported by Patchew and other CI systems
once in a while:
  Running test test-char
  Unexpected error in object_property_try_add() at ../qom/object.c:1220: attempt to add duplicate property 'serial-id' to object (type 'container')
  ERROR test-char - too few tests run (expected 38, got 9)
  make: *** [run-test-86] Error 1
This is what seems to be happening:
- char_file_test_internal() creates chr using qemu_chardev_new().
- qemu_chardev_new() automatically assigns ID, adds
  chardev to the QOM tree.
- char_file_test_internal() does _not_ own the reference
  to the created object.
- char_file_test_internal() incorrectly calls object_unref().
- object is freed but, but /containers now has a dangling
  pointer.
- char_serial_test() creates a chardev with ID "serial-id", and
  it ends up being allocated at the same address as the old
  object.
- char_serial_test() correctly calls object_unparent().
- object_property_del_child() looks for the right child property
  in the hashtable, finds the dangling pointer with the same
  address, removes the wrong property, leaves a dangling
  "serial-id" property.
- New object is created by char_serial_test() with ID "serial-id".
- object_property_try_add_child() will fail because of the
  dangling "serial-id" property.
Eduardo Habkost (2):
  test-char: Destroy chardev correctly at char_file_test_internal()
  qom: Assert that objects being destroyed have no parent
 qom/object.c      | 1 +
 tests/test-char.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
-- 
2.28.0
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal()
  2020-12-15 22:41 [PATCH 0/2] Fix test-char reference counting bug Eduardo Habkost
@ 2020-12-15 22:41 ` Eduardo Habkost
  2020-12-16  7:45   ` Marc-André Lureau
  2020-12-16 16:50   ` Alex Bennée
  2020-12-15 22:41 ` [PATCH 2/2] qom: Assert that objects being destroyed have no parent Eduardo Habkost
  1 sibling, 2 replies; 10+ messages in thread
From: Eduardo Habkost @ 2020-12-15 22:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marc-André Lureau, qemu-devel
  Cc: Daniel P. Berrangé, Eduardo Habkost,
	Philippe Mathieu-Daudé
commit 1e419ee68fa5 ("chardev: generate an internal id when none
given") changed the reference ownership semantics of
qemu_chardev_new(NULL, ...): now all chardevs created using
qemu_chardev_new() are added to the /chardevs QOM container, and
the caller does not own a reference to the newly created object.
However, the code at char_file_test_internal() had not been
updated and was calling object_unref() on a chardev object it
didn't own.  This makes the chardev be destroyed, but leaves a
dangling pointer in the /chardev container children list, and
seems to be the cause of the following char_serial_test() crash:
  Unexpected error in object_property_try_add() at ../qom/object.c:1220: \
      attempt to add duplicate property 'serial-id' to object (type 'container')
  ERROR test-char - too few tests run (expected 38, got 9)
Update the code to use object_unparent() at the end of
char_file_test_internal(), to make sure the chardev will be
correctly removed from the QOM tree.
Fixes: 1e419ee68fa5 ("chardev: generate an internal id when none given")
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
 tests/test-char.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/test-char.c b/tests/test-char.c
index 953e0d1c1f..06102977b6 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -1298,7 +1298,7 @@ static void char_file_test_internal(Chardev *ext_chr, const char *filepath)
     g_assert(strncmp(contents, "hello!", 6) == 0);
 
     if (!ext_chr) {
-        object_unref(OBJECT(chr));
+        object_unparent(OBJECT(chr));
         g_unlink(out);
     }
     g_free(contents);
-- 
2.28.0
^ permalink raw reply related	[flat|nested] 10+ messages in thread
* [PATCH 2/2] qom: Assert that objects being destroyed have no parent
  2020-12-15 22:41 [PATCH 0/2] Fix test-char reference counting bug Eduardo Habkost
  2020-12-15 22:41 ` [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal() Eduardo Habkost
@ 2020-12-15 22:41 ` Eduardo Habkost
  2020-12-16  7:53   ` Marc-André Lureau
  2020-12-16 16:52   ` Alex Bennée
  1 sibling, 2 replies; 10+ messages in thread
From: Eduardo Habkost @ 2020-12-15 22:41 UTC (permalink / raw)
  To: Paolo Bonzini, Marc-André Lureau, qemu-devel
  Cc: Daniel P. Berrangé, Eduardo Habkost,
	Philippe Mathieu-Daudé
QOM reference counting bugs are often hard to detect, but there's
one kind of bug that's easier: if we are freeing an object but is
still attached to a parent, it means the reference count is wrong
(because the parent always hold a reference to their children).
Add an assertion to make sure we detect those cases.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
 qom/object.c | 1 +
 1 file changed, 1 insertion(+)
diff --git a/qom/object.c b/qom/object.c
index f2ae6e6b2a..5cfed6d7c6 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -685,6 +685,7 @@ static void object_finalize(void *data)
     object_deinit(obj, ti);
 
     g_assert(obj->ref == 0);
+    g_assert(obj->parent == NULL);
     if (obj->free) {
         obj->free(obj);
     }
-- 
2.28.0
^ permalink raw reply related	[flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal()
  2020-12-15 22:41 ` [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal() Eduardo Habkost
@ 2020-12-16  7:45   ` Marc-André Lureau
  2020-12-16 16:50   ` Alex Bennée
  1 sibling, 0 replies; 10+ messages in thread
From: Marc-André Lureau @ 2020-12-16  7:45 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Daniel P. Berrangé, qemu-devel,
	Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]
On Wed, Dec 16, 2020 at 2:41 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
> commit 1e419ee68fa5 ("chardev: generate an internal id when none
> given") changed the reference ownership semantics of
> qemu_chardev_new(NULL, ...): now all chardevs created using
> qemu_chardev_new() are added to the /chardevs QOM container, and
> the caller does not own a reference to the newly created object.
>
> However, the code at char_file_test_internal() had not been
> updated and was calling object_unref() on a chardev object it
> didn't own.  This makes the chardev be destroyed, but leaves a
> dangling pointer in the /chardev container children list, and
> seems to be the cause of the following char_serial_test() crash:
>
>   Unexpected error in object_property_try_add() at ../qom/object.c:1220: \
>       attempt to add duplicate property 'serial-id' to object (type
> 'container')
>   ERROR test-char - too few tests run (expected 38, got 9)
>
> Update the code to use object_unparent() at the end of
> char_file_test_internal(), to make sure the chardev will be
> correctly removed from the QOM tree.
>
> Fixes: 1e419ee68fa5 ("chardev: generate an internal id when none given")
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>
nice catch
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
>  tests/test-char.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/test-char.c b/tests/test-char.c
> index 953e0d1c1f..06102977b6 100644
> --- a/tests/test-char.c
> +++ b/tests/test-char.c
> @@ -1298,7 +1298,7 @@ static void char_file_test_internal(Chardev
> *ext_chr, const char *filepath)
>      g_assert(strncmp(contents, "hello!", 6) == 0);
>
>      if (!ext_chr) {
> -        object_unref(OBJECT(chr));
> +        object_unparent(OBJECT(chr));
>          g_unlink(out);
>      }
>      g_free(contents);
> --
> 2.28.0
>
>
[-- Attachment #2: Type: text/html, Size: 2706 bytes --]
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] qom: Assert that objects being destroyed have no parent
  2020-12-15 22:41 ` [PATCH 2/2] qom: Assert that objects being destroyed have no parent Eduardo Habkost
@ 2020-12-16  7:53   ` Marc-André Lureau
  2020-12-16  9:55     ` Daniel P. Berrangé
  2020-12-16 13:31     ` Paolo Bonzini
  2020-12-16 16:52   ` Alex Bennée
  1 sibling, 2 replies; 10+ messages in thread
From: Marc-André Lureau @ 2020-12-16  7:53 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Daniel P. Berrangé, qemu-devel,
	Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]
Hi
On Wed, Dec 16, 2020 at 2:41 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
> QOM reference counting bugs are often hard to detect, but there's
> one kind of bug that's easier: if we are freeing an object but is
> still attached to a parent, it means the reference count is wrong
> (because the parent always hold a reference to their children).
>
> Add an assertion to make sure we detect those cases.
>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>
On the principle, I fully agree. But the risk is high to introduce
regression if objects are manipulated in strange ways.
I remember I wanted object_unref() to automatically remove itself from the
parent when the last ref is dropped. I think there were similar concerns.
Maybe with --enable-qom-debug ? (removing the -cast)
---
>  qom/object.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/qom/object.c b/qom/object.c
> index f2ae6e6b2a..5cfed6d7c6 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -685,6 +685,7 @@ static void object_finalize(void *data)
>      object_deinit(obj, ti);
>
>      g_assert(obj->ref == 0);
> +    g_assert(obj->parent == NULL);
>      if (obj->free) {
>          obj->free(obj);
>      }
> --
> 2.28.0
>
>
[-- Attachment #2: Type: text/html, Size: 2039 bytes --]
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] qom: Assert that objects being destroyed have no parent
  2020-12-16  7:53   ` Marc-André Lureau
@ 2020-12-16  9:55     ` Daniel P. Berrangé
  2020-12-16 13:31     ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2020-12-16  9:55 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé, Eduardo Habkost,
	qemu-devel
On Wed, Dec 16, 2020 at 11:53:06AM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Dec 16, 2020 at 2:41 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > QOM reference counting bugs are often hard to detect, but there's
> > one kind of bug that's easier: if we are freeing an object but is
> > still attached to a parent, it means the reference count is wrong
> > (because the parent always hold a reference to their children).
> >
> > Add an assertion to make sure we detect those cases.
> >
> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> >
> 
> On the principle, I fully agree. But the risk is high to introduce
> regression if objects are manipulated in strange ways.
Isn't the point that we're broken already. We have a QOM instance
in the tree which has a zero reference count and has been freed.
As soon as something touches that object in the tree, we're liable
to crash & burn touching free'd memory. So it seems the choices are
between crash fast where we see the problem, or crash eventually
at a place where we can't easily trace back to the root cause.
> I remember I wanted object_unref() to automatically remove itself from the
> parent when the last ref is dropped. I think there were similar concerns.
Automatically removing itself would be hiding the bug in whatever
code has mistakenly removed a reference it didn't own.
> 
> Maybe with --enable-qom-debug ? (removing the -cast)
> 
> ---
> >  qom/object.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/qom/object.c b/qom/object.c
> > index f2ae6e6b2a..5cfed6d7c6 100644
> > --- a/qom/object.c
> > +++ b/qom/object.c
> > @@ -685,6 +685,7 @@ static void object_finalize(void *data)
> >      object_deinit(obj, ti);
> >
> >      g_assert(obj->ref == 0);
> > +    g_assert(obj->parent == NULL);
> >      if (obj->free) {
> >          obj->free(obj);
> >      }
> > --
> > 2.28.0
> >
> >
Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] qom: Assert that objects being destroyed have no parent
  2020-12-16  7:53   ` Marc-André Lureau
  2020-12-16  9:55     ` Daniel P. Berrangé
@ 2020-12-16 13:31     ` Paolo Bonzini
  2020-12-16 16:15       ` Alex Bennée
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2020-12-16 13:31 UTC (permalink / raw)
  To: Marc-André Lureau, Eduardo Habkost
  Cc: Daniel P. Berrangé, qemu-devel, Philippe Mathieu-Daudé
On 16/12/20 08:53, Marc-André Lureau wrote:
> 
> On the principle, I fully agree. But the risk is high to introduce 
> regression if objects are manipulated in strange ways.
> 
> I remember I wanted object_unref() to automatically remove itself from 
> the parent when the last ref is dropped. I think there were similar 
> concerns.
unref and unparent are two very different operations; the former means 
*I* am done with this object, the latter means *QEMU* is done with this 
object (even though there may be a few reference held, e.g. on the call 
stack or by RCU).  Since object_unparent operates on global state, you 
can even call object_unparent if you don't own yourself a reference to 
the object and just got the pointer from the caller.
While unref is a "mechanical" operation of dropping a reference and 
possibly freeing the object, unparent is an active operation that 
includes for example dropping reference cycles or in general detaching 
from other places that are known to hold references to this object.
This is not a concept that is specific to QEMU, I think I read somewhere 
that LibreOffice's UI library does something similar, calling it "dispose".
Paolo
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] qom: Assert that objects being destroyed have no parent
  2020-12-16 13:31     ` Paolo Bonzini
@ 2020-12-16 16:15       ` Alex Bennée
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Bennée @ 2020-12-16 16:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Marc-André Lureau, Daniel P. Berrangé,
	Eduardo Habkost, Philippe Mathieu-Daudé
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 16/12/20 08:53, Marc-André Lureau wrote:
>> 
>> On the principle, I fully agree. But the risk is high to introduce 
>> regression if objects are manipulated in strange ways.
>> 
>> I remember I wanted object_unref() to automatically remove itself from 
>> the parent when the last ref is dropped. I think there were similar 
>> concerns.
>
> unref and unparent are two very different operations; the former means 
> *I* am done with this object, the latter means *QEMU* is done with this 
> object (even though there may be a few reference held, e.g. on the call 
> stack or by RCU).  Since object_unparent operates on global state, you 
> can even call object_unparent if you don't own yourself a reference to 
> the object and just got the pointer from the caller.
>
> While unref is a "mechanical" operation of dropping a reference and 
> possibly freeing the object, unparent is an active operation that 
> includes for example dropping reference cycles or in general detaching 
> from other places that are known to hold references to this object.
This all sounds like good material for a QOM object lifetime section of
docs/devel/qom.rst
>
> This is not a concept that is specific to QEMU, I think I read somewhere 
> that LibreOffice's UI library does something similar, calling it "dispose".
>
> Paolo
-- 
Alex Bennée
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal()
  2020-12-15 22:41 ` [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal() Eduardo Habkost
  2020-12-16  7:45   ` Marc-André Lureau
@ 2020-12-16 16:50   ` Alex Bennée
  1 sibling, 0 replies; 10+ messages in thread
From: Alex Bennée @ 2020-12-16 16:50 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, Paolo Bonzini, Daniel P. Berrangé,
	Philippe Mathieu-Daudé, Marc-André Lureau
Eduardo Habkost <ehabkost@redhat.com> writes:
> commit 1e419ee68fa5 ("chardev: generate an internal id when none
> given") changed the reference ownership semantics of
> qemu_chardev_new(NULL, ...): now all chardevs created using
> qemu_chardev_new() are added to the /chardevs QOM container, and
> the caller does not own a reference to the newly created object.
>
> However, the code at char_file_test_internal() had not been
> updated and was calling object_unref() on a chardev object it
> didn't own.  This makes the chardev be destroyed, but leaves a
> dangling pointer in the /chardev container children list, and
> seems to be the cause of the following char_serial_test() crash:
>
>   Unexpected error in object_property_try_add() at ../qom/object.c:1220: \
>       attempt to add duplicate property 'serial-id' to object (type 'container')
>   ERROR test-char - too few tests run (expected 38, got 9)
>
> Update the code to use object_unparent() at the end of
> char_file_test_internal(), to make sure the chardev will be
> correctly removed from the QOM tree.
>
> Fixes: 1e419ee68fa5 ("chardev: generate an internal id when none given")
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Previously I could get a failure within a 100 runs so I think the soak
test is good ;-)
  Results summary:
  0: 1000 times (100.00%), avg time 2.256 (0.00 varience/0.00 deviation)
  Ran command 1000 times, 1000 passes
> ---
>  tests/test-char.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/test-char.c b/tests/test-char.c
> index 953e0d1c1f..06102977b6 100644
> --- a/tests/test-char.c
> +++ b/tests/test-char.c
> @@ -1298,7 +1298,7 @@ static void char_file_test_internal(Chardev *ext_chr, const char *filepath)
>      g_assert(strncmp(contents, "hello!", 6) == 0);
>  
>      if (!ext_chr) {
> -        object_unref(OBJECT(chr));
> +        object_unparent(OBJECT(chr));
>          g_unlink(out);
>      }
>      g_free(contents);
-- 
Alex Bennée
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] qom: Assert that objects being destroyed have no parent
  2020-12-15 22:41 ` [PATCH 2/2] qom: Assert that objects being destroyed have no parent Eduardo Habkost
  2020-12-16  7:53   ` Marc-André Lureau
@ 2020-12-16 16:52   ` Alex Bennée
  1 sibling, 0 replies; 10+ messages in thread
From: Alex Bennée @ 2020-12-16 16:52 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, Paolo Bonzini, Daniel P. Berrangé,
	Philippe Mathieu-Daudé, Marc-André Lureau
Eduardo Habkost <ehabkost@redhat.com> writes:
> QOM reference counting bugs are often hard to detect, but there's
> one kind of bug that's easier: if we are freeing an object but is
> still attached to a parent, it means the reference count is wrong
> (because the parent always hold a reference to their children).
>
> Add an assertion to make sure we detect those cases.
>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
I'm happy with an assert and crash early approach if the alternative it
scratching your head when things crash in weird ways. I'll of course
defer to the maintainer but from me:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  qom/object.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/qom/object.c b/qom/object.c
> index f2ae6e6b2a..5cfed6d7c6 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -685,6 +685,7 @@ static void object_finalize(void *data)
>      object_deinit(obj, ti);
>  
>      g_assert(obj->ref == 0);
> +    g_assert(obj->parent == NULL);
>      if (obj->free) {
>          obj->free(obj);
>      }
-- 
Alex Bennée
^ permalink raw reply	[flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-12-16 17:15 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-15 22:41 [PATCH 0/2] Fix test-char reference counting bug Eduardo Habkost
2020-12-15 22:41 ` [PATCH 1/2] test-char: Destroy chardev correctly at char_file_test_internal() Eduardo Habkost
2020-12-16  7:45   ` Marc-André Lureau
2020-12-16 16:50   ` Alex Bennée
2020-12-15 22:41 ` [PATCH 2/2] qom: Assert that objects being destroyed have no parent Eduardo Habkost
2020-12-16  7:53   ` Marc-André Lureau
2020-12-16  9:55     ` Daniel P. Berrangé
2020-12-16 13:31     ` Paolo Bonzini
2020-12-16 16:15       ` Alex Bennée
2020-12-16 16:52   ` Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).