Re: Question about (or bug in?) the kobject implementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Question about (or bug in?) the kobject implementation
@ 2004-02-28  4:02 Alan Stern
  2004-02-28  7:38 ` Michael Frank
  2004-03-03 21:44 ` Greg KH
  0 siblings, 2 replies; 10+ messages in thread
From: Alan Stern @ 2004-02-28  4:02 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

On Fri, 27 Feb 2004, Greg KH wrote:

> Seriously, once kobject_del() is called, you can't safely call
> kobject_get() anymore on that object.
> 
> If you can think of a way we can implement this in the code to prevent
> people from doing this, please send a patch.  We've been getting by
> without such a "safeguard" so far...

The problem is unsolvable.  Let me explain...

We're actually discussing two different questions here.

    A.	Is it okay to call kobject_add() after calling kobject_del() -- 
	this was my original question.

    B.	Can we prevent people from doing kobject_get() after the kobject's
	refcount has dropped to 0?

Your earlier response amounted to saying that A isn't good because it
might cause B to happen; once kobject_del() has returned it's possible
that the refcount is 0.  But this begs the real question.  Suppose we
_know_ that the refcount isn't 0, say because earlier we did an unmatched
kobject_get().  Under those circumstances should it be legal to call
kobject_add() after calling kobject_del()?  This is question A'.

Question B can be divided into two subcases.

    B1.	The code calling kobject_get() knows that the kobject hasn't been
	deallocated yet.  For example, it might be the cleanup routine 
	itself calling kobject_get().

Such a thing is legal in Java, but you probably don't want to sanction 
such pranks in the driver model.  So let's forget about B1.  Your big 
concern really seems to be:

    B2.	Everything else; the code calling kobject_get() doesn't know 
	whether the kobject has been deallocated.

This really is a programming error.  It means that kobject_get() has been 
passed a possibly stale pointer.  Ipso facto, the call to kobject_put() 
that decremented the refcount to 0 was made too early, while there were 
still active pointers to the kobject floating around.

It's impossible to prevent people from making programming errors or
dereferencing stale pointers.  It doesn't matter _what_ code you put in
kobject_get() -- it will crash when given a pointer to a kobject whose
cleanup routine has already run and deallocated the storage.

The best you can do is call people's attention to such errors and fail the
operation gracefully whenever possible (i.e., when it doesn't generate an
addressing error).  My personal choice would be to change kobject_get() as
follows:

struct kobject * kobject_get(struct kobject * kobj)
{
	if (kobj) {
		if (atomic_read(&kobj->refcount) == 0) {
			WARN_ON(1);
			return NULL;
		}
		atomic_inc(&kobj->refcount);
	}
	return kobj;
}

I think that's about the best you can do.

And what's the answer to A'?

Alan Stern

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-02-28  4:02 Question about (or bug in?) the kobject implementation Alan Stern
@ 2004-02-28  7:38 ` Michael Frank
  2004-02-28 17:09   ` Alan Stern
  2004-03-03 21:44 ` Greg KH
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Frank @ 2004-02-28  7:38 UTC (permalink / raw)
  To: Alan Stern, Greg KH; +Cc: linux-kernel

On Fri, 27 Feb 2004 23:02:34 -0500 (EST), Alan Stern <stern@rowland.harvard.edu> wrote:

> On Fri, 27 Feb 2004, Greg KH wrote:
>
>> Seriously, once kobject_del() is called, you can't safely call
>> kobject_get() anymore on that object.
>>
>> If you can think of a way we can implement this in the code to prevent
>> people from doing this, please send a patch.  We've been getting by
>> without such a "safeguard" so far...
>
> The problem is unsolvable.  Let me explain...
>
> We're actually discussing two different questions here.
>
>     A.	Is it okay to call kobject_add() after calling kobject_del() --
> 	this was my original question.
>
>     B.	Can we prevent people from doing kobject_get() after the kobject's
> 	refcount has dropped to 0?
>
> Your earlier response amounted to saying that A isn't good because it
> might cause B to happen; once kobject_del() has returned it's possible
> that the refcount is 0.  But this begs the real question.  Suppose we
> _know_ that the refcount isn't 0, say because earlier we did an unmatched
> kobject_get().  Under those circumstances should it be legal to call
> kobject_add() after calling kobject_del()?  This is question A'.
>
>
> Question B can be divided into two subcases.
>
>     B1.	The code calling kobject_get() knows that the kobject hasn't been
> 	deallocated yet.  For example, it might be the cleanup routine
> 	itself calling kobject_get().
>
> Such a thing is legal in Java, but you probably don't want to sanction
> such pranks in the driver model.  So let's forget about B1.  Your big
> concern really seems to be:
>
>     B2.	Everything else; the code calling kobject_get() doesn't know
> 	whether the kobject has been deallocated.
>
> This really is a programming error.  It means that kobject_get() has been
> passed a possibly stale pointer.  Ipso facto, the call to kobject_put()
> that decremented the refcount to 0 was made too early, while there were
> still active pointers to the kobject floating around.
>
> It's impossible to prevent people from making programming errors or
> dereferencing stale pointers.  It doesn't matter _what_ code you put in
> kobject_get() -- it will crash when given a pointer to a kobject whose
> cleanup routine has already run and deallocated the storage.
>
> The best you can do is call people's attention to such errors and fail the
> operation gracefully whenever possible (i.e., when it doesn't generate an
> addressing error).  My personal choice would be to change kobject_get() as
> follows:
>
> struct kobject * kobject_get(struct kobject * kobj)
> {
> 	if (kobj) {
> 		if (atomic_read(&kobj->refcount) == 0) {
> 			WARN_ON(1);
> 			return NULL;
> 		}
> 		atomic_inc(&kobj->refcount);
> 	}
> 	return kobj;
> }
>
> I think that's about the best you can do.

This is too ugly :-(

> And what's the answer to A'?

The weakness is really in that the refcount is stored dynamically.

What about a new struct to hold the pointer to the kobj and it's refcount:

struct kobjectref {
	struct kobject *kobj;
	int refcount;
};
...

struct kobjectref rkobj;

Using refkobj eliminates all problems as the pointer to the refcount can't
be invalid.

refcount would be removed from struct kobject.

Regards
Michael

List of files in 2.6.3 containing kobject, so it is not that bad to change it.

./arch/i386/kernel/edd.c
./arch/ppc64/kernel/vio.c
./Documentation/firmware_class/firmware_sample_firmware_class.c
./drivers/acorn/block/fd1772.c
./drivers/acpi/scan.c
./drivers/base/base.h
./drivers/base/bus.c
./drivers/base/class.c
./drivers/base/core.c
./drivers/base/cpu.c
./drivers/base/driver.c
./drivers/base/firmware.c
./drivers/base/firmware_class.c
./drivers/base/map.c
./drivers/base/sys.c
./drivers/block/amiflop.c
./drivers/block/as-iosched.c
./drivers/block/ataflop.c
./drivers/block/deadline-iosched.c
./drivers/block/elevator.c
./drivers/block/floppy98.c
./drivers/block/floppy.c
./drivers/block/genhd.c
./drivers/block/ll_rw_blk.c
./drivers/block/z2ram.c
./drivers/char/tty_io.c
./drivers/cpufreq/cpufreq.c
./drivers/i2c/chips/eeprom.c
./drivers/ide/ide-probe.c
./drivers/ieee1394/amdtp.c
./drivers/ieee1394/dv1394.c
./drivers/ieee1394/raw1394.c
./drivers/ieee1394/video1394.c
./drivers/md/md.c
./drivers/pci/hotplug/acpiphp.h
./drivers/pci/hotplug/pci_hotplug_core.c
./drivers/pci/hotplug/pci_hotplug.h
./drivers/pci/pci-driver.c
./drivers/pci/pci-sysfs.c
./drivers/scsi/qla2xxx/qla_os.c
./drivers/scsi/sd.c
./drivers/scsi/sg.c
./drivers/scsi/sim710.c
./drivers/scsi/st.c
./drivers/usb/serial/usb-serial.c
./drivers/usb/serial/usb-serial.h
./drivers/video/aty/radeon_base.c
./drivers/zorro/zorro-sysfs.c
./fs/block_dev.c
./fs/char_dev.c
./fs/partitions/check.c
./fs/sysfs/bin.c
./fs/sysfs/dir.c
./fs/sysfs/file.c
./fs/sysfs/group.c
./fs/sysfs/symlink.c
./fs/sysfs/sysfs.h
./include/acpi/acpi_bus.h
./include/linux/blkdev.h
./include/linux/cdev.h
./include/linux/cpufreq.h
./include/linux/device.h
./include/linux/elevator.h
./include/linux/fs.h
./include/linux/genhd.h
./include/linux/kobject.h
./include/linux/kobj_map.h
./include/linux/sysdev.h
./include/linux/sysfs.h
./kernel/power/main.c
./lib/kobject.c
./net/core/net-sysfs.c
./scripts/kconfig/gconf.c

--end

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-02-28  7:38 ` Michael Frank
@ 2004-02-28 17:09   ` Alan Stern
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Stern @ 2004-02-28 17:09 UTC (permalink / raw)
  To: Michael Frank; +Cc: Greg KH, linux-kernel

On Sat, 28 Feb 2004, Michael Frank wrote:

> On Fri, 27 Feb 2004 23:02:34 -0500 (EST), Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > This really is a programming error.  It means that kobject_get() has been
> > passed a possibly stale pointer.  Ipso facto, the call to kobject_put()
> > that decremented the refcount to 0 was made too early, while there were
> > still active pointers to the kobject floating around.
> >
> > It's impossible to prevent people from making programming errors or
> > dereferencing stale pointers.  It doesn't matter _what_ code you put in
> > kobject_get() -- it will crash when given a pointer to a kobject whose
> > cleanup routine has already run and deallocated the storage.
> >
> > The best you can do is call people's attention to such errors and fail the
> > operation gracefully whenever possible (i.e., when it doesn't generate an
> > addressing error).  My personal choice would be to change kobject_get() as
> > follows:
> >
> > struct kobject * kobject_get(struct kobject * kobj)
> > {
> > 	if (kobj) {
> > 		if (atomic_read(&kobj->refcount) == 0) {
> > 			WARN_ON(1);
> > 			return NULL;
> > 		}
> > 		atomic_inc(&kobj->refcount);
> > 	}
> > 	return kobj;
> > }
> >
> > I think that's about the best you can do.
> 
> This is too ugly :-(

It's cleaner than your proposal below.  It's not so different from the
code that's there now.  And it does what that code _ought_ to do, namely, 
return a NULL pointer when the kobject is no longer available.

> > And what's the answer to A'?
> 
> The weakness is really in that the refcount is stored dynamically.
> 
> What about a new struct to hold the pointer to the kobj and it's refcount:
> 
> struct kobjectref {
> 	struct kobject *kobj;
> 	int refcount;
> };
> ...
> 
> struct kobjectref rkobj;

Since kobjects are allocated dynamically, you will have to allocate 
kobjectrefs dynamically as well.

> Using refkobj eliminates all problems as the pointer to the refcount can't
> be invalid.

Only until you deallocate the kobjectref.  And when you do, you then face
exactly the same set of problems: pointers to the kobjectref will become
stale.  If you don't ever deallocate kobjectrefs then you have a memory
leak.  So this proposal doesn't solve anything, it just adds an extra
layer of indirection.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-02-28  4:02 Question about (or bug in?) the kobject implementation Alan Stern
  2004-02-28  7:38 ` Michael Frank
@ 2004-03-03 21:44 ` Greg KH
  2004-03-03 22:11   ` Alan Stern
  1 sibling, 1 reply; 10+ messages in thread
From: Greg KH @ 2004-03-03 21:44 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-kernel

On Fri, Feb 27, 2004 at 11:02:34PM -0500, Alan Stern wrote:
> On Fri, 27 Feb 2004, Greg KH wrote:
> 
> > Seriously, once kobject_del() is called, you can't safely call
> > kobject_get() anymore on that object.
> > 
> > If you can think of a way we can implement this in the code to prevent
> > people from doing this, please send a patch.  We've been getting by
> > without such a "safeguard" so far...
> 
> The problem is unsolvable.  Let me explain...
> 
> We're actually discussing two different questions here.
> 
>     A.	Is it okay to call kobject_add() after calling kobject_del() -- 
> 	this was my original question.

No, this is not ok.  It might happen to work, but it is not valid.

>     B.	Can we prevent people from doing kobject_get() after the kobject's
> 	refcount has dropped to 0?

By saying, "you can not call kobject_get() on a object that you know is
released with kobject_del()".  If you already have a valid reference,
you can always call kobject_get().  But once you call kobject_del() that
pointer you passed should not be passed to kobject_get() as it may now
be gone.

Does that help?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-03-03 21:44 ` Greg KH
@ 2004-03-03 22:11   ` Alan Stern
  2004-03-03 22:16     ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2004-03-03 22:11 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

On Wed, 3 Mar 2004, Greg KH wrote:

> On Fri, Feb 27, 2004 at 11:02:34PM -0500, Alan Stern wrote:
> > We're actually discussing two different questions here.
> > 
> >     A.	Is it okay to call kobject_add() after calling kobject_del() -- 
> > 	this was my original question.
> 
> No, this is not ok.  It might happen to work, but it is not valid.

I want to understand _why_ it is not valid.  Can you explain please?

>From what you said earlier, I got the impression that calling _add() after 
_del() is illegal because it runs the risk that the refcount may be 0 and 
the object may be gone.  But if you have a separate valid reference, that 
can't happen.  Would it be legal then, or is there more to it?


> >     B.	Can we prevent people from doing kobject_get() after the kobject's
> > 	refcount has dropped to 0?
> 
> By saying, "you can not call kobject_get() on a object that you know is
> released with kobject_del()".  If you already have a valid reference,
> you can always call kobject_get().  But once you call kobject_del() that
> pointer you passed should not be passed to kobject_get() as it may now
> be gone.
> 
> Does that help?
> 
> thanks,
> 
> greg k-h

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-03-03 22:11   ` Alan Stern
@ 2004-03-03 22:16     ` Greg KH
  0 siblings, 0 replies; 10+ messages in thread
From: Greg KH @ 2004-03-03 22:16 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-kernel

On Wed, Mar 03, 2004 at 05:11:02PM -0500, Alan Stern wrote:
> On Wed, 3 Mar 2004, Greg KH wrote:
> 
> > On Fri, Feb 27, 2004 at 11:02:34PM -0500, Alan Stern wrote:
> > > We're actually discussing two different questions here.
> > > 
> > >     A.	Is it okay to call kobject_add() after calling kobject_del() -- 
> > > 	this was my original question.
> > 
> > No, this is not ok.  It might happen to work, but it is not valid.
> 
> I want to understand _why_ it is not valid.  Can you explain please?
> 
> From what you said earlier, I got the impression that calling _add() after 
> _del() is illegal because it runs the risk that the refcount may be 0 and 
> the object may be gone.

Yes, that is the risk.

> But if you have a separate valid reference, that can't happen.  Would
> it be legal then, or is there more to it?

Hm, it probably would work, hence the current working USB code :)
But I really don't want to "special case" anything here.  So it's easier
to say, "just don't do that".

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about (or bug in?) the kobject implementation
@ 2004-02-25 15:05 Alan Stern
  2004-02-27 19:48 ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2004-02-25 15:05 UTC (permalink / raw)
  To: Kernel development list

Is it supposed to be legal to repeatedly call kobject_add() and 
kobject_del() for the same kobject?  That is, is

	kobject_add(&kobj);
	...
	kobject_del(&kobj);
	...
	kobject_add(&kobj);
	...
	kobject_del(&kobj);

supposed to work?  The API doesn't forbid it, and there's no apparent 
reason why it should be illegal.

I ask because the current implementation is set up in such a way that
doing this will mess up the reference counting for the kobject's parent.  
The problem is that the parent's refcount is increased each time
kobject_add() is called, but it is only decremented in kobject_cleanup(),
not in kobject_del().  Thus, the statements above will leave the parent's
refcount permanently increased by 1, potentially causing a memory leak.

Why would anyone want to do this, you ask?  Well the USB subsystem does it 
already.  Each USB device can have several configurations, only one of 
which is active at any time.  Corresponding to each configuration is a set 
of struct devices, and they (together with their embedded kobjects) are 
allocated and initialized when the USB device is first detected.  The 
struct devices are add()'ed and del()'ed as configurations are activated 
and deactivated, leading to just the sort of call sequence shown above.

Alan Stern

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-02-25 15:05 Alan Stern
@ 2004-02-27 19:48 ` Greg KH
  2004-02-27 20:06   ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Greg KH @ 2004-02-27 19:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Kernel development list

On Wed, Feb 25, 2004 at 10:05:37AM -0500, Alan Stern wrote:
> Is it supposed to be legal to repeatedly call kobject_add() and 
> kobject_del() for the same kobject?  That is, is
> 
> 	kobject_add(&kobj);
> 	...
> 	kobject_del(&kobj);
> 	...
> 	kobject_add(&kobj);
> 	...
> 	kobject_del(&kobj);
> 
> supposed to work?

No.

> The API doesn't forbid it, and there's no apparent reason why it
> should be illegal.

We prevent race conditions in kobject_put() by saying "Don't do that!"
:)

Seriously, once kobject_del() is called, you can't safely call
kobject_get() anymore on that object.

If you can think of a way we can implement this in the code to prevent
people from doing this, please send a patch.  We've been getting by
without such a "safeguard" so far...

> Why would anyone want to do this, you ask?  Well the USB subsystem does it 
> already.  Each USB device can have several configurations, only one of 
> which is active at any time.  Corresponding to each configuration is a set 
> of struct devices, and they (together with their embedded kobjects) are 
> allocated and initialized when the USB device is first detected.  The 
> struct devices are add()'ed and del()'ed as configurations are activated 
> and deactivated, leading to just the sort of call sequence shown above.

Then we need to fix this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-02-27 19:48 ` Greg KH
@ 2004-02-27 20:06   ` Alan Stern
  2004-02-27 20:17     ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2004-02-27 20:06 UTC (permalink / raw)
  To: Greg KH; +Cc: Kernel development list

On Fri, 27 Feb 2004, Greg KH wrote:

> On Wed, Feb 25, 2004 at 10:05:37AM -0500, Alan Stern wrote:
> > Is it supposed to be legal to repeatedly call kobject_add() and 
> > kobject_del() for the same kobject?  That is, is
> > 
> > 	kobject_add(&kobj);
> > 	...
> > 	kobject_del(&kobj);
> > 	...
> > 	kobject_add(&kobj);
> > 	...
> > 	kobject_del(&kobj);
> > 
> > supposed to work?
> 
> No.
> 
> > The API doesn't forbid it, and there's no apparent reason why it
> > should be illegal.
> 
> We prevent race conditions in kobject_put() by saying "Don't do that!"
> :)
> 
> Seriously, once kobject_del() is called, you can't safely call
> kobject_get() anymore on that object.

Are you worried about the possibility of the refcount dropping to 0 and 
the cleanup starting but then kobject_get() increasing the refcount again?
Or is there some other problem?

> If you can think of a way we can implement this in the code to prevent
> people from doing this, please send a patch.  We've been getting by
> without such a "safeguard" so far...

Maybe I could if I knew more clearly the exact issue in question.

> > Why would anyone want to do this, you ask?  Well the USB subsystem does it 
> > already.  Each USB device can have several configurations, only one of 
> > which is active at any time.  Corresponding to each configuration is a set 
> > of struct devices, and they (together with their embedded kobjects) are 
> > allocated and initialized when the USB device is first detected.  The 
> > struct devices are add()'ed and del()'ed as configurations are activated 
> > and deactivated, leading to just the sort of call sequence shown above.
> 
> Then we need to fix this.

As it happens, I have a patch to do that. :-)  I'll send it in a separate 
message.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about (or bug in?) the kobject implementation
  2004-02-27 20:06   ` Alan Stern
@ 2004-02-27 20:17     ` Greg KH
  0 siblings, 0 replies; 10+ messages in thread
From: Greg KH @ 2004-02-27 20:17 UTC (permalink / raw)
  To: Alan Stern; +Cc: Kernel development list

On Fri, Feb 27, 2004 at 03:06:32PM -0500, Alan Stern wrote:
> On Fri, 27 Feb 2004, Greg KH wrote:
> > Seriously, once kobject_del() is called, you can't safely call
> > kobject_get() anymore on that object.
> 
> Are you worried about the possibility of the refcount dropping to 0 and 
> the cleanup starting but then kobject_get() increasing the refcount again?

Exactly, that is where the problem could happen.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-03-03 22:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-28  4:02 Question about (or bug in?) the kobject implementation Alan Stern
2004-02-28  7:38 ` Michael Frank
2004-02-28 17:09   ` Alan Stern
2004-03-03 21:44 ` Greg KH
2004-03-03 22:11   ` Alan Stern
2004-03-03 22:16     ` Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2004-02-25 15:05 Alan Stern
2004-02-27 19:48 ` Greg KH
2004-02-27 20:06   ` Alan Stern
2004-02-27 20:17     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox