public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Fixing the main programmer thinko with the device model
@ 2008-03-24 15:39 James Bottomley
  2008-03-24 17:58 ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2008-03-24 15:39 UTC (permalink / raw)
  To: Greg KH, Kay Sievers, Van De Ven, Arjan, Al Viro; +Cc: linux-kernel

Having just spent the weekend tracking two separate driver model
problems through SCSI, I believe the biggest trap everyone falls into
with the driver model (well, OK, at least with SCSI) is to try to defer
a callback to the device ->release routine without realising that
somewhere along the callback path we're going to drop a reference to the
device.

You can do this very inadvertently:  One developer didn't realise
bsg_unregister_queue() released a ref, and another didn't realise that
transport_destroy_device() held one.

The real problem is that it's fantastically easy to do this ... it's not
at all clear which of the cleanup routines actually release references
unless you dig down into them and it's very difficult to detect because
all that happens is that devices don't get released when they should,
which isn't something we ever warn about.

So, what I was wondering is:  is there any way we can reliably detect
and warn when someone does this.  Could something like lockdep (although
I can't really see how dynamic detection will work because the device
->release routine is never called) or a static code analysis tool like
sparse be modified to detect the unreleaseable references?

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fixing the main programmer thinko with the device model
  2008-03-24 15:39 Fixing the main programmer thinko with the device model James Bottomley
@ 2008-03-24 17:58 ` Greg KH
  2008-03-24 18:08   ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2008-03-24 17:58 UTC (permalink / raw)
  To: James Bottomley; +Cc: Kay Sievers, Van De Ven, Arjan, Al Viro, linux-kernel

On Mon, Mar 24, 2008 at 10:39:48AM -0500, James Bottomley wrote:
> Having just spent the weekend tracking two separate driver model
> problems through SCSI, I believe the biggest trap everyone falls into
> with the driver model (well, OK, at least with SCSI) is to try to defer
> a callback to the device ->release routine without realising that
> somewhere along the callback path we're going to drop a reference to the
> device.
> 
> You can do this very inadvertently:  One developer didn't realise
> bsg_unregister_queue() released a ref, and another didn't realise that
> transport_destroy_device() held one.
> 
> The real problem is that it's fantastically easy to do this ... it's not
> at all clear which of the cleanup routines actually release references
> unless you dig down into them and it's very difficult to detect because
> all that happens is that devices don't get released when they should,
> which isn't something we ever warn about.

Sounds like a documentation issue for how the scsi layer is using the
driver model more than anything else.  None of the other busses seem to
have these kinds of issues that I can see, is it just because of your
complex usage model?

> So, what I was wondering is:  is there any way we can reliably detect
> and warn when someone does this.

Warn that a device did not get released when the programmer thought it
should yet they forgot to call the correct function to have that happen?
That seems a bit difficult :)

Also note that the scsi layer usage of multiple refcounted objects
within the same structure might be causing some of these issues, that's
a bug in how the scsi layer has implmented things much more so than how
the driver core is implemented, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fixing the main programmer thinko with the device model
  2008-03-24 17:58 ` Greg KH
@ 2008-03-24 18:08   ` James Bottomley
  2008-03-25  9:57     ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2008-03-24 18:08 UTC (permalink / raw)
  To: Greg KH; +Cc: Kay Sievers, Van De Ven, Arjan, Al Viro, linux-kernel

On Mon, 2008-03-24 at 10:58 -0700, Greg KH wrote:
> On Mon, Mar 24, 2008 at 10:39:48AM -0500, James Bottomley wrote:
> > Having just spent the weekend tracking two separate driver model
> > problems through SCSI, I believe the biggest trap everyone falls into
> > with the driver model (well, OK, at least with SCSI) is to try to defer
> > a callback to the device ->release routine without realising that
> > somewhere along the callback path we're going to drop a reference to the
> > device.
> > 
> > You can do this very inadvertently:  One developer didn't realise
> > bsg_unregister_queue() released a ref, and another didn't realise that
> > transport_destroy_device() held one.
> > 
> > The real problem is that it's fantastically easy to do this ... it's not
> > at all clear which of the cleanup routines actually release references
> > unless you dig down into them and it's very difficult to detect because
> > all that happens is that devices don't get released when they should,
> > which isn't something we ever warn about.
> 
> Sounds like a documentation issue for how the scsi layer is using the
> driver model more than anything else.  None of the other busses seem to
> have these kinds of issues that I can see, is it just because of your
> complex usage model?

Possibly ... although the bsg routines aren't actually SCSI ones,
they're block.  The transport class destroy function also actually
picked up an implicit reference because of the class device rework you
just did.

> > So, what I was wondering is:  is there any way we can reliably detect
> > and warn when someone does this.
> 
> Warn that a device did not get released when the programmer thought it
> should yet they forgot to call the correct function to have that happen?
> That seems a bit difficult :)

Yes, that's why I think it has to be discovered via static analysis.

> Also note that the scsi layer usage of multiple refcounted objects
> within the same structure might be causing some of these issues, that's
> a bug in how the scsi layer has implmented things much more so than how
> the driver core is implemented, right?

That's true, but irrelevant (and also soon to be untrue if we get rid of
the scsi_device class as you and Kay keep requesting).  The two calls
release references on the actual embedded generic device, it's nothing
to do with entangled lifetime rules.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fixing the main programmer thinko with the device model
  2008-03-24 18:08   ` James Bottomley
@ 2008-03-25  9:57     ` Andi Kleen
  2008-03-26  4:16       ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2008-03-25  9:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: Greg KH, Kay Sievers, Van De Ven, Arjan, Al Viro, linux-kernel

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> That's true, but irrelevant (and also soon to be untrue if we get rid of
> the scsi_device class as you and Kay keep requesting).  The two calls
> release references on the actual embedded generic device, it's nothing
> to do with entangled lifetime rules.

<heretic thought>Has anybody ever considered just doing away with 
the problematic and bug prone and tricky reference counts for kobjects 
and switch to a simple garbage collector for them?

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fixing the main programmer thinko with the device model
  2008-03-25  9:57     ` Andi Kleen
@ 2008-03-26  4:16       ` Greg KH
  2008-03-26  5:07         ` Kay Sievers
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2008-03-26  4:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: James Bottomley, Kay Sievers, Van De Ven, Arjan, Al Viro,
	linux-kernel

On Tue, Mar 25, 2008 at 10:57:32AM +0100, Andi Kleen wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > 
> > That's true, but irrelevant (and also soon to be untrue if we get rid of
> > the scsi_device class as you and Kay keep requesting).  The two calls
> > release references on the actual embedded generic device, it's nothing
> > to do with entangled lifetime rules.
> 
> <heretic thought>Has anybody ever considered just doing away with 
> the problematic and bug prone and tricky reference counts for kobjects 
> and switch to a simple garbage collector for them?

Sure, I have no objection to that.  It's just that the reference count
"issue" really doesn't seem to be one on sanely designed busses :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fixing the main programmer thinko with the device model
  2008-03-26  4:16       ` Greg KH
@ 2008-03-26  5:07         ` Kay Sievers
  2008-03-26  5:23           ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Kay Sievers @ 2008-03-26  5:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Andi Kleen, James Bottomley, Van De Ven, Arjan, Al Viro,
	linux-kernel

On Tue, 2008-03-25 at 21:16 -0700, Greg KH wrote:
> On Tue, Mar 25, 2008 at 10:57:32AM +0100, Andi Kleen wrote:
> > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > > 
> > > That's true, but irrelevant (and also soon to be untrue if we get rid of
> > > the scsi_device class as you and Kay keep requesting).  The two calls
> > > release references on the actual embedded generic device, it's nothing
> > > to do with entangled lifetime rules.
> > 
> > <heretic thought>Has anybody ever considered just doing away with 
> > the problematic and bug prone and tricky reference counts for kobjects 
> > and switch to a simple garbage collector for them?
> 
> Sure, I have no objection to that.  It's just that the reference count
> "issue" really doesn't seem to be one on sanely designed busses :)

Hmm, what is a "simple garbage collector" here? How could one determine
"reachability" of objects, means: at what point of time do objects
actually become "garbage"? How could one trace in our current kernel
code who still accesses an object, without doing refcounts?

Kay


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fixing the main programmer thinko with the device model
  2008-03-26  5:07         ` Kay Sievers
@ 2008-03-26  5:23           ` Greg KH
  0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2008-03-26  5:23 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Andi Kleen, James Bottomley, Van De Ven, Arjan, Al Viro,
	linux-kernel

On Wed, Mar 26, 2008 at 06:07:49AM +0100, Kay Sievers wrote:
> On Tue, 2008-03-25 at 21:16 -0700, Greg KH wrote:
> > On Tue, Mar 25, 2008 at 10:57:32AM +0100, Andi Kleen wrote:
> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > > > 
> > > > That's true, but irrelevant (and also soon to be untrue if we get rid of
> > > > the scsi_device class as you and Kay keep requesting).  The two calls
> > > > release references on the actual embedded generic device, it's nothing
> > > > to do with entangled lifetime rules.
> > > 
> > > <heretic thought>Has anybody ever considered just doing away with 
> > > the problematic and bug prone and tricky reference counts for kobjects 
> > > and switch to a simple garbage collector for them?
> > 
> > Sure, I have no objection to that.  It's just that the reference count
> > "issue" really doesn't seem to be one on sanely designed busses :)
> 
> Hmm, what is a "???simple garbage collector" here? How could one determine
> "reachability" of objects, means: at what point of time do objects
> actually become "garbage"? How could one trace in our current kernel
> code who still accesses an object, without doing refcounts?

I think in the end, it would be the same thing, so we should stick with
our current code, as that's exactly what the current kobject model now
implements :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-03-26  5:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-24 15:39 Fixing the main programmer thinko with the device model James Bottomley
2008-03-24 17:58 ` Greg KH
2008-03-24 18:08   ` James Bottomley
2008-03-25  9:57     ` Andi Kleen
2008-03-26  4:16       ` Greg KH
2008-03-26  5:07         ` Kay Sievers
2008-03-26  5:23           ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox