Sysfs and suicidal attributes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Sysfs and suicidal attributes
@ 2007-07-08 14:31 Alan Stern
  2007-07-09 15:26 ` Cornelia Huck
  2007-07-10  5:09 ` Tejun Heo
  0 siblings, 2 replies; 18+ messages in thread
From: Alan Stern @ 2007-07-08 14:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cornelia Huck, Kernel development list

Tejun:

You remember the problem we faced with "suicidal" sysfs attributes and 
the device_schedule_callback() routine that fixes it?

Well, it turns out that this approach may confict with suspend/resume
processing.  In brief, it's not a good idea to unregister devices while
a suspend is in progress, but on the other hand it's not a good idea to
block keventd waiting until the suspend is over.

There are some ways around this such as using a different workqueue, 
one that could safely be blocked during the suspend.  But I had another 
thought.

I haven't paid much attention to your sysfs updates.  Is it still true
that calls to show/store methods are mutually exclusive with attribute
unregistration?  Assuming the answer is Yes, would it be possible to
bypass that mutual exclusion in certain explicit cases?  Here's what I
have in mind:

	The user writes to an attribute file.

	The sysfs core calls the attribute's store method.

	The method tells the sysfs core to pretend that the call
	temporarily doesn't exist, or has completed, or something
	like that.

	The method safely unregisters the attribute file, with no
	mutual exclusion problems and no deadlock.  Of course, the
	unregistration will still block until all _other_ method
	calls for this attribute are complete.

	The method tells the sysfs core to stop pretending and
	go back to its normal state.

	The method returns, and the sysfs core takes whatever actions
	are needed to fully release the attribute file.

The idea is that there could be a way to allow unregistration while a 
method is still running, if the method specifically requests it.  If we 
could do this then device_schedule_callback() would be unnecessary.

What do you think?

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-08 14:31 Sysfs and suicidal attributes Alan Stern
@ 2007-07-09 15:26 ` Cornelia Huck
  2007-07-09 22:28   ` Alan Stern
  2007-07-10  5:09 ` Tejun Heo
  1 sibling, 1 reply; 18+ messages in thread
From: Cornelia Huck @ 2007-07-09 15:26 UTC (permalink / raw)
  To: Alan Stern; +Cc: Tejun Heo, Kernel development list

On Sun, 8 Jul 2007 10:31:21 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> There are some ways around this such as using a different workqueue, 
> one that could safely be blocked during the suspend.

This would be the simplest solution, I think.

> 	The user writes to an attribute file.
> 
> 	The sysfs core calls the attribute's store method.
> 
> 	The method tells the sysfs core to pretend that the call
> 	temporarily doesn't exist, or has completed, or something
> 	like that.

Or add an ignore-for-now-flag to the buffer? It would need to be
processed in a second pass.

> 	The method safely unregisters the attribute file, with no
> 	mutual exclusion problems and no deadlock.  Of course, the
> 	unregistration will still block until all _other_ method
> 	calls for this attribute are complete.
> 
> 	The method tells the sysfs core to stop pretending and
> 	go back to its normal state.
> 
> 	The method returns, and the sysfs core takes whatever actions
> 	are needed to fully release the attribute file.

It would need to retain a reference to the buffer collection so it can
get rid of the formerly-ignored buffer.

> The idea is that there could be a way to allow unregistration while a 
> method is still running, if the method specifically requests it.  If we 
> could do this then device_schedule_callback() would be unnecessary.
> 
> What do you think?

I don't think that this is easy to get correct. Another workqueue looks
like a solution which is easy to get right, even if it may not seem so
nice.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-09 15:26 ` Cornelia Huck
@ 2007-07-09 22:28   ` Alan Stern
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Stern @ 2007-07-09 22:28 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Tejun Heo, Kernel development list

On Mon, 9 Jul 2007, Cornelia Huck wrote:

> On Sun, 8 Jul 2007 10:31:21 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > There are some ways around this such as using a different workqueue, 
> > one that could safely be blocked during the suspend.
> 
> This would be the simplest solution, I think.

I have to agree.

Alan Stern


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-08 14:31 Sysfs and suicidal attributes Alan Stern
  2007-07-09 15:26 ` Cornelia Huck
@ 2007-07-10  5:09 ` Tejun Heo
  2007-07-10  8:28   ` Cornelia Huck
  2007-07-10 14:41   ` Alan Stern
  1 sibling, 2 replies; 18+ messages in thread
From: Tejun Heo @ 2007-07-10  5:09 UTC (permalink / raw)
  To: Alan Stern; +Cc: Cornelia Huck, Kernel development list

Hello, Alan.

Alan Stern wrote:
> You remember the problem we faced with "suicidal" sysfs attributes and 
> the device_schedule_callback() routine that fixes it?

Sure.

> Well, it turns out that this approach may confict with suspend/resume
> processing.  In brief, it's not a good idea to unregister devices while
> a suspend is in progress, but on the other hand it's not a good idea to
> block keventd waiting until the suspend is over.

Is it because keventd isn't frozen during suspend?  I'm afraid we might
be trying to solve the problem at the wrong layer.  Device
addition/removal should be _locked out_ during suspend/resume.
Depending on executing thread being frozen is more subtle and fragile
and freezing might go away altogether.

IMHO, what's needed is a giant rwlock protecting the whole device
addition/removal while suspend is in progress (addition/removal during
resume should be safe as long as we put the new devices on the already
processed list).  The lock can probably replace dpm_sem and dpm_list_sem.

It's probably necessary to use trylock from addition/removal to avoid
deadlock.  Drivers which support hotplugging need to rescan/revalidate
the bus during resume anyway so failing addition/removal during suspend
shouldn't be a problem.

> I haven't paid much attention to your sysfs updates.  Is it still true
> that calls to show/store methods are mutually exclusive with attribute
> unregistration?  Assuming the answer is Yes, would it be possible to
> bypass that mutual exclusion in certain explicit cases?  Here's what I
> have in mind:
> 
> 	The user writes to an attribute file.
> 
> 	The sysfs core calls the attribute's store method.
> 
> 	The method tells the sysfs core to pretend that the call
> 	temporarily doesn't exist, or has completed, or something
> 	like that.
> 
> 	The method safely unregisters the attribute file, with no
> 	mutual exclusion problems and no deadlock.  Of course, the
> 	unregistration will still block until all _other_ method
> 	calls for this attribute are complete.
> 
> 	The method tells the sysfs core to stop pretending and
> 	go back to its normal state.
> 
> 	The method returns, and the sysfs core takes whatever actions
> 	are needed to fully release the attribute file.
> 
> The idea is that there could be a way to allow unregistration while a 
> method is still running, if the method specifically requests it.  If we 
> could do this then device_schedule_callback() would be unnecessary.

Regardless of the suspend problem, I like the above idea.  Actually, I
was thinking about doing exactly that for suicidal nodes when I first
found out the problem.  All that's needed is an owner field which allows
the owning thread to reenter (what was the name for this type of mutex?
 BSD folks like it).  It's probably better to put some restrictions to
allow only the specific case tho.

I like it because it shifts complexity from the drivers into driver
core.  IOW, the driver model is kinder to drivers that way - the driver
writer doesn't have to care whether something is suicidal or not - and I
think that's the way we should be headed although we're not good at it yet.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10  5:09 ` Tejun Heo
@ 2007-07-10  8:28   ` Cornelia Huck
  2007-07-10  8:49     ` Tejun Heo
  2007-07-10 14:41   ` Alan Stern
  1 sibling, 1 reply; 18+ messages in thread
From: Cornelia Huck @ 2007-07-10  8:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alan Stern, Kernel development list

On Tue, 10 Jul 2007 14:09:41 +0900,
Tejun Heo <htejun@gmail.com> wrote:

> > I haven't paid much attention to your sysfs updates.  Is it still true
> > that calls to show/store methods are mutually exclusive with attribute
> > unregistration?  Assuming the answer is Yes, would it be possible to
> > bypass that mutual exclusion in certain explicit cases?  Here's what I
> > have in mind:
> > 
> > 	The user writes to an attribute file.
> > 
> > 	The sysfs core calls the attribute's store method.
> > 
> > 	The method tells the sysfs core to pretend that the call
> > 	temporarily doesn't exist, or has completed, or something
> > 	like that.
> > 
> > 	The method safely unregisters the attribute file, with no
> > 	mutual exclusion problems and no deadlock.  Of course, the
> > 	unregistration will still block until all _other_ method
> > 	calls for this attribute are complete.
> > 
> > 	The method tells the sysfs core to stop pretending and
> > 	go back to its normal state.
> > 
> > 	The method returns, and the sysfs core takes whatever actions
> > 	are needed to fully release the attribute file.
> > 
> > The idea is that there could be a way to allow unregistration while a 
> > method is still running, if the method specifically requests it.  If we 
> > could do this then device_schedule_callback() would be unnecessary.
> 
> Regardless of the suspend problem, I like the above idea.  Actually, I
> was thinking about doing exactly that for suicidal nodes when I first
> found out the problem.  All that's needed is an owner field which allows
> the owning thread to reenter (what was the name for this type of mutex?
>  BSD folks like it).  It's probably better to put some restrictions to
> allow only the specific case tho.
> 
> I like it because it shifts complexity from the drivers into driver
> core.  IOW, the driver model is kinder to drivers that way - the driver
> writer doesn't have to care whether something is suicidal or not - and I
> think that's the way we should be headed although we're not good at it yet.

It might be that I misunderstand the idea, but doesn't the device
driver writer need to consider whether the attribute is suicidal as
well? (If it is the method that needs to call the sysfs core.)

A general immediate disconnect of the buffers (which will be handled in
a second pass) would be great, but doesn't sound easy.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10  8:28   ` Cornelia Huck
@ 2007-07-10  8:49     ` Tejun Heo
  2007-07-10  9:04       ` Cornelia Huck
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2007-07-10  8:49 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Alan Stern, Kernel development list

Cornelia Huck wrote:
>> I like it because it shifts complexity from the drivers into driver
>> core.  IOW, the driver model is kinder to drivers that way - the driver
>> writer doesn't have to care whether something is suicidal or not - and I
>> think that's the way we should be headed although we're not good at it yet.
> 
> It might be that I misunderstand the idea, but doesn't the device
> driver writer need to consider whether the attribute is suicidal as
> well? (If it is the method that needs to call the sysfs core.)

What I was trying to say was that suicide and murder could be done the
same way from the driver's POV or am I misunderstanding?

> A general immediate disconnect of the buffers (which will be handled in
> a second pass) would be great, but doesn't sound easy.

I haven't thought too hard about actual implementation but it's pretty
specific case.  If doing things in generic manner is difficult, there
are plenty of shortcuts to choose from, I think.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10  8:49     ` Tejun Heo
@ 2007-07-10  9:04       ` Cornelia Huck
  2007-07-10  9:13         ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Cornelia Huck @ 2007-07-10  9:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alan Stern, Kernel development list

On Tue, 10 Jul 2007 17:49:37 +0900,
Tejun Heo <htejun@gmail.com> wrote:

> What I was trying to say was that suicide and murder could be done the
> same way from the driver's POV or am I misunderstanding?

Do you mean a device unregistering itself from its attribute vs. a
device unregistering another device from its attribute?

> 
> > A general immediate disconnect of the buffers (which will be handled in
> > a second pass) would be great, but doesn't sound easy.
> 
> I haven't thought too hard about actual implementation but it's pretty
> specific case.  If doing things in generic manner is difficult, there
> are plenty of shortcuts to choose from, I think.

The "second pass" approach where the store method calls the sysfs core
or sets a flag or whatever sounds doable, but I'm not sure how general
we can get. Maybe for all store methods that just trigger an action.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10  9:04       ` Cornelia Huck
@ 2007-07-10  9:13         ` Tejun Heo
  2007-07-10 10:50           ` Cornelia Huck
  2007-07-10 14:42           ` Alan Stern
  0 siblings, 2 replies; 18+ messages in thread
From: Tejun Heo @ 2007-07-10  9:13 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Alan Stern, Kernel development list

Cornelia Huck wrote:
> On Tue, 10 Jul 2007 17:49:37 +0900,
> Tejun Heo <htejun@gmail.com> wrote:
> 
>> What I was trying to say was that suicide and murder could be done the
>> same way from the driver's POV or am I misunderstanding?
> 
> Do you mean a device unregistering itself from its attribute vs. a
> device unregistering another device from its attribute?

More like "device unregistering itself from its attribute" vs. "whatever
else".

>>> A general immediate disconnect of the buffers (which will be handled in
>>> a second pass) would be great, but doesn't sound easy.
>> I haven't thought too hard about actual implementation but it's pretty
>> specific case.  If doing things in generic manner is difficult, there
>> are plenty of shortcuts to choose from, I think.
> 
> The "second pass" approach where the store method calls the sysfs core
> or sets a flag or whatever sounds doable, but I'm not sure how general
> we can get. Maybe for all store methods that just trigger an action.

Hmm... I'll give it a shot in a few days.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10  9:13         ` Tejun Heo
@ 2007-07-10 10:50           ` Cornelia Huck
  2007-07-10 14:42           ` Alan Stern
  1 sibling, 0 replies; 18+ messages in thread
From: Cornelia Huck @ 2007-07-10 10:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alan Stern, Kernel development list

On Tue, 10 Jul 2007 18:13:43 +0900,
Tejun Heo <htejun@gmail.com> wrote:

> More like "device unregistering itself from its attribute" vs. "whatever
> else".

Ah, ok. I don't think we can make those the same, but "trigger an
action and return" vs. "whatever else" may be possible.

> 
> >>> A general immediate disconnect of the buffers (which will be handled in
> >>> a second pass) would be great, but doesn't sound easy.
> >> I haven't thought too hard about actual implementation but it's pretty
> >> specific case.  If doing things in generic manner is difficult, there
> >> are plenty of shortcuts to choose from, I think.
> > 
> > The "second pass" approach where the store method calls the sysfs core
> > or sets a flag or whatever sounds doable, but I'm not sure how general
> > we can get. Maybe for all store methods that just trigger an action.
> 
> Hmm... I'll give it a shot in a few days.

Cool. I'd try myself, but I'm currently a bit short on time :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10  5:09 ` Tejun Heo
  2007-07-10  8:28   ` Cornelia Huck
@ 2007-07-10 14:41   ` Alan Stern
  2007-07-10 15:11     ` Tejun Heo
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Stern @ 2007-07-10 14:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cornelia Huck, Kernel development list

On Tue, 10 Jul 2007, Tejun Heo wrote:

> > Well, it turns out that this approach may confict with suspend/resume
> > processing.  In brief, it's not a good idea to unregister devices while
> > a suspend is in progress, but on the other hand it's not a good idea to
> > block keventd waiting until the suspend is over.
> 
> Is it because keventd isn't frozen during suspend?  I'm afraid we might
> be trying to solve the problem at the wrong layer.  Device
> addition/removal should be _locked out_ during suspend/resume.
> Depending on executing thread being frozen is more subtle and fragile
> and freezing might go away altogether.

There has been a very intense thread recently on LKML and linux-pm 
discussing this and several other aspects of the freezer.

You have to be very cautious about locking out device addition/removal.  
Quite often the thread calling the driver core owns the parent device's 
semaphore, which is needed for suspending the parent.  If that thread 
were blocked, the suspend would deadlock.

Instead we should broadcast a notification to all threads which might
try to add a device, telling them not to do it until the suspend is
over.  After that notification is sent and current additions have
completed, the driver core can simply fail further calls to
device_add().

> IMHO, what's needed is a giant rwlock protecting the whole device
> addition/removal while suspend is in progress (addition/removal during
> resume should be safe as long as we put the new devices on the already
> processed list).  The lock can probably replace dpm_sem and dpm_list_sem.

Addition during resume is _not_ safe.  The new device can easily get
added to the already-processed list in the wrong position, so that the
next time the system is suspended the PM core tries to suspend the new
device's parent before suspending the new device.

> It's probably necessary to use trylock from addition/removal to avoid
> deadlock.  Drivers which support hotplugging need to rescan/revalidate
> the bus during resume anyway so failing addition/removal during suspend
> shouldn't be a problem.

I had envisioned using SRCU to synchronize existing device additions
with the beginning of a suspend.  down_read_trylock should work just as
well (aside from cacheline bouncing).

Device removal isn't as much of a problem as addition.  We could allow
it with no difficulty.  Failing it isn't really an option because
device_del() returns void.  And anyway, my idea was to have the PM core
acquire all the device semaphores at the start of the suspend -- this
would automatically block any attempted removal until the semaphore was
released.

> > I haven't paid much attention to your sysfs updates.  Is it still true
> > that calls to show/store methods are mutually exclusive with attribute
> > unregistration?  Assuming the answer is Yes, would it be possible to
> > bypass that mutual exclusion in certain explicit cases?  Here's what I
> > have in mind:
> > 
> > 	The user writes to an attribute file.
> > 
> > 	The sysfs core calls the attribute's store method.
> > 
> > 	The method tells the sysfs core to pretend that the call
> > 	temporarily doesn't exist, or has completed, or something
> > 	like that.
> > 
> > 	The method safely unregisters the attribute file, with no
> > 	mutual exclusion problems and no deadlock.  Of course, the
> > 	unregistration will still block until all _other_ method
> > 	calls for this attribute are complete.
> > 
> > 	The method tells the sysfs core to stop pretending and
> > 	go back to its normal state.
> > 
> > 	The method returns, and the sysfs core takes whatever actions
> > 	are needed to fully release the attribute file.
> > 
> > The idea is that there could be a way to allow unregistration while a 
> > method is still running, if the method specifically requests it.  If we 
> > could do this then device_schedule_callback() would be unnecessary.
> 
> Regardless of the suspend problem, I like the above idea.  Actually, I
> was thinking about doing exactly that for suicidal nodes when I first
> found out the problem.  All that's needed is an owner field which allows
> the owning thread to reenter (what was the name for this type of mutex?
>  BSD folks like it).  It's probably better to put some restrictions to
> allow only the specific case tho.

What with all the sysfs changes, I wouldn't know where to begin
looking.  The idea occurred to me mainly because I was considering
blocking all access to sysfs during a suspend -- the problem being that
a suspend is itself triggered by a write to a sysfs attribute!  
Somehow that write would have to wait for all _other_ existing accesses
to complete and block new ones.

> I like it because it shifts complexity from the drivers into driver
> core.  IOW, the driver model is kinder to drivers that way - the driver
> writer doesn't have to care whether something is suicidal or not - and I
> think that's the way we should be headed although we're not good at it yet.

It doesn't shift all the complexity.  The method still has to tell the 
sysfs core that it's going to unregister its attribute and that the 
unregistration has finished.  But this is simpler than the callback 
method we use now.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10  9:13         ` Tejun Heo
  2007-07-10 10:50           ` Cornelia Huck
@ 2007-07-10 14:42           ` Alan Stern
  1 sibling, 0 replies; 18+ messages in thread
From: Alan Stern @ 2007-07-10 14:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cornelia Huck, Kernel development list

On Tue, 10 Jul 2007, Tejun Heo wrote:

> >>> A general immediate disconnect of the buffers (which will be handled in
> >>> a second pass) would be great, but doesn't sound easy.
> >> I haven't thought too hard about actual implementation but it's pretty
> >> specific case.  If doing things in generic manner is difficult, there
> >> are plenty of shortcuts to choose from, I think.
> > 
> > The "second pass" approach where the store method calls the sysfs core
> > or sets a flag or whatever sounds doable, but I'm not sure how general
> > we can get. Maybe for all store methods that just trigger an action.
> 
> Hmm... I'll give it a shot in a few days.

How would that differ from the callback technique we use now?

Alan Stern


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10 14:41   ` Alan Stern
@ 2007-07-10 15:11     ` Tejun Heo
  2007-07-10 17:18       ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2007-07-10 15:11 UTC (permalink / raw)
  To: Alan Stern; +Cc: Cornelia Huck, Kernel development list

Hello,

Alan Stern wrote:
> Instead we should broadcast a notification to all threads which might
> try to add a device, telling them not to do it until the suspend is
> over.  After that notification is sent and current additions have
> completed, the driver core can simply fail further calls to
> device_add().
> 
>> IMHO, what's needed is a giant rwlock protecting the whole device
>> addition/removal while suspend is in progress (addition/removal during
>> resume should be safe as long as we put the new devices on the already
>> processed list).  The lock can probably replace dpm_sem and dpm_list_sem.

Sans notification, the plan proposed in the first paragraph and rwlock
with down_read_trylock() for addition/removal is pretty similar.

> Addition during resume is _not_ safe.  The new device can easily get
> added to the already-processed list in the wrong position, so that the
> next time the system is suspended the PM core tries to suspend the new
> device's parent before suspending the new device.

When a bus device is being woken up and rescanning the bus, all
prerequisites for the bus should have been resumed already.  As long as
the newly added device is placed after the bus device itself, things
should be safe.  Well, that's the theory at least.

>> It's probably necessary to use trylock from addition/removal to avoid
>> deadlock.  Drivers which support hotplugging need to rescan/revalidate
>> the bus during resume anyway so failing addition/removal during suspend
>> shouldn't be a problem.
> 
> I had envisioned using SRCU to synchronize existing device additions
> with the beginning of a suspend.  down_read_trylock should work just as
> well (aside from cacheline bouncing).

How high performance device addition/removal should be?  Wouldn't rwlock
be enough?

> Device removal isn't as much of a problem as addition.  We could allow
> it with no difficulty.  Failing it isn't really an option because
> device_del() returns void.  And anyway, my idea was to have the PM core
> acquire all the device semaphores at the start of the suspend -- this
> would automatically block any attempted removal until the semaphore was
> released.

Hmmm... locking all device mutexes sounds scary to me but I tend to be a
chicken and lockdep is great, so it might just work.  :-)

>> Regardless of the suspend problem, I like the above idea.  Actually, I
>> was thinking about doing exactly that for suicidal nodes when I first
>> found out the problem.  All that's needed is an owner field which allows
>> the owning thread to reenter (what was the name for this type of mutex?
>>  BSD folks like it).  It's probably better to put some restrictions to
>> allow only the specific case tho.
> 
> What with all the sysfs changes, I wouldn't know where to begin
> looking.  The idea occurred to me mainly because I was considering
> blocking all access to sysfs during a suspend -- the problem being that
> a suspend is itself triggered by a write to a sysfs attribute!  
> Somehow that write would have to wait for all _other_ existing accesses
> to complete and block new ones.

Why is that necessary?  What happens if there's some long-running
read/write operation using bin attr?

>> I like it because it shifts complexity from the drivers into driver
>> core.  IOW, the driver model is kinder to drivers that way - the driver
>> writer doesn't have to care whether something is suicidal or not - and I
>> think that's the way we should be headed although we're not good at it yet.
> 
> It doesn't shift all the complexity.  The method still has to tell the 
> sysfs core that it's going to unregister its attribute and that the 
> unregistration has finished.  But this is simpler than the callback 
> method we use now.

Right, it's nothing major.  I just dislike such limitation is visible
from individual drivers and "echo 1 > /sys/whatever/kill-yourself"
returns before the suicide is actually complete.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10 15:11     ` Tejun Heo
@ 2007-07-10 17:18       ` Alan Stern
  2007-07-10 18:08         ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2007-07-10 17:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cornelia Huck, Kernel development list

On Wed, 11 Jul 2007, Tejun Heo wrote:

> > Addition during resume is _not_ safe.  The new device can easily get
> > added to the already-processed list in the wrong position, so that the
> > next time the system is suspended the PM core tries to suspend the new
> > device's parent before suspending the new device.
> 
> When a bus device is being woken up and rescanning the bus, all
> prerequisites for the bus should have been resumed already.  As long as
> the newly added device is placed after the bus device itself, things
> should be safe.  Well, that's the theory at least.

What actually happened in practice was that a device send a wakeup
request and hence was resumed without the PM core's knowledge.  As a
result it was still on the "suspended" list when its new child was
added to the end of the "resumed" list.  Then when the PM core got
around to the parent, the parent was placed after the child.  Since the 
"resumed" list is iterated in reverse order for suspending, you see the 
problem.

If the PM core had been holding the parent's semaphore this wouldn't 
have occurred.  But it does illustrate the pitfalls of adding a device 
during resume.

> > I had envisioned using SRCU to synchronize existing device additions
> > with the beginning of a suspend.  down_read_trylock should work just as
> > well (aside from cacheline bouncing).
> 
> How high performance device addition/removal should be?  Wouldn't rwlock
> be enough?

rwlock is a type of spinlock, right?  Hence it can't sleep and can't be
used for device addition/removal.  Also rwlocks aren't fair and are
subject to livelock.

> > Device removal isn't as much of a problem as addition.  We could allow
> > it with no difficulty.  Failing it isn't really an option because
> > device_del() returns void.  And anyway, my idea was to have the PM core
> > acquire all the device semaphores at the start of the suspend -- this
> > would automatically block any attempted removal until the semaphore was
> > released.
> 
> Hmmm... locking all device mutexes sounds scary to me but I tend to be a
> chicken and lockdep is great, so it might just work.  :-)

These aren't mutexes; they are semaphores (although they are used as 
mutexes).  I speculate that part of the reason they have never been 
converted to mutexes is because of potential lockdep issues.

The order of locking isn't a problem.  The list itself defines the 
correct order.

> > What with all the sysfs changes, I wouldn't know where to begin
> > looking.  The idea occurred to me mainly because I was considering
> > blocking all access to sysfs during a suspend -- the problem being that
> > a suspend is itself triggered by a write to a sysfs attribute!  
> > Somehow that write would have to wait for all _other_ existing accesses
> > to complete and block new ones.
> 
> Why is that necessary?  What happens if there's some long-running
> read/write operation using bin attr?

That is certainly a possible problem.  I don't know if there are any 
such long-running operations.

Is it necessary?  No.  But it would make things easier if suspend 
handling could be localized in one place (the sysfs core) instead of 
spread out among a bunch of attribute methods.

Using a freezable workqueue for the callbacks would also localize the 
suspend handling.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10 17:18       ` Alan Stern
@ 2007-07-10 18:08         ` Tejun Heo
  2007-07-10 19:31           ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2007-07-10 18:08 UTC (permalink / raw)
  To: Alan Stern; +Cc: Cornelia Huck, Kernel development list

Hello,

Alan Stern wrote:
>> When a bus device is being woken up and rescanning the bus, all
>> prerequisites for the bus should have been resumed already.  As long as
>> the newly added device is placed after the bus device itself, things
>> should be safe.  Well, that's the theory at least.
> 
> What actually happened in practice was that a device send a wakeup
> request and hence was resumed without the PM core's knowledge.  As a
> result it was still on the "suspended" list when its new child was
> added to the end of the "resumed" list.  Then when the PM core got
> around to the parent, the parent was placed after the child.  Since the 
> "resumed" list is iterated in reverse order for suspending, you see the 
> problem.
> 
> If the PM core had been holding the parent's semaphore this wouldn't 
> have occurred.  But it does illustrate the pitfalls of adding a device 
> during resume.

Oh I see.  Considering such cases, I guess what's needed is allowing
"addition of devices hanging off of resumed or resuming bus during
system resume".

>>> I had envisioned using SRCU to synchronize existing device additions
>>> with the beginning of a suspend.  down_read_trylock should work just as
>>> well (aside from cacheline bouncing).
>> How high performance device addition/removal should be?  Wouldn't rwlock
>> be enough?
> 
> rwlock is a type of spinlock, right?  Hence it can't sleep and can't be
> used for device addition/removal.  Also rwlocks aren't fair and are
> subject to livelock.

rwsem of course.  Sorry about the confusion.  Fairness doesn't matter
here.  The synchronization is suspend vs. addition with priority on
suspend (suspend does down_write, removal does down_read_trylock).

>> Hmmm... locking all device mutexes sounds scary to me but I tend to be a
>> chicken and lockdep is great, so it might just work.  :-)
> 
> These aren't mutexes; they are semaphores (although they are used as 
> mutexes).  I speculate that part of the reason they have never been 
> converted to mutexes is because of potential lockdep issues.
> 
> The order of locking isn't a problem.  The list itself defines the 
> correct order.

I'm still not a big fan of locking (or downing) all device semaphores
when the same effect can be achieved with a rwsem.  One thing I like
about rwsem approach is that it can never cause a deadlock because one
side uses trylock exclusively.

If you lock all the device sems, failing addition becomes more difficult
(can be done but then you need something like the rwsem anyway).
Without special handling of dev->sem, when a driver screws up (try to
add a child device from ->suspend), it will deadlock while suspend is in
progress, which isn't pretty.

Please also note that currently dev->sem does not protect against adding
children.  It says it does in the comment on the definition of the field
but it's never acquired during device_add().

>> Why is that necessary?  What happens if there's some long-running
>> read/write operation using bin attr?
> 
> That is certainly a possible problem.  I don't know if there are any 
> such long-running operations.
> 
> Is it necessary?  No.  But it would make things easier if suspend 
> handling could be localized in one place (the sysfs core) instead of 
> spread out among a bunch of attribute methods.

Yeah, I'm all for centralizing things.  I just didn't see what could be
simplified in LLDs by freezing sysfs access.  Can you point me to any
example node which can benefit from such thing?

> Using a freezable workqueue for the callbacks would also localize the 
> suspend handling.

But the freezable workqueue thing isn't really necessary if any of the
discussed solutions is implemented, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10 18:08         ` Tejun Heo
@ 2007-07-10 19:31           ` Alan Stern
  2007-07-11  5:04             ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2007-07-10 19:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cornelia Huck, Kernel development list

On Wed, 11 Jul 2007, Tejun Heo wrote:

> >> How high performance device addition/removal should be?  Wouldn't rwlock
> >> be enough?
> > 
> > rwlock is a type of spinlock, right?  Hence it can't sleep and can't be
> > used for device addition/removal.  Also rwlocks aren't fair and are
> > subject to livelock.
> 
> rwsem of course.  Sorry about the confusion.  Fairness doesn't matter
> here.  The synchronization is suspend vs. addition with priority on
> suspend (suspend does down_write, removal does down_read_trylock).

Right.  I think device addition/removal is infrequent enough that 
either SRCU or and rwsem could be used, without any significant 
slowdown.

> I'm still not a big fan of locking (or downing) all device semaphores
> when the same effect can be achieved with a rwsem.  One thing I like
> about rwsem approach is that it can never cause a deadlock because one
> side uses trylock exclusively.

But an rwsem _doesn't_ obtain the same effect.  The major difference is
that holding all the device semaphores blocks binding of drivers, which
we need to prevent during suspend.  A minor difference is that it won't
prevent adding a new device, since you don't need to hold any existing
semaphores before calling device_add().

> If you lock all the device sems, failing addition becomes more difficult
> (can be done but then you need something like the rwsem anyway).

That was the idea: First acquire the rwsem write lock so that all
existing calls to device_add will have finished and new ones will fail,
and then acquire all the semaphores.

The purpose of holding all the semaphores is not to prevent device
addition.  More like the other way around: In order to be sure of
holding _all_ the semaphores, we have to prevent new devices from being
added.

> Without special handling of dev->sem, when a driver screws up (try to
> add a child device from ->suspend), it will deadlock while suspend is in
> progress, which isn't pretty.

No it won't.  If it calls device_add from within its suspend method,
the down_read_trylock will fail and the call will return immediately.  
With a WARN_ON to let people know which driver tried to do device_add
at an illegal time.

> Please also note that currently dev->sem does not protect against adding
> children.  It says it does in the comment on the definition of the field
> but it's never acquired during device_add().

That's why we need the rwsem.

> Yeah, I'm all for centralizing things.  I just didn't see what could be
> simplified in LLDs by freezing sysfs access.  Can you point me to any
> example node which can benefit from such thing?

For USB devices, writes to the /sys/.../bConfigurationValue node will
call usb_set_configuration(), which registers child devices.  We don't
want these calls to fail because a suspend is underway, since suspends 
are supposed to be transparent to userspace.  So they need to block 
until the suspend is over.

Access (either read or write) to any sysfs node which actually touches 
the device hardware will fail while the device is suspended.  Again, 
for transparency such accesses should block.

> > Using a freezable workqueue for the callbacks would also localize the 
> > suspend handling.
> 
> But the freezable workqueue thing isn't really necessary if any of the
> discussed solutions is implemented, right?

It isn't necessary if the callbacks aren't needed (i.e., if we give
methods a way to unregister their attribute directly).  But if we do
keep the callbacks, then for safety's sake their workqueue should be
freezable.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-10 19:31           ` Alan Stern
@ 2007-07-11  5:04             ` Tejun Heo
  2007-07-11 12:36               ` Cornelia Huck
  2007-07-11 15:04               ` Alan Stern
  0 siblings, 2 replies; 18+ messages in thread
From: Tejun Heo @ 2007-07-11  5:04 UTC (permalink / raw)
  To: Alan Stern; +Cc: Cornelia Huck, Kernel development list

Alan Stern wrote:
>> Please also note that currently dev->sem does not protect against adding
>> children.  It says it does in the comment on the definition of the field
>> but it's never acquired during device_add().
> 
> That's why we need the rwsem.

Alright, there's our confusion.  I thought you were gonna use dev->sem
to protect new device addition && driver binding.  We can use the same
rwsem directly for binding protection too but I guess there's no big
difference one way or the other.

>> Yeah, I'm all for centralizing things.  I just didn't see what could be
>> simplified in LLDs by freezing sysfs access.  Can you point me to any
>> example node which can benefit from such thing?
> 
> For USB devices, writes to the /sys/.../bConfigurationValue node will
> call usb_set_configuration(), which registers child devices.  We don't
> want these calls to fail because a suspend is underway, since suspends 
> are supposed to be transparent to userspace.  So they need to block 
> until the suspend is over.
> 
> Access (either read or write) to any sysfs node which actually touches 
> the device hardware will fail while the device is suspended.  Again, 
> for transparency such accesses should block.

Thanks for enlightening me.  Probably what can be done is blocking
regular file sysfs nodes automatically and make it optional (optionally
enable or disable) for bin attrs.

>>> Using a freezable workqueue for the callbacks would also localize the 
>>> suspend handling.
>> But the freezable workqueue thing isn't really necessary if any of the
>> discussed solutions is implemented, right?
> 
> It isn't necessary if the callbacks aren't needed (i.e., if we give
> methods a way to unregister their attribute directly).  But if we do
> keep the callbacks, then for safety's sake their workqueue should be
> freezable.

I see.  Let's see how the direct suicide thing works out.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-11  5:04             ` Tejun Heo
@ 2007-07-11 12:36               ` Cornelia Huck
  2007-07-11 15:04               ` Alan Stern
  1 sibling, 0 replies; 18+ messages in thread
From: Cornelia Huck @ 2007-07-11 12:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alan Stern, Kernel development list

On Wed, 11 Jul 2007 14:04:44 +0900,
Tejun Heo <htejun@gmail.com> wrote:

> Thanks for enlightening me.  Probably what can be done is blocking
> regular file sysfs nodes automatically and make it optional (optionally
> enable or disable) for bin attrs.

Perhaps all files should block by default, but have a knob to disable
blocking for special cases (or an attribute flag like ATTR_NOFREEZE?)

> >>> Using a freezable workqueue for the callbacks would also localize the 
> >>> suspend handling.
> >> But the freezable workqueue thing isn't really necessary if any of the
> >> discussed solutions is implemented, right?
> > 
> > It isn't necessary if the callbacks aren't needed (i.e., if we give
> > methods a way to unregister their attribute directly).  But if we do
> > keep the callbacks, then for safety's sake their workqueue should be
> > freezable.

I agree.

> I see.  Let's see how the direct suicide thing works out.

If it can be done nicely, I'd prefer it over the workqueue solution -
but not if we end up with hard-to-understand code.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Sysfs and suicidal attributes
  2007-07-11  5:04             ` Tejun Heo
  2007-07-11 12:36               ` Cornelia Huck
@ 2007-07-11 15:04               ` Alan Stern
  1 sibling, 0 replies; 18+ messages in thread
From: Alan Stern @ 2007-07-11 15:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cornelia Huck, Kernel development list

On Wed, 11 Jul 2007, Tejun Heo wrote:

> Alright, there's our confusion.  I thought you were gonna use dev->sem
> to protect new device addition && driver binding.  We can use the same
> rwsem directly for binding protection too but I guess there's no big
> difference one way or the other.

The rwsem should not be used for binding protection.  If we did trylock
then binding at the wrong time would fail, which would be bad.  If we
did down_read then we would probably block while holding a device
semaphore, which also would be bad.

> Thanks for enlightening me.  Probably what can be done is blocking
> regular file sysfs nodes automatically and make it optional (optionally
> enable or disable) for bin attrs.

Maybe.  At the moment I don't see any reason to treat binary attributes 
different from text.

It would also be good to find out whether there are any long-running 
sysfs callbacks.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-07-11 15:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-08 14:31 Sysfs and suicidal attributes Alan Stern
2007-07-09 15:26 ` Cornelia Huck
2007-07-09 22:28   ` Alan Stern
2007-07-10  5:09 ` Tejun Heo
2007-07-10  8:28   ` Cornelia Huck
2007-07-10  8:49     ` Tejun Heo
2007-07-10  9:04       ` Cornelia Huck
2007-07-10  9:13         ` Tejun Heo
2007-07-10 10:50           ` Cornelia Huck
2007-07-10 14:42           ` Alan Stern
2007-07-10 14:41   ` Alan Stern
2007-07-10 15:11     ` Tejun Heo
2007-07-10 17:18       ` Alan Stern
2007-07-10 18:08         ` Tejun Heo
2007-07-10 19:31           ` Alan Stern
2007-07-11  5:04             ` Tejun Heo
2007-07-11 12:36               ` Cornelia Huck
2007-07-11 15:04               ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox