* Sysfs attributes racing with unregistration @ 2012-01-04 16:52 Alan Stern 2012-01-04 17:18 ` Tejun Heo 0 siblings, 1 reply; 25+ messages in thread From: Alan Stern @ 2012-01-04 16:52 UTC (permalink / raw) To: Tejun Heo; +Cc: Kernel development list Tejun: Can you explain the current situation regarding access to sysfs attributes and possible races with kobject removal? I have two questions in particular: What happens if one thread calls an attribute's show or store method concurrently with another thread unregistering the underlying kobject? What happens if a thread continues to hold an open fd reference to a sysfs attribute file after the kobject is unregistered, and then tries to read or write that fd? If there are any guarantees about what happens in these situations, I can't find them in the kernel source. And of course, if you can think of any other matters related to this topic, please mention them. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Sysfs attributes racing with unregistration 2012-01-04 16:52 Sysfs attributes racing with unregistration Alan Stern @ 2012-01-04 17:18 ` Tejun Heo 2012-01-04 18:13 ` Eric W. Biederman 2012-01-04 18:13 ` Sysfs attributes racing with unregistration Alan Stern 0 siblings, 2 replies; 25+ messages in thread From: Tejun Heo @ 2012-01-04 17:18 UTC (permalink / raw) To: Alan Stern Cc: Kernel development list, Eric Biederman, Greg Kroah-Hartman, Kay Sievers Hello, Alan. On Wed, Jan 04, 2012 at 11:52:20AM -0500, Alan Stern wrote: > Can you explain the current situation regarding access to sysfs > attributes and possible races with kobject removal? I have two > questions in particular: Heh, I haven't looked at sysfs code seriously for years now and my memory sucks to begin with, so please take whatever I say with a gigantic grain of salt. Eric has been looking at sysfs a lot lately so he probably can answer these best. Adding him, Greg and Kay - hi! guys. > What happens if one thread calls an attribute's show or > store method concurrently with another thread unregistering > the underlying kobject? sysfs nodes have two reference counts - one for object lifespan and the other for active usage. The latter is called active and acquired and released using sysfs_get/put_active(). Any callback invocation should be performed while holding an active reference. On removal, sysfs_deactivate() marks the active reference count for deactivation so that no new active reference is given out and waits for the in-flight ones to drain. IOW, removal makes sure new invocations of callbacks fail and waits for in-progress ones to finish before proceeding with removal. > What happens if a thread continues to hold an open fd > reference to a sysfs attribute file after the kobject is > unregistered, and then tries to read or write that fd? Active reference is held only for the duration of each callback invocation. Userland can't prolong the existence of active reference. The duration of callback execution is the only deciding factor. Someone (I think Eric, right?) was trying to generalize the semantics to vfs layer so that severance/revocation capability is generally available. IIRC, it didn't get through tho. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Sysfs attributes racing with unregistration 2012-01-04 17:18 ` Tejun Heo @ 2012-01-04 18:13 ` Eric W. Biederman 2012-01-04 19:41 ` Alan Stern 2012-01-04 18:13 ` Sysfs attributes racing with unregistration Alan Stern 1 sibling, 1 reply; 25+ messages in thread From: Eric W. Biederman @ 2012-01-04 18:13 UTC (permalink / raw) To: Tejun Heo Cc: Alan Stern, Kernel development list, Greg Kroah-Hartman, Kay Sievers Tejun Heo <tj@kernel.org> writes: > Hello, Alan. > > On Wed, Jan 04, 2012 at 11:52:20AM -0500, Alan Stern wrote: >> Can you explain the current situation regarding access to sysfs >> attributes and possible races with kobject removal? I have two >> questions in particular: > > Heh, I haven't looked at sysfs code seriously for years now and my > memory sucks to begin with, so please take whatever I say with a > gigantic grain of salt. Eric has been looking at sysfs a lot lately > so he probably can answer these best. Adding him, Greg and Kay - hi! > guys. > >> What happens if one thread calls an attribute's show or >> store method concurrently with another thread unregistering >> the underlying kobject? > > sysfs nodes have two reference counts - one for object lifespan and > the other for active usage. The latter is called active and acquired > and released using sysfs_get/put_active(). Any callback invocation > should be performed while holding an active reference. On removal, > sysfs_deactivate() marks the active reference count for deactivation > so that no new active reference is given out and waits for the > in-flight ones to drain. IOW, removal makes sure new invocations of > callbacks fail and waits for in-progress ones to finish before > proceeding with removal. Or in simple terms. If the unregister call happens first the we do not call the show method. If the show method happens first the unregister waits until the show method is complete before letting the unregistration proceed. Furthermore lockdep models this wait as a reader/writer lock so lockdep should be able to warn you about deadlocks triggered by waiting for the unregistration to complete. >> What happens if a thread continues to hold an open fd >> reference to a sysfs attribute file after the kobject is >> unregistered, and then tries to read or write that fd? > > Active reference is held only for the duration of each callback > invocation. Userland can't prolong the existence of active reference. > The duration of callback execution is the only deciding factor. The fd only pins core sysfs data structures in memory. The fd remains usable (in the -EIO -EBADF sense of usable) even > Someone (I think Eric, right?) was trying to generalize the semantics > to vfs layer so that severance/revocation capability is generally > available. IIRC, it didn't get through tho. Unfortunately I didn't have time to complete the effort of those patches. The approach was not fundamentally rejected but it needed a clear and convincing use case as well as some strong scrutiny. But fundamentally finding a way to do that was seen as an interesting, if it could be solved without slowing down the existing cases. Eric ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Sysfs attributes racing with unregistration 2012-01-04 18:13 ` Eric W. Biederman @ 2012-01-04 19:41 ` Alan Stern 2012-01-05 3:07 ` Eric W. Biederman 0 siblings, 1 reply; 25+ messages in thread From: Alan Stern @ 2012-01-04 19:41 UTC (permalink / raw) To: Eric W. Biederman Cc: Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers On Wed, 4 Jan 2012, Eric W. Biederman wrote: > > Someone (I think Eric, right?) was trying to generalize the semantics > > to vfs layer so that severance/revocation capability is generally > > available. IIRC, it didn't get through tho. > > Unfortunately I didn't have time to complete the effort of those > patches. The approach was not fundamentally rejected but it needed a > clear and convincing use case as well as some strong scrutiny. But > fundamentally finding a way to do that was seen as an interesting, > if it could be solved without slowing down the existing cases. Ted Ts'o has been talking about something similar but not the same -- a way to revoke an entire filesystem. For example, see commit 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific kludge to avoid an oops after the disk disappears). The use case for that is obvious and widespread: Somebody yanks out a USB drive without unmounting it first. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Sysfs attributes racing with unregistration 2012-01-04 19:41 ` Alan Stern @ 2012-01-05 3:07 ` Eric W. Biederman 2012-01-05 15:13 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern 0 siblings, 1 reply; 25+ messages in thread From: Eric W. Biederman @ 2012-01-05 3:07 UTC (permalink / raw) To: Alan Stern Cc: Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers Alan Stern <stern@rowland.harvard.edu> writes: > On Wed, 4 Jan 2012, Eric W. Biederman wrote: > >> > Someone (I think Eric, right?) was trying to generalize the semantics >> > to vfs layer so that severance/revocation capability is generally >> > available. IIRC, it didn't get through tho. >> >> Unfortunately I didn't have time to complete the effort of those >> patches. The approach was not fundamentally rejected but it needed a >> clear and convincing use case as well as some strong scrutiny. But >> fundamentally finding a way to do that was seen as an interesting, >> if it could be solved without slowing down the existing cases. > > Ted Ts'o has been talking about something similar but not the same -- a > way to revoke an entire filesystem. For example, see commit > 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific > kludge to avoid an oops after the disk disappears). > > The use case for that is obvious and widespread: Somebody yanks out a > USB drive without unmounting it first. Agreed. The best I have at the moment is a library that can wrap filesystem methods to implement the hotplug bits. Do you know how hard it is to remove event up to the filesystem that sits on top of a block device? Do you know how hard it is to detect at mount time if a block device might be hot-plugable? We can always use a mount option here and make userspace figure it out, but being to have a good default would be nice. If it isn't too hard to get the event up from the block device to the filesystem when the block device is uncermoniously removed I might just make the time to have hotunplug trigger a filesystem wide revoke on a filesystem like ext4. In addition to sysfs we need the same logic in proc, sysctl, and uio. So it makes sense to move towards a common library that can do all of the hard bits. I just notice that sysctl is currently sysctl is broken in design if not in practice by having poll methods that will break if you unregister the sysctls. Fortunately for the time being we don't have any sysctls where that case comes up. Eric ^ permalink raw reply [flat|nested] 25+ messages in thread
* Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 3:07 ` Eric W. Biederman @ 2012-01-05 15:13 ` Alan Stern 2012-01-05 15:32 ` Tejun Heo ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Alan Stern @ 2012-01-05 15:13 UTC (permalink / raw) To: Eric W. Biederman Cc: Theodore Ts'o, Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers On Wed, 4 Jan 2012, Eric W. Biederman wrote: > > Ted Ts'o has been talking about something similar but not the same -- a > > way to revoke an entire filesystem. For example, see commit > > 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific > > kludge to avoid an oops after the disk disappears). > > > > The use case for that is obvious and widespread: Somebody yanks out a > > USB drive without unmounting it first. > > Agreed. The best I have at the moment is a library that can wrap > filesystem methods to implement the hotplug bits. > > Do you know how hard it is to remove event up to the filesystem that > sits on top of a block device? I don't have a clear idea of what's involved (in particular, how to go from a block_device structure to a mounted filesystem). But the place to do it would probably be block/genhd.c:invalidate_partition(). Ted can tell you if there's a better alternative. > Do you know how hard it is to detect at mount time if a block device > might be hot-plugable? We can always use a mount option here and > make userspace figure it out, but being to have a good default would > be nice. I don't think it's possible to tell if a device is hot-unpluggable. For example, the device itself might not be removable from its parent, but the parent might be hot-unpluggable. You'll probably have to assume that every device can potentially be unplugged, one way or another. Also, even devices that aren't hot-unpluggable can fail. The end result should be pretty much the same. > If it isn't too hard to get the event up from the block device to the > filesystem when the block device is uncermoniously removed I might just > make the time to have hotunplug trigger a filesystem wide revoke on a > filesystem like ext4. > > In addition to sysfs we need the same logic in proc, sysctl, and uio. > So it makes sense to move towards a common library that can do all of > the hard bits. Ted mentioned the need for a new "device removed" superblock method. Then each filesystem can add its own implementation as people get around to it. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 15:13 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern @ 2012-01-05 15:32 ` Tejun Heo 2012-01-05 16:03 ` Eric W. Biederman 2012-01-05 15:52 ` Eric W. Biederman 2012-01-05 18:18 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Greg KH 2 siblings, 1 reply; 25+ messages in thread From: Tejun Heo @ 2012-01-05 15:32 UTC (permalink / raw) To: Alan Stern Cc: Eric W. Biederman, Theodore Ts'o, Kernel development list, Greg Kroah-Hartman, Kay Sievers Hello, On Thu, Jan 05, 2012 at 10:13:31AM -0500, Alan Stern wrote: > I don't have a clear idea of what's involved (in particular, how to go > from a block_device structure to a mounted filesystem). But the place > to do it would probably be block/genhd.c:invalidate_partition(). Ted > can tell you if there's a better alternative. > > > Do you know how hard it is to detect at mount time if a block device > > might be hot-plugable? We can always use a mount option here and > > make userspace figure it out, but being to have a good default would > > be nice. > > I don't think it's possible to tell if a device is hot-unpluggable. > For example, the device itself might not be removable from its parent, > but the parent might be hot-unpluggable. You'll probably have to > assume that every device can potentially be unplugged, one way or > another. > > Also, even devices that aren't hot-unpluggable can fail. The end > result should be pretty much the same. Ummm.... I could be missing something but filesystems need to be able to deal with partial device failures (ie. some block can't be read) and hot-unplug or handling full failure is a logical extension of that. That's how it already works, so I don't really think that is a particularly good application for the revoke mechanism. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 15:32 ` Tejun Heo @ 2012-01-05 16:03 ` Eric W. Biederman 2012-01-05 16:44 ` Tejun Heo 2012-01-05 16:47 ` Alan Stern 0 siblings, 2 replies; 25+ messages in thread From: Eric W. Biederman @ 2012-01-05 16:03 UTC (permalink / raw) To: Tejun Heo Cc: Alan Stern, Theodore Ts'o, Kernel development list, Greg Kroah-Hartman, Kay Sievers Tejun Heo <tj@kernel.org> writes: > Hello, > > On Thu, Jan 05, 2012 at 10:13:31AM -0500, Alan Stern wrote: >> I don't have a clear idea of what's involved (in particular, how to go >> from a block_device structure to a mounted filesystem). But the place >> to do it would probably be block/genhd.c:invalidate_partition(). Ted >> can tell you if there's a better alternative. >> >> > Do you know how hard it is to detect at mount time if a block device >> > might be hot-plugable? We can always use a mount option here and >> > make userspace figure it out, but being to have a good default would >> > be nice. >> >> I don't think it's possible to tell if a device is hot-unpluggable. >> For example, the device itself might not be removable from its parent, >> but the parent might be hot-unpluggable. You'll probably have to >> assume that every device can potentially be unplugged, one way or >> another. >> >> Also, even devices that aren't hot-unpluggable can fail. The end >> result should be pretty much the same. > > Ummm.... I could be missing something but filesystems need to be able > to deal with partial device failures (ie. some block can't be read) > and hot-unplug or handling full failure is a logical extension of > that. That's how it already works, so I don't really think that is a > particularly good application for the revoke mechanism. Well the choices are really: a) On a block device hotunplug keep the device and have it simply report everything as errors, to the filesystem. Maybe with a hint to the filesystem that something is wrong. b) Have a filesystem revoke method so that we don't have to keep the unplugged block device structure around indefinitely. It seems clear that we are neither doing (a) or (b) which results in periodic and spectacular failures when block devices are unplugged, because we try and access block devices that no longer exist. Eric ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 16:03 ` Eric W. Biederman @ 2012-01-05 16:44 ` Tejun Heo 2012-01-05 16:47 ` Alan Stern 1 sibling, 0 replies; 25+ messages in thread From: Tejun Heo @ 2012-01-05 16:44 UTC (permalink / raw) To: Eric W. Biederman Cc: Alan Stern, Theodore Ts'o, Kernel development list, Greg Kroah-Hartman, Kay Sievers Hello, On Thu, Jan 05, 2012 at 08:03:16AM -0800, Eric W. Biederman wrote: > Well the choices are really: > a) On a block device hotunplug keep the device and have it simply report > everything as errors, to the filesystem. Maybe with a hint to the > filesystem that something is wrong. > b) Have a filesystem revoke method so that we don't have to keep the > unplugged block device structure around indefinitely. > > It seems clear that we are neither doing (a) or (b) which results in > periodic and spectacular failures when block devices are unplugged, > because we try and access block devices that no longer exist. We're definitely doing a). If it's not working properly, it's a bug. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 16:03 ` Eric W. Biederman 2012-01-05 16:44 ` Tejun Heo @ 2012-01-05 16:47 ` Alan Stern 2012-01-05 17:11 ` Tejun Heo 2012-01-05 18:27 ` Ted Ts'o 1 sibling, 2 replies; 25+ messages in thread From: Alan Stern @ 2012-01-05 16:47 UTC (permalink / raw) To: Eric W. Biederman Cc: Tejun Heo, Theodore Ts'o, Kernel development list, Greg Kroah-Hartman, Kay Sievers On Thu, 5 Jan 2012, Eric W. Biederman wrote: > > Ummm.... I could be missing something but filesystems need to be able > > to deal with partial device failures (ie. some block can't be read) > > and hot-unplug or handling full failure is a logical extension of > > that. That's how it already works, so I don't really think that is a > > particularly good application for the revoke mechanism. > > Well the choices are really: > a) On a block device hotunplug keep the device and have it simply report > everything as errors, to the filesystem. Maybe with a hint to the > filesystem that something is wrong. > b) Have a filesystem revoke method so that we don't have to keep the > unplugged block device structure around indefinitely. When I asked Ted about this, he strongly indicated that he preferred b). > It seems clear that we are neither doing (a) or (b) which results in > periodic and spectacular failures when block devices are unplugged, > because we try and access block devices that no longer exist. Actually we are doing a). But we aren't doing it well enough. One problem (which was reported by a user last spring) is that del_gendisk() calls device_del() for the disk and bdi_unregister() for the disk's backing_dev_info structure. Now, del_gendisk will leave the data structure in memory until the disk's refcount drops to 0, but bdi_unregister ignores refcounts and simply erases the bdi->dev pointer. Once this happens, any attempt to call mark_buffer_dirty() (for example, by ext4_commit_super) will cause an oops. Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 16:47 ` Alan Stern @ 2012-01-05 17:11 ` Tejun Heo 2012-01-05 18:27 ` Ted Ts'o 1 sibling, 0 replies; 25+ messages in thread From: Tejun Heo @ 2012-01-05 17:11 UTC (permalink / raw) To: Alan Stern Cc: Eric W. Biederman, Theodore Ts'o, Kernel development list, Greg Kroah-Hartman, Kay Sievers Hello, On Thu, Jan 05, 2012 at 11:47:54AM -0500, Alan Stern wrote: > One problem (which was reported by a user last spring) is that > del_gendisk() calls device_del() for the disk and bdi_unregister() for > the disk's backing_dev_info structure. Now, del_gendisk will leave the > data structure in memory until the disk's refcount drops to 0, but > bdi_unregister ignores refcounts and simply erases the bdi->dev > pointer. Once this happens, any attempt to call mark_buffer_dirty() > (for example, by ext4_commit_super) will cause an oops. Yeah, there were multiple bugs in block device hot-removal path. I got some of them fixed recently but didn't get to the bdi one yet. It's a bug and needs to be fixed regardless of fs revoke support. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 16:47 ` Alan Stern 2012-01-05 17:11 ` Tejun Heo @ 2012-01-05 18:27 ` Ted Ts'o 2012-01-05 18:36 ` Tejun Heo 2012-01-05 18:38 ` Christoph Hellwig 1 sibling, 2 replies; 25+ messages in thread From: Ted Ts'o @ 2012-01-05 18:27 UTC (permalink / raw) To: Alan Stern Cc: Eric W. Biederman, Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers On Thu, Jan 05, 2012 at 11:47:54AM -0500, Alan Stern wrote: > > Well the choices are really: > > a) On a block device hotunplug keep the device and have it simply report > > everything as errors, to the filesystem. Maybe with a hint to the > > filesystem that something is wrong. > > b) Have a filesystem revoke method so that we don't have to keep the > > unplugged block device structure around indefinitely. > > When I asked Ted about this, he strongly indicated that he preferred > b). Ideally, we should do both. The block device should call a notification function (probably run out of a workqueue context, to avoid locking issues) which tells the file system, "the block device is _gone_ and isn't coming back". Any attempts to read or write to the block device should return errors, since there maybe writeback happening in the background while the file system is shutting down file system mount. Once the file system is done, it can all a function which tells the block device layer that it's OK to release the block device and its related structures. In order for the file system to shut down the file system cleanly, it will need to access VFS-level revoke functionality that replaces file descriptors with ones that returns an error on reads and writes, and which does the right thing with mmap's[1], etc. So it's really more of a filesystem force-umount method. I could imagine that this could also be used to extend the functionality of umount(2) so that the MNT_FORCE flag could be used with non-NFS file systems as well as NFS file systems. - Ted [1] Interesting question: do we convert an mmap region to an anonymous region and perhaps notify the user out of band this has happened? Or do we just make the mapping disappear and nuke the process with a SEGV if it attempts to access it? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 18:27 ` Ted Ts'o @ 2012-01-05 18:36 ` Tejun Heo 2012-01-05 19:28 ` Ted Ts'o 2012-01-05 20:43 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Eric W. Biederman 2012-01-05 18:38 ` Christoph Hellwig 1 sibling, 2 replies; 25+ messages in thread From: Tejun Heo @ 2012-01-05 18:36 UTC (permalink / raw) To: Ted Ts'o, Alan Stern, Eric W. Biederman, Kernel development list, Greg Kroah-Hartman, Kay Sievers Hello, Ted. On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote: > So it's really more of a filesystem force-umount method. I could > imagine that this could also be used to extend the functionality of > umount(2) so that the MNT_FORCE flag could be used with non-NFS file > systems as well as NFS file systems. I think these are two separate mechanisms. Filesystems need to be able to handle IO errors no matter what and underlying device going away is the same situation. There's no reason to mix that with force unmount. That's a separate feature and whether to force unmount filesystem on device removal or permanent failure is a policy decision which belongs to userland - ie. if such behavior is desired, it should be implemented via udev/udisk instead of hard coded logic in kernel. I don't know enough to decide whether such forced unmount is a useful feature tho. It can be neat for development but is there any real necessity for the feature? > [1] Interesting question: do we convert an mmap region to an anonymous > region and perhaps notify the user out of band this has happened? Or > do we just make the mapping disappear and nuke the process with a SEGV > if it attempts to access it? FWIW, I vote for SIGBUS similarly to the way we handle mmap vs. truncate. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 18:36 ` Tejun Heo @ 2012-01-05 19:28 ` Ted Ts'o 2012-01-05 20:52 ` Tejun Heo ` (2 more replies) 2012-01-05 20:43 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Eric W. Biederman 1 sibling, 3 replies; 25+ messages in thread From: Ted Ts'o @ 2012-01-05 19:28 UTC (permalink / raw) To: Tejun Heo Cc: Alan Stern, Eric W. Biederman, Kernel development list, Greg Kroah-Hartman, Kay Sievers On Thu, Jan 05, 2012 at 10:36:02AM -0800, Tejun Heo wrote: > Hello, Ted. > > On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote: > > So it's really more of a filesystem force-umount method. I could > > imagine that this could also be used to extend the functionality of > > umount(2) so that the MNT_FORCE flag could be used with non-NFS file > > systems as well as NFS file systems. > > I think these are two separate mechanisms. Filesystems need to be > able to handle IO errors no matter what and underlying device going > away is the same situation. There's no reason to mix that with force > unmount. That's a separate feature and whether to force unmount > filesystem on device removal or permanent failure is a policy decision > which belongs to userland - ie. if such behavior is desired, it should > be implemented via udev/udisk instead of hard coded logic in kernel. I think it's needless complexity to loop this into userspace. If the block device is gone, it's *gone*. What else could userspace do with this information that block device has disappeared? Right now, once gone, it's never coming back. Even if the luser plugs the USB device back in, it's going to be coming back as a new block device node. So we might as well automatically forcibly unmount the file system at this point. I can imagine sending an optional notification that such a thing has happened, perhaps via a netlink socket, but why not have the kernel do the right thing automatically? > I don't know enough to decide whether such forced unmount is a useful > feature tho. It can be neat for development but is there any real > necessity for the feature? Well, if you want to complicate matters by having this go via a notification up to userspace, and then have the userspace thoughtfully consider (after looking up all sorts of complex rules stored in XML files whose schema is documented nowhere but in the source code) that the file system should go away because the block device has gone away, the userspace code will then have to send a forced unmount. The other use case would be a system administrator who doesn't want to figure out which random shell is still cd'ed into a directory of a file system he/she wants to unmount, he can still force the umount. (Other Unix systems have had this feature in the past, and the result is the same as what happens if you are cd'ed into a directory which is later rmdir'ed.) It's an ungraceful way of running things, but sometimes it's the easist way to go. - Ted ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 19:28 ` Ted Ts'o @ 2012-01-05 20:52 ` Tejun Heo 2012-01-06 6:25 ` Alexander E. Patrakov 2012-01-07 21:01 ` Revoking filesystems [was Re: Sysfs attributes racing withunregistration] Milton Miller 2 siblings, 0 replies; 25+ messages in thread From: Tejun Heo @ 2012-01-05 20:52 UTC (permalink / raw) To: Ted Ts'o Cc: Alan Stern, Eric W. Biederman, Kernel development list, Greg Kroah-Hartman, Kay Sievers Hello, On Thu, Jan 05, 2012 at 02:28:22PM -0500, Ted Ts'o wrote: > > I think these are two separate mechanisms. Filesystems need to be > > able to handle IO errors no matter what and underlying device going > > away is the same situation. There's no reason to mix that with force > > unmount. That's a separate feature and whether to force unmount > > filesystem on device removal or permanent failure is a policy decision > > which belongs to userland - ie. if such behavior is desired, it should > > be implemented via udev/udisk instead of hard coded logic in kernel. > > I think it's needless complexity to loop this into userspace. If the > block device is gone, it's *gone*. What else could userspace do with > this information that block device has disappeared? Right now, once > gone, it's never coming back. Even if the luser plugs the USB device > back in, it's going to be coming back as a new block device node. > > So we might as well automatically forcibly unmount the file system at > this point. I can imagine sending an optional notification that such > a thing has happened, perhaps via a netlink socket, but why not have > the kernel do the right thing automatically? * If this was the one method to deal with hotunplug, sure, but it's not. We already have (supposedly) working failure mode for hot device removal. * Any modern linux distro already has all the infrastructure to handle this. You can't handle hotplug without userland provided poicies and the same mechanism is used for hotunplugging too, *today*. If force umount is decided to be the action to take on block device removal, that would be several line changes in userland. Userland is already responsible for taking actions for those events. * Such automation might look like a good idea now but we really don't know how it would end up in the longer run or for different use case scenarios. I think a good example of this is the cdrom driver. It implents tons of automatic behaviors, and then had to be augmented with ioctls to turn on and off them as they no longer fit new hardware, new userland behavior and changing user expectations. So, regardless of whether adding revoking is a good idea or not, I believe that force umount should be a separate thing from internal block error handling. > The other use case would be a system administrator who doesn't want to > figure out which random shell is still cd'ed into a directory of a > file system he/she wants to unmount, he can still force the umount. > (Other Unix systems have had this feature in the past, and the result > is the same as what happens if you are cd'ed into a directory which is > later rmdir'ed.) It's an ungraceful way of running things, but > sometimes it's the easist way to go. More importantly, I can't really see valid use cases other than scenarios like the above for using revocation for usual hot unplug. For most users, it wouldn't matter one way or the other. It's not like sync + lazy umount can't achieve (note that all the desktop stuff knows about "filesystem is going away" and will gracefully step aside) most of forced umount anyway. It could be nice to cli aficionados or grumpy admins but for the vast majority of userbase, it just wouldn't matter. Given that, I'm not convinced this is a worthwhile thing to have. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 19:28 ` Ted Ts'o 2012-01-05 20:52 ` Tejun Heo @ 2012-01-06 6:25 ` Alexander E. Patrakov 2012-01-07 21:01 ` Revoking filesystems [was Re: Sysfs attributes racing withunregistration] Milton Miller 2 siblings, 0 replies; 25+ messages in thread From: Alexander E. Patrakov @ 2012-01-06 6:25 UTC (permalink / raw) To: linux-kernel Ted Ts'o <tytso@mit.edu> wrote: > On Thu, Jan 05, 2012 at 10:36:02AM -0800, Tejun Heo wrote: > > Hello, Ted. > > > > On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote: > > > So it's really more of a filesystem force-umount method. I could > > > imagine that this could also be used to extend the functionality > > > of umount(2) so that the MNT_FORCE flag could be used with > > > non-NFS file systems as well as NFS file systems. > > > > I think these are two separate mechanisms. Filesystems need to be > > able to handle IO errors no matter what and underlying device going > > away is the same situation. There's no reason to mix that with > > force unmount. That's a separate feature and whether to force > > unmount filesystem on device removal or permanent failure is a > > policy decision which belongs to userland - ie. if such behavior is > > desired, it should be implemented via udev/udisk instead of hard > > coded logic in kernel. > > I think it's needless complexity to loop this into userspace. If the > block device is gone, it's *gone*. What else could userspace do with > this information that block device has disappeared? Right now, once > gone, it's never coming back. Even if the luser plugs the USB device > back in, it's going to be coming back as a new block device node. > > So we might as well automatically forcibly unmount the file system at > this point. I can imagine sending an optional notification that such > a thing has happened, perhaps via a netlink socket, but why not have > the kernel do the right thing automatically? +1, but with a different motivation. It just has to be done in the kernel, because the userspace does not have all the needed information to do it properly. Here are some testcases to think of, but, honestly, I have tested only the first one and consider that it is sufficient to prove my point. Testcase 1: lazy unmount in progress. Plug in your USB flash drive, mount it (or let it be automounted, say, in /media/DEVICE), open two shells. In the first one, cd /media/DEVICE, and, after that, in the second one, umount -l /media/DEVICE. Now look at /proc/mounts in the second shell - there is no trace of your flash drive, so how would your userspace guess that /media/DEVICE has to be force-unmounted if you unplug the device now? Testcase 2: mount namespaces. Same issue - are you going to traverse all of /proc/???/mounts files, unscalably? Testcase 3 (unsure): a filesystem bind-mounted several times on different directories. What is the correct order of unmounting? OTOH, I won't be surprised if anyone finds a case that clearly shows that it cannot be done correctly in the kernel, either (and actually want you to think about it). In that case, we are screwed :( Here are some ideas for someone else to investigate if they are a problem: 1) Strange DM mappings on top of the device (LUKS?) 2) Something else mounted in /media/DEVICE/somedir - what to do with it? -- Alexander E. Patrakov ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing withunregistration] 2012-01-05 19:28 ` Ted Ts'o 2012-01-05 20:52 ` Tejun Heo 2012-01-06 6:25 ` Alexander E. Patrakov @ 2012-01-07 21:01 ` Milton Miller 2 siblings, 0 replies; 25+ messages in thread From: Milton Miller @ 2012-01-07 21:01 UTC (permalink / raw) To: Ted Ts'o Cc: linux-kernel, Eric W. Biederman, Tejun Heo, Alexander E. Patrakov [resending with better headers] On Thu Jan 05 2012 about 14:28:28 EST, Ted Ts'o wrote: > On Thu, Jan 05, 2012 at 10:36:02AM -0800, Tejun Heo wrote: > > Hello, Ted. > > > > On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote: > > > So it's really more of a filesystem force-umount method. I could > > > imagine that this could also be used to extend the functionality of > > > umount(2) so that the MNT_FORCE flag could be used with non-NFS file > > > systems as well as NFS file systems. > > > > I think these are two separate mechanisms. Filesystems need to be > > able to handle IO errors no matter what and underlying device going > > away is the same situation. There's no reason to mix that with force > > unmount. That's a separate feature and whether to force unmount > > filesystem on device removal or permanent failure is a policy decision > > which belongs to userland - ie. if such behavior is desired, it should > > be implemented via udev/udisk instead of hard coded logic in kernel. > > I think it's needless complexity to loop this into userspace. If the > block device is gone, it's *gone*. What else could userspace do with > this information that block device has disappeared? Right now, once > gone, it's never coming back. Even if the luser plugs the USB device > back in, it's going to be coming back as a new block device node. While user space has lost the ability to read that fs, there is lots that can continue to work, espically if the system is not under memory pressure. First of all, what if the process I really care about is in a chroot on another file system that was mounted under the failed filesystem? I don't want the kernel killing my job and leaving a partial file on some other file system just because some other disk went offline. Second, as long as the file is cached in memory, I might be able to use that busybox that is cached to shutdown my system or mount the usb drive after it comes back as a new location, as long as there isn't memory pressure. > > So we might as well automatically forcibly unmount the file system at > this point. I can imagine sending an optional notification that such > a thing has happened, perhaps via a netlink socket, but why not have > the kernel do the right thing automatically? > > > I don't know enough to decide whether such forced unmount is a useful > > feature tho. It can be neat for development but is there any real > > necessity for the feature? > > Well, if you want to complicate matters by having this go via a > notification up to userspace, and then have the userspace thoughtfully > consider (after looking up all sorts of complex rules stored in XML > files whose schema is documented nowhere but in the source code) that > the file system should go away because the block device has gone away, > the userspace code will then have to send a forced unmount. > > The other use case would be a system administrator who doesn't want to > figure out which random shell is still cd'ed into a directory of a > file system he/she wants to unmount, he can still force the umount. > (Other Unix systems have had this feature in the past, and the result > is the same as what happens if you are cd'ed into a directory which is > later rmdir'ed.) It's an ungraceful way of running things, but > sometimes it's the easist way to go. > > - Ted I can see something like this as an assist to userspace, but don't forget that mounts are a tree. milton ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 18:36 ` Tejun Heo 2012-01-05 19:28 ` Ted Ts'o @ 2012-01-05 20:43 ` Eric W. Biederman 2012-01-05 20:55 ` Tejun Heo 1 sibling, 1 reply; 25+ messages in thread From: Eric W. Biederman @ 2012-01-05 20:43 UTC (permalink / raw) To: Tejun Heo Cc: Ted Ts'o, Alan Stern, Kernel development list, Greg Kroah-Hartman, Kay Sievers Tejun Heo <tj@kernel.org> writes: > Hello, Ted. > > On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote: >> So it's really more of a filesystem force-umount method. I could >> imagine that this could also be used to extend the functionality of >> umount(2) so that the MNT_FORCE flag could be used with non-NFS file >> systems as well as NFS file systems. > > I think these are two separate mechanisms. Filesystems need to be > able to handle IO errors no matter what and underlying device going > away is the same situation. There's no reason to mix that with force > unmount. That's a separate feature and whether to force unmount > filesystem on device removal or permanent failure is a policy decision > which belongs to userland - ie. if such behavior is desired, it should > be implemented via udev/udisk instead of hard coded logic in kernel. > > I don't know enough to decide whether such forced unmount is a useful > feature tho. It can be neat for development but is there any real > necessity for the feature? > >> [1] Interesting question: do we convert an mmap region to an anonymous >> region and perhaps notify the user out of band this has happened? Or >> do we just make the mapping disappear and nuke the process with a SEGV >> if it attempts to access it? > > FWIW, I vote for SIGBUS similarly to the way we handle mmap > vs. truncate. Agreed. SIGBUS is documented as the mapping exists but the backing store has gone away, which seems to describe hotunplug very well. Additionally we already do this for sysfs and it works well. So it appears that on a hotunplug it is desirable to wake all poll waiters of a filesystem, invalidate all mmaps, and probably notify all inotify watchers. And in general scream to userspace that the filesystem is gone leave it alone. That does require a notification from the block device going away to the filesystem. Tejun is there an existing mechanism that we can plug into or do we need to implement something new? Ted we can scream that the filesystem is going away without freeing all of the filesystem data structures. To userspace there would effectively be no difference but internal to the kernel it should allows to skip the expensive logic of tracking every time a filesystem method is invoked, allowing us to not penalize the fast path. If I don't have to provide a zero cost ability to track which filesystem methods are active at any given time I think I can whip up something that is usable in a couple of days. Eric ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 20:43 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Eric W. Biederman @ 2012-01-05 20:55 ` Tejun Heo 0 siblings, 0 replies; 25+ messages in thread From: Tejun Heo @ 2012-01-05 20:55 UTC (permalink / raw) To: Eric W. Biederman Cc: Ted Ts'o, Alan Stern, Kernel development list, Greg Kroah-Hartman, Kay Sievers On Thu, Jan 05, 2012 at 12:43:11PM -0800, Eric W. Biederman wrote: > That does require a notification from the block device going away > to the filesystem. Tejun is there an existing mechanism that we > can plug into or do we need to implement something new? Of course, udev. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 18:27 ` Ted Ts'o 2012-01-05 18:36 ` Tejun Heo @ 2012-01-05 18:38 ` Christoph Hellwig 1 sibling, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2012-01-05 18:38 UTC (permalink / raw) To: Ted Ts'o, Alan Stern, Eric W. Biederman, Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote: > Ideally, we should do both. The block device should call a > notification function (probably run out of a workqueue context, to > avoid locking issues) which tells the file system, "the block device > is _gone_ and isn't coming back". Any attempts to read or write to > the block device should return errors, since there maybe writeback > happening in the background while the file system is shutting down > file system mount. Once the file system is done, it can all a > function which tells the block device layer that it's OK to release > the block device and its related structures. FYI: we have all the functionality for that available in XFS and would just need to wire it up. It's also triggered if we get a write I/O error for metadata (typically the log), so with a minim delay we actually provide that behaviour already. > In order for the file system to shut down the file system cleanly, it > will need to access VFS-level revoke functionality that replaces file > descriptors with ones that returns an error on reads and writes, and > which does the right thing with mmap's[1], etc. And that part is close to impossible to get right. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 15:13 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern 2012-01-05 15:32 ` Tejun Heo @ 2012-01-05 15:52 ` Eric W. Biederman 2013-01-14 15:11 ` watchdog code anish kumar 2012-01-05 18:18 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Greg KH 2 siblings, 1 reply; 25+ messages in thread From: Eric W. Biederman @ 2012-01-05 15:52 UTC (permalink / raw) To: Alan Stern Cc: Theodore Ts'o, Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers Alan Stern <stern@rowland.harvard.edu> writes: > On Wed, 4 Jan 2012, Eric W. Biederman wrote: > >> > Ted Ts'o has been talking about something similar but not the same -- a >> > way to revoke an entire filesystem. For example, see commit >> > 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific >> > kludge to avoid an oops after the disk disappears). >> > >> > The use case for that is obvious and widespread: Somebody yanks out a >> > USB drive without unmounting it first. >> >> Agreed. The best I have at the moment is a library that can wrap >> filesystem methods to implement the hotplug bits. >> >> Do you know how hard it is to remove event up to the filesystem that >> sits on top of a block device? > > I don't have a clear idea of what's involved (in particular, how to go > from a block_device structure to a mounted filesystem). But the place > to do it would probably be block/genhd.c:invalidate_partition(). Ted > can tell you if there's a better alternative. Interesting. That sounds like a good place to look. Thanks. >> Do you know how hard it is to detect at mount time if a block device >> might be hot-plugable? We can always use a mount option here and >> make userspace figure it out, but being to have a good default would >> be nice. > > I don't think it's possible to tell if a device is hot-unpluggable. > For example, the device itself might not be removable from its parent, > but the parent might be hot-unpluggable. You'll probably have to > assume that every device can potentially be unplugged, one way or > another. > > Also, even devices that aren't hot-unpluggable can fail. The end > result should be pretty much the same. True, and ultimately I agree with you. Unfortunately solving the full general case right now looks like perfection being the enemy of the good. When the requirement becomes add the ability to tear down the data structures and to remove the modules we have to track while we are in a filesystem method and to add the ability to wait for us to stop being in all filesystem methods. That tracking is hard to make free. So implementing it for everyone out of the gate is a hard sell. However if we can pick those cases where we care more about doing the right thing on hot-unplug than we care about performance we should be able to go forward with a good enough method now. But since there are performance implications for very common path system calls it makes sense to make this for the first pass something like mount -o sync. Something that you can opt into when it makes sense, but that you don't have to opt into. So the practical option that I see is either we autodetect block devices that are setup to be hotpluggable or that we require a mount option. >> If it isn't too hard to get the event up from the block device to the >> filesystem when the block device is uncermoniously removed I might just >> make the time to have hotunplug trigger a filesystem wide revoke on a >> filesystem like ext4. >> >> In addition to sysfs we need the same logic in proc, sysctl, and uio. >> So it makes sense to move towards a common library that can do all of >> the hard bits. > > Ted mentioned the need for a new "device removed" superblock method. > Then each filesystem can add its own implementation as people get > around to it. Yeah. If we can get the "device removed" aka "revokefs" superblock it isn't too hard to build a library that filesystems can use to wrap their normal filesystem methods and implement revokefs. Eric ^ permalink raw reply [flat|nested] 25+ messages in thread
* watchdog code 2012-01-05 15:52 ` Eric W. Biederman @ 2013-01-14 15:11 ` anish kumar 0 siblings, 0 replies; 25+ messages in thread From: anish kumar @ 2013-01-14 15:11 UTC (permalink / raw) To: johlstei; +Cc: Kernel development list >From your comments in this thread https://lkml.org/lkml/2011/3/25/723 >The msm watchdog driver is present in kernel only. It does not use the >built-in Linux watchdog api. This is because the primary function of >our watchdog is detecting bus lockups and interrupts being turned off Doesn't linux original implementation(kernel/watchdog.c) already cover this?If not then how does this implementation detect it i.e. bus lockup and interrupt turned off for long time? Does this piece of code can co-exist with the soft/hard lockup detection in the core kernel? >for long periods of time. We wanted this functionality to be present >regardless of the userspace the kernel is running beneath. Userspace is >free to have its own watchdog implemented in software. what does this mean, can you elaborate? >Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org> In my personal opinion, we should always acknowledge the code from which this code is inspired :) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration] 2012-01-05 15:13 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern 2012-01-05 15:32 ` Tejun Heo 2012-01-05 15:52 ` Eric W. Biederman @ 2012-01-05 18:18 ` Greg KH 2 siblings, 0 replies; 25+ messages in thread From: Greg KH @ 2012-01-05 18:18 UTC (permalink / raw) To: Alan Stern Cc: Eric W. Biederman, Theodore Ts'o, Tejun Heo, Kernel development list, Kay Sievers On Thu, Jan 05, 2012 at 10:13:31AM -0500, Alan Stern wrote: > On Wed, 4 Jan 2012, Eric W. Biederman wrote: > > > > Ted Ts'o has been talking about something similar but not the same -- a > > > way to revoke an entire filesystem. For example, see commit > > > 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific > > > kludge to avoid an oops after the disk disappears). > > > > > > The use case for that is obvious and widespread: Somebody yanks out a > > > USB drive without unmounting it first. > > > > Agreed. The best I have at the moment is a library that can wrap > > filesystem methods to implement the hotplug bits. > > > > Do you know how hard it is to remove event up to the filesystem that > > sits on top of a block device? > > I don't have a clear idea of what's involved (in particular, how to go > from a block_device structure to a mounted filesystem). But the place > to do it would probably be block/genhd.c:invalidate_partition(). Ted > can tell you if there's a better alternative. > > > Do you know how hard it is to detect at mount time if a block device > > might be hot-plugable? We can always use a mount option here and > > make userspace figure it out, but being to have a good default would > > be nice. > > I don't think it's possible to tell if a device is hot-unpluggable. > For example, the device itself might not be removable from its parent, > but the parent might be hot-unpluggable. You'll probably have to > assume that every device can potentially be unplugged, one way or > another. These days, _any_ block device is hot unplugable, what with PCI hotplug and the like (running in a virtual machine, etc.) So you always need to assume that any device can go away at any point in time. thanks, greg k-h ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Sysfs attributes racing with unregistration 2012-01-04 17:18 ` Tejun Heo 2012-01-04 18:13 ` Eric W. Biederman @ 2012-01-04 18:13 ` Alan Stern 2012-01-04 18:20 ` Tejun Heo 1 sibling, 1 reply; 25+ messages in thread From: Alan Stern @ 2012-01-04 18:13 UTC (permalink / raw) To: Tejun Heo Cc: Kernel development list, Eric Biederman, Greg Kroah-Hartman, Kay Sievers On Wed, 4 Jan 2012, Tejun Heo wrote: > Hello, Alan. > > On Wed, Jan 04, 2012 at 11:52:20AM -0500, Alan Stern wrote: > > Can you explain the current situation regarding access to sysfs > > attributes and possible races with kobject removal? I have two > > questions in particular: > > Heh, I haven't looked at sysfs code seriously for years now and my > memory sucks to begin with, so please take whatever I say with a > gigantic grain of salt. Eric has been looking at sysfs a lot lately > so he probably can answer these best. Adding him, Greg and Kay - hi! > guys. > > > What happens if one thread calls an attribute's show or > > store method concurrently with another thread unregistering > > the underlying kobject? > > sysfs nodes have two reference counts - one for object lifespan and > the other for active usage. The latter is called active and acquired > and released using sysfs_get/put_active(). Any callback invocation > should be performed while holding an active reference. On removal, > sysfs_deactivate() marks the active reference count for deactivation > so that no new active reference is given out and waits for the > in-flight ones to drain. IOW, removal makes sure new invocations of > callbacks fail and waits for in-progress ones to finish before > proceeding with removal. > > > What happens if a thread continues to hold an open fd > > reference to a sysfs attribute file after the kobject is > > unregistered, and then tries to read or write that fd? > > Active reference is held only for the duration of each callback > invocation. Userland can't prolong the existence of active reference. > The duration of callback execution is the only deciding factor. > > Someone (I think Eric, right?) was trying to generalize the semantics > to vfs layer so that severance/revocation capability is generally > available. IIRC, it didn't get through tho. That's great; it's just what I wanted to know. Thanks. Now, looking through the code, I wonder why sysfs_{get,put}_active() and sysfs_deactivate() don't use a real rwsem. Why go to all the effort of imitating one? Is it just to save space? Alan Stern ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Sysfs attributes racing with unregistration 2012-01-04 18:13 ` Sysfs attributes racing with unregistration Alan Stern @ 2012-01-04 18:20 ` Tejun Heo 0 siblings, 0 replies; 25+ messages in thread From: Tejun Heo @ 2012-01-04 18:20 UTC (permalink / raw) To: Alan Stern Cc: Kernel development list, Eric Biederman, Greg Kroah-Hartman, Kay Sievers Hello, On Wed, Jan 04, 2012 at 01:13:41PM -0500, Alan Stern wrote: > Now, looking through the code, I wonder why sysfs_{get,put}_active() > and sysfs_deactivate() don't use a real rwsem. Why go to all the > effort of imitating one? Is it just to save space? Hmmm... maybe there was something which prevented that or maybe I was just being stupid. I don't really remember. Space is a fairly important consideration too. Depending on configuration, there can be a LOT of sysfs_dirents and memory consumption from sysfs has been a real problem. Thanks. -- tejun ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2013-01-14 15:12 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-04 16:52 Sysfs attributes racing with unregistration Alan Stern 2012-01-04 17:18 ` Tejun Heo 2012-01-04 18:13 ` Eric W. Biederman 2012-01-04 19:41 ` Alan Stern 2012-01-05 3:07 ` Eric W. Biederman 2012-01-05 15:13 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern 2012-01-05 15:32 ` Tejun Heo 2012-01-05 16:03 ` Eric W. Biederman 2012-01-05 16:44 ` Tejun Heo 2012-01-05 16:47 ` Alan Stern 2012-01-05 17:11 ` Tejun Heo 2012-01-05 18:27 ` Ted Ts'o 2012-01-05 18:36 ` Tejun Heo 2012-01-05 19:28 ` Ted Ts'o 2012-01-05 20:52 ` Tejun Heo 2012-01-06 6:25 ` Alexander E. Patrakov 2012-01-07 21:01 ` Revoking filesystems [was Re: Sysfs attributes racing withunregistration] Milton Miller 2012-01-05 20:43 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Eric W. Biederman 2012-01-05 20:55 ` Tejun Heo 2012-01-05 18:38 ` Christoph Hellwig 2012-01-05 15:52 ` Eric W. Biederman 2013-01-14 15:11 ` watchdog code anish kumar 2012-01-05 18:18 ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Greg KH 2012-01-04 18:13 ` Sysfs attributes racing with unregistration Alan Stern 2012-01-04 18:20 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).