* [PATCH] devfs v196 available @ 2001-11-03 17:47 Richard Gooch 2001-11-16 22:56 ` Roman Zippel 0 siblings, 1 reply; 9+ messages in thread From: Richard Gooch @ 2001-11-03 17:47 UTC (permalink / raw) To: linux-kernel, devfs-announce-list Hi, all. Version 196 of my devfs patch is now available from: http://www.atnf.csiro.au/~rgooch/linux/kernel-patches.html The devfs FAQ is also available here. Patch directly available from: ftp://ftp.??.kernel.org/pub/linux/kernel/people/rgooch/v2.4/devfs-patch-current.gz AND: ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.4/devfs-patch-current.gz This is against 2.4.14-pre7. Highlights of this release: - Fixed race in <devfsd_ioctl> when setting event mask Thanks to Kari Hurtta <hurtta@leija.mh.fmi.fi> - Avoid deadlock in <devfs_follow_link> by using temporary buffer Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] devfs v196 available 2001-11-03 17:47 [PATCH] devfs v196 available Richard Gooch @ 2001-11-16 22:56 ` Roman Zippel 2001-11-17 21:16 ` Richard Gooch 2001-11-18 17:33 ` Tony Hoyle 0 siblings, 2 replies; 9+ messages in thread From: Roman Zippel @ 2001-11-16 22:56 UTC (permalink / raw) To: Richard Gooch; +Cc: linux-kernel Hi, On Sat, 3 Nov 2001, Richard Gooch wrote: > Hi, all. Version 196 of my devfs patch is now available from: Some small comments: - You should consider to integrate devfs into the dcache (check e.g. ramfs), currently you duplicate lots of infrastructure, which you get for free (and faster) from the dcache. It would save you also lots of locking. - you should delay the path generation in try_modload to the read function, then you're not limited here in the pathname length and don't have to abuse the stack. - you should use "%.*s" if you want to print a dentry name (no need to copy it). - you should do something about the recursive calls, it's an invitation to abuse them. - symlink/slave handling of tapes/disk/cdroms is maybe better done in userspace. - uid/gid in devfsd_buf_entry is 16 bit - devfs is not a database! E.g. devfs has no business to store the char/block device ops table. So devfs_get_ops is wrong here, the same for devfs_[gs]et_info. devfs has to stay optional and storing/retrieving certain device data should not be done in different ways, this is only asking for trouble because of subtle differences. The problem so far is that we have no [bc]dev_t, where this info should be stored, but this hopefully changes early 2.5. So the path to get e.g. to the ops table should be "kdev_t -> [bc]dev_t -> ops" and not "kdev_t -> search whole devfs tree -> no ops" (because you missed manual dev nodes). - the simple event mechanism looks prone to DOS attacks (even if all races are gone). Events are too easily delayed or even dropped. This makes the events unreliable and unsuitable for any serious use. - in _devfs_make_parent_for_leaf you shouldn't simply return if _devfs_append_entry fails, because someone else might have created the directory since _devfs_descend. Especially the first point is very important, this was even suggested by Linus already more than 3 years ago. devfs could be a very thin layer, but right now it's far bigger than it had to be. bye, Roman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] devfs v196 available 2001-11-16 22:56 ` Roman Zippel @ 2001-11-17 21:16 ` Richard Gooch 2001-11-18 12:00 ` Xavier Bestel 2001-11-18 16:00 ` Roman Zippel 2001-11-18 17:33 ` Tony Hoyle 1 sibling, 2 replies; 9+ messages in thread From: Richard Gooch @ 2001-11-17 21:16 UTC (permalink / raw) To: Roman Zippel; +Cc: linux-kernel Roman Zippel writes: > On Sat, 3 Nov 2001, Richard Gooch wrote: > > > Hi, all. Version 196 of my devfs patch is now available from: > > Some small comments: > - You should consider to integrate devfs into the dcache (check e.g. > ramfs), currently you duplicate lots of infrastructure, which you get > for free (and faster) from the dcache. It would save you also lots of > locking. That's something that I've already said I'm planning, but that's a 2.5 thing, and I'm waiting for the dcache to be split from the icache. Even if the current, horribly bloated, struct inode is trimmed down by removing that union, it will still be too big for my liking. Hence I want the dcache/icache split. > - you should delay the path generation in try_modload to the read > function, then you're not limited here in the pathname length and don't > have to abuse the stack. Yep. Now that I've got the refcounting, I can do the thing I'm doing with other events (a for loop to increment the refcounts before returning) for the lookup event as well. In fact, it will allow me to clean up the devfsd_read() code as well (I never liked that test for the event type). > - you should use "%.*s" if you want to print a dentry name (no need to > copy it). Yep, good idea. > - you should do something about the recursive calls, it's an > invitation to abuse them. That's why I've tagged them as such. I've tried to keep their stack usage low. Switching to a non-recursive algorithm has it's own problems: it's a lot harder to understand, and it's much harder to get right in the first place. Non-recursive algorithms are more fragile. > - symlink/slave handling of tapes/disk/cdroms is maybe better done > in userspace. No, because you want to be able to mount your root FS using "root=/dev/cdroms/cdrom0", for example. > - uid/gid in devfsd_buf_entry is 16 bit Actually, it's uid_t and gid_t. Just like struct inode uses. When these are increased to 32 bits, devfs will automatically be 32 bits for these as well. > - devfs is not a database! E.g. devfs has no business to store the > char/block device ops table. So devfs_get_ops is wrong here, the > same for devfs_[gs]et_info. devfs has to stay optional and > storing/retrieving certain device data should not be done in > different ways, this is only asking for trouble because of subtle > differences. The problem so far is that we have no [bc]dev_t, > where this info should be stored, but this hopefully changes early > 2.5. So the path to get e.g. to the ops table should be "kdev_t -> > [bc]dev_t -> ops" and not "kdev_t -> search whole devfs tree -> no > ops" (because you missed manual dev nodes). Again, you're talking about 2.5 stuff. If and when we get a decent block device structure, I'll look at using that. It's questionable whether we'll ever get a generic char device structure, since they're all so different. So for char devices, the ops pointer should be kept inside the devfs entry. As for devfs_get_ops(), that has to stay until you can change over all the SGI IA64 drivers, which (ab)use it heavily. API change-> 2.5. > - the simple event mechanism looks prone to DOS attacks (even if all > races are gone). Events are too easily delayed or even > dropped. This makes the events unreliable and unsuitable for any > serious use. I know about the dropped events: people with some ISDN cards get them, as did I when I was testing my sd-many patch. That's been on my list of things to fix, once the locking and refcounting was working. It was planned to be a v1.1 fix:-) > - in _devfs_make_parent_for_leaf you shouldn't simply return if > _devfs_append_entry fails, because someone else might have created > the directory since _devfs_descend. Yes, I know. That too was planned to be a v1.1 improvement. > Especially the first point is very important, this was even > suggested by Linus already more than 3 years ago. devfs could be a > very thin layer, but right now it's far bigger than it had to be. Yeah, on more than one occasion I've stated that I'd love to rip out all the tree management code in devfs. I'm waiting for VFS changes to make that palatable. Don't think for a minute that v1.x (where x is small) is my final vision of devfs. v1.x was supposed to be the race-free version which did it's own tree management (v0.x being the racy, development version prior to v1.0). And v2.0 is supposed to be the lightweight-let-the-dcache-do-the-work version. I hope that 2.5 will see further VFS improvements. Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] devfs v196 available 2001-11-17 21:16 ` Richard Gooch @ 2001-11-18 12:00 ` Xavier Bestel 2001-11-18 16:00 ` Roman Zippel 1 sibling, 0 replies; 9+ messages in thread From: Xavier Bestel @ 2001-11-18 12:00 UTC (permalink / raw) To: Richard Gooch; +Cc: Roman Zippel, Linux Kernel Mailing List le sam 17-11-2001 à 22:16, Richard Gooch a écrit : > Roman Zippel writes: > > - symlink/slave handling of tapes/disk/cdroms is maybe better done > > in userspace. > > No, because you want to be able to mount your root FS using > "root=/dev/cdroms/cdrom0", for example. "userspace" should probably be the "extended initrd" stuff we'll have in 2.5. Xav ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] devfs v196 available 2001-11-17 21:16 ` Richard Gooch 2001-11-18 12:00 ` Xavier Bestel @ 2001-11-18 16:00 ` Roman Zippel 1 sibling, 0 replies; 9+ messages in thread From: Roman Zippel @ 2001-11-18 16:00 UTC (permalink / raw) To: Richard Gooch; +Cc: linux-kernel Hi, Richard Gooch wrote: > That's something that I've already said I'm planning, but that's a 2.5 > thing, and I'm waiting for the dcache to be split from the icache. > Even if the current, horribly bloated, struct inode is trimmed down by > removing that union, it will still be too big for my liking. Hence I > want the dcache/icache split. Why not do it the other way around? Such a change would require changes to all filesystems. If devfs would already make better use of existing infrastructure, it would be far easier to take the needs of devfs into account, instead of trying to fit it in later. A nice side effect would be people could help with the conversion. > > - you should do something about the recursive calls, it's an > > invitation to abuse them. > > That's why I've tagged them as such. I've tried to keep their stack > usage low. Switching to a non-recursive algorithm has it's own > problems: it's a lot harder to understand, and it's much harder to get > right in the first place. Non-recursive algorithms are more fragile. _devfs_walk_path: There is already nicely working code in fs/namei.c. unregister: you can easily go back with the parent pointer. find_by_dev: this is just wrong (see below) > > - symlink/slave handling of tapes/disk/cdroms is maybe better done > > in userspace. > > No, because you want to be able to mount your root FS using > "root=/dev/cdroms/cdrom0", for example. That's about the only use of it at boot time and can be achieved easier. > Again, you're talking about 2.5 stuff. If and when we get a decent > block device structure, I'll look at using that. The correct interface in 2.4 is get_blkfops. devfs has to use it like everyone else. > It's questionable > whether we'll ever get a generic char device structure, since they're > all so different. So for char devices, the ops pointer should be kept > inside the devfs entry. No, find_by_dev is just wrong, scanning a whole tree for a specific information is broken. Timing is absolutely unpredictable and prone to abuse. The current implementation can even return wrong information. The correct place to store char device info is fs/char_dev.c. > As for devfs_get_ops(), that has to stay until you can change over all > the SGI IA64 drivers, which (ab)use it heavily. API change-> 2.5. If it depends on devfs, it's simply broken and must be fixed, that has nothing to do with the API. > Yeah, on more than one occasion I've stated that I'd love to rip out > all the tree management code in devfs. I'm waiting for VFS changes to > make that palatable. Do the changes now and you can push the responsibility to VFS, but the last you should do is to maintain bloat. bye, Roman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] devfs v196 available 2001-11-16 22:56 ` Roman Zippel 2001-11-17 21:16 ` Richard Gooch @ 2001-11-18 17:33 ` Tony Hoyle 2001-11-23 4:06 ` Richard Gooch 1 sibling, 1 reply; 9+ messages in thread From: Tony Hoyle @ 2001-11-18 17:33 UTC (permalink / raw) To: linux-kernel On Fri, 16 Nov 2001 22:59:05 +0000, Roman Zippel wrote: > Hi, > > On Sat, 3 Nov 2001, Richard Gooch wrote: > >> Hi, all. Version 196 of my devfs patch is now available from: > I noticed with the 2.4.15pre kernels it doesn't support my LS-120 floppy - there are no devfs calls in ide-floppy.c. How hard would it be to add these, or is it a 2.5 issue? Tony ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] devfs v196 available 2001-11-18 17:33 ` Tony Hoyle @ 2001-11-23 4:06 ` Richard Gooch 0 siblings, 0 replies; 9+ messages in thread From: Richard Gooch @ 2001-11-23 4:06 UTC (permalink / raw) To: Tony Hoyle; +Cc: linux-kernel Tony Hoyle writes: > On Fri, 16 Nov 2001 22:59:05 +0000, Roman Zippel wrote: > > > Hi, > > > > On Sat, 3 Nov 2001, Richard Gooch wrote: > > > >> Hi, all. Version 196 of my devfs patch is now available from: > > > I noticed with the 2.4.15pre kernels it doesn't support my LS-120 > floppy - there are no devfs calls in ide-floppy.c. How hard would > it be to add these, or is it a 2.5 issue? A kind person is shipping me a (partly working) LS-120 drive. I should finally be able to play with this myself. This issue will get resolved in due course. Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <200111201819.fAKIJpp10565@vindaloo.ras.ucalgary.ca>]
* Re: [PATCH] devfs v196 available [not found] <200111201819.fAKIJpp10565@vindaloo.ras.ucalgary.ca> @ 2001-11-21 13:10 ` Roman Zippel 2001-11-21 18:56 ` Richard Gooch 0 siblings, 1 reply; 9+ messages in thread From: Roman Zippel @ 2001-11-21 13:10 UTC (permalink / raw) To: Richard Gooch; +Cc: linux-kernel Hi, On Tue, 20 Nov 2001, Richard Gooch wrote: > Delayed events are harmless, since devfs ensures correct ordering. It's not about ordering, timing is currently unpredictable, anything timing sensitive has to be very careful to touch anything in devfs. > After consideration, I've decided to dynamically grow the event buffer > as required, and free up space as it's not needed. You should use a slab cache and acknowledge events as soon as they are finished. Right now all processes are waiting until the devfsd is completely finished and restarted at the same time. This is currently limited by dropping events, with a dynamic event queue this can become a problem. > Since devfsd has to > wait for a module load to complete, it's not a good idea to block > waiting for free space in the event buffer (potential deadlock). What do you mean? bye, Roman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] devfs v196 available 2001-11-21 13:10 ` Roman Zippel @ 2001-11-21 18:56 ` Richard Gooch 0 siblings, 0 replies; 9+ messages in thread From: Richard Gooch @ 2001-11-21 18:56 UTC (permalink / raw) To: Roman Zippel; +Cc: linux-kernel Roman Zippel writes: > Hi, > > On Tue, 20 Nov 2001, Richard Gooch wrote: > > > Delayed events are harmless, since devfs ensures correct ordering. > > It's not about ordering, timing is currently unpredictable, anything > timing sensitive has to be very careful to touch anything in devfs. Are you talking about user-space or kernel-space? I assume you mean user-space. And it's only a problem if you have module autoloading enabled, *and* you are attempting a lookup of a non-existant inode. I'd argue that if you have applications which are timing sensitive, you shouldn't be autoloading modules anyway, you should have them statically loaded or built into the kernel. Loading a module remains a heavyweight operation. > > After consideration, I've decided to dynamically grow the event buffer > > as required, and free up space as it's not needed. > > You should use a slab cache and acknowledge events as soon as they > are finished. Right now all processes are waiting until the devfsd > is completely finished and restarted at the same time. This is > currently limited by dropping events, with a dynamic event queue > this can become a problem. I'm concerned about the overhead of allocating an entry. With the current scheme, the common case is that the buffer doesn't overflow, and hence it's usually "good enough". In addition, it's very fast, since it takes few instructions to "allocate" an entry (grab a lock, test and update some pointers). If the slab cache is fast enough on allocations (for the common case), then it has clear advantages, since it makes cleaning up very easy, and hence it's easier to wake up waiters as events are processed. Changing to a wake-per-event-processed scheme will require some careful consideration, though, since the current devfs+devfsd behviour ensures there are few surprises (everyone waits until devfsd has finished, so you know the system is in a "finished" state before providing access). I'm not going to be rushed into changing it until I'm satisfied that the new behaviour is reasonable. This is definately not a v1.1 thing. > > Since devfsd has to > > wait for a module load to complete, it's not a good idea to block > > waiting for free space in the event buffer (potential deadlock). > > What do you mean? One thought I had was to have event generators wait if the queue is full, until devfsd consumes (at least) one event. But that's no good if devfsd is responsible for generating those events (i.e. REGISTER events when devsfd loads a module). Since devsfd can't process new events while waiting for the module to finish loading, in effect devfsd would have to wait on itself: deadlock. I tried to come up with ways around this deadlock, but none of them were pretty, so I abandoned that approach (fortunately before cutting any code:-). So a dynamic buffer (possibly a slab cache) is probably the only reasonable solution. Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2001-11-23 4:06 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-03 17:47 [PATCH] devfs v196 available Richard Gooch
2001-11-16 22:56 ` Roman Zippel
2001-11-17 21:16 ` Richard Gooch
2001-11-18 12:00 ` Xavier Bestel
2001-11-18 16:00 ` Roman Zippel
2001-11-18 17:33 ` Tony Hoyle
2001-11-23 4:06 ` Richard Gooch
[not found] <200111201819.fAKIJpp10565@vindaloo.ras.ucalgary.ca>
2001-11-21 13:10 ` Roman Zippel
2001-11-21 18:56 ` Richard Gooch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox