From: Jeff Garzik <jgarzik@mandrakesoft.com>
To: Patrick Mochel <mochelp@infinity.powertie.org>
Cc: linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFC] New Driver Model for 2.5
Date: Thu, 18 Oct 2001 02:23:36 -0400 [thread overview]
Message-ID: <3BCE7568.1DAB9FF0@mandrakesoft.com> (raw)
In-Reply-To: <Pine.LNX.4.21.0110171617460.15653-100000@marty.infinity.powertie.org>
Patrick Mochel wrote:
>
> One July afternoon, while hacking on the pm_dev layer for the purpose of
> system-wide power management support, I decided that I was quite tired of
> trying to make this layer look like a tree and feel like a tree, but not
> have any real integration with the actual device drivers..
>
> I had read the accounts of what the goals were for 2.5. And, after some
> conversations with Linus and the (gasp) ACPI guys, I realized that I had a
> good chunk of the infrastructural code written; it was a matter of working
> out a few crucial details and massaging it in nicely.
>
> I have had the chance this week (after moving and vacationing) to update
> the (read: write some) documentation for it. I will not go into details,
> and will let the document speak for itself.
>
> With all luck, this should go into the early stages of 2.5, and allow a
> significant cleanup of many drivers. Such a model will also allow for neat
> tricks like full device power management support, and Plug N Play
> capabilities.
>
> In order to support the new driver model, I have written a small in-memory
> filesystem, called ddfs, to export a unified interface to userland. It is
> mentioned in the doc, and is pretty self-explanatory. More information
> will be available soon.
>
> There is code available for the model and ddfs at:
>
> http://kernel.org/pub/linux/kernel/people/mochel/device/
>
> but there are some fairly large caveats concerning it.
>
> First, I feel comfortable with the device layer code and the ddfs
> code. Though, the PCI code is still work in progress. I am still working
> out some of the finer details concerning it.
>
> Next is the environment under which I developed it all. It was on an ia32
> box, with only PCI support, and using ACPI. The latter didn't have too
> much of an effect on the development, but there are a few items explicitly
> inspired by it..
>
> I am hoping both the PCI code, and the structure and in general can be
> further improved based on the input of the driver maintainers.
>
> This model is not final, and may be way off from what most people actually
> want. It has gotten tentative blessing from all those that have seen it,
> though they number but a few. It's definitely not the only solution...
>
> That said, enjoy; and have at it.
>
> -pat
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The (New) Linux Kernel Driver Model
>
> Version 0.01
>
> 17 October 2001
>
> Overview
> ~~~~~~~~
>
> This driver model is a unification of all the current, disparate driver models
> that are currently in the kernel. It is intended is to augment the
> bus-specific drivers for bridges and devices by consolidating a set of data
> and operations into globally accessible data structures.
>
> Current driver models implement some sort of tree-like structure (sometimes
> just a list) for the devices they control. But, there is no linkage between
> the different bus types.
>
> A common data structure can provide this linkage with little overhead: when a
> bus driver discovers a particular device, it can insert it into the global
> tree as well as its local tree. In fact, the local tree becomes just a subset
> of the global tree.
>
> Common data fields can also be moved out of the local bus models into the
> global model. Some of the manipulation of these fields can also be
> consolidated. Most likely, manipulation functions will become a set
> of helper functions, which the bus drivers wrap around to include any
> bus-specific items.
>
> The common device and bridge interface currently reflects the goals of the
> modern PC: namely the ability to do seamless Plug and Play, power management,
> and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
> us that any device in the system may fit any of these criteria.)
>
> In reality, not every bus will be able to support such operations. But, most
> buses will support a majority of those operations, and all future buses will.
> In other words, a bus that doesn't support an operation is the exception,
> instead of the other way around.
>
> Drivers
> ~~~~~~~
>
> The callbacks for bridges and devices are intended to be singular for a
> particular type of bus. For each type of bus that has support compiled in the
> kernel, there should be one statically allocated structure with the
> appropriate callbacks that each device (or bridge) of that type share.
>
> Each bus layer should implement the callbacks for these drivers. It then
> forwards the calls on to the device-specific callbacks. This means that
> device-specific drivers must still implement callbacks for each operation.
> But, they are not called from the top level driver layer.
>
> This does add another layer of indirection for calling one of these functions,
> but there are benefits that are believed to outweigh this slowdown.
>
> First, it prevents device-specific drivers from having to know about the
> global device layer. This speeds up integration time incredibly. It also
> allows drivers to be more portable across kernel versions. Note that the
> former was intentional, the latter is an added bonus.
>
> Second, this added indirection allows the bus to perform any additional logic
> necessary for its child devices. A bus layer may add additional information to
> the call, or translate it into something meaningful for its children.
>
> This could be done in the driver, but if it happens for every object of a
> particular type, it is best done at a higher level.
>
> Recap
> ~~~~~
>
> Instances of devices and bridges are allocated dynamically as the system
> discovers their existence. Their fields describe the individual object.
> Drivers - in the global sense - are statically allocated and singular for a
> particular type of bus. They describe a set of operations that every type of
> bus could implement, the implementation following the bus's semantics.
>
> Downstream Access
> ~~~~~~~~~~~~~~~~~
>
> Common data fields have been moved out of individual bus layers into a common
> data structure. But, these fields must still be accessed by the bus layers,
> and
> sometimes by the device-specific drivers.
>
> Other bus layers are encouraged to do what has been done for the PCI layer.
> struct pci_dev now looks like this:
>
> struct pci_dev {
> ...
>
> struct device device;
> };
>
> Note first that it is statically allocated. This means only one allocation on
> device discovery. Note also that it is at the _end_ of struct pci_dev. This is
> to make people think about what they're doing when switching between the bus
> driver and the global driver; and to prevent against mindless casts between
> the two.
>
> The PCI bus layer freely accesses the fields of struct device. It knows about
> the structure of struct pci_dev, and it should know the structure of struct
> device. PCI devices that have been converted generally do not touch the fields
> of struct device. More precisely, device-specific drivers should not touch
> fields of struct device unless there is a strong compelling reason to do so.
>
> This abstraction is prevention of unnecessary pain during transitional phases.
> If the name of the field changes or is removed, then every downstream driver
> will break. On the other hand, if only the bus layer (and not the device
> layer) accesses struct device, it is only those that need to change.
>
> User Interface
> ~~~~~~~~~~~~~~
>
> By virtue of having a complete hierarchical view of all the devices in the
> system, exporting a complete hierarchical view to userspace becomes relatively
> easy. Whenever a device is inserted into the tree, a file or directory can be
> created for it.
>
> In this model, a directory is created for each bridge and each device. When it
> is created, it is populated with a set of default files, first at the global
> layer, then at the bus layer. The device layer may then add its own files.
>
> These files export data about the driver and can be used to modify behavior of
> the driver or even device.
>
> For example, at the global layer, a file named 'status' is created for each
> device. When read, it reports to the user the name of the device, its bus ID,
> its current power state, and the name of the driver its using.
>
> By writing to this file, you can have control over the device. By writing
> "suspend 3" to this file, one could place the device into power state "3".
> Basically, by writing to this file, the user has access to the operations
> defined in struct device_driver.
>
> The PCI layer also adds default files. For devices, it adds a "resource" file
> and a "wake" file. The former reports the BAR information for the device; the
> latter reports the wake capabilities of the device.
>
> The device layer could also add files for device-specific data reporting and
> control.
>
> The dentry to the device's directory is kept in struct device. It also keeps a
> linked list of all the files in the directory, with pointers to their read and
> write callbacks. This allows the driver layer to maintain full control of its
> destiny. If it desired to override the default behavior of a file, or simply
> remove it, it could easily do so. (It is assumed that the files added upstream
> will always be a known quantity.)
>
> These features were initially implemented using procfs. However, after one
> conversation with Linus, a new filesystem - ddfs - was created to implement
> these features. It is an in-memory filesystem, based heavily off of ramfs,
> though it uses procfs as inspiration for its callback functionality.
>
> Device Structures
> ~~~~~~~~~~~~~~~~~
>
> struct device {
> struct list_head bus_list;
> struct io_bus *parent;
> struct io_bus *subordinate;
>
> char name[DEVICE_NAME_SIZE];
> char bus_id[BUS_ID_SIZE];
>
> struct dentry *dentry;
> struct list_head files;
>
> struct semaphore lock;
>
> struct device_driver *driver;
> void *driver_data;
> void *platform_data;
>
> u32 current_state;
> unsigned char *saved_state;
> };
>
> bus_list:
> List of all devices on a particular bus; i.e. the device's siblings
>
> parent:
> The parent bridge for the device.
>
> subordinate:
> If the device is a bridge itself, this points to the struct io_bus that is
> created for it.
>
> name:
> Human readable (descriptive) name of device. E.g. "Intel EEPro 100"
>
> bus_id:
> Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function
> 0). It is necessary to have a searchable bus id for each device; making it
> ASCII allows us to use it for its directory name without translating it.
>
> dentry:
> Pointer to driver's ddfs directory.
>
> files:
> Linked list of all the files that a driver has in its ddfs directory.
>
> lock:
> Driver specific lock.
>
> driver:
> Pointer to a struct device_driver, the common operations for each device. See
> next section.
>
> driver_data:
> Private data for the driver.
> Much like the PCI implementation of this field, this allows device-specific
> drivers to keep a pointer to a device-specific data.
>
> platform_data:
> Data that the platform (firmware) provides about the device.
> For example, the ACPI BIOS or EFI may have additional information about the
> device that is not directly mappable to any existing kernel data structure.
> It also allows the platform driver (e.g. ACPI) to a driver without the driver
> having to have explicit knowledge of (atrocities like) ACPI.
>
> current_state:
> Current power state of the device. For PCI and other modern devices, this is
> 0-3, though it's not necessarily limited to those values.
>
> saved_state:
> Pointer to driver-specific set of saved state.
> Having it here allows modules to be unloaded on system suspend and reloaded
> on resume and maintain state across transitions.
> It also allows generic drivers to maintain state across system state
> transitions.
> (I've implemented a generic PCI driver for devices that don't have a
> device-specific driver. Instead of managing some vector of saved state
> for each device the generic driver supports, it can simply store it here.)
>
> struct device_driver {
> int (*probe) (struct device *dev);
> int (*remove) (struct device *dev);
>
> int (*init) (struct device *dev);
> int (*shutdown) (struct device *dev);
>
> int (*save_state) (struct device *dev, u32 state);
> int (*restore_state)(struct device *dev);
>
> int (*suspend) (struct device *dev, u32 state);
> int (*resume) (struct device *dev);
> }
>
> probe:
> Check for device existence and associate driver with it.
>
> remove:
> Dissociate driver with device. Releases device so that it could be used by
> another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an
> ejection event could take place here.
>
> init:
> Initialise the device - allocate resources, irqs, etc.
>
> shutdown:
> "De-initialise" the device - release resources, free memory, etc.
>
> save_state:
> Save current device state before entering suspend state.
>
> restore_state:
> Restore device state, after coming back from suspend state.
>
> suspend:
> Physically enter suspend state.
>
> resume:
> Physically leave suspend state and re-initialise hardware.
>
> Initially, the probe/remove sequence followed the PCI semantics exactly, but
> have since been broken up into a four-stage process: probe(), remove(),
> init(), and shutdown().
>
> While it's not entirely necessary in all environments, breaking them up so
> each routine does only one thing makes sense.
>
> Hot-pluggable devices may also benefit from this model, especially ones that
> can be subjected to suprise removals - only the remove function would be
> called, and the driver could easily know if the there was still hardware there
> to shutdown.
>
> Drivers that are controlling failing, or buggy, hardware, by allowing the user
> to trigger a removal of the driver from userspace, without trying to shutdown
> down the device.
>
> In each case that remove() is called without a shutdown(), it's important to
> note that resources will still need to be freed; it's only the hardware that
> cannot be assumed to be present.
So, remove() might be called without a shutdown(), and then asked to
perform the duties normally performed by shutdown()? That sounds like
API dain bramage. :)
Your proposal sounds ok, my one objection is separating probe/remove
further into init/shutdown. Can you give real-life cases where this
will be useful? I don't see it causing much except headache.
The preferred way of doing things (IMHO) is to do some simply sanity
checking of the h/w device at probe time, and then perform lots of
initialization and such at device/interface open time. You ideally want
a device driver lifecycle to look like
probe:
register interface
sanity check h/w to make sure it's there and alive
stop DMA/interrupts/etc., just in case
start timer to powerdown h/w in N seconds
dev_open:
wake up device, if necessary
init device
dev_close:
stop DMA/interrupts/etc.
start timer to powerdown h/w in N seconds
With that in mind, init -really- happens at device open, and in
additional is driven more through normal user interaction via standard
APIs, than the PCI and PM subsystems.
--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the dash
MandrakeSoft | and shrieking like a cheerleader." -Max
next prev parent reply other threads:[~2001-10-18 6:23 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-10-17 23:52 [RFC] New Driver Model for 2.5 Patrick Mochel
2001-10-18 6:23 ` Jeff Garzik [this message]
2001-10-18 12:13 ` Benjamin Herrenschmidt
2001-10-18 16:19 ` Patrick Mochel
2001-10-18 17:38 ` Tim Jansen
2001-10-18 22:06 ` Benjamin Herrenschmidt
2001-10-19 17:09 ` Kai Henningsen
2001-10-18 22:10 ` Kai Henningsen
2001-10-19 18:26 ` Patrick Mochel
2001-10-19 19:02 ` Tim Jansen
2001-10-19 19:21 ` Mike Fedyk
2001-10-19 20:07 ` Tim Jansen
2001-10-19 20:24 ` Mike Fedyk
2001-10-19 22:25 ` Tim Jansen
2001-10-20 13:47 ` Kai Henningsen
2001-10-20 1:41 ` john slee
2001-10-20 13:52 ` Kai Henningsen
2001-10-22 11:02 ` Padraig Brady
2001-10-27 11:01 ` Kai Henningsen
2001-10-19 7:57 ` Henning P. Schmiedehausen
2001-10-19 8:09 ` Jeff Garzik
2001-10-19 8:31 ` Keith Owens
2001-10-19 8:43 ` Jeff Garzik
2001-10-19 18:50 ` Tim Jansen
2001-10-19 15:21 ` Taral
2001-10-19 23:30 ` Benjamin Herrenschmidt
2001-10-19 23:54 ` Benjamin Herrenschmidt
2001-10-18 15:17 ` Patrick Mochel
2001-10-18 16:08 ` Taral
2001-10-18 16:52 ` Jonathan Lundell
2001-10-18 17:38 ` Patrick Mochel
2001-10-18 17:41 ` Patrick Mochel
2001-10-18 18:28 ` Jonathan Lundell
2001-10-18 19:49 ` Patrick Mochel
2001-10-18 20:40 ` Jeff Garzik
2001-10-18 21:32 ` John Alvord
2001-10-18 22:23 ` Benjamin Herrenschmidt
2001-10-18 22:26 ` Jeff Garzik
2001-10-18 22:18 ` Benjamin Herrenschmidt
2001-10-18 23:30 ` Patrick Mochel
2001-10-18 23:44 ` Benjamin Herrenschmidt
2001-10-18 23:52 ` Jeff Garzik
[not found] ` <3BCF3941.D4B79FE1@mandrakesoft.com>
2001-10-19 17:12 ` Jonathan Lundell
2001-10-18 17:05 ` Jonathan Corbet
2001-10-18 17:33 ` Patrick Mochel
-- strict thread matches above, loose matches on Subject: below --
2001-10-19 17:01 Kevin Easton
2001-10-19 18:40 ` Patrick Mochel
2001-10-19 21:43 Grover, Andrew
2001-10-19 23:33 Benjamin Herrenschmidt
2001-10-20 0:09 ` Linus Torvalds
2001-10-20 9:28 ` Benjamin Herrenschmidt
2001-10-21 17:09 ` Pavel Machek
2001-10-23 0:19 ` Patrick Mochel
2001-10-23 0:31 ` Alan Cox
2001-10-23 0:29 ` Patrick Mochel
2001-10-23 7:53 ` Alan Cox
2001-10-23 15:10 ` Jonathan Lundell
2001-10-23 15:49 ` Alan Cox
2001-10-23 20:22 ` Benjamin Herrenschmidt
2001-10-23 20:54 ` Alan Cox
2001-10-24 0:26 ` Benjamin Herrenschmidt
2001-10-24 9:57 ` Alan Cox
2001-10-24 10:34 ` Benjamin Herrenschmidt
2001-10-24 10:54 ` Alan Cox
2001-10-24 13:04 ` Benjamin Herrenschmidt
2001-10-24 13:25 ` Alan Cox
2001-10-24 16:19 ` Linus Torvalds
2001-10-24 16:36 ` Michael H. Warfield
2001-10-24 16:45 ` Linus Torvalds
2001-10-24 22:48 ` Alan Cox
2001-10-24 16:15 ` Linus Torvalds
2001-10-24 16:46 ` Xavier Bestel
2001-10-24 16:54 ` Patrick Mochel
2001-10-24 16:55 ` Linus Torvalds
2001-10-24 22:45 ` Alan Cox
2001-10-24 17:33 ` Benjamin Herrenschmidt
2001-10-24 22:41 ` Alan Cox
2001-10-24 22:41 ` Linus Torvalds
2001-10-25 7:58 ` Benjamin Herrenschmidt
2001-10-25 12:22 ` Alan Cox
2001-10-25 14:57 ` Benjamin Herrenschmidt
2001-10-25 8:03 ` Benjamin Herrenschmidt
2001-10-25 8:09 ` Benjamin Herrenschmidt
2001-10-25 12:20 ` Alan Cox
2001-10-25 21:47 ` Pavel Machek
2001-10-24 22:50 ` Alan Cox
2001-10-25 4:14 ` Linus Torvalds
2001-10-25 12:42 ` Alan Cox
2001-10-25 21:52 ` Xavier Bestel
2001-10-25 23:53 ` Benjamin Herrenschmidt
2001-10-25 23:53 ` Alan Cox
2001-10-26 11:35 ` Helge Hafting
2001-10-26 12:38 ` Alan Cox
2001-10-25 8:27 ` Rob Turk
2001-10-25 10:01 ` Benjamin Herrenschmidt
2001-10-25 10:02 ` Helge Hafting
2001-10-25 14:20 ` Victor Yodaiken
2001-10-25 14:44 ` Jeff Garzik
2001-10-25 14:45 ` Jeff Garzik
2001-10-25 15:22 ` Rob Turk
2001-10-25 15:44 ` Jonathan Lundell
2001-10-25 16:26 ` David Lang
2001-10-25 21:59 ` Pavel Machek
2001-10-25 21:32 ` Rob Turk
2001-10-24 17:01 ` Mike Anderson
2001-10-25 9:02 ` Eric W. Biederman
2001-10-25 9:29 ` Linus Torvalds
2001-10-25 9:47 ` Benjamin Herrenschmidt
2001-10-25 10:11 ` Eric W. Biederman
2001-10-25 10:59 ` Linus Torvalds
2001-10-24 15:18 ` Jonathan Lundell
2001-10-24 15:41 ` Linus Torvalds
2001-10-24 15:59 ` Alan Cox
2001-10-24 15:56 ` Linus Torvalds
2001-10-23 9:44 ` Pavel Machek
2001-10-23 11:03 ` Benjamin Herrenschmidt
2001-10-23 11:49 ` Benjamin Herrenschmidt
2001-10-23 10:54 ` Benjamin Herrenschmidt
2001-10-24 17:56 Grover, Andrew
2001-10-24 18:45 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3BCE7568.1DAB9FF0@mandrakesoft.com \
--to=jgarzik@mandrakesoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mochelp@infinity.powertie.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.