From: Jeff Garzik <jgarzik@mandrakesoft.com>
To: Patrick Mochel <mochelp@infinity.powertie.org>
Cc: linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFC] New Driver Model for 2.5
Date: Thu, 18 Oct 2001 02:23:36 -0400 [thread overview]
Message-ID: <3BCE7568.1DAB9FF0@mandrakesoft.com> (raw)
In-Reply-To: <Pine.LNX.4.21.0110171617460.15653-100000@marty.infinity.powertie.org>
Patrick Mochel wrote:
>
> One July afternoon, while hacking on the pm_dev layer for the purpose of
> system-wide power management support, I decided that I was quite tired of
> trying to make this layer look like a tree and feel like a tree, but not
> have any real integration with the actual device drivers..
>
> I had read the accounts of what the goals were for 2.5. And, after some
> conversations with Linus and the (gasp) ACPI guys, I realized that I had a
> good chunk of the infrastructural code written; it was a matter of working
> out a few crucial details and massaging it in nicely.
>
> I have had the chance this week (after moving and vacationing) to update
> the (read: write some) documentation for it. I will not go into details,
> and will let the document speak for itself.
>
> With all luck, this should go into the early stages of 2.5, and allow a
> significant cleanup of many drivers. Such a model will also allow for neat
> tricks like full device power management support, and Plug N Play
> capabilities.
>
> In order to support the new driver model, I have written a small in-memory
> filesystem, called ddfs, to export a unified interface to userland. It is
> mentioned in the doc, and is pretty self-explanatory. More information
> will be available soon.
>
> There is code available for the model and ddfs at:
>
> http://kernel.org/pub/linux/kernel/people/mochel/device/
>
> but there are some fairly large caveats concerning it.
>
> First, I feel comfortable with the device layer code and the ddfs
> code. Though, the PCI code is still work in progress. I am still working
> out some of the finer details concerning it.
>
> Next is the environment under which I developed it all. It was on an ia32
> box, with only PCI support, and using ACPI. The latter didn't have too
> much of an effect on the development, but there are a few items explicitly
> inspired by it..
>
> I am hoping both the PCI code, and the structure and in general can be
> further improved based on the input of the driver maintainers.
>
> This model is not final, and may be way off from what most people actually
> want. It has gotten tentative blessing from all those that have seen it,
> though they number but a few. It's definitely not the only solution...
>
> That said, enjoy; and have at it.
>
> -pat
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The (New) Linux Kernel Driver Model
>
> Version 0.01
>
> 17 October 2001
>
> Overview
> ~~~~~~~~
>
> This driver model is a unification of all the current, disparate driver models
> that are currently in the kernel. It is intended is to augment the
> bus-specific drivers for bridges and devices by consolidating a set of data
> and operations into globally accessible data structures.
>
> Current driver models implement some sort of tree-like structure (sometimes
> just a list) for the devices they control. But, there is no linkage between
> the different bus types.
>
> A common data structure can provide this linkage with little overhead: when a
> bus driver discovers a particular device, it can insert it into the global
> tree as well as its local tree. In fact, the local tree becomes just a subset
> of the global tree.
>
> Common data fields can also be moved out of the local bus models into the
> global model. Some of the manipulation of these fields can also be
> consolidated. Most likely, manipulation functions will become a set
> of helper functions, which the bus drivers wrap around to include any
> bus-specific items.
>
> The common device and bridge interface currently reflects the goals of the
> modern PC: namely the ability to do seamless Plug and Play, power management,
> and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
> us that any device in the system may fit any of these criteria.)
>
> In reality, not every bus will be able to support such operations. But, most
> buses will support a majority of those operations, and all future buses will.
> In other words, a bus that doesn't support an operation is the exception,
> instead of the other way around.
>
> Drivers
> ~~~~~~~
>
> The callbacks for bridges and devices are intended to be singular for a
> particular type of bus. For each type of bus that has support compiled in the
> kernel, there should be one statically allocated structure with the
> appropriate callbacks that each device (or bridge) of that type share.
>
> Each bus layer should implement the callbacks for these drivers. It then
> forwards the calls on to the device-specific callbacks. This means that
> device-specific drivers must still implement callbacks for each operation.
> But, they are not called from the top level driver layer.
>
> This does add another layer of indirection for calling one of these functions,
> but there are benefits that are believed to outweigh this slowdown.
>
> First, it prevents device-specific drivers from having to know about the
> global device layer. This speeds up integration time incredibly. It also
> allows drivers to be more portable across kernel versions. Note that the
> former was intentional, the latter is an added bonus.
>
> Second, this added indirection allows the bus to perform any additional logic
> necessary for its child devices. A bus layer may add additional information to
> the call, or translate it into something meaningful for its children.
>
> This could be done in the driver, but if it happens for every object of a
> particular type, it is best done at a higher level.
>
> Recap
> ~~~~~
>
> Instances of devices and bridges are allocated dynamically as the system
> discovers their existence. Their fields describe the individual object.
> Drivers - in the global sense - are statically allocated and singular for a
> particular type of bus. They describe a set of operations that every type of
> bus could implement, the implementation following the bus's semantics.
>
> Downstream Access
> ~~~~~~~~~~~~~~~~~
>
> Common data fields have been moved out of individual bus layers into a common
> data structure. But, these fields must still be accessed by the bus layers,
> and
> sometimes by the device-specific drivers.
>
> Other bus layers are encouraged to do what has been done for the PCI layer.
> struct pci_dev now looks like this:
>
> struct pci_dev {
> ...
>
> struct device device;
> };
>
> Note first that it is statically allocated. This means only one allocation on
> device discovery. Note also that it is at the _end_ of struct pci_dev. This is
> to make people think about what they're doing when switching between the bus
> driver and the global driver; and to prevent against mindless casts between
> the two.
>
> The PCI bus layer freely accesses the fields of struct device. It knows about
> the structure of struct pci_dev, and it should know the structure of struct
> device. PCI devices that have been converted generally do not touch the fields
> of struct device. More precisely, device-specific drivers should not touch
> fields of struct device unless there is a strong compelling reason to do so.
>
> This abstraction is prevention of unnecessary pain during transitional phases.
> If the name of the field changes or is removed, then every downstream driver
> will break. On the other hand, if only the bus layer (and not the device
> layer) accesses struct device, it is only those that need to change.
>
> User Interface
> ~~~~~~~~~~~~~~
>
> By virtue of having a complete hierarchical view of all the devices in the
> system, exporting a complete hierarchical view to userspace becomes relatively
> easy. Whenever a device is inserted into the tree, a file or directory can be
> created for it.
>
> In this model, a directory is created for each bridge and each device. When it
> is created, it is populated with a set of default files, first at the global
> layer, then at the bus layer. The device layer may then add its own files.
>
> These files export data about the driver and can be used to modify behavior of
> the driver or even device.
>
> For example, at the global layer, a file named 'status' is created for each
> device. When read, it reports to the user the name of the device, its bus ID,
> its current power state, and the name of the driver its using.
>
> By writing to this file, you can have control over the device. By writing
> "suspend 3" to this file, one could place the device into power state "3".
> Basically, by writing to this file, the user has access to the operations
> defined in struct device_driver.
>
> The PCI layer also adds default files. For devices, it adds a "resource" file
> and a "wake" file. The former reports the BAR information for the device; the
> latter reports the wake capabilities of the device.
>
> The device layer could also add files for device-specific data reporting and
> control.
>
> The dentry to the device's directory is kept in struct device. It also keeps a
> linked list of all the files in the directory, with pointers to their read and
> write callbacks. This allows the driver layer to maintain full control of its
> destiny. If it desired to override the default behavior of a file, or simply
> remove it, it could easily do so. (It is assumed that the files added upstream
> will always be a known quantity.)
>
> These features were initially implemented using procfs. However, after one
> conversation with Linus, a new filesystem - ddfs - was created to implement
> these features. It is an in-memory filesystem, based heavily off of ramfs,
> though it uses procfs as inspiration for its callback functionality.
>
> Device Structures
> ~~~~~~~~~~~~~~~~~
>
> struct device {
> struct list_head bus_list;
> struct io_bus *parent;
> struct io_bus *subordinate;
>
> char name[DEVICE_NAME_SIZE];
> char bus_id[BUS_ID_SIZE];
>
> struct dentry *dentry;
> struct list_head files;
>
> struct semaphore lock;
>
> struct device_driver *driver;
> void *driver_data;
> void *platform_data;
>
> u32 current_state;
> unsigned char *saved_state;
> };
>
> bus_list:
> List of all devices on a particular bus; i.e. the device's siblings
>
> parent:
> The parent bridge for the device.
>
> subordinate:
> If the device is a bridge itself, this points to the struct io_bus that is
> created for it.
>
> name:
> Human readable (descriptive) name of device. E.g. "Intel EEPro 100"
>
> bus_id:
> Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function
> 0). It is necessary to have a searchable bus id for each device; making it
> ASCII allows us to use it for its directory name without translating it.
>
> dentry:
> Pointer to driver's ddfs directory.
>
> files:
> Linked list of all the files that a driver has in its ddfs directory.
>
> lock:
> Driver specific lock.
>
> driver:
> Pointer to a struct device_driver, the common operations for each device. See
> next section.
>
> driver_data:
> Private data for the driver.
> Much like the PCI implementation of this field, this allows device-specific
> drivers to keep a pointer to a device-specific data.
>
> platform_data:
> Data that the platform (firmware) provides about the device.
> For example, the ACPI BIOS or EFI may have additional information about the
> device that is not directly mappable to any existing kernel data structure.
> It also allows the platform driver (e.g. ACPI) to a driver without the driver
> having to have explicit knowledge of (atrocities like) ACPI.
>
> current_state:
> Current power state of the device. For PCI and other modern devices, this is
> 0-3, though it's not necessarily limited to those values.
>
> saved_state:
> Pointer to driver-specific set of saved state.
> Having it here allows modules to be unloaded on system suspend and reloaded
> on resume and maintain state across transitions.
> It also allows generic drivers to maintain state across system state
> transitions.
> (I've implemented a generic PCI driver for devices that don't have a
> device-specific driver. Instead of managing some vector of saved state
> for each device the generic driver supports, it can simply store it here.)
>
> struct device_driver {
> int (*probe) (struct device *dev);
> int (*remove) (struct device *dev);
>
> int (*init) (struct device *dev);
> int (*shutdown) (struct device *dev);
>
> int (*save_state) (struct device *dev, u32 state);
> int (*restore_state)(struct device *dev);
>
> int (*suspend) (struct device *dev, u32 state);
> int (*resume) (struct device *dev);
> }
>
> probe:
> Check for device existence and associate driver with it.
>
> remove:
> Dissociate driver with device. Releases device so that it could be used by
> another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an
> ejection event could take place here.
>
> init:
> Initialise the device - allocate resources, irqs, etc.
>
> shutdown:
> "De-initialise" the device - release resources, free memory, etc.
>
> save_state:
> Save current device state before entering suspend state.
>
> restore_state:
> Restore device state, after coming back from suspend state.
>
> suspend:
> Physically enter suspend state.
>
> resume:
> Physically leave suspend state and re-initialise hardware.
>
> Initially, the probe/remove sequence followed the PCI semantics exactly, but
> have since been broken up into a four-stage process: probe(), remove(),
> init(), and shutdown().
>
> While it's not entirely necessary in all environments, breaking them up so
> each routine does only one thing makes sense.
>
> Hot-pluggable devices may also benefit from this model, especially ones that
> can be subjected to suprise removals - only the remove function would be
> called, and the driver could easily know if the there was still hardware there
> to shutdown.
>
> Drivers that are controlling failing, or buggy, hardware, by allowing the user
> to trigger a removal of the driver from userspace, without trying to shutdown
> down the device.
>
> In each case that remove() is called without a shutdown(), it's important to
> note that resources will still need to be freed; it's only the hardware that
> cannot be assumed to be present.
So, remove() might be called without a shutdown(), and then asked to
perform the duties normally performed by shutdown()? That sounds like
API dain bramage. :)
Your proposal sounds ok, my one objection is separating probe/remove
further into init/shutdown. Can you give real-life cases where this
will be useful? I don't see it causing much except headache.
The preferred way of doing things (IMHO) is to do some simply sanity
checking of the h/w device at probe time, and then perform lots of
initialization and such at device/interface open time. You ideally want
a device driver lifecycle to look like
probe:
register interface
sanity check h/w to make sure it's there and alive
stop DMA/interrupts/etc., just in case
start timer to powerdown h/w in N seconds
dev_open:
wake up device, if necessary
init device
dev_close:
stop DMA/interrupts/etc.
start timer to powerdown h/w in N seconds
With that in mind, init -really- happens at device open, and in
additional is driven more through normal user interaction via standard
APIs, than the PCI and PM subsystems.
--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the dash
MandrakeSoft | and shrieking like a cheerleader." -Max
next prev parent reply other threads:[~2001-10-18 6:23 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-10-17 23:52 [RFC] New Driver Model for 2.5 Patrick Mochel
2001-10-18 6:23 ` Jeff Garzik [this message]
2001-10-18 12:13 ` Benjamin Herrenschmidt
2001-10-18 16:19 ` Patrick Mochel
2001-10-18 17:38 ` Tim Jansen
2001-10-18 22:06 ` Benjamin Herrenschmidt
2001-10-19 17:09 ` Kai Henningsen
2001-10-18 22:10 ` Kai Henningsen
2001-10-19 18:26 ` Patrick Mochel
2001-10-19 19:02 ` Tim Jansen
2001-10-19 19:21 ` Mike Fedyk
2001-10-19 20:07 ` Tim Jansen
2001-10-19 20:24 ` Mike Fedyk
2001-10-19 22:25 ` Tim Jansen
2001-10-20 13:47 ` Kai Henningsen
2001-10-20 1:41 ` john slee
2001-10-20 13:52 ` Kai Henningsen
2001-10-22 11:02 ` Padraig Brady
2001-10-27 11:01 ` Kai Henningsen
2001-10-19 7:57 ` Henning P. Schmiedehausen
2001-10-19 8:09 ` Jeff Garzik
2001-10-19 8:31 ` Keith Owens
2001-10-19 8:43 ` Jeff Garzik
2001-10-19 18:50 ` Tim Jansen
2001-10-19 15:21 ` Taral
2001-10-19 23:30 ` Benjamin Herrenschmidt
2001-10-19 23:54 ` Benjamin Herrenschmidt
2001-10-18 15:17 ` Patrick Mochel
2001-10-18 16:08 ` Taral
2001-10-18 16:52 ` Jonathan Lundell
2001-10-18 17:38 ` Patrick Mochel
2001-10-18 17:41 ` Patrick Mochel
2001-10-18 18:28 ` Jonathan Lundell
2001-10-18 19:49 ` Patrick Mochel
2001-10-18 20:40 ` Jeff Garzik
2001-10-18 21:32 ` John Alvord
2001-10-18 22:23 ` Benjamin Herrenschmidt
2001-10-18 22:26 ` Jeff Garzik
2001-10-18 22:18 ` Benjamin Herrenschmidt
2001-10-18 23:30 ` Patrick Mochel
2001-10-18 23:44 ` Benjamin Herrenschmidt
2001-10-18 23:52 ` Jeff Garzik
[not found] ` <3BCF3941.D4B79FE1@mandrakesoft.com>
2001-10-19 17:12 ` Jonathan Lundell
2001-10-18 17:05 ` Jonathan Corbet
2001-10-18 17:33 ` Patrick Mochel
-- strict thread matches above, loose matches on Subject: below --
2001-10-19 17:01 Kevin Easton
2001-10-19 18:40 ` Patrick Mochel
2001-10-19 21:43 Grover, Andrew
2001-10-19 23:33 Benjamin Herrenschmidt
2001-10-20 0:09 ` Linus Torvalds
2001-10-20 9:28 ` Benjamin Herrenschmidt
2001-10-21 17:09 ` Pavel Machek
2001-10-23 0:19 ` Patrick Mochel
2001-10-23 0:31 ` Alan Cox
2001-10-23 0:29 ` Patrick Mochel
2001-10-23 7:53 ` Alan Cox
2001-10-23 15:10 ` Jonathan Lundell
2001-10-23 15:49 ` Alan Cox
2001-10-23 20:22 ` Benjamin Herrenschmidt
2001-10-23 20:54 ` Alan Cox
2001-10-24 0:26 ` Benjamin Herrenschmidt
2001-10-24 9:57 ` Alan Cox
2001-10-24 10:34 ` Benjamin Herrenschmidt
2001-10-24 10:54 ` Alan Cox
2001-10-24 13:04 ` Benjamin Herrenschmidt
2001-10-24 13:25 ` Alan Cox
2001-10-24 16:19 ` Linus Torvalds
2001-10-24 16:36 ` Michael H. Warfield
2001-10-24 16:45 ` Linus Torvalds
2001-10-24 22:48 ` Alan Cox
2001-10-24 16:15 ` Linus Torvalds
2001-10-24 16:46 ` Xavier Bestel
2001-10-24 16:54 ` Patrick Mochel
2001-10-24 16:55 ` Linus Torvalds
2001-10-24 22:45 ` Alan Cox
2001-10-24 17:33 ` Benjamin Herrenschmidt
2001-10-24 22:41 ` Alan Cox
2001-10-24 22:41 ` Linus Torvalds
2001-10-25 7:58 ` Benjamin Herrenschmidt
2001-10-25 12:22 ` Alan Cox
2001-10-25 14:57 ` Benjamin Herrenschmidt
2001-10-25 8:03 ` Benjamin Herrenschmidt
2001-10-25 8:09 ` Benjamin Herrenschmidt
2001-10-25 12:20 ` Alan Cox
2001-10-25 21:47 ` Pavel Machek
2001-10-24 22:50 ` Alan Cox
2001-10-25 4:14 ` Linus Torvalds
2001-10-25 12:42 ` Alan Cox
2001-10-25 21:52 ` Xavier Bestel
2001-10-25 23:53 ` Benjamin Herrenschmidt
2001-10-25 23:53 ` Alan Cox
2001-10-26 11:35 ` Helge Hafting
2001-10-26 12:38 ` Alan Cox
2001-10-25 8:27 ` Rob Turk
2001-10-25 10:01 ` Benjamin Herrenschmidt
2001-10-25 10:02 ` Helge Hafting
2001-10-25 14:20 ` Victor Yodaiken
2001-10-25 14:44 ` Jeff Garzik
2001-10-25 14:45 ` Jeff Garzik
2001-10-25 15:22 ` Rob Turk
2001-10-25 15:44 ` Jonathan Lundell
2001-10-25 16:26 ` David Lang
2001-10-25 21:59 ` Pavel Machek
2001-10-25 21:32 ` Rob Turk
2001-10-24 17:01 ` Mike Anderson
2001-10-25 9:02 ` Eric W. Biederman
2001-10-25 9:29 ` Linus Torvalds
2001-10-25 9:47 ` Benjamin Herrenschmidt
2001-10-25 10:11 ` Eric W. Biederman
2001-10-25 10:59 ` Linus Torvalds
2001-10-24 15:18 ` Jonathan Lundell
2001-10-24 15:41 ` Linus Torvalds
2001-10-24 15:59 ` Alan Cox
2001-10-24 15:56 ` Linus Torvalds
2001-10-23 9:44 ` Pavel Machek
2001-10-23 11:03 ` Benjamin Herrenschmidt
2001-10-23 11:49 ` Benjamin Herrenschmidt
2001-10-23 10:54 ` Benjamin Herrenschmidt
2001-10-24 17:56 Grover, Andrew
2001-10-24 18:45 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3BCE7568.1DAB9FF0@mandrakesoft.com \
--to=jgarzik@mandrakesoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mochelp@infinity.powertie.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox