public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Roland Dreier <roland@topspin.com>
To: "Nguyen, Tom L" <tom.l.nguyen@intel.com>
Cc: "long" <tlnguyen@snoqualmie.dp.intel.com>, <ak@muc.de>,
	<akpm@osdl.org>, <greg@kroah.com>, <jgarzik@pobox.com>,
	<linux-kernel@vger.kernel.org>, <zwane@linuxpower.ca>,
	<eli@mellanox.co.il>
Subject: Re: [PATCH]2.6.7 MSI-X Update
Date: Wed, 23 Jun 2004 18:42:57 -0700	[thread overview]
Message-ID: <52y8mdd93y.fsf@topspin.com> (raw)
In-Reply-To: <C7AB9DA4D0B1F344BF2489FA165E502405843FF0@orsmsx404.amr.corp.intel.com> (Tom L. Nguyen's message of "Wed, 23 Jun 2004 15:03:19 -0700")

    Long> It was my initial thought. However, below two reasons
    Long> convince me that the patch should enforce the rules to
    Long> ensure that the software device driver behaves properly.

    Long> -The software driver is responsible for asking how many
    Long> vectors it actually needs to serivce its hardward needs. Why
    Long> does it asks for more than what it actually uses?

I could imagine hardware where the driver does not know exactly how
many vectors it will use until it starts up.  As a hypothetical
example, imagine some storage networking host adapter that supports an
interrupt vector per storage target.  The driver does not know how
many vectors it will actually use until it has logged into the storage
fabric; in fact, the driver may want to keep some vectors "in reserve"
in case a new target is added to the fabric later.

I think it would be better to preserve maximum flexibility for devices
and drivers, and not mandate that every allocated MSI-X vector is
always used.

    Long> -Assuming I add pci_disable_msix() API, then there are two
    Long> consequences for this approach. It would be a problem for
    Long> the kernel to determine whether the MSI vectors are unhooked
    Long> cleanly from their corresponding driver ISR before freeing
    Long> these MSI vectors for reuse purpose on other devices. If
    Long> there is an error in the driver, it may result an unexpected
    Long> behavior. Second, for example if a driver asks for 10 MSI
    Long> vector for the first time and decides to call
    Long> pci_disable_msix() to free up 5. If the driver switches
    Long> interrupt mode back and forth, the next MSI request by
    Long> calling pci_enable_msix() will result 5 given instead of 10.

It seems in the code right now you are able to tell if any MSI-X
vectors are hooked, since you wait for the last vector to be unhooked
to disable MSI-X.  I would just have it be a WARN_ON() (or maybe
BUG_ON()) if a driver calls pci_disable_msix() without calling
free_irq for all its MSI-X vectors.

Right now there is an issue if a driver is unloaded without freeing
all its IRQs -- the device will be left in MSI-X mode and can not be
recovered without rebooting.

Also, drivers have a problem in their error paths right now with
freeing MSI-X resources.  For example, suppose a driver successfully
requests 4 MSI-X vectors.  request_irq() is a function call that can
fail, for example if the kernel can't allocate memory.  What should
the driver do if its second (out of 4) request_irq() call fails?
There doesn't seem to be any way for it to proceed without leaking
MSI-X resources.

Similarly, with the API as it stands in your patch, a driver must be
very careful not to take any action that may fail in between calling
pci_enable_msix() and actually calling request_irq(), or otherwise the
only way to avoid leaking MSI-X resources is to take the very risky
step of calling request_irq() on an error path.  This doesn't fit very
well with the structure of lots of device drivers, for example Intel's
very own e1000 driver, which wait until the device is actually opened
to call request_irq().

For your second point, I would have pci_disable_msix() always free all
MSI-X vectors that have been allocated... the only parameter that I
expect it would take is a struct pci_dev *.

 - Roland

  reply	other threads:[~2004-06-24  1:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-23 22:03 [PATCH]2.6.7 MSI-X Update Nguyen, Tom L
2004-06-24  1:42 ` Roland Dreier [this message]
2004-06-24  6:37   ` Zwane Mwaikambo
2004-06-24  7:27     ` Roland Dreier
  -- strict thread matches above, loose matches on Subject: below --
2004-07-19 15:36 Nguyen, Tom L
2004-07-15 15:46 Nguyen, Tom L
2004-07-13 22:02 long
2004-07-15  2:14 ` [PATCH]2.6.7 " Roland Dreier
2004-07-18  1:25 ` Roland Dreier
2004-06-26  1:21 long
2004-06-26  0:30 ` [PATCH]2.6.7 " Roland Dreier
2004-06-26  1:38 ` Roland Dreier
2004-06-26  8:27   ` Christoph Hellwig
2004-06-26 17:30     ` Roland Dreier
2004-06-24 16:29 Nguyen, Tom L
2004-06-23 16:58 Nguyen, Tom L
2004-06-23 16:49 Nguyen, Tom L
2004-06-22 23:06 Nguyen, Tom L
2004-06-22 21:48 long
2004-06-22 23:57 ` Roland Dreier
2004-06-23  0:26   ` Jeff Garzik
2004-06-23  1:18     ` Roland Dreier
2004-06-23  3:45 ` Roland Dreier
2004-06-23 16:50 ` Roland Dreier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52y8mdd93y.fsf@topspin.com \
    --to=roland@topspin.com \
    --cc=ak@muc.de \
    --cc=akpm@osdl.org \
    --cc=eli@mellanox.co.il \
    --cc=greg@kroah.com \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tlnguyen@snoqualmie.dp.intel.com \
    --cc=tom.l.nguyen@intel.com \
    --cc=zwane@linuxpower.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox