Re: [PATCH] Suspend and Resume Support for Intel IOMMU

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	'iommu@lists.linux-foundation.org',
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Suspend and Resume Support for Intel IOMMU
Date: Wed, 18 Feb 2009 18:44:18 +0100	[thread overview]
Message-ID: <20090218174418.GA13325@elte.hu> (raw)
In-Reply-To: <20090218165633.GA20167@linux-os.sc.intel.com>


* Fenghua Yu <fenghua.yu@intel.com> wrote:

> Current Intel IOMMU does not support suspend and resume. In S3 
> event, kernel crashes when IOMMU is enabled. The attached 
> patch implements the suspend and resume feature for Intel 
> IOMMU. It hooks to kernel suspend and resume interface. When 
> suspend happens, it saves necessary hardware registers. When 
> resume happens it restores the registers and restarts IOMMU.

hm, i think the patch is broken in its current form, and not 
only for the unclean structure reason Linus pointed out:

> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> 
> ---
> 
>  drivers/pci/intel-iommu.c   |  164 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/intel-iommu.h |    1 
>  include/linux/iommu.h       |    3 
>  3 files changed, 168 insertions(+)
> 
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index c842438..83a206e 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -282,6 +282,7 @@ int dmar_disabled = 1;
>  static int __initdata dmar_map_gfx = 1;
>  static int dmar_forcedac;
>  static int intel_iommu_strict;
> +static int vtd_enabled;
>  
>  #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
>  static DEFINE_SPINLOCK(device_domain_lock);
> @@ -2763,6 +2764,7 @@ int __init intel_iommu_init(void)
>  	init_timer(&unmap_timer);
>  	force_iommu = 1;
>  	dma_ops = &intel_dma_ops;
> +	vtd_enabled = 1;
>  
>  	register_iommu(&intel_iommu_ops);
>  
> @@ -3168,3 +3170,165 @@ static void __devinit quirk_cantiga_iommu(struct pci_dev *dev)
>  }
>  
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_rwbf);
> +
> +#ifdef CONFIG_PM
> +static int init_iommu_hw(void)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +	int ret, unit = 0;
> +
> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;
> +
> +		iommu = drhd->iommu;
> +		if (dmar_enable_qi(iommu)) {
> +			/*
> +			 * Queued Invalidate not enabled, use Register Based
> +			 * Invalidate
> +			 */
> +			iommu->flush.flush_context = __iommu_flush_context;
> +			iommu->flush.flush_iotlb = __iommu_flush_iotlb;
> +			printk(KERN_INFO "IOMMU 0x%Lx: using Register based "
> +			       "invalidation\n",
> +			       (unsigned long long)drhd->reg_base_addr);

[ small nit: do we really need a KERN_INFO printout for ever 
  s2ram cycle? ]

> +		} else {
> +			iommu->flush.flush_context = qi_flush_context;
> +			iommu->flush.flush_iotlb = qi_flush_iotlb;
> +			printk(KERN_INFO "IOMMU 0x%Lx: using Queued "
> +			       "invalidation\n",
> +			       (unsigned long long)drhd->reg_base_addr);
> +		}
> +	}
> +
> +	/*
> +	 * for each drhd
> +	 *   enable fault log
> +	 *   global invalidate context cache
> +	 *   global invalidate iotlb
> +	 *   enable translation
> +	 */
> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;
> +		iommu = drhd->iommu;
> +		sprintf(iommu->name, "dmar%d", unit++);
> +
> +		iommu_flush_write_buffer(iommu);
> +
> +		ret = dmar_set_interrupt(iommu);
> +		if (ret)
> +			goto error;
> +

hm, if this fails (due to irq space shortage for example) then 
we do not clean up properly - is that intended? We have all 
units so far enabled and configured and the iommus registered.

In fact, this whole sequence seems buggy, as there's no 
free_irq() pair - nor should there be any.

Using dmar_set_interrupt() is plain buggy here - the irq wont go 
away during a S3 event. So we leak IRQs at every S3 event and 
eventually we'll lock up and run into this sequence, and will 
hang. The point in time when this happens depends on NR_IRQS.

> +		iommu_set_root_entry(iommu);
> +
> +		iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL,
> +					   0);
> +		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH,
> +					 0);
> +		iommu_disable_protect_mem_regions(iommu);
> +
> +	}
> +
> +	return 0;
> +error:
> +	return ret;
> +}

Also, the whole sequence shares a lot of code with init_dmars(). 
Shouldnt there be a common helper function that does the 
initialization?

> +
> +static void iommu_flush_all(void)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +
> +	for_each_drhd_unit(drhd) {
> +		iommu = drhd->iommu;
> +		iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL,
> +					   0);
> +		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH,
> +					 0);
> +	}
> +}

Hm, doesnt this loop body miss a drhd->ignored check?

> +
> +static u32 iommu_state[MAX_IOMMUS][MAX_IOMMU_REGS];
> +static int iommu_suspend(struct sys_device *dev, pm_message_t state)
> +{

[ style nit: please always separate variables from function 
  prototypes, by at least one newline. ]

> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +	int    i = 0;
> +
> +	if (!vtd_enabled)
> +		return 0;
> +
> +	iommu_flush_all();
> +
> +	for_each_drhd_unit(drhd) {
> +		iommu = drhd->iommu;
> +
> +		iommu_state[i][DMAR_FECTL_REG] =
> +			(u32) readl(iommu->reg + DMAR_FECTL_REG);
> +		iommu_state[i][DMAR_FEDATA_REG] =
> +			(u32) readl(iommu->reg + DMAR_FEDATA_REG);
> +		iommu_state[i][DMAR_FEADDR_REG] =
> +			(u32) readl(iommu->reg + DMAR_FEADDR_REG);
> +		iommu_state[i][DMAR_FEUADDR_REG] =
> +			(u32) readl(iommu->reg + DMAR_FEUADDR_REG);
> +		i++;
> +	}
> +	return 0;
> +}

This loop body too appears to miss a drhd->ignored check. Or is 
it not needed for some reason?

If the check is needed here then it would be cleaner if the 
for_each_drhd_unit() iterator automatically included a check for 
drhd->ignored, that way it cannot be missed.

> +static int iommu_resume(struct sys_device *dev)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +	int i = 0;
> +
> +	if (!vtd_enabled)
> +		return 0;
> +
> +	iommu_flush_all();
> +
> +	if (init_iommu_hw())
> +		panic("IOMMU setup failed, DMAR can not start!\n");
> +
> +	for_each_drhd_unit(drhd) {
> +		iommu = drhd->iommu;
> +
> +		writel((u32) iommu_state[i][DMAR_FECTL_REG],
> +			iommu->reg + DMAR_FECTL_REG);
> +		writel((u32) iommu_state[i][DMAR_FEDATA_REG],
> +			iommu->reg + DMAR_FEDATA_REG);
> +		writel((u32) iommu_state[i][DMAR_FEADDR_REG],
> +			iommu->reg + DMAR_FEADDR_REG);
> +		writel((u32) iommu_state[i][DMAR_FEUADDR_REG],
> +			iommu->reg + DMAR_FEUADDR_REG);
> +		iommu_enable_translation(iommu);
> +		i++;
> +	}
> +	return 0;
> +}
> +
> +static struct sysdev_class iommu_sysclass = {
> +	.name		= "iommu",
> +	.resume		= iommu_resume,
> +	.suspend	= iommu_suspend,
> +};
> +
> +static struct sys_device device_iommu = {
> +	.id	= 0,

no need to initialize .id to zero, it's a static variable.

> +	.cls	= &iommu_sysclass,
> +};
> +
> +static int __init init_iommu_sysfs(void)
> +{
> +	int error;
> +
> +	error = sysdev_class_register(&iommu_sysclass);
> +	if (!error)
> +		error = sysdev_register(&device_iommu);
> +	return error;
> +}
> +device_initcall(init_iommu_sysfs);
> +
> +#endif	/* CONFIG_PM */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index c4f6c10..5bc2da7 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -23,6 +23,7 @@
>  #define _INTEL_IOMMU_H_
>  
>  #include <linux/types.h>
> +#include <linux/sysdev.h>
>  #include <linux/iova.h>
>  #include <linux/io.h>
>  #include <linux/dma_remapping.h>

small cleanliness nit: since the intel-iommu.h header clearly 
does not need the sysdev.h include, it would be cleaner to 
include it in intel-iommu.c only.

Header files should not add 'convenience' headers, they should 
strictly only include types they need themselves for their own 
type and method definitions.

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 8a7bfb1..01942ca 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -22,6 +22,9 @@
>  #define IOMMU_READ	(1)
>  #define IOMMU_WRITE	(2)

[ small style nit: the parenthesis looks superfluous. ]

>  
> + #define MAX_IOMMUS	32
> + #define MAX_IOMMU_REGS	0xc0
> +
>  struct device;
>  
>  struct iommu_domain {

	Ingo

     prev parent reply	other threads:[~2009-02-18 17:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-18 16:56 [PATCH] Suspend and Resume Support for Intel IOMMU Fenghua Yu
2009-02-18 17:12 ` Linus Torvalds
2009-02-18 17:19   ` Ingo Molnar
2009-02-18 17:28     ` Yu, Fenghua
2009-03-25 16:06       ` David Woodhouse
2009-02-18 17:44 ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090218174418.GA13325@elte.hu \
    --to=mingo@elte.hu \
    --cc='iommu@lists.linux-foundation.org' \
    --cc=dwmw2@infradead.org \
    --cc=fenghua.yu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sfr@canb.auug.org.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox