public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
       [not found] <1211178689-3507-1-git-send-email-fujita.tomonori@lab.ntt.co.jp>
@ 2008-05-22 10:43 ` Amit Shah
  2008-05-25  7:20   ` Muli Ben-Yehuda
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2008-05-22 10:43 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: linux-kernel, akpm, muli, alexisb, andi, kvm, avi

On Monday 19 May 2008 12:01:27 FUJITA Tomonori wrote:
> This is an updated version of the patchset to add per-device
> dma_mapping_ops support for CONFIG_X86_64 like POWER architecture
> does:
>
> http://lkml.org/lkml/2008/5/13/36
>
> This is against 2.6.26-rc2-mm1 (the changes since the v1 are pretty
> trivial like dropping the change for v850 arch).
>
> This enables us to cleanly fix the Calgary IOMMU issue that some
> devices are not behind the IOMMU [1].
>
> I think that per-device dma_mapping_ops support would be also helpful
> for KVM people to support PCI passthrough but Andi thinks that this
> makes it difficult to support the PCI passthrough (see the above
> thread). So I CC'ed this to KVM camp. Comments are appreciated.

I think what's more useful is a chain with a properly defined order or 
hierarchy (based on what Muli suggested last time we discussed this

http://lkml.org/lkml/2007/11/12/44 )

The suggested order was (in calling order):
pvdma->hardare->nommu/swiotlb

The discussion in the thread pointed to above has details as to why.

> A pointer to dma_mapping_ops to struct dev_archdata is added. If the
> pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If
> it's NULL, the system-wide dma_ops pointer is used as before.


> If it's useful for KVM people, I plan to implement a mechanism to
> register a hook called when a new pci (or dma capable) device is

OK; this sounds helpful. the hook can make a hypercall and confirm with the 
host kernel if the device in question is an assigned physical device. If yes, 
we replace the dma_ops. Though, the original intent of having stackable ops 
is that we might want to go through the swiotlb in the guest even for an 
assigned device if the guest dma addresses are not in the addressable range 
of the guest chipset.

> created (it works with hot plugging). It enables IOMMUs to set up an
> appropriate dma_mapping_ops per device.

>From what we've discussed so far, it looks like stackable dma ops will 
definitely be needed. Does this patchset provide something that stacking 
won't?

> [1] http://lkml.org/lkml/2008/5/8/423

Amit.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-22 10:43 ` [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops Amit Shah
@ 2008-05-25  7:20   ` Muli Ben-Yehuda
  2008-05-26  4:09     ` Amit Shah
  0 siblings, 1 reply; 10+ messages in thread
From: Muli Ben-Yehuda @ 2008-05-25  7:20 UTC (permalink / raw)
  To: Amit Shah; +Cc: FUJITA Tomonori, linux-kernel, akpm, alexisb, andi, kvm, avi

On Thu, May 22, 2008 at 04:13:02PM +0530, Amit Shah wrote:

> OK; this sounds helpful. the hook can make a hypercall and confirm
> with the host kernel if the device in question is an assigned
> physical device. If yes, we replace the dma_ops. Though, the
> original intent of having stackable ops is that we might want to go
> through the swiotlb in the guest even for an assigned device if the
> guest dma addresses are not in the addressable range of the guest
> chipset.
> 
> > created (it works with hot plugging). It enables IOMMUs to set up an
> > appropriate dma_mapping_ops per device.
> 
> From what we've discussed so far, it looks like stackable dma ops will 
> definitely be needed. Does this patchset provide something that stacking 
> won't?

Yes---this patchset let's you have a per-device dma-ops, whereas with
stackable you only get global dma-ops. I think it's clear we need
both, and I think per-device dma-ops are the first thing that's
needed. Stacking can then be introduced on a per-device basis.

Cheers,
Muli


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-25  7:20   ` Muli Ben-Yehuda
@ 2008-05-26  4:09     ` Amit Shah
  2008-05-26  6:11       ` FUJITA Tomonori
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2008-05-26  4:09 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: FUJITA Tomonori, linux-kernel, akpm, alexisb, andi, kvm, avi

On Sunday 25 May 2008 12:50:11 Muli Ben-Yehuda wrote:
> On Thu, May 22, 2008 at 04:13:02PM +0530, Amit Shah wrote:
> > OK; this sounds helpful. the hook can make a hypercall and confirm
> > with the host kernel if the device in question is an assigned
> > physical device. If yes, we replace the dma_ops. Though, the
> > original intent of having stackable ops is that we might want to go
> > through the swiotlb in the guest even for an assigned device if the
> > guest dma addresses are not in the addressable range of the guest
> > chipset.
> >
> > > created (it works with hot plugging). It enables IOMMUs to set up an
> > > appropriate dma_mapping_ops per device.
> >
> > From what we've discussed so far, it looks like stackable dma ops will
> > definitely be needed. Does this patchset provide something that stacking
> > won't?
>
> Yes---this patchset let's you have a per-device dma-ops, whereas with
> stackable you only get global dma-ops. I think it's clear we need
> both, and I think per-device dma-ops are the first thing that's
> needed. Stacking can then be introduced on a per-device basis.

When we would want stacking, we'll want it globally and not per-device, isn't 
it? Or at least for devices on a particular bus.

When an IOMMU driver registers itself, it should tell which devices it's 
interested in (each device behind a bus or by enumerating each device it 
cares for). This should take care of all the scenarios and we won't have the 
need for per-device dma_ops.

For something like pvdma, we can walk through the list of pci devices and make 
a hypercall for each of them to get this information and have the pvdma 
version of dma_ops registered for that device. This sounds like it's 
per-device dma_ops, but it's not -- internally, the dma operations walk 
through each of the IOMMUs registered and call them in sequence.

Does this work?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-26  4:09     ` Amit Shah
@ 2008-05-26  6:11       ` FUJITA Tomonori
  2008-05-26 16:44         ` Amit Shah
  0 siblings, 1 reply; 10+ messages in thread
From: FUJITA Tomonori @ 2008-05-26  6:11 UTC (permalink / raw)
  To: amit.shah
  Cc: muli, fujita.tomonori, linux-kernel, akpm, alexisb, andi, kvm,
	avi

On Mon, 26 May 2008 09:39:20 +0530
Amit Shah <amit.shah@qumranet.com> wrote:

> On Sunday 25 May 2008 12:50:11 Muli Ben-Yehuda wrote:
> > On Thu, May 22, 2008 at 04:13:02PM +0530, Amit Shah wrote:
> > > OK; this sounds helpful. the hook can make a hypercall and confirm
> > > with the host kernel if the device in question is an assigned
> > > physical device. If yes, we replace the dma_ops. Though, the
> > > original intent of having stackable ops is that we might want to go
> > > through the swiotlb in the guest even for an assigned device if the
> > > guest dma addresses are not in the addressable range of the guest
> > > chipset.
> > >
> > > > created (it works with hot plugging). It enables IOMMUs to set up an
> > > > appropriate dma_mapping_ops per device.
> > >
> > > From what we've discussed so far, it looks like stackable dma ops will
> > > definitely be needed. Does this patchset provide something that stacking
> > > won't?
> >
> > Yes---this patchset let's you have a per-device dma-ops, whereas with
> > stackable you only get global dma-ops. I think it's clear we need
> > both, and I think per-device dma-ops are the first thing that's
> > needed. Stacking can then be introduced on a per-device basis.
> 
> When we would want stacking, we'll want it globally and not per-device, isn't 
> it? Or at least for devices on a particular bus.
> 
> When an IOMMU driver registers itself, it should tell which devices it's 
> interested in (each device behind a bus or by enumerating each device it 
> cares for). This should take care of all the scenarios and we won't have the 
> need for per-device dma_ops.

Well, without per-device dma_ops, IOMMUs could live. But it's pretty
hacky. Every time a dma operation is called, IOMMUs need to figure out
how a device should be handled.

If IOMMUs can set dma_ops for the device when a new device is created,
IOMMUs don't care anything any more. That's much clean. That's What
the POWER architecture does.


> For something like pvdma, we can walk through the list of pci devices and make 
> a hypercall for each of them to get this information and have the pvdma 
> version of dma_ops registered for that device. This sounds like it's 
> per-device dma_ops, but it's not -- internally, the dma operations walk 
> through each of the IOMMUs registered and call them in sequence.

As Muli poinsted out, For pvdma, you need stacking per-device
dma_ops. With per-device dma_ops, you don't need hack like adding
is_pv_device hook in dma_ops. You can set your dma_ops to only pci
devices that you are interested.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-26  6:11       ` FUJITA Tomonori
@ 2008-05-26 16:44         ` Amit Shah
  2008-05-26 23:50           ` FUJITA Tomonori
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2008-05-26 16:44 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: muli, linux-kernel, akpm, alexisb, andi, kvm, avi

On Monday 26 May 2008 11:41:52 FUJITA Tomonori wrote:
> On Mon, 26 May 2008 09:39:20 +0530
>
> Amit Shah <amit.shah@qumranet.com> wrote:
> > On Sunday 25 May 2008 12:50:11 Muli Ben-Yehuda wrote:
> > > On Thu, May 22, 2008 at 04:13:02PM +0530, Amit Shah wrote:
> > > > OK; this sounds helpful. the hook can make a hypercall and confirm
> > > > with the host kernel if the device in question is an assigned
> > > > physical device. If yes, we replace the dma_ops. Though, the
> > > > original intent of having stackable ops is that we might want to go
> > > > through the swiotlb in the guest even for an assigned device if the
> > > > guest dma addresses are not in the addressable range of the guest
> > > > chipset.
> > > >
> > > > > created (it works with hot plugging). It enables IOMMUs to set up
> > > > > an appropriate dma_mapping_ops per device.
> > > >
> > > > From what we've discussed so far, it looks like stackable dma ops
> > > > will definitely be needed. Does this patchset provide something that
> > > > stacking won't?
> > >
> > > Yes---this patchset let's you have a per-device dma-ops, whereas with
> > > stackable you only get global dma-ops. I think it's clear we need
> > > both, and I think per-device dma-ops are the first thing that's
> > > needed. Stacking can then be introduced on a per-device basis.
> >
> > When we would want stacking, we'll want it globally and not per-device,
> > isn't it? Or at least for devices on a particular bus.
> >
> > When an IOMMU driver registers itself, it should tell which devices it's
> > interested in (each device behind a bus or by enumerating each device it
> > cares for). This should take care of all the scenarios and we won't have
> > the need for per-device dma_ops.
>
> Well, without per-device dma_ops, IOMMUs could live. But it's pretty
> hacky. Every time a dma operation is called, IOMMUs need to figure out
> how a device should be handled.

What if this information could be hidden behind (a slightly complicated) 
get_dma_ops()? Also, each of the operations in dma_ops will see if there's 
something else down the stack that might be interested in the current device.

My contention is that we are going to need stackable ops, and a full-fledged 
stackable implementation is going to solve this problem as well. However, 
this current implementation of per-device dma_ops looks like a really simple 
and non-intrusive solution to one problem, that of getting rid of some 
overheads in the IOMMU code.

> If IOMMUs can set dma_ops for the device when a new device is created,
> IOMMUs don't care anything any more. That's much clean. That's What
> the POWER architecture does.
>
> > For something like pvdma, we can walk through the list of pci devices and
> > make a hypercall for each of them to get this information and have the
> > pvdma version of dma_ops registered for that device. This sounds like
> > it's per-device dma_ops, but it's not -- internally, the dma operations
> > walk through each of the IOMMUs registered and call them in sequence.
>
> As Muli poinsted out, For pvdma, you need stacking per-device
> dma_ops. With per-device dma_ops, you don't need hack like adding
> is_pv_device hook in dma_ops. You can set your dma_ops to only pci
> devices that you are interested.

The hack was added only because there's no stackable dma api we have now. 
Sure, per-device dma_ops is going to solve this problem and I like it. I'm 
only saying we're also going to need stacking ops and in effect, per-device 
dma_ops would just be replaced by them once we get the complete solution.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-26 16:44         ` Amit Shah
@ 2008-05-26 23:50           ` FUJITA Tomonori
  2008-05-27  4:53             ` Amit Shah
  0 siblings, 1 reply; 10+ messages in thread
From: FUJITA Tomonori @ 2008-05-26 23:50 UTC (permalink / raw)
  To: amit.shah
  Cc: fujita.tomonori, muli, linux-kernel, akpm, alexisb, andi, kvm,
	avi

On Mon, 26 May 2008 22:14:34 +0530
Amit Shah <amit.shah@qumranet.com> wrote:

> On Monday 26 May 2008 11:41:52 FUJITA Tomonori wrote:
> > On Mon, 26 May 2008 09:39:20 +0530
> >
> > Amit Shah <amit.shah@qumranet.com> wrote:
> > > On Sunday 25 May 2008 12:50:11 Muli Ben-Yehuda wrote:
> > > > On Thu, May 22, 2008 at 04:13:02PM +0530, Amit Shah wrote:
> > > > > OK; this sounds helpful. the hook can make a hypercall and confirm
> > > > > with the host kernel if the device in question is an assigned
> > > > > physical device. If yes, we replace the dma_ops. Though, the
> > > > > original intent of having stackable ops is that we might want to go
> > > > > through the swiotlb in the guest even for an assigned device if the
> > > > > guest dma addresses are not in the addressable range of the guest
> > > > > chipset.
> > > > >
> > > > > > created (it works with hot plugging). It enables IOMMUs to set up
> > > > > > an appropriate dma_mapping_ops per device.
> > > > >
> > > > > From what we've discussed so far, it looks like stackable dma ops
> > > > > will definitely be needed. Does this patchset provide something that
> > > > > stacking won't?
> > > >
> > > > Yes---this patchset let's you have a per-device dma-ops, whereas with
> > > > stackable you only get global dma-ops. I think it's clear we need
> > > > both, and I think per-device dma-ops are the first thing that's
> > > > needed. Stacking can then be introduced on a per-device basis.
> > >
> > > When we would want stacking, we'll want it globally and not per-device,
> > > isn't it? Or at least for devices on a particular bus.
> > >
> > > When an IOMMU driver registers itself, it should tell which devices it's
> > > interested in (each device behind a bus or by enumerating each device it
> > > cares for). This should take care of all the scenarios and we won't have
> > > the need for per-device dma_ops.
> >
> > Well, without per-device dma_ops, IOMMUs could live. But it's pretty
> > hacky. Every time a dma operation is called, IOMMUs need to figure out
> > how a device should be handled.
> 
> What if this information could be hidden behind (a slightly complicated) 
> get_dma_ops()? Also, each of the operations in dma_ops will see if there's 
> something else down the stack that might be interested in the current device.

dma_ops can't do anything since only IOMMUs know what to do against a
device.

Whatever you implement in dma_ops, without per-device dma_ops, IOMMUs
need to figure out what to do a device every time a dma operation is
called.


> My contention is that we are going to need stackable ops, and a full-fledged 
> stackable implementation is going to solve this problem as well. However, 
> this current implementation of per-device dma_ops looks like a really simple 
> and non-intrusive solution to one problem, that of getting rid of some 
> overheads in the IOMMU code.

I don't think that stackable ops solve the problem that some IOMMUs have.


> > If IOMMUs can set dma_ops for the device when a new device is created,
> > IOMMUs don't care anything any more. That's much clean. That's What
> > the POWER architecture does.
> >
> > > For something like pvdma, we can walk through the list of pci devices and
> > > make a hypercall for each of them to get this information and have the
> > > pvdma version of dma_ops registered for that device. This sounds like
> > > it's per-device dma_ops, but it's not -- internally, the dma operations
> > > walk through each of the IOMMUs registered and call them in sequence.
> >
> > As Muli poinsted out, For pvdma, you need stacking per-device
> > dma_ops. With per-device dma_ops, you don't need hack like adding
> > is_pv_device hook in dma_ops. You can set your dma_ops to only pci
> > devices that you are interested.
> 
> The hack was added only because there's no stackable dma api we have now. 
> Sure, per-device dma_ops is going to solve this problem and I like it. I'm 
> only saying we're also going to need stacking ops and in effect, per-device 
> dma_ops would just be replaced by them once we get the complete solution.

Again, stackable ops can't cleanly solve the problem that per-device
dma_ops tries to solve. For example, you stack dma_ops like
pvdma->hardare->nommu/swiotlb. How can pvdma_ops know if pvdma_ops
needs to handle a device or not? pvdma_ops needs to skip some devices
and handle some. per-device dma_ops enables us not to stack pvdma_ops
for devices that pvdma_ops are not instrested in. That's much clean.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-26 23:50           ` FUJITA Tomonori
@ 2008-05-27  4:53             ` Amit Shah
  2008-05-27  5:24               ` FUJITA Tomonori
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2008-05-27  4:53 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: muli, linux-kernel, akpm, alexisb, andi, kvm, avi

On Tuesday 27 May 2008 05:20:54 FUJITA Tomonori wrote:
> On Mon, 26 May 2008 22:14:34 +0530
>
> Amit Shah <amit.shah@qumranet.com> wrote:
> > On Monday 26 May 2008 11:41:52 FUJITA Tomonori wrote:
> > > On Mon, 26 May 2008 09:39:20 +0530
> > >
> > > Amit Shah <amit.shah@qumranet.com> wrote:
> > > > On Sunday 25 May 2008 12:50:11 Muli Ben-Yehuda wrote:
> > > > > On Thu, May 22, 2008 at 04:13:02PM +0530, Amit Shah wrote:
> > > > > > OK; this sounds helpful. the hook can make a hypercall and
> > > > > > confirm with the host kernel if the device in question is an
> > > > > > assigned physical device. If yes, we replace the dma_ops. Though,
> > > > > > the original intent of having stackable ops is that we might want
> > > > > > to go through the swiotlb in the guest even for an assigned
> > > > > > device if the guest dma addresses are not in the addressable
> > > > > > range of the guest chipset.
> > > > > >
> > > > > > > created (it works with hot plugging). It enables IOMMUs to set
> > > > > > > up an appropriate dma_mapping_ops per device.
> > > > > >
> > > > > > From what we've discussed so far, it looks like stackable dma ops
> > > > > > will definitely be needed. Does this patchset provide something
> > > > > > that stacking won't?
> > > > >
> > > > > Yes---this patchset let's you have a per-device dma-ops, whereas
> > > > > with stackable you only get global dma-ops. I think it's clear we
> > > > > need both, and I think per-device dma-ops are the first thing
> > > > > that's needed. Stacking can then be introduced on a per-device
> > > > > basis.
> > > >
> > > > When we would want stacking, we'll want it globally and not
> > > > per-device, isn't it? Or at least for devices on a particular bus.
> > > >
> > > > When an IOMMU driver registers itself, it should tell which devices
> > > > it's interested in (each device behind a bus or by enumerating each
> > > > device it cares for). This should take care of all the scenarios and
> > > > we won't have the need for per-device dma_ops.
> > >
> > > Well, without per-device dma_ops, IOMMUs could live. But it's pretty
> > > hacky. Every time a dma operation is called, IOMMUs need to figure out
> > > how a device should be handled.
> >
> > What if this information could be hidden behind (a slightly complicated)
> > get_dma_ops()? Also, each of the operations in dma_ops will see if
> > there's something else down the stack that might be interested in the
> > current device.
>
> dma_ops can't do anything since only IOMMUs know what to do against a
> device.

Instead of each device calling a function to check which IOMMU is right, I am 
suggesting each IOMMU come in and tell which devices it is interested in.

> Again, stackable ops can't cleanly solve the problem that per-device
> dma_ops tries to solve. For example, you stack dma_ops like
> pvdma->hardare->nommu/swiotlb. How can pvdma_ops know if pvdma_ops
> needs to handle a device or not? pvdma_ops needs to skip some devices
> and handle some. per-device dma_ops enables us not to stack pvdma_ops
> for devices that pvdma_ops are not instrested in. That's much clean.

OK; how about this:

An example with per-device dma_ops and stacking will look like this:

pvdma->hardware->nommu/swiotlb
    ^             ^
     |              |
 e1000     rtl8139

And this scheme is going to suit everyone, agreed?

This is simple and doesn't need too many changes all around.

I was suggesting something more than this that can handle cases like an iommu 
wanting to have each device behind a bus to pass through it (it's still 
possible, but needs a per-device walk). Also, in the scenario depicted above, 
each device will start by pointing to the first iommu in the chain (pvdma in 
this case) and the iommu will then determine if that device needs to be 
passed via its translations.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-27  4:53             ` Amit Shah
@ 2008-05-27  5:24               ` FUJITA Tomonori
  2008-05-27  5:54                 ` Amit Shah
  0 siblings, 1 reply; 10+ messages in thread
From: FUJITA Tomonori @ 2008-05-27  5:24 UTC (permalink / raw)
  To: amit.shah
  Cc: fujita.tomonori, muli, linux-kernel, akpm, alexisb, andi, kvm,
	avi

On Tue, 27 May 2008 10:23:21 +0530
Amit Shah <amit.shah@qumranet.com> wrote:

> On Tuesday 27 May 2008 05:20:54 FUJITA Tomonori wrote:
> > On Mon, 26 May 2008 22:14:34 +0530
> >
> > Amit Shah <amit.shah@qumranet.com> wrote:
> > > On Monday 26 May 2008 11:41:52 FUJITA Tomonori wrote:
> > > > On Mon, 26 May 2008 09:39:20 +0530
> > > >
> > > > Amit Shah <amit.shah@qumranet.com> wrote:
> > > > > On Sunday 25 May 2008 12:50:11 Muli Ben-Yehuda wrote:
> > > > > > On Thu, May 22, 2008 at 04:13:02PM +0530, Amit Shah wrote:
> > > > > > > OK; this sounds helpful. the hook can make a hypercall and
> > > > > > > confirm with the host kernel if the device in question is an
> > > > > > > assigned physical device. If yes, we replace the dma_ops. Though,
> > > > > > > the original intent of having stackable ops is that we might want
> > > > > > > to go through the swiotlb in the guest even for an assigned
> > > > > > > device if the guest dma addresses are not in the addressable
> > > > > > > range of the guest chipset.
> > > > > > >
> > > > > > > > created (it works with hot plugging). It enables IOMMUs to set
> > > > > > > > up an appropriate dma_mapping_ops per device.
> > > > > > >
> > > > > > > From what we've discussed so far, it looks like stackable dma ops
> > > > > > > will definitely be needed. Does this patchset provide something
> > > > > > > that stacking won't?
> > > > > >
> > > > > > Yes---this patchset let's you have a per-device dma-ops, whereas
> > > > > > with stackable you only get global dma-ops. I think it's clear we
> > > > > > need both, and I think per-device dma-ops are the first thing
> > > > > > that's needed. Stacking can then be introduced on a per-device
> > > > > > basis.
> > > > >
> > > > > When we would want stacking, we'll want it globally and not
> > > > > per-device, isn't it? Or at least for devices on a particular bus.
> > > > >
> > > > > When an IOMMU driver registers itself, it should tell which devices
> > > > > it's interested in (each device behind a bus or by enumerating each
> > > > > device it cares for). This should take care of all the scenarios and
> > > > > we won't have the need for per-device dma_ops.
> > > >
> > > > Well, without per-device dma_ops, IOMMUs could live. But it's pretty
> > > > hacky. Every time a dma operation is called, IOMMUs need to figure out
> > > > how a device should be handled.
> > >
> > > What if this information could be hidden behind (a slightly complicated)
> > > get_dma_ops()? Also, each of the operations in dma_ops will see if
> > > there's something else down the stack that might be interested in the
> > > current device.
> >
> > dma_ops can't do anything since only IOMMUs know what to do against a
> > device.
> 
> Instead of each device calling a function to check which IOMMU is right, I am 
> suggesting each IOMMU come in and tell which devices it is interested in.

It means that you need to register IOMMU information per
device. That's same to per-device dma_ops.

Or It means you need put devices (an IOMMU is interested in) to a
list. Every time dma operation is called, you check the list to see
who is interested in a device. That's not clean (not effective too).

 
> > Again, stackable ops can't cleanly solve the problem that per-device
> > dma_ops tries to solve. For example, you stack dma_ops like
> > pvdma->hardare->nommu/swiotlb. How can pvdma_ops know if pvdma_ops
> > needs to handle a device or not? pvdma_ops needs to skip some devices
> > and handle some. per-device dma_ops enables us not to stack pvdma_ops
> > for devices that pvdma_ops are not instrested in. That's much clean.
> 
> OK; how about this:
> 
> An example with per-device dma_ops and stacking will look like this:
> 
> pvdma->hardware->nommu/swiotlb
>     ^             ^
>      |              |
>  e1000     rtl8139
> 
> And this scheme is going to suit everyone, agreed?
> 
> This is simple and doesn't need too many changes all around.

Sorry, I'm not sure what this picture represents.

BTW, without pvdma, there is no need to hardware->nommu/swiotlb
stacking for IOMMUs like Calgary. Per-device dma_ops wor for them.


> I was suggesting something more than this that can handle cases like an iommu 
> wanting to have each device behind a bus to pass through it (it's still 
> possible, but needs a per-device walk). Also, in the scenario depicted above, 
> each device will start by pointing to the first iommu in the chain (pvdma in 
> this case) and the iommu will then determine if that device needs to be 
> passed via its translations.

No, IOMMUs doesn't need to do that. We need to put a stacking
mechanism in dma-mapping.h. A stacking mechanism should not be visible
to IOMMUs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-27  5:24               ` FUJITA Tomonori
@ 2008-05-27  5:54                 ` Amit Shah
  2008-05-28 10:19                   ` FUJITA Tomonori
  0 siblings, 1 reply; 10+ messages in thread
From: Amit Shah @ 2008-05-27  5:54 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: muli, linux-kernel, akpm, alexisb, andi, kvm, avi

On Tuesday 27 May 2008 10:54:06 FUJITA Tomonori wrote:

> > An example with per-device dma_ops and stacking will look like this:
> >
> > pvdma->hardware->nommu/swiotlb
> >     ^             ^
> >
> >  e1000     rtl8139
> >
> > And this scheme is going to suit everyone, agreed?
> >
> > This is simple and doesn't need too many changes all around.
>
> Sorry, I'm not sure what this picture represents.

It meant to show just e1000 needs to go through the pvdma translations. 
rtl8139 goes via the other iommus. e1000 also goes through the other iommus 
(mainly if it's going to be the swiotlb that a guest might need).

> BTW, without pvdma, there is no need to hardware->nommu/swiotlb
> stacking for IOMMUs like Calgary. Per-device dma_ops wor for them.

Hmm, ok. Then this argument doesn't count.

> > I was suggesting something more than this that can handle cases like an
> > iommu wanting to have each device behind a bus to pass through it (it's
> > still possible, but needs a per-device walk). Also, in the scenario
> > depicted above, each device will start by pointing to the first iommu in
> > the chain (pvdma in this case) and the iommu will then determine if that
> > device needs to be passed via its translations.
>
> No, IOMMUs doesn't need to do that. We need to put a stacking
> mechanism in dma-mapping.h. A stacking mechanism should not be visible
> to IOMMUs.

OK; then just per-device dma_ops will work and for the pvdma case, we'll have 
to have the stacking. Since this is a special case, any kind of generic APIs 
shouldn't be needed as well.

What is the plan with this patch then? When do you plan to ask for mainline 
merging?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops
  2008-05-27  5:54                 ` Amit Shah
@ 2008-05-28 10:19                   ` FUJITA Tomonori
  0 siblings, 0 replies; 10+ messages in thread
From: FUJITA Tomonori @ 2008-05-28 10:19 UTC (permalink / raw)
  To: amit.shah
  Cc: fujita.tomonori, muli, linux-kernel, akpm, alexisb, andi, kvm,
	avi

On Tue, 27 May 2008 11:24:08 +0530
Amit Shah <amit.shah@qumranet.com> wrote:

> On Tuesday 27 May 2008 10:54:06 FUJITA Tomonori wrote:
> 
> > > An example with per-device dma_ops and stacking will look like this:
> > >
> > > pvdma->hardware->nommu/swiotlb
> > >     ^             ^
> > >
> > >  e1000     rtl8139
> > >
> > > And this scheme is going to suit everyone, agreed?
> > >
> > > This is simple and doesn't need too many changes all around.
> >
> > Sorry, I'm not sure what this picture represents.
> 
> It meant to show just e1000 needs to go through the pvdma translations. 
> rtl8139 goes via the other iommus. e1000 also goes through the other iommus 
> (mainly if it's going to be the swiotlb that a guest might need).
> 
> > BTW, without pvdma, there is no need to hardware->nommu/swiotlb
> > stacking for IOMMUs like Calgary. Per-device dma_ops wor for them.
> 
> Hmm, ok. Then this argument doesn't count.
> 
> > > I was suggesting something more than this that can handle cases like an
> > > iommu wanting to have each device behind a bus to pass through it (it's
> > > still possible, but needs a per-device walk). Also, in the scenario
> > > depicted above, each device will start by pointing to the first iommu in
> > > the chain (pvdma in this case) and the iommu will then determine if that
> > > device needs to be passed via its translations.
> >
> > No, IOMMUs doesn't need to do that. We need to put a stacking
> > mechanism in dma-mapping.h. A stacking mechanism should not be visible
> > to IOMMUs.
> 
> OK; then just per-device dma_ops will work and for the pvdma case, we'll have 
> to have the stacking. Since this is a special case, any kind of generic APIs 
> shouldn't be needed as well.

I'm not sure what you mean exactly, but there might be other people
who want to use the dma_ops stacking though I'm not sure yet how they
use it:

http://lkml.org/lkml/2008/5/15/79


> What is the plan with this patch then? When do you plan to ask for mainline 
> merging?

Andrew already put this patchset in -mm. Unless someone comes with a
new reason against this patchset, it will be merged, I think.

BTW, Andrew, really sorry about several compile bugs due to the first
patch (changing dma_mapping_error) in this patchset. And thanks a lot
for fixing them.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-05-28 10:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1211178689-3507-1-git-send-email-fujita.tomonori@lab.ntt.co.jp>
2008-05-22 10:43 ` [PATCH v2 -mm 0/2] x86: per-device dma_mapping_ops Amit Shah
2008-05-25  7:20   ` Muli Ben-Yehuda
2008-05-26  4:09     ` Amit Shah
2008-05-26  6:11       ` FUJITA Tomonori
2008-05-26 16:44         ` Amit Shah
2008-05-26 23:50           ` FUJITA Tomonori
2008-05-27  4:53             ` Amit Shah
2008-05-27  5:24               ` FUJITA Tomonori
2008-05-27  5:54                 ` Amit Shah
2008-05-28 10:19                   ` FUJITA Tomonori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox