All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Chris Wright <chrisw@sous-sol.org>
Cc: Ian.Campbell@eu.citrix.com, jeremy@goop.org,
	xen-devel@lists.xensource.com, joerg.roedel@amd.com,
	fujita.tomonori@lab.ntt.co.jp, iommu@lists.linux-foundation.org,
	dwmw2@infradead.org, alex.williamson@hp.com
Subject: Re: [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs.
Date: Tue, 19 Jan 2010 12:46:54 -0500	[thread overview]
Message-ID: <20100119174654.GQ11986@phenom.dumpdata.com> (raw)
In-Reply-To: <20100115013308.GB6021@sequoia.sous-sol.org>

. snip..
> You can move the comments to a kerneldoc section for proper
> documentation.
> 
> /**
>  * struct swiotlb_engine - short desc...
>  * @name:	Name of the engine...
> etc

<nods>.
.. snip ..
> > +	char *end;
> 
> Isn't this still global to swiotlb, not specific to the backend impl.?

Yes and no. Without the start/end, the "is_swiotlb_buffer"
would not be able to determine if the passed in address is within the
SWIOTLB buffer.

> 
> > +	/*
> > +	 * The number of IO TLB blocks (in groups of 64) betweeen start and
> > +	 * end.  This is command line adjustable via setup_io_tlb_npages.
> > +	 */
> > +	unsigned long nslabs;
> 
> Same here.
> 
That one can be put back (make it part of lib/swiotlb.c)
> > +
> > +	/*
> > +	 * When the IOMMU overflows we return a fallback buffer.
> > +	 * This sets the size.
> > +	 */
> > +	unsigned long overflow;
> > +
> > +	void *overflow_buffer;
> 
> And here.
Ditto.
..snip ..
> > +	 * Is the DMA (Bus) address within our bounce buffer (start and end).
> > +	 */
> > +	int (*is_swiotlb_buffer)(struct swiotlb_engine *, dma_addr_t dev_addr,
> > +				 phys_addr_t phys);
> > +
> 
> Why is this implementation specific?

In the current implementation, they both use the physical address and
do a simple check:

	return paddr >= virt_to_phys(io_tlb_start) &&
		paddr < virt_to_phys(io_tlb_end);

That for virtualized environments where a PCI device is passed in would
work too.

Unfortunately the problem is when we provide a channel of communication
with another domain and we end up doing DMA on behalf of another guest.
The short description of the problem is that a page of memory is shared
with another domain and the mapping in our domain is correct (bus->physical)
the other way (virt->physical->bus) is incorrect for the duration of this page
being shared. Hence we need to verify that the page is local to our
domain, and for that we need the bus address to verify that the
addr ==  physical->bus(bus->physical(addr)) where addr is the bus
address (dma_addr_t). If it is not local (shared with another domain)
we MUST not consider it as a SWIOTLB buffer as that can lead to
panics and possible corruptions. The trick here is that the phys->virt
address can fall within the SWIOTLB buffer for pages that are
shared with another domain and we need the DMA address to do an extra check.

The long description of the problem is:

You are the domain doing some DMA on behalf of another domain. The
simple example is you are servicing a block device to the other guests.
One way to implement this is to present a one page ring buffer where
both domains move the producer and consumer indexes around. Once you get
a request (READ/WRITE), you use the virtualized channels to "share" that page
into your domain. For this you have a buffer (2MB or bigger) wherein for
pages that shared in to you, you over-write the phys->bus mapping.
That means that the phys->phys translation is altered for the duration
of this request being out-standing. Once it is completed, the phys->bus
translation is restored.

Here is a little diagram of what happens when a page is shared (and lets
assume that we have a situation where virt #1 == virt #2, which means
that phys #1 == phys #2).

(domain 2) virt#1->phys#1---\
                             +- bus #1
(domain 3) virt#2->phys#2 ---/

(phys#1 points to bus #1, and phys#2 points to bus #1 too).

The converse of the above picture is not true:

      /---> phys #1-> virt #1. (domain 2).
bus#1 +
      \---> phys #2-> virt #2. (domain 3).

phys #1 != phys #2 and hence virt #1 != virt #2.

When a page is not shared:

(domain 2) virt #1->phys #1--> bus #1
(domain 3) virt #2->phys #2--> bus #2

bus #1 -> phys #1 -> virt #1 (domain 2)
bus #2 -> phys #2 -> virt #2 (domain 3)

The bus #1 != bus #2, but phys #1 could be same as phys #2 (since
there are just PFNs). And virt #1 == virt #2.

The reason for these is that each domain has its own memory layout where
the memory starts at pfn 0, not at some higher number. So each domain
sees the physical address identically, but the bus address MUST point
to different areas (except when sharing) otherwise one domain would
over-write another domain, ouch.

Furthermore when a domain is allocated, the pages for the domain are not
guaranteed to be linearly contiguous so we can't guarantee that phys == bus.

So to guard against the situation in which phys #1 ->virt comes out with
an address that looks to be within our SWIOTLB buffer we need to do the
extra check:

addr ==  physical->bus(bus->physical(addr)) where addr is the bus
address

And for scenarios where this is not true (page belongs to another
domain), that page is not in the SWIOTLB (even thought the virtual and
physical address point to it).

> > +	/*
> > +	 * Is the DMA (Bus) address reachable by the PCI device?.
> > +	 */
> > +	bool (*dma_capable)(struct device *, dma_addr_t, phys_addr_t, size_t);

I mentioned in the previous explanation that when a domain is allocated,
the pages are not guaranteed to be linearly contiguous.

For bare-metal that is not the case and 'dma_capable' just checks
the device DMA mask against the bus address.

For virtualized environment we do need to check if the pages are linearly
contiguous for the size request.

For that we need the physical address to iterate over them doing the
phys->bus#1 translation and checking whether the  (phys+1)->bus#2
bus#1 == bus#2 + 1.

  reply	other threads:[~2010-01-19 17:46 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-14 23:00 [RFC SWIOTLB-0.2] Konrad Rzeszutek Wilk
2010-01-14 23:00 ` [PATCH 01/15] [swiotlb] fix: Update 'setup_io_tlb_npages' to accept both arguments in either order Konrad Rzeszutek Wilk
2010-01-14 23:00   ` [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs Konrad Rzeszutek Wilk
2010-01-14 23:00     ` [PATCH 03/15] [swiotlb] Add swiotlb_register_engine function Konrad Rzeszutek Wilk
2010-01-14 23:00       ` [PATCH 04/15] [swiotlb] Search and replace s/io_tlb/iommu_sw->/ Konrad Rzeszutek Wilk
2010-01-14 23:00         ` [PATCH 05/15] [swiotlb] Respect the io_tlb_nslabs argument value Konrad Rzeszutek Wilk
2010-01-14 23:00           ` [PATCH 06/15] [swiotlb] In 'swiotlb_init' take advantage of the default swiotlb_engine support Konrad Rzeszutek Wilk
2010-01-14 23:00             ` [PATCH 07/15] [swiotlb] In 'swiotlb_free' check iommu_sw pointer Konrad Rzeszutek Wilk
2010-01-14 23:00               ` [PATCH 08/15] [swiotlb] Add 'is_swiotlb_buffer' to the swiotlb_ops function decleration Konrad Rzeszutek Wilk
2010-01-14 23:00                 ` [PATCH 09/15] [swiotlb] Add 'dma_capable' to the swiotlb_ops structure Konrad Rzeszutek Wilk
2010-01-14 23:00                   ` [PATCH 10/15] [swiotlb] Replace the [phys, bus]->virt and virt->[bus, phys] functions with iommu_sw calls Konrad Rzeszutek Wilk
2010-01-14 23:01                     ` [PATCH 11/15] [swiotlb] Replace late_alloc with iommu_sw->priv usage Konrad Rzeszutek Wilk
2010-01-14 23:01                       ` [PATCH 12/15] [swiotlb] Remove un-used static declerations obsoleted by iommu_sw Konrad Rzeszutek Wilk
2010-01-14 23:01                         ` [PATCH 13/15] [swiotlb] Make io_tlb_nslabs visible outside lib/swiotlb.c and rename it Konrad Rzeszutek Wilk
2010-01-14 23:01                           ` [PATCH 14/15] [swiotlb] Move initialization (swiotlb_init) and its friends in swiotlb-default.c Konrad Rzeszutek Wilk
2010-01-14 23:01                             ` [PATCH 15/15] [swiotlb] Take advantage of iommu_sw->name and add %s to printk's Konrad Rzeszutek Wilk
2010-01-15  2:14                             ` [PATCH 14/15] [swiotlb] Move initialization (swiotlb_init) and its friends in swiotlb-default.c Chris Wright
2010-01-19 17:45                               ` Konrad Rzeszutek Wilk
2010-01-19 18:55                                 ` Chris Wright
2010-01-15  2:02               ` [PATCH 07/15] [swiotlb] In 'swiotlb_free' check iommu_sw pointer Chris Wright
2010-01-19 17:45                 ` Konrad Rzeszutek Wilk
2010-01-19 18:23                   ` Chris Wright
2010-01-15  1:57             ` [PATCH 06/15] [swiotlb] In 'swiotlb_init' take advantage of the default swiotlb_engine support Chris Wright
2010-01-19 17:45               ` Konrad Rzeszutek Wilk
2010-01-15  1:47           ` [PATCH 05/15] [swiotlb] Respect the io_tlb_nslabs argument value Chris Wright
2010-01-15  1:43         ` [PATCH 04/15] [swiotlb] Search and replace s/io_tlb/iommu_sw->/ Chris Wright
2010-01-19 17:45           ` Konrad Rzeszutek Wilk
2010-01-15  1:41       ` [PATCH 03/15] [swiotlb] Add swiotlb_register_engine function Chris Wright
2010-01-19 17:25         ` Konrad Rzeszutek Wilk
2010-01-15  1:33     ` [PATCH 02/15] [swiotlb] Add swiotlb_engine structure for tracking multiple software IO TLBs Chris Wright
2010-01-19 17:46       ` Konrad Rzeszutek Wilk [this message]
2010-01-19 18:43         ` Chris Wright
2010-01-22  1:51     ` FUJITA Tomonori
2010-01-26 16:20       ` Konrad Rzeszutek Wilk
2010-02-03  2:04         ` FUJITA Tomonori
2010-02-03 17:08           ` [RFC SWIOTLB-0.4] Konrad Rzeszutek Wilk
2010-02-03 17:08             ` [PATCH 01/11] [swiotlb] fix: Update 'setup_io_tlb_npages' to accept both arguments in either order Konrad Rzeszutek Wilk
2010-02-03 17:08               ` [PATCH 02/11] [swiotlb] Make 'setup_io_tlb_npages' accept new 'swiotlb=' syntax Konrad Rzeszutek Wilk
2010-02-03 17:08                 ` [PATCH 03/11] [swiotlb] Normalize the swiotlb_init_* function's naming syntax Konrad Rzeszutek Wilk
2010-02-03 17:08                   ` [PATCH 04/11] [swiotlb] Make printk's use same prefix and include dev_err when possible Konrad Rzeszutek Wilk
2010-02-03 17:08                     ` [PATCH 05/11] [swiotlb] Make internal bookkeeping functions have 'do_' prefix Konrad Rzeszutek Wilk
2010-02-03 17:08                       ` [PATCH 06/11] [swiotlb] do_map_single: abstract out swiotlb_virt_to_bus calls out Konrad Rzeszutek Wilk
2010-02-03 17:08                         ` [PATCH 07/11] [swiotlb] Fix checkpatch warnings Konrad Rzeszutek Wilk
2010-02-03 17:08                           ` [PATCH 08/11] [swiotlb] Re-order the function declerations Konrad Rzeszutek Wilk
2010-02-03 17:08                             ` [PATCH 09/11] [swiotlb] Make swiotlb bookkeeping functions visible in the header file Konrad Rzeszutek Wilk
2010-02-03 17:08                               ` [PATCH 10/11] [swiotlb] Rename swiotlb.c to swiotlb-core.c Konrad Rzeszutek Wilk
2010-02-03 17:08                                 ` [PATCH 11/11] [swiotlb] move dma_ops functions to swiotlb.c Konrad Rzeszutek Wilk
2010-02-04  0:17             ` [RFC SWIOTLB-0.4] FUJITA Tomonori
2010-02-04  3:07               ` Konrad Rzeszutek Wilk
2010-02-16 23:37                 ` Konrad Rzeszutek Wilk
2010-01-15  1:22   ` [PATCH 01/15] [swiotlb] fix: Update 'setup_io_tlb_npages' to accept both arguments in either order Chris Wright
2010-01-19 17:47     ` Konrad Rzeszutek Wilk
2010-01-19 19:00       ` Chris Wright
2010-01-19 19:39         ` Konrad Rzeszutek Wilk
2010-01-15  2:25 ` [RFC SWIOTLB-0.2] Chris Wright
2010-01-19 18:20   ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100119174654.GQ11986@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Ian.Campbell@eu.citrix.com \
    --cc=alex.williamson@hp.com \
    --cc=chrisw@sous-sol.org \
    --cc=dwmw2@infradead.org \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jeremy@goop.org \
    --cc=joerg.roedel@amd.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.