[Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2)
@ 2011-09-05  4:34 David Gibson
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA David Gibson
                   ` (9 more replies)
  0 siblings, 10 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

A while back, Eduard - Gabriel Munteanu send a series of patches
implementing support for emulating the AMD IOMMU in conjunction with
qemu emulated PCI devices.  A revised patch series added support for
the Intel IOMMU, and I also send a revised version of this series
which added support for the hypervisor mediated IOMMU on the pseries
machine.

Richard Henderson also weighed in on the discussion, and there's still
a fair bit to be thrashed out in terms of exactly how to set up an
IOMMU / DMA translation subsystem.

However, really only 2 or 3 patches in any of these series have
contained anything interesting.  The rest of the series has been
converting existing PCI emulated devices to use the new DMA interface
which worked through the IOMMU translation, whatever it was.  While we
keep working out what we want for the guts of the IOMMU support, these
device conversion patches keep bitrotting against updates to the
various device implementations themselves.

Really, regardless of whether we're actually implementing IOMMU
translation, it makes sense that qemu code should distinguish between
when it is really operating in CPU physical addresses and when it is
operating in bus or DMA addresses which might have some kind of
translation into physical addresses.

This series, therefore, begins the conversion of existing PCI device
emulation code to use new (stub) pci dma access functions.  These are
for now, just defined to be untranslated cpu physical memory accesses,
as before, but has two advantages:

   * It becomes obvious where the code is working with dma addresses,
     so it's easier to grep for what might be affected by an IOMMU or
     other bus address translation.

   * The new stubs take the PCIDevice *, from which any of the various
     suggested IOMMU interfaces should be able to locate the correct
     IOMMU translation context.

This series only converts the easy cases so far.  That is simple
direct DMA access from device code:
cpu_physical_memory_{read,write}(), ld*_phys() and st*_phys().  It
doesn't handle devices which use the scatter/gather code (just ide and
UHCI, so far).  I plan to address that later, but I have some details
still to work out.

Anthony, please apply.

[Changes since v1]
 - Cleanup up some macro namespace pollution
 - Fixed some bugs where a few DMA accesses had been missed
 - Removed ohci from the series - non-PCI versions of it make it not a
   suitable easy case; it will need to be fixed up in a later pass.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
@ 2011-09-05  4:34 ` David Gibson
  2011-10-02 10:25   ` Michael S. Tsirkin
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 2/9] rtl8139: Use PCI DMA stub functions David Gibson
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
present, these are just stubs which perform directly cpu physical memory
accesses.

Using these stubs, however, distinguishes PCI device DMA transactions from
other accesses to physical memory, which will allow PCI IOMMU support to
be added in one place, rather than updating every PCI driver at that time.

That is, it allows us to update individual PCI drivers to support an IOMMU
without having yet determined the details of how the IOMMU emulation will
operate.  This will let us remove the most bitrot-sensitive part of an
IOMMU patch in advance.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 dma.h    |    2 ++
 hw/pci.c |   31 +++++++++++++++++++++++++++++++
 hw/pci.h |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/dma.h b/dma.h
index a6db5ba..06e91cb 100644
--- a/dma.h
+++ b/dma.h
@@ -15,6 +15,8 @@
 #include "hw/hw.h"
 #include "block.h"
 
+typedef target_phys_addr_t dma_addr_t;
+
 typedef struct {
     target_phys_addr_t base;
     target_phys_addr_t len;
diff --git a/hw/pci.c b/hw/pci.c
index 1cdcbb7..0be7611 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -2211,3 +2211,34 @@ MemoryRegion *pci_address_space(PCIDevice *dev)
 {
     return dev->bus->address_space_mem;
 }
+
+#define PCI_DMA_DEFINE_LDST(_lname, _sname, _bits) \
+    uint##_bits##_t ld##_lname##_pci_dma(PCIDevice *dev, dma_addr_t addr) \
+    { \
+        uint##_bits##_t val; \
+        pci_dma_read(dev, addr, &val, sizeof(val)); \
+        return le##_bits##_to_cpu(val); \
+    } \
+    void st##_sname##_pci_dma(PCIDevice *dev, \
+                              dma_addr_t addr, uint##_bits##_t val) \
+    { \
+        val = cpu_to_le##_bits(val); \
+        pci_dma_write(dev, addr, &val, sizeof(val)); \
+    }
+
+uint8_t ldub_pci_dma(PCIDevice *dev, dma_addr_t addr)
+{
+    uint8_t val;
+
+    pci_dma_read(dev, addr, &val, sizeof(val));
+    return val;
+}
+
+void stb_pci_dma(PCIDevice *dev, dma_addr_t addr, uint8_t val)
+{
+    pci_dma_write(dev, addr, &val, sizeof(val));
+}
+
+PCI_DMA_DEFINE_LDST(uw, w, 16);
+PCI_DMA_DEFINE_LDST(l, l, 32);
+PCI_DMA_DEFINE_LDST(q, q, 64);
diff --git a/hw/pci.h b/hw/pci.h
index 391217e..4426e9d 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -6,6 +6,7 @@
 
 #include "qdev.h"
 #include "memory.h"
+#include "dma.h"
 
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
@@ -492,4 +493,36 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
     return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
 }
 
+/* DMA access functions */
+static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
+                             void *buf, dma_addr_t len, int is_write)
+{
+    cpu_physical_memory_rw(addr, buf, len, is_write);
+    return 0;
+}
+
+static inline int pci_dma_read(PCIDevice *dev, dma_addr_t addr,
+                               void *buf, dma_addr_t len)
+{
+    return pci_dma_rw(dev, addr, buf, len, 0);
+}
+
+static inline int pci_dma_write(PCIDevice *dev, dma_addr_t addr,
+                                const void *buf, dma_addr_t len)
+{
+    return pci_dma_rw(dev, addr, (void *) buf, len, 1);
+}
+
+#define PCI_DMA_DECLARE_LDST(_lname, _sname, _bits) \
+    uint##_bits##_t ld##_lname##_pci_dma(PCIDevice *dev, dma_addr_t addr); \
+    void st##_sname##_pci_dma(PCIDevice *dev, dma_addr_t addr, \
+                              uint##_bits##_t val);            \
+
+PCI_DMA_DECLARE_LDST(ub, b, 8);
+PCI_DMA_DECLARE_LDST(uw, w, 16);
+PCI_DMA_DECLARE_LDST(l, l, 32);
+PCI_DMA_DECLARE_LDST(q, q, 64);
+
+#undef DECLARE_LDST_DMA
+
 #endif
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 2/9] rtl8139: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA David Gibson
@ 2011-09-05  4:34 ` David Gibson
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 3/9] eepro100: " David Gibson
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

This updates the rtl8139 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/rtl8139.c |   98 +++++++++++++++++++++++++++++----------------------------
 1 files changed, 50 insertions(+), 48 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index c5de5b4..af71cce 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -53,6 +53,7 @@
 
 #include "hw.h"
 #include "pci.h"
+#include "dma.h"
 #include "qemu-timer.h"
 #include "net.h"
 #include "loader.h"
@@ -427,9 +428,6 @@ typedef struct RTL8139TallyCounters
 /* Clears all tally counters */
 static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
 
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
     PCIDevice dev;
     uint8_t phys[8]; /* mac address */
@@ -512,6 +510,9 @@ typedef struct RTL8139State {
     int rtl8139_mmio_io_addr_dummy;
 } RTL8139State;
 
+/* Writes tally counters to memory via DMA */
+static void RTL8139TallyCounters_dma_write(RTL8139State *s, dma_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -773,15 +774,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                pci_dma_write(&s->dev, s->RxBuf + s->RxBufAddr,
+                              buf, size-wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            pci_dma_write(&s->dev, s->RxBuf + s->RxBufAddr,
+                          buf + (size-wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -790,13 +791,13 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    pci_dma_write(&s->dev, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
 
 #define MIN_BUF_SIZE 60
-static inline target_phys_addr_t rtl8139_addr64(uint32_t low, uint32_t high)
+static inline dma_addr_t rtl8139_addr64(uint32_t low, uint32_t high)
 {
 #if TARGET_PHYS_ADDR_BITS > 32
     return low | ((target_phys_addr_t)high << 32);
@@ -979,7 +980,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 /* w3 high 32bit of Rx buffer ptr */
 
         int descriptor = s->currCPlusRxDesc;
-        target_phys_addr_t cplus_rx_ring_desc;
+        dma_addr_t cplus_rx_ring_desc;
 
         cplus_rx_ring_desc = rtl8139_addr64(s->RxRingAddrLO, s->RxRingAddrHI);
         cplus_rx_ring_desc += 16 * descriptor;
@@ -990,13 +991,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_dma_read(&s->dev, cplus_rx_ring_desc, (uint8_t *)&val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_dma_read(&s->dev, cplus_rx_ring_desc+4, (uint8_t *)&val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        pci_dma_read(&s->dev, cplus_rx_ring_desc+8, (uint8_t *)&val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        pci_dma_read(&s->dev, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DPRINTF("+++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1060,16 +1061,16 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
             return size_;
         }
 
-        target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
+        dma_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
         if (dot1q_buf) {
-            cpu_physical_memory_write(rx_addr, buf, 2 * ETHER_ADDR_LEN);
-            cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN,
-                buf + 2 * ETHER_ADDR_LEN + VLAN_HLEN,
-                size - 2 * ETHER_ADDR_LEN);
+            pci_dma_write(&s->dev, rx_addr, buf, 2 * ETHER_ADDR_LEN);
+            pci_dma_write(&s->dev, rx_addr + 2 * ETHER_ADDR_LEN,
+                          buf + 2 * ETHER_ADDR_LEN + VLAN_HLEN,
+                          size - 2 * ETHER_ADDR_LEN);
         } else {
-            cpu_physical_memory_write(rx_addr, buf, size);
+            pci_dma_write(&s->dev, rx_addr, buf, size);
         }
 
         if (s->CpCmd & CPlusRxChkSum)
@@ -1079,7 +1080,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* write checksum */
         val = cpu_to_le32(crc32(0, buf, size_));
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        pci_dma_write(&s->dev, rx_addr+size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1125,9 +1126,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_dma_write(&s->dev, cplus_rx_ring_desc, (uint8_t *)&val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_dma_write(&s->dev, cplus_rx_ring_desc+4, (uint8_t *)&val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1307,50 +1308,51 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void RTL8139TallyCounters_dma_write(RTL8139State *s, dma_addr_t tc_addr)
 {
+    RTL8139TallyCounters *tally_counters = &s->tally_counters;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    pci_dma_write(&s->dev, tc_addr + 0,     (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    pci_dma_write(&s->dev, tc_addr + 8,     (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    pci_dma_write(&s->dev, tc_addr + 16,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    pci_dma_write(&s->dev, tc_addr + 24,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    pci_dma_write(&s->dev, tc_addr + 28,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    pci_dma_write(&s->dev, tc_addr + 30,    (uint8_t *)&val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    pci_dma_write(&s->dev, tc_addr + 32,    (uint8_t *)&val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    pci_dma_write(&s->dev, tc_addr + 36,    (uint8_t *)&val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    pci_dma_write(&s->dev, tc_addr + 40,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    pci_dma_write(&s->dev, tc_addr + 48,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    pci_dma_write(&s->dev, tc_addr + 56,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    pci_dma_write(&s->dev, tc_addr + 60,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    pci_dma_write(&s->dev, tc_addr + 62,    (uint8_t *)&val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1842,7 +1844,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DPRINTF("+++ transmit reading %d bytes from host memory at 0x%08x\n",
         txsize, s->TxAddr[descriptor]);
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    pci_dma_read(&s->dev, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1963,8 +1965,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     int descriptor = s->currCPlusTxDesc;
 
-    target_phys_addr_t cplus_tx_ring_desc =
-        rtl8139_addr64(s->TxAddr[0], s->TxAddr[1]);
+    dma_addr_t cplus_tx_ring_desc = rtl8139_addr64(s->TxAddr[0], s->TxAddr[1]);
 
     /* Normal priority ring */
     cplus_tx_ring_desc += 16 * descriptor;
@@ -1975,13 +1976,13 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_dma_read(&s->dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     txdw0 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    pci_dma_read(&s->dev, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    pci_dma_read(&s->dev, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    pci_dma_read(&s->dev, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
     txbufHI = le32_to_cpu(val);
 
     DPRINTF("+++ C+ mode TX descriptor %d %08x %08x %08x %08x\n", descriptor,
@@ -2047,7 +2048,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     }
 
     int txsize = txdw0 & CP_TX_BUFFER_SIZE_MASK;
-    target_phys_addr_t tx_addr = rtl8139_addr64(txbufLO, txbufHI);
+    dma_addr_t tx_addr = rtl8139_addr64(txbufLO, txbufHI);
 
     /* make sure we have enough space to assemble the packet */
     if (!s->cplus_txbuffer)
@@ -2089,7 +2090,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
         TARGET_FMT_plx" to offset %d\n", txsize, tx_addr,
         s->cplus_txbuffer_offset);
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    pci_dma_read(&s->dev, tx_addr,
+                 s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2116,7 +2118,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_dma_write(&s->dev, cplus_tx_ring_desc, (uint8_t *)&val, 4);
 
     /* Now decide if descriptor being processed is holding the last segment of packet */
     if (txdw0 & CP_TX_LS)
@@ -2475,7 +2477,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_dma_write(s, tc_addr);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 3/9] eepro100: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA David Gibson
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 2/9] rtl8139: Use PCI DMA stub functions David Gibson
@ 2011-09-05  4:34 ` David Gibson
  2011-10-02 11:45   ` Michael S. Tsirkin
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 4/9] ac97: " David Gibson
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

This updates the eepro100 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/eepro100.c |  113 +++++++++++++++++++++------------------------------------
 1 files changed, 41 insertions(+), 72 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 4e3c52f..9f57371 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -46,6 +46,7 @@
 #include "net.h"
 #include "eeprom93xx.h"
 #include "sysemu.h"
+#include "dma.h"
 
 /* QEMU sends frames smaller than 60 bytes to ethernet nics.
  * Such frames are rejected by real nics and their emulations.
@@ -315,38 +316,6 @@ static const uint16_t eepro100_mdi_mask[] = {
     0xffff, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
 };
 
-/* Read a 16 bit little endian value from physical memory. */
-static uint16_t e100_ldw_le_phys(target_phys_addr_t addr)
-{
-    /* Load 16 bit (little endian) word from emulated hardware. */
-    uint16_t val;
-    cpu_physical_memory_read(addr, &val, sizeof(val));
-    return le16_to_cpu(val);
-}
-
-/* Read a 32 bit little endian value from physical memory. */
-static uint32_t e100_ldl_le_phys(target_phys_addr_t addr)
-{
-    /* Load 32 bit (little endian) word from emulated hardware. */
-    uint32_t val;
-    cpu_physical_memory_read(addr, &val, sizeof(val));
-    return le32_to_cpu(val);
-}
-
-/* Write a 16 bit little endian value to physical memory. */
-static void e100_stw_le_phys(target_phys_addr_t addr, uint16_t val)
-{
-    val = cpu_to_le16(val);
-    cpu_physical_memory_write(addr, &val, sizeof(val));
-}
-
-/* Write a 32 bit little endian value to physical memory. */
-static void e100_stl_le_phys(target_phys_addr_t addr, uint32_t val)
-{
-    val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, &val, sizeof(val));
-}
-
 #define POLYNOMIAL 0x04c11db6
 
 /* From FreeBSD */
@@ -744,21 +713,22 @@ static void dump_statistics(EEPRO100State * s)
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr, &s->statistics, s->stats_size);
-    e100_stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    e100_stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    e100_stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    e100_stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    pci_dma_write(&s->dev, s->statsaddr,
+                  (uint8_t *) & s->statistics, s->stats_size);
+    stl_pci_dma(&s->dev, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_pci_dma(&s->dev, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_pci_dma(&s->dev, s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_pci_dma(&s->dev, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
-    e100_stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
-    e100_stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
+    stw_pci_dma(&s->dev, s->statsaddr + 76, s->statistics.xmt_tco_frames);
+    stw_pci_dma(&s->dev, s->statsaddr + 78, s->statistics.rcv_tco_frames);
     missing("CU dump statistical counters");
 #endif
 }
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, &s->tx, sizeof(s->tx));
+    pci_dma_read(&s->dev, s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -788,18 +758,17 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = e100_ldl_le_phys(tbd_address);
-        uint16_t tx_buffer_size = e100_ldw_le_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = ldl_pci_dma(&s->dev, tbd_address);
+        uint16_t tx_buffer_size = lduw_pci_dma(&s->dev, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = e100_ldw_le_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = lduw_pci_dma(&s->dev, tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        pci_dma_read(&s->dev, tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -810,16 +779,16 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = e100_ldl_le_phys(tbd_address);
-                uint16_t tx_buffer_size = e100_ldw_le_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = e100_ldw_le_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = ldl_pci_dma(&s->dev, tbd_address);
+                uint16_t tx_buffer_size = lduw_pci_dma(&s->dev, tbd_address + 4);
+                uint16_t tx_buffer_el = lduw_pci_dma(&s->dev, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                pci_dma_read(&s->dev, tx_buffer_address,
+                             &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -828,16 +797,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = e100_ldl_le_phys(tbd_address);
-            uint16_t tx_buffer_size = e100_ldw_le_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = e100_ldw_le_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = ldl_pci_dma(&s->dev, tbd_address);
+            uint16_t tx_buffer_size = lduw_pci_dma(&s->dev, tbd_address + 4);
+            uint16_t tx_buffer_el = lduw_pci_dma(&s->dev, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            pci_dma_read(&s->dev, tx_buffer_address,
+                         &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -862,7 +831,7 @@ static void set_multicast_list(EEPRO100State *s)
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        pci_dma_read(&s->dev, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -896,12 +865,12 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            pci_dma_read(&s->dev, s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            pci_dma_read(&s->dev, s->cb_address + 8,
+                         &s->configuration[0], sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n",
                                 nic_dump(&s->configuration[0], 16)));
             TRACE(OTHER, logout("configuration: %s\n",
@@ -938,7 +907,7 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        e100_stw_le_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        stw_pci_dma(&s->dev, s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -1005,7 +974,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        e100_stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_pci_dma(&s->dev, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -1016,7 +985,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        e100_stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_pci_dma(&s->dev, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1310,10 +1279,10 @@ static void eepro100_write_port(EEPRO100State *s)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, &data, sizeof(data));
+        pci_dma_read(&s->dev, address, (uint8_t *) & data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, &data, sizeof(data));
+        pci_dma_write(&s->dev, address, (uint8_t *) & data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1729,8 +1698,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, &rx,
-                             sizeof(eepro100_rx_t));
+    pci_dma_read(&s->dev, s->ru_base + s->ru_offset,
+                 (uint8_t *) &rx, sizeof(eepro100_rx_t));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1746,10 +1715,10 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #endif
     TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
           rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
-    e100_stw_le_phys(s->ru_base + s->ru_offset +
-                     offsetof(eepro100_rx_t, status), rfd_status);
-    e100_stw_le_phys(s->ru_base + s->ru_offset +
-                     offsetof(eepro100_rx_t, count), size);
+    stw_pci_dma(&s->dev, s->ru_base + s->ru_offset +
+                offsetof(eepro100_rx_t, status), rfd_status);
+    stw_pci_dma(&s->dev, s->ru_base + s->ru_offset +
+                offsetof(eepro100_rx_t, count), size);
     /* Early receive interrupt not supported. */
 #if 0
     eepro100_er_interrupt(s);
@@ -1763,8 +1732,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              sizeof(eepro100_rx_t), buf, size);
+    pci_dma_write(&s->dev, s->ru_base + s->ru_offset +
+                  sizeof(eepro100_rx_t), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 4/9] ac97: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
                   ` (2 preceding siblings ...)
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 3/9] eepro100: " David Gibson
@ 2011-09-05  4:34 ` David Gibson
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 5/9] es1370: " David Gibson
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

This updates the ac97 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ac97.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 541d9a4..14bdbcd 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -18,6 +18,7 @@
 #include "audiodev.h"
 #include "audio/audio.h"
 #include "pci.h"
+#include "dma.h"
 
 enum {
     AC97_Reset                     = 0x00,
@@ -224,7 +225,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    pci_dma_read (&s->dev, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -973,7 +974,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        pci_dma_read (&s->dev, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1054,7 +1055,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        pci_dma_write (&s->dev, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 5/9] es1370: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
                   ` (3 preceding siblings ...)
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 4/9] ac97: " David Gibson
@ 2011-09-05  4:35 ` David Gibson
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 6/9] e1000: " David Gibson
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

This updates the es1370 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/es1370.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/es1370.c b/hw/es1370.c
index a9387d1..fbbb94f 100644
--- a/hw/es1370.c
+++ b/hw/es1370.c
@@ -30,6 +30,7 @@
 #include "audiodev.h"
 #include "audio/audio.h"
 #include "pci.h"
+#include "dma.h"
 
 /* Missing stuff:
    SCTRL_P[12](END|ST)INC
@@ -802,7 +803,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
             if (!acquired)
                 break;
 
-            cpu_physical_memory_write (addr, tmpbuf, acquired);
+            pci_dma_write (&s->dev, addr, tmpbuf, acquired);
 
             temp -= acquired;
             addr += acquired;
@@ -816,7 +817,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
             int copied, to_copy;
 
             to_copy = audio_MIN ((size_t) temp, sizeof (tmpbuf));
-            cpu_physical_memory_read (addr, tmpbuf, to_copy);
+            pci_dma_read (&s->dev, addr, tmpbuf, to_copy);
             copied = AUD_write (voice, tmpbuf, to_copy);
             if (!copied)
                 break;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 6/9] e1000: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
                   ` (4 preceding siblings ...)
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 5/9] es1370: " David Gibson
@ 2011-09-05  4:35 ` David Gibson
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 7/9] lsi53c895a: " David Gibson
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

This updates the e1000 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/e1000.c |   29 +++++++++++++++--------------
 1 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index a6d12c5..b782b49 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -31,6 +31,7 @@
 #include "net/checksum.h"
 #include "loader.h"
 #include "sysemu.h"
+#include "dma.h"
 
 #include "e1000_hw.h"
 
@@ -465,7 +466,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
             bytes = split_size;
             if (tp->size + bytes > msh)
                 bytes = msh - tp->size;
-            cpu_physical_memory_read(addr, tp->data + tp->size, bytes);
+            pci_dma_read(&s->dev, addr, tp->data + tp->size, bytes);
             if ((sz = tp->size + bytes) >= hdr && tp->size < hdr)
                 memmove(tp->header, tp->data, hdr);
             tp->size = sz;
@@ -480,7 +481,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
         // context descriptor TSE is not set, while data descriptor TSE is set
         DBGOUT(TXERR, "TCP segmentaion Error\n");
     } else {
-        cpu_physical_memory_read(addr, tp->data + tp->size, split_size);
+        pci_dma_read(&s->dev, addr, tp->data + tp->size, split_size);
         tp->size += split_size;
     }
 
@@ -496,7 +497,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 }
 
 static uint32_t
-txdesc_writeback(target_phys_addr_t base, struct e1000_tx_desc *dp)
+txdesc_writeback(E1000State *s, dma_addr_t base, struct e1000_tx_desc *dp)
 {
     uint32_t txd_upper, txd_lower = le32_to_cpu(dp->lower.data);
 
@@ -505,8 +506,8 @@ txdesc_writeback(target_phys_addr_t base, struct e1000_tx_desc *dp)
     txd_upper = (le32_to_cpu(dp->upper.data) | E1000_TXD_STAT_DD) &
                 ~(E1000_TXD_STAT_EC | E1000_TXD_STAT_LC | E1000_TXD_STAT_TU);
     dp->upper.data = cpu_to_le32(txd_upper);
-    cpu_physical_memory_write(base + ((char *)&dp->upper - (char *)dp),
-                              (void *)&dp->upper, sizeof(dp->upper));
+    pci_dma_write(&s->dev, base + ((char *)&dp->upper - (char *)dp),
+                  (void *)&dp->upper, sizeof(dp->upper));
     return E1000_ICR_TXDW;
 }
 
@@ -521,7 +522,7 @@ static uint64_t tx_desc_base(E1000State *s)
 static void
 start_xmit(E1000State *s)
 {
-    target_phys_addr_t base;
+    dma_addr_t base;
     struct e1000_tx_desc desc;
     uint32_t tdh_start = s->mac_reg[TDH], cause = E1000_ICS_TXQE;
 
@@ -533,14 +534,14 @@ start_xmit(E1000State *s)
     while (s->mac_reg[TDH] != s->mac_reg[TDT]) {
         base = tx_desc_base(s) +
                sizeof(struct e1000_tx_desc) * s->mac_reg[TDH];
-        cpu_physical_memory_read(base, (void *)&desc, sizeof(desc));
+        pci_dma_read(&s->dev, base, (void *)&desc, sizeof(desc));
 
         DBGOUT(TX, "index %d: %p : %x %x\n", s->mac_reg[TDH],
                (void *)(intptr_t)desc.buffer_addr, desc.lower.data,
                desc.upper.data);
 
         process_tx_desc(s, &desc);
-        cause |= txdesc_writeback(base, &desc);
+        cause |= txdesc_writeback(s, base, &desc);
 
         if (++s->mac_reg[TDH] * sizeof(desc) >= s->mac_reg[TDLEN])
             s->mac_reg[TDH] = 0;
@@ -668,7 +669,7 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
 {
     E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
     struct e1000_rx_desc desc;
-    target_phys_addr_t base;
+    dma_addr_t base;
     unsigned int n, rdt;
     uint32_t rdh_start;
     uint16_t vlan_special = 0;
@@ -713,7 +714,7 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
             desc_size = s->rxbuf_size;
         }
         base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
-        cpu_physical_memory_read(base, (void *)&desc, sizeof(desc));
+        pci_dma_read(&s->dev, base, (void *)&desc, sizeof(desc));
         desc.special = vlan_special;
         desc.status |= (vlan_status | E1000_RXD_STAT_DD);
         if (desc.buffer_addr) {
@@ -722,9 +723,9 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
                 if (copy_size > s->rxbuf_size) {
                     copy_size = s->rxbuf_size;
                 }
-                cpu_physical_memory_write(le64_to_cpu(desc.buffer_addr),
-                                          (void *)(buf + desc_offset + vlan_offset),
-                                          copy_size);
+                pci_dma_write(&s->dev, le64_to_cpu(desc.buffer_addr),
+                                 (void *)(buf + desc_offset + vlan_offset),
+                                 copy_size);
             }
             desc_offset += desc_size;
             desc.length = cpu_to_le16(desc_size);
@@ -738,7 +739,7 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
         } else { // as per intel docs; skip descriptors with null buf addr
             DBGOUT(RX, "Null RX descriptor!!\n");
         }
-        cpu_physical_memory_write(base, (void *)&desc, sizeof(desc));
+        pci_dma_write(&s->dev, base, (void *)&desc, sizeof(desc));
 
         if (++s->mac_reg[RDH] * sizeof(desc) >= s->mac_reg[RDLEN])
             s->mac_reg[RDH] = 0;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 7/9] lsi53c895a: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
                   ` (5 preceding siblings ...)
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 6/9] e1000: " David Gibson
@ 2011-09-05  4:35 ` David Gibson
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 8/9] pcnet-pci: " David Gibson
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

This updates the lsi53c895a device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/lsi53c895a.c |   30 ++++++++++++++----------------
 1 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 1643a63..f1f6800 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -16,6 +16,7 @@
 #include "pci.h"
 #include "scsi.h"
 #include "block_int.h"
+#include "dma.h"
 
 //#define DEBUG_LSI
 //#define DEBUG_LSI_REG
@@ -391,10 +392,7 @@ static inline uint32_t read_dword(LSIState *s, uint32_t addr)
 {
     uint32_t buf;
 
-    /* XXX: an optimization here used to fast-path the read from scripts
-     * memory.  But that bypasses any iommu.
-     */
-    cpu_physical_memory_read(addr, (uint8_t *)&buf, 4);
+    pci_dma_read(&s->dev, addr, (uint8_t *)&buf, 4);
     return cpu_to_le32(buf);
 }
 
@@ -533,7 +531,7 @@ static void lsi_bad_selection(LSIState *s, uint32_t id)
 static void lsi_do_dma(LSIState *s, int out)
 {
     uint32_t count, id;
-    target_phys_addr_t addr;
+    dma_addr_t addr;
     SCSIDevice *dev;
 
     assert(s->current);
@@ -572,9 +570,9 @@ static void lsi_do_dma(LSIState *s, int out)
     }
     /* ??? Set SFBR to first data byte.  */
     if (out) {
-        cpu_physical_memory_read(addr, s->current->dma_buf, count);
+        pci_dma_read(&s->dev, addr, s->current->dma_buf, count);
     } else {
-        cpu_physical_memory_write(addr, s->current->dma_buf, count);
+        pci_dma_write(&s->dev, addr, s->current->dma_buf, count);
     }
     s->current->dma_len -= count;
     if (s->current->dma_len == 0) {
@@ -767,7 +765,7 @@ static void lsi_do_command(LSIState *s)
     DPRINTF("Send command len=%d\n", s->dbc);
     if (s->dbc > 16)
         s->dbc = 16;
-    cpu_physical_memory_read(s->dnad, buf, s->dbc);
+    pci_dma_read(&s->dev, s->dnad, buf, s->dbc);
     s->sfbr = buf[0];
     s->command_complete = 0;
 
@@ -818,7 +816,7 @@ static void lsi_do_status(LSIState *s)
     s->dbc = 1;
     status = s->status;
     s->sfbr = status;
-    cpu_physical_memory_write(s->dnad, &status, 1);
+    pci_dma_write(&s->dev, s->dnad, &status, 1);
     lsi_set_phase(s, PHASE_MI);
     s->msg_action = 1;
     lsi_add_msg_byte(s, 0); /* COMMAND COMPLETE */
@@ -832,7 +830,7 @@ static void lsi_do_msgin(LSIState *s)
     len = s->msg_len;
     if (len > s->dbc)
         len = s->dbc;
-    cpu_physical_memory_write(s->dnad, s->msg, len);
+    pci_dma_write(&s->dev, s->dnad, s->msg, len);
     /* Linux drivers rely on the last byte being in the SIDL.  */
     s->sidl = s->msg[len - 1];
     s->msg_len -= len;
@@ -864,7 +862,7 @@ static void lsi_do_msgin(LSIState *s)
 static uint8_t lsi_get_msgbyte(LSIState *s)
 {
     uint8_t data;
-    cpu_physical_memory_read(s->dnad, &data, 1);
+    pci_dma_read(&s->dev, s->dnad, &data, 1);
     s->dnad++;
     s->dbc--;
     return data;
@@ -1019,8 +1017,8 @@ static void lsi_memcpy(LSIState *s, uint32_t dest, uint32_t src, int count)
     DPRINTF("memcpy dest 0x%08x src 0x%08x count %d\n", dest, src, count);
     while (count) {
         n = (count > LSI_BUF_SIZE) ? LSI_BUF_SIZE : count;
-        cpu_physical_memory_read(src, buf, n);
-        cpu_physical_memory_write(dest, buf, n);
+        pci_dma_read(&s->dev, src, buf, n);
+        pci_dma_write(&s->dev, dest, buf, n);
         src += n;
         dest += n;
         count -= n;
@@ -1088,7 +1086,7 @@ again:
 
             /* 32-bit Table indirect */
             offset = sxt24(addr);
-            cpu_physical_memory_read(s->dsa + offset, (uint8_t *)buf, 8);
+            pci_dma_read(&s->dev, s->dsa + offset, (uint8_t *)buf, 8);
             /* byte count is stored in bits 0:23 only */
             s->dbc = cpu_to_le32(buf[0]) & 0xffffff;
             s->rbc = s->dbc;
@@ -1447,7 +1445,7 @@ again:
             n = (insn & 7);
             reg = (insn >> 16) & 0xff;
             if (insn & (1 << 24)) {
-                cpu_physical_memory_read(addr, data, n);
+                pci_dma_read(&s->dev, addr, data, n);
                 DPRINTF("Load reg 0x%x size %d addr 0x%08x = %08x\n", reg, n,
                         addr, *(int *)data);
                 for (i = 0; i < n; i++) {
@@ -1458,7 +1456,7 @@ again:
                 for (i = 0; i < n; i++) {
                     data[i] = lsi_reg_readb(s, reg + i);
                 }
-                cpu_physical_memory_write(addr, data, n);
+                pci_dma_write(&s->dev, addr, data, n);
             }
         }
     }
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 8/9] pcnet-pci: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
                   ` (6 preceding siblings ...)
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 7/9] lsi53c895a: " David Gibson
@ 2011-09-05  4:35 ` David Gibson
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 9/9] intel-hda: " David Gibson
  2011-09-16  6:58 ` [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

This updates the pcnet-pci device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/pcnet-pci.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 51e1320..40403a9 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -31,6 +31,7 @@
 #include "net.h"
 #include "loader.h"
 #include "qemu-timer.h"
+#include "dma.h"
 
 #include "pcnet.h"
 
@@ -230,13 +231,13 @@ static const MemoryRegionOps pcnet_mmio_ops = {
 static void pci_physical_memory_write(void *dma_opaque, target_phys_addr_t addr,
                                       uint8_t *buf, int len, int do_bswap)
 {
-    cpu_physical_memory_write(addr, buf, len);
+    pci_dma_write(dma_opaque, addr, buf, len);
 }
 
 static void pci_physical_memory_read(void *dma_opaque, target_phys_addr_t addr,
                                      uint8_t *buf, int len, int do_bswap)
 {
-    cpu_physical_memory_read(addr, buf, len);
+    pci_dma_read(dma_opaque, addr, buf, len);
 }
 
 static void pci_pcnet_cleanup(VLANClientState *nc)
@@ -302,6 +303,7 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
     s->irq = pci_dev->irq[0];
     s->phys_mem_read = pci_physical_memory_read;
     s->phys_mem_write = pci_physical_memory_write;
+    s->dma_opaque = pci_dev;
 
     if (!pci_dev->qdev.hotplugged) {
         static int loaded = 0;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 9/9] intel-hda: Use PCI DMA stub functions
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
                   ` (7 preceding siblings ...)
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 8/9] pcnet-pci: " David Gibson
@ 2011-09-05  4:35 ` David Gibson
  2011-09-16  6:58 ` [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-05  4:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, kraxel, mst, joerg.roedel, agraf, avi, eduard.munteanu,
	rth

This updates the intel-hda device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/intel-hda.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/intel-hda.c b/hw/intel-hda.c
index 4272204..94937cf 100644
--- a/hw/intel-hda.c
+++ b/hw/intel-hda.c
@@ -24,6 +24,7 @@
 #include "audiodev.h"
 #include "intel-hda.h"
 #include "intel-hda-defs.h"
+#include "dma.h"
 
 /* --------------------------------------------------------------------- */
 /* hda bus                                                               */
@@ -328,7 +329,7 @@ static void intel_hda_corb_run(IntelHDAState *d)
 
         rp = (d->corb_rp + 1) & 0xff;
         addr = intel_hda_addr(d->corb_lbase, d->corb_ubase);
-        verb = ldl_le_phys(addr + 4*rp);
+        verb = ldl_pci_dma(&d->pci, addr + 4*rp);
         d->corb_rp = rp;
 
         dprint(d, 2, "%s: [rp 0x%x] verb 0x%08x\n", __FUNCTION__, rp, verb);
@@ -360,8 +361,8 @@ static void intel_hda_response(HDACodecDevice *dev, bool solicited, uint32_t res
     ex = (solicited ? 0 : (1 << 4)) | dev->cad;
     wp = (d->rirb_wp + 1) & 0xff;
     addr = intel_hda_addr(d->rirb_lbase, d->rirb_ubase);
-    stl_le_phys(addr + 8*wp, response);
-    stl_le_phys(addr + 8*wp + 4, ex);
+    stl_pci_dma(&d->pci, addr + 8*wp, response);
+    stl_pci_dma(&d->pci, addr + 8*wp + 4, ex);
     d->rirb_wp = wp;
 
     dprint(d, 2, "%s: [wp 0x%x] response 0x%x, extra 0x%x\n",
@@ -425,8 +426,7 @@ static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t stnr, bool output,
         dprint(d, 3, "dma: entry %d, pos %d/%d, copy %d\n",
                st->be, st->bp, st->bpl[st->be].len, copy);
 
-        cpu_physical_memory_rw(st->bpl[st->be].addr + st->bp,
-                               buf, copy, !output);
+        pci_dma_rw(&d->pci, st->bpl[st->be].addr + st->bp, buf, copy, !output);
         st->lpib += copy;
         st->bp += copy;
         buf += copy;
@@ -448,7 +448,7 @@ static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t stnr, bool output,
     }
     if (d->dp_lbase & 0x01) {
         addr = intel_hda_addr(d->dp_lbase & ~0x01, d->dp_ubase);
-        stl_le_phys(addr + 8*s, st->lpib);
+        stl_pci_dma(&d->pci, addr + 8*s, st->lpib);
     }
     dprint(d, 3, "dma: --\n");
 
@@ -470,7 +470,7 @@ static void intel_hda_parse_bdl(IntelHDAState *d, IntelHDAStream *st)
     g_free(st->bpl);
     st->bpl = g_malloc(sizeof(bpl) * st->bentries);
     for (i = 0; i < st->bentries; i++, addr += 16) {
-        cpu_physical_memory_read(addr, buf, 16);
+        pci_dma_read(&d->pci, addr, buf, 16);
         st->bpl[i].addr  = le64_to_cpu(*(uint64_t *)buf);
         st->bpl[i].len   = le32_to_cpu(*(uint32_t *)(buf + 8));
         st->bpl[i].flags = le32_to_cpu(*(uint32_t *)(buf + 12));
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2)
  2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
                   ` (8 preceding siblings ...)
  2011-09-05  4:35 ` [Qemu-devel] [PATCH 9/9] intel-hda: " David Gibson
@ 2011-09-16  6:58 ` David Gibson
  9 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-09-16  6:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, mst, joerg.roedel, agraf, avi, eduard.munteanu, rth,
	kraxel

So.. there were no serious objections to the last version of this
series, AFAICT, but no response at all to this one.

Anthony, please apply?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA David Gibson
@ 2011-10-02 10:25   ` Michael S. Tsirkin
  2011-10-02 10:29     ` Avi Kivity
  2011-10-03 13:10     ` Anthony Liguori
  0 siblings, 2 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-02 10:25 UTC (permalink / raw)
  To: David Gibson
  Cc: aliguori, kraxel, joerg.roedel, qemu-devel, agraf, avi,
	eduard.munteanu, rth

On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> present, these are just stubs which perform directly cpu physical memory
> accesses.
> 
> Using these stubs, however, distinguishes PCI device DMA transactions from
> other accesses to physical memory, which will allow PCI IOMMU support to
> be added in one place, rather than updating every PCI driver at that time.
> 
> That is, it allows us to update individual PCI drivers to support an IOMMU
> without having yet determined the details of how the IOMMU emulation will
> operate.  This will let us remove the most bitrot-sensitive part of an
> IOMMU patch in advance.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

So something I just thought about:

all wrappers now go through cpu_physical_memory_rw.
This is a problem as e.g. virtio assumes that
accesses such as stw are atomic. cpu_physical_memory_rw
is a memcpy which makes no such guarantees.

> ---
>  dma.h    |    2 ++
>  hw/pci.c |   31 +++++++++++++++++++++++++++++++
>  hw/pci.h |   33 +++++++++++++++++++++++++++++++++
>  3 files changed, 66 insertions(+), 0 deletions(-)
> 
> diff --git a/dma.h b/dma.h
> index a6db5ba..06e91cb 100644
> --- a/dma.h
> +++ b/dma.h
> @@ -15,6 +15,8 @@
>  #include "hw/hw.h"
>  #include "block.h"
>  
> +typedef target_phys_addr_t dma_addr_t;
> +
>  typedef struct {
>      target_phys_addr_t base;
>      target_phys_addr_t len;
> diff --git a/hw/pci.c b/hw/pci.c
> index 1cdcbb7..0be7611 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -2211,3 +2211,34 @@ MemoryRegion *pci_address_space(PCIDevice *dev)
>  {
>      return dev->bus->address_space_mem;
>  }
> +
> +#define PCI_DMA_DEFINE_LDST(_lname, _sname, _bits) \
> +    uint##_bits##_t ld##_lname##_pci_dma(PCIDevice *dev, dma_addr_t addr) \
> +    { \
> +        uint##_bits##_t val; \
> +        pci_dma_read(dev, addr, &val, sizeof(val)); \
> +        return le##_bits##_to_cpu(val); \
> +    } \
> +    void st##_sname##_pci_dma(PCIDevice *dev, \
> +                              dma_addr_t addr, uint##_bits##_t val) \
> +    { \
> +        val = cpu_to_le##_bits(val); \
> +        pci_dma_write(dev, addr, &val, sizeof(val)); \
> +    }
> +

I am still not 100% positive why do we do the LE conversions here.
st4_phys and friends don't seem to do it ...
Has something to do with the fact we pass a value as an array?
Probably worth a comment.

> +uint8_t ldub_pci_dma(PCIDevice *dev, dma_addr_t addr)
> +{
> +    uint8_t val;
> +
> +    pci_dma_read(dev, addr, &val, sizeof(val));
> +    return val;
> +}
> +
> +void stb_pci_dma(PCIDevice *dev, dma_addr_t addr, uint8_t val)
> +{
> +    pci_dma_write(dev, addr, &val, sizeof(val));
> +}
> +

pci_ XXX would be better names?

> +PCI_DMA_DEFINE_LDST(uw, w, 16);
> +PCI_DMA_DEFINE_LDST(l, l, 32);
> +PCI_DMA_DEFINE_LDST(q, q, 64);
> diff --git a/hw/pci.h b/hw/pci.h
> index 391217e..4426e9d 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -6,6 +6,7 @@
>  
>  #include "qdev.h"
>  #include "memory.h"
> +#include "dma.h"
>  
>  /* PCI includes legacy ISA access.  */
>  #include "isa.h"
> @@ -492,4 +493,36 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
>      return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
>  }
>  
> +/* DMA access functions */
> +static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
> +                             void *buf, dma_addr_t len, int is_write)
> +{
> +    cpu_physical_memory_rw(addr, buf, len, is_write);
> +    return 0;
> +}
> +
> +static inline int pci_dma_read(PCIDevice *dev, dma_addr_t addr,
> +                               void *buf, dma_addr_t len)
> +{
> +    return pci_dma_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline int pci_dma_write(PCIDevice *dev, dma_addr_t addr,
> +                                const void *buf, dma_addr_t len)
> +{
> +    return pci_dma_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +#define PCI_DMA_DECLARE_LDST(_lname, _sname, _bits) \
> +    uint##_bits##_t ld##_lname##_pci_dma(PCIDevice *dev, dma_addr_t addr); \
> +    void st##_sname##_pci_dma(PCIDevice *dev, dma_addr_t addr, \
> +                              uint##_bits##_t val);            \
> +
> +PCI_DMA_DECLARE_LDST(ub, b, 8);
> +PCI_DMA_DECLARE_LDST(uw, w, 16);
> +PCI_DMA_DECLARE_LDST(l, l, 32);
> +PCI_DMA_DECLARE_LDST(q, q, 64);
> +
> +#undef DECLARE_LDST_DMA
> +

I think macros should just create stX_phys/ldX_phys calls
directly, in the .h file. This will also make it clearer what is going on,
with less levels of indirection.



>  #endif
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 10:25   ` Michael S. Tsirkin
@ 2011-10-02 10:29     ` Avi Kivity
  2011-10-02 10:52       ` Michael S. Tsirkin
  2011-10-03 13:10     ` Anthony Liguori
  1 sibling, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2011-10-02 10:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, joerg.roedel, qemu-devel, agraf, kraxel,
	eduard.munteanu, David Gibson, rth

On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> >  This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> >  present, these are just stubs which perform directly cpu physical memory
> >  accesses.
> >
> >  Using these stubs, however, distinguishes PCI device DMA transactions from
> >  other accesses to physical memory, which will allow PCI IOMMU support to
> >  be added in one place, rather than updating every PCI driver at that time.
> >
> >  That is, it allows us to update individual PCI drivers to support an IOMMU
> >  without having yet determined the details of how the IOMMU emulation will
> >  operate.  This will let us remove the most bitrot-sensitive part of an
> >  IOMMU patch in advance.
> >
> >  Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
>
> So something I just thought about:
>
> all wrappers now go through cpu_physical_memory_rw.
> This is a problem as e.g. virtio assumes that
> accesses such as stw are atomic. cpu_physical_memory_rw
> is a memcpy which makes no such guarantees.
>

Let's change cpu_physical_memory_rw() to provide that guarantee for 
aligned two and four byte accesses.  Having separate paths just for that 
is not maintainable.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 10:29     ` Avi Kivity
@ 2011-10-02 10:52       ` Michael S. Tsirkin
  2011-10-02 10:58         ` Avi Kivity
  2011-10-12  3:11         ` David Gibson
  0 siblings, 2 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-02 10:52 UTC (permalink / raw)
  To: Avi Kivity
  Cc: aliguori, joerg.roedel, qemu-devel, agraf, kraxel,
	eduard.munteanu, David Gibson, rth

On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> >>  This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> >>  present, these are just stubs which perform directly cpu physical memory
> >>  accesses.
> >>
> >>  Using these stubs, however, distinguishes PCI device DMA transactions from
> >>  other accesses to physical memory, which will allow PCI IOMMU support to
> >>  be added in one place, rather than updating every PCI driver at that time.
> >>
> >>  That is, it allows us to update individual PCI drivers to support an IOMMU
> >>  without having yet determined the details of how the IOMMU emulation will
> >>  operate.  This will let us remove the most bitrot-sensitive part of an
> >>  IOMMU patch in advance.
> >>
> >>  Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> >
> >So something I just thought about:
> >
> >all wrappers now go through cpu_physical_memory_rw.
> >This is a problem as e.g. virtio assumes that
> >accesses such as stw are atomic. cpu_physical_memory_rw
> >is a memcpy which makes no such guarantees.
> >
> 
> Let's change cpu_physical_memory_rw() to provide that guarantee for
> aligned two and four byte accesses.  Having separate paths just for
> that is not maintainable.

Well, we also have stX_phys convert to target native endian-ness
(nop for KVM but not necessarily for qemu).

So if we do what you suggest, this patch will become more correct, but
it would still need to duplicate the endian-ness work.

For that reason, I think calling stX_phys and friends from pci
makes more sense - we get more simple inline wrappers
but that code duplication worries me much less than tricky
endian-ness hidden within a macro.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 10:52       ` Michael S. Tsirkin
@ 2011-10-02 10:58         ` Avi Kivity
  2011-10-02 11:17           ` Michael S. Tsirkin
  2011-10-12  3:11         ` David Gibson
  1 sibling, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2011-10-02 10:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, joerg.roedel, qemu-devel, agraf, kraxel,
	eduard.munteanu, David Gibson, rth

On 10/02/2011 12:52 PM, Michael S. Tsirkin wrote:
> On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> >  On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> >  >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> >  >>   This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> >  >>   present, these are just stubs which perform directly cpu physical memory
> >  >>   accesses.
> >  >>
> >  >>   Using these stubs, however, distinguishes PCI device DMA transactions from
> >  >>   other accesses to physical memory, which will allow PCI IOMMU support to
> >  >>   be added in one place, rather than updating every PCI driver at that time.
> >  >>
> >  >>   That is, it allows us to update individual PCI drivers to support an IOMMU
> >  >>   without having yet determined the details of how the IOMMU emulation will
> >  >>   operate.  This will let us remove the most bitrot-sensitive part of an
> >  >>   IOMMU patch in advance.
> >  >>
> >  >>   Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> >  >
> >  >So something I just thought about:
> >  >
> >  >all wrappers now go through cpu_physical_memory_rw.
> >  >This is a problem as e.g. virtio assumes that
> >  >accesses such as stw are atomic. cpu_physical_memory_rw
> >  >is a memcpy which makes no such guarantees.
> >  >
> >
> >  Let's change cpu_physical_memory_rw() to provide that guarantee for
> >  aligned two and four byte accesses.  Having separate paths just for
> >  that is not maintainable.
>
> Well, we also have stX_phys convert to target native endian-ness
> (nop for KVM but not necessarily for qemu).
>
> So if we do what you suggest, this patch will become more correct, but
> it would still need to duplicate the endian-ness work.
>
> For that reason, I think calling stX_phys and friends from pci
> makes more sense - we get more simple inline wrappers
> but that code duplication worries me much less than tricky
> endian-ness hidden within a macro.
>

Good point.  Though this is really a virtio specific issue since other 
devices have explicit endianness (not guest dependent).

I think endian conversion is best made explicit in virtio (like e1000 
does explicit conversions to little endian).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 10:58         ` Avi Kivity
@ 2011-10-02 11:17           ` Michael S. Tsirkin
  2011-10-02 11:28             ` Alexander Graf
  2011-10-02 12:01             ` Avi Kivity
  0 siblings, 2 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-02 11:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: aliguori, joerg.roedel, qemu-devel, agraf, kraxel,
	eduard.munteanu, David Gibson, rth

On Sun, Oct 02, 2011 at 12:58:35PM +0200, Avi Kivity wrote:
> On 10/02/2011 12:52 PM, Michael S. Tsirkin wrote:
> >On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> >>  On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> >>  >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> >>  >>   This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> >>  >>   present, these are just stubs which perform directly cpu physical memory
> >>  >>   accesses.
> >>  >>
> >>  >>   Using these stubs, however, distinguishes PCI device DMA transactions from
> >>  >>   other accesses to physical memory, which will allow PCI IOMMU support to
> >>  >>   be added in one place, rather than updating every PCI driver at that time.
> >>  >>
> >>  >>   That is, it allows us to update individual PCI drivers to support an IOMMU
> >>  >>   without having yet determined the details of how the IOMMU emulation will
> >>  >>   operate.  This will let us remove the most bitrot-sensitive part of an
> >>  >>   IOMMU patch in advance.
> >>  >>
> >>  >>   Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> >>  >
> >>  >So something I just thought about:
> >>  >
> >>  >all wrappers now go through cpu_physical_memory_rw.
> >>  >This is a problem as e.g. virtio assumes that
> >>  >accesses such as stw are atomic. cpu_physical_memory_rw
> >>  >is a memcpy which makes no such guarantees.
> >>  >
> >>
> >>  Let's change cpu_physical_memory_rw() to provide that guarantee for
> >>  aligned two and four byte accesses.  Having separate paths just for
> >>  that is not maintainable.
> >
> >Well, we also have stX_phys convert to target native endian-ness
> >(nop for KVM but not necessarily for qemu).
> >
> >So if we do what you suggest, this patch will become more correct, but
> >it would still need to duplicate the endian-ness work.
> >
> >For that reason, I think calling stX_phys and friends from pci
> >makes more sense - we get more simple inline wrappers
> >but that code duplication worries me much less than tricky
> >endian-ness hidden within a macro.
> >
> 
> Good point.  Though this is really a virtio specific issue since
> other devices have explicit endianness (not guest dependent).

Hmm, not entirely virtio specific, some devices use stX macros to do the
conversion.  E.g. stw_be_phys and stl_le_phys are used in several
places.

> I think endian conversion is best made explicit in virtio (like
> e1000 does explicit conversions to little endian).

That's certainly possible. Though it's hard to see why duplicating e.g.

static void e100_stw_le_phys(target_phys_addr_t addr, uint16_t val)
{
    val = cpu_to_le16(val);
    cpu_physical_memory_write(addr, &val, sizeof(val));
}

is a better idea than a central utility that does this.
Maybe the address is not guaranteed to be aligned in the e100
case.


> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 11:17           ` Michael S. Tsirkin
@ 2011-10-02 11:28             ` Alexander Graf
  2011-10-02 11:43               ` Michael S. Tsirkin
  2011-10-02 12:01             ` Avi Kivity
  1 sibling, 1 reply; 42+ messages in thread
From: Alexander Graf @ 2011-10-02 11:28 UTC (permalink / raw)
  To: Michael S.Tsirkin
  Cc: aliguori, kraxel, joerg.roedel, qemu-devel, Avi Kivity,
	eduard.munteanu, David Gibson, rth


On 02.10.2011, at 13:17, Michael S. Tsirkin wrote:

> On Sun, Oct 02, 2011 at 12:58:35PM +0200, Avi Kivity wrote:
>> On 10/02/2011 12:52 PM, Michael S. Tsirkin wrote:
>>> On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
>>>> On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
>>>>>>  This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
>>>>>>  present, these are just stubs which perform directly cpu physical memory
>>>>>>  accesses.
>>>>>> 
>>>>>>  Using these stubs, however, distinguishes PCI device DMA transactions from
>>>>>>  other accesses to physical memory, which will allow PCI IOMMU support to
>>>>>>  be added in one place, rather than updating every PCI driver at that time.
>>>>>> 
>>>>>>  That is, it allows us to update individual PCI drivers to support an IOMMU
>>>>>>  without having yet determined the details of how the IOMMU emulation will
>>>>>>  operate.  This will let us remove the most bitrot-sensitive part of an
>>>>>>  IOMMU patch in advance.
>>>>>> 
>>>>>>  Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
>>>>> 
>>>>> So something I just thought about:
>>>>> 
>>>>> all wrappers now go through cpu_physical_memory_rw.
>>>>> This is a problem as e.g. virtio assumes that
>>>>> accesses such as stw are atomic. cpu_physical_memory_rw
>>>>> is a memcpy which makes no such guarantees.
>>>>> 
>>>> 
>>>> Let's change cpu_physical_memory_rw() to provide that guarantee for
>>>> aligned two and four byte accesses.  Having separate paths just for
>>>> that is not maintainable.
>>> 
>>> Well, we also have stX_phys convert to target native endian-ness
>>> (nop for KVM but not necessarily for qemu).
>>> 
>>> So if we do what you suggest, this patch will become more correct, but
>>> it would still need to duplicate the endian-ness work.
>>> 
>>> For that reason, I think calling stX_phys and friends from pci
>>> makes more sense - we get more simple inline wrappers
>>> but that code duplication worries me much less than tricky
>>> endian-ness hidden within a macro.
>>> 
>> 
>> Good point.  Though this is really a virtio specific issue since
>> other devices have explicit endianness (not guest dependent).
> 
> Hmm, not entirely virtio specific, some devices use stX macros to do the
> conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> places.

Yes, explicit endianness. Virtio is the only device type we support in QEMU that changes its endianness depending on the guest CPU. All other devices are independent of the guest CPU we're targeting.


Alex

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 11:28             ` Alexander Graf
@ 2011-10-02 11:43               ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-02 11:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: aliguori, kraxel, joerg.roedel, qemu-devel, Avi Kivity,
	eduard.munteanu, David Gibson, rth

On Sun, Oct 02, 2011 at 01:28:37PM +0200, Alexander Graf wrote:
> >> Good point.  Though this is really a virtio specific issue since
> >> other devices have explicit endianness (not guest dependent).
> > 
> > Hmm, not entirely virtio specific, some devices use stX macros to do the
> > conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> > places.
> 
> Yes, explicit endianness. Virtio is the only device type we support in QEMU that changes its endianness depending on the guest CPU. All other devices are independent of the guest CPU we're targeting.
> 
> 
> Alex

True I think, for pci devices. And virtio bypasses the iommu
anyway, so we don't need to worry about it. But the point is that it
makes sense to support endian-ness handling in the core.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] eepro100: Use PCI DMA stub functions
  2011-09-05  4:34 ` [Qemu-devel] [PATCH 3/9] eepro100: " David Gibson
@ 2011-10-02 11:45   ` Michael S. Tsirkin
  2011-10-14  2:15     ` David Gibson
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-02 11:45 UTC (permalink / raw)
  To: David Gibson
  Cc: aliguori, kraxel, joerg.roedel, qemu-devel, agraf, avi,
	eduard.munteanu, rth

On Mon, Sep 05, 2011 at 02:34:58PM +1000, David Gibson wrote:
> From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> 
> This updates the eepro100 device emulation to use the explicit PCI DMA
> functions, instead of directly calling physical memory access functions.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
>  hw/eepro100.c |  113 +++++++++++++++++++++------------------------------------
>  1 files changed, 41 insertions(+), 72 deletions(-)
> 
> diff --git a/hw/eepro100.c b/hw/eepro100.c
> index 4e3c52f..9f57371 100644
> --- a/hw/eepro100.c
> +++ b/hw/eepro100.c
> @@ -46,6 +46,7 @@
>  #include "net.h"
>  #include "eeprom93xx.h"
>  #include "sysemu.h"
> +#include "dma.h"
>  
>  /* QEMU sends frames smaller than 60 bytes to ethernet nics.
>   * Such frames are rejected by real nics and their emulations.
> @@ -315,38 +316,6 @@ static const uint16_t eepro100_mdi_mask[] = {
>      0xffff, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
>  };
>  
> -/* Read a 16 bit little endian value from physical memory. */
> -static uint16_t e100_ldw_le_phys(target_phys_addr_t addr)
> -{
> -    /* Load 16 bit (little endian) word from emulated hardware. */
> -    uint16_t val;
> -    cpu_physical_memory_read(addr, &val, sizeof(val));
> -    return le16_to_cpu(val);
> -}
> -
> -/* Read a 32 bit little endian value from physical memory. */
> -static uint32_t e100_ldl_le_phys(target_phys_addr_t addr)
> -{
> -    /* Load 32 bit (little endian) word from emulated hardware. */
> -    uint32_t val;
> -    cpu_physical_memory_read(addr, &val, sizeof(val));
> -    return le32_to_cpu(val);
> -}
> -
> -/* Write a 16 bit little endian value to physical memory. */
> -static void e100_stw_le_phys(target_phys_addr_t addr, uint16_t val)
> -{
> -    val = cpu_to_le16(val);
> -    cpu_physical_memory_write(addr, &val, sizeof(val));
> -}
> -
> -/* Write a 32 bit little endian value to physical memory. */
> -static void e100_stl_le_phys(target_phys_addr_t addr, uint32_t val)
> -{
> -    val = cpu_to_le32(val);
> -    cpu_physical_memory_write(addr, &val, sizeof(val));
> -}
> -
>  #define POLYNOMIAL 0x04c11db6
>  
>  /* From FreeBSD */
> @@ -744,21 +713,22 @@ static void dump_statistics(EEPRO100State * s)
>       * values which really matter.
>       * Number of data should check configuration!!!
>       */
> -    cpu_physical_memory_write(s->statsaddr, &s->statistics, s->stats_size);
> -    e100_stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
> -    e100_stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
> -    e100_stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
> -    e100_stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
> +    pci_dma_write(&s->dev, s->statsaddr,
> +                  (uint8_t *) & s->statistics, s->stats_size);
> +    stl_pci_dma(&s->dev, s->statsaddr + 0, s->statistics.tx_good_frames);
> +    stl_pci_dma(&s->dev, s->statsaddr + 36, s->statistics.rx_good_frames);
> +    stl_pci_dma(&s->dev, s->statsaddr + 48, s->statistics.rx_resource_errors);
> +    stl_pci_dma(&s->dev, s->statsaddr + 60, s->statistics.rx_short_frame_errors);

At least old stl macros assumed an aligned address.
Not sure it's still the case but for e100 address might
not be aligned I think.

>  #if 0
> -    e100_stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
> -    e100_stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
> +    stw_pci_dma(&s->dev, s->statsaddr + 76, s->statistics.xmt_tco_frames);
> +    stw_pci_dma(&s->dev, s->statsaddr + 78, s->statistics.rcv_tco_frames);
>      missing("CU dump statistical counters");
>  #endif
>  }
>  
>  static void read_cb(EEPRO100State *s)
>  {
> -    cpu_physical_memory_read(s->cb_address, &s->tx, sizeof(s->tx));
> +    pci_dma_read(&s->dev, s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
>      s->tx.status = le16_to_cpu(s->tx.status);
>      s->tx.command = le16_to_cpu(s->tx.command);
>      s->tx.link = le32_to_cpu(s->tx.link);
> @@ -788,18 +758,17 @@ static void tx_command(EEPRO100State *s)
>      }
>      assert(tcb_bytes <= sizeof(buf));
>      while (size < tcb_bytes) {
> -        uint32_t tx_buffer_address = e100_ldl_le_phys(tbd_address);
> -        uint16_t tx_buffer_size = e100_ldw_le_phys(tbd_address + 4);
> +        uint32_t tx_buffer_address = ldl_pci_dma(&s->dev, tbd_address);
> +        uint16_t tx_buffer_size = lduw_pci_dma(&s->dev, tbd_address + 4);
>  #if 0
> -        uint16_t tx_buffer_el = e100_ldw_le_phys(tbd_address + 6);
> +        uint16_t tx_buffer_el = lduw_pci_dma(&s->dev, tbd_address + 6);
>  #endif
>          tbd_address += 8;
>          TRACE(RXTX, logout
>              ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
>               tx_buffer_address, tx_buffer_size));
>          tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
> -        cpu_physical_memory_read(tx_buffer_address, &buf[size],
> -                                 tx_buffer_size);
> +        pci_dma_read(&s->dev, tx_buffer_address, &buf[size], tx_buffer_size);
>          size += tx_buffer_size;
>      }
>      if (tbd_array == 0xffffffff) {
> @@ -810,16 +779,16 @@ static void tx_command(EEPRO100State *s)
>          if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
>              /* Extended Flexible TCB. */
>              for (; tbd_count < 2; tbd_count++) {
> -                uint32_t tx_buffer_address = e100_ldl_le_phys(tbd_address);
> -                uint16_t tx_buffer_size = e100_ldw_le_phys(tbd_address + 4);
> -                uint16_t tx_buffer_el = e100_ldw_le_phys(tbd_address + 6);
> +                uint32_t tx_buffer_address = ldl_pci_dma(&s->dev, tbd_address);
> +                uint16_t tx_buffer_size = lduw_pci_dma(&s->dev, tbd_address + 4);
> +                uint16_t tx_buffer_el = lduw_pci_dma(&s->dev, tbd_address + 6);
>                  tbd_address += 8;
>                  TRACE(RXTX, logout
>                      ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
>                       tx_buffer_address, tx_buffer_size));
>                  tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
> -                cpu_physical_memory_read(tx_buffer_address, &buf[size],
> -                                         tx_buffer_size);
> +                pci_dma_read(&s->dev, tx_buffer_address,
> +                             &buf[size], tx_buffer_size);
>                  size += tx_buffer_size;
>                  if (tx_buffer_el & 1) {
>                      break;
> @@ -828,16 +797,16 @@ static void tx_command(EEPRO100State *s)
>          }
>          tbd_address = tbd_array;
>          for (; tbd_count < s->tx.tbd_count; tbd_count++) {
> -            uint32_t tx_buffer_address = e100_ldl_le_phys(tbd_address);
> -            uint16_t tx_buffer_size = e100_ldw_le_phys(tbd_address + 4);
> -            uint16_t tx_buffer_el = e100_ldw_le_phys(tbd_address + 6);
> +            uint32_t tx_buffer_address = ldl_pci_dma(&s->dev, tbd_address);
> +            uint16_t tx_buffer_size = lduw_pci_dma(&s->dev, tbd_address + 4);
> +            uint16_t tx_buffer_el = lduw_pci_dma(&s->dev, tbd_address + 6);
>              tbd_address += 8;
>              TRACE(RXTX, logout
>                  ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
>                   tx_buffer_address, tx_buffer_size));
>              tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
> -            cpu_physical_memory_read(tx_buffer_address, &buf[size],
> -                                     tx_buffer_size);
> +            pci_dma_read(&s->dev, tx_buffer_address,
> +                         &buf[size], tx_buffer_size);
>              size += tx_buffer_size;
>              if (tx_buffer_el & 1) {
>                  break;
> @@ -862,7 +831,7 @@ static void set_multicast_list(EEPRO100State *s)
>      TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
>      for (i = 0; i < multicast_count; i += 6) {
>          uint8_t multicast_addr[6];
> -        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
> +        pci_dma_read(&s->dev, s->cb_address + 10 + i, multicast_addr, 6);
>          TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
>          unsigned mcast_idx = compute_mcast_idx(multicast_addr);
>          assert(mcast_idx < 64);
> @@ -896,12 +865,12 @@ static void action_command(EEPRO100State *s)
>              /* Do nothing. */
>              break;
>          case CmdIASetup:
> -            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
> +            pci_dma_read(&s->dev, s->cb_address + 8, &s->conf.macaddr.a[0], 6);
>              TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
>              break;
>          case CmdConfigure:
> -            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
> -                                     sizeof(s->configuration));
> +            pci_dma_read(&s->dev, s->cb_address + 8,
> +                         &s->configuration[0], sizeof(s->configuration));
>              TRACE(OTHER, logout("configuration: %s\n",
>                                  nic_dump(&s->configuration[0], 16)));
>              TRACE(OTHER, logout("configuration: %s\n",
> @@ -938,7 +907,7 @@ static void action_command(EEPRO100State *s)
>              break;
>          }
>          /* Write new status. */
> -        e100_stw_le_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
> +        stw_pci_dma(&s->dev, s->cb_address, s->tx.status | ok_status | STATUS_C);
>          if (bit_i) {
>              /* CU completed action. */
>              eepro100_cx_interrupt(s);
> @@ -1005,7 +974,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
>          /* Dump statistical counters. */
>          TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
>          dump_statistics(s);
> -        e100_stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
> +        stl_pci_dma(&s->dev, s->statsaddr + s->stats_size, 0xa005);
>          break;
>      case CU_CMD_BASE:
>          /* Load CU base. */
> @@ -1016,7 +985,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
>          /* Dump and reset statistical counters. */
>          TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
>          dump_statistics(s);
> -        e100_stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
> +        stl_pci_dma(&s->dev, s->statsaddr + s->stats_size, 0xa007);
>          memset(&s->statistics, 0, sizeof(s->statistics));
>          break;
>      case CU_SRESUME:
> @@ -1310,10 +1279,10 @@ static void eepro100_write_port(EEPRO100State *s)
>      case PORT_SELFTEST:
>          TRACE(OTHER, logout("selftest address=0x%08x\n", address));
>          eepro100_selftest_t data;
> -        cpu_physical_memory_read(address, &data, sizeof(data));
> +        pci_dma_read(&s->dev, address, (uint8_t *) & data, sizeof(data));
>          data.st_sign = 0xffffffff;
>          data.st_result = 0;
> -        cpu_physical_memory_write(address, &data, sizeof(data));
> +        pci_dma_write(&s->dev, address, (uint8_t *) & data, sizeof(data));
>          break;
>      case PORT_SELECTIVE_RESET:
>          TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
> @@ -1729,8 +1698,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
>      }
>      /* !!! */
>      eepro100_rx_t rx;
> -    cpu_physical_memory_read(s->ru_base + s->ru_offset, &rx,
> -                             sizeof(eepro100_rx_t));
> +    pci_dma_read(&s->dev, s->ru_base + s->ru_offset,
> +                 (uint8_t *) &rx, sizeof(eepro100_rx_t));
>      uint16_t rfd_command = le16_to_cpu(rx.command);
>      uint16_t rfd_size = le16_to_cpu(rx.size);
>  
> @@ -1746,10 +1715,10 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
>  #endif
>      TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
>            rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
> -    e100_stw_le_phys(s->ru_base + s->ru_offset +
> -                     offsetof(eepro100_rx_t, status), rfd_status);
> -    e100_stw_le_phys(s->ru_base + s->ru_offset +
> -                     offsetof(eepro100_rx_t, count), size);
> +    stw_pci_dma(&s->dev, s->ru_base + s->ru_offset +
> +                offsetof(eepro100_rx_t, status), rfd_status);
> +    stw_pci_dma(&s->dev, s->ru_base + s->ru_offset +
> +                offsetof(eepro100_rx_t, count), size);
>      /* Early receive interrupt not supported. */
>  #if 0
>      eepro100_er_interrupt(s);
> @@ -1763,8 +1732,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
>  #if 0
>      assert(!(s->configuration[17] & BIT(0)));
>  #endif
> -    cpu_physical_memory_write(s->ru_base + s->ru_offset +
> -                              sizeof(eepro100_rx_t), buf, size);
> +    pci_dma_write(&s->dev, s->ru_base + s->ru_offset +
> +                  sizeof(eepro100_rx_t), buf, size);
>      s->statistics.rx_good_frames++;
>      eepro100_fr_interrupt(s);
>      s->ru_offset = le32_to_cpu(rx.link);
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 11:17           ` Michael S. Tsirkin
  2011-10-02 11:28             ` Alexander Graf
@ 2011-10-02 12:01             ` Avi Kivity
  2011-10-02 12:14               ` Michael S. Tsirkin
  1 sibling, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2011-10-02 12:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, joerg.roedel, qemu-devel, agraf, kraxel,
	eduard.munteanu, David Gibson, rth

On 10/02/2011 01:17 PM, Michael S. Tsirkin wrote:
> On Sun, Oct 02, 2011 at 12:58:35PM +0200, Avi Kivity wrote:
> >  On 10/02/2011 12:52 PM, Michael S. Tsirkin wrote:
> >  >On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> >  >>   On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> >  >>   >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> >  >>   >>    This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> >  >>   >>    present, these are just stubs which perform directly cpu physical memory
> >  >>   >>    accesses.
> >  >>   >>
> >  >>   >>    Using these stubs, however, distinguishes PCI device DMA transactions from
> >  >>   >>    other accesses to physical memory, which will allow PCI IOMMU support to
> >  >>   >>    be added in one place, rather than updating every PCI driver at that time.
> >  >>   >>
> >  >>   >>    That is, it allows us to update individual PCI drivers to support an IOMMU
> >  >>   >>    without having yet determined the details of how the IOMMU emulation will
> >  >>   >>    operate.  This will let us remove the most bitrot-sensitive part of an
> >  >>   >>    IOMMU patch in advance.
> >  >>   >>
> >  >>   >>    Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> >  >>   >
> >  >>   >So something I just thought about:
> >  >>   >
> >  >>   >all wrappers now go through cpu_physical_memory_rw.
> >  >>   >This is a problem as e.g. virtio assumes that
> >  >>   >accesses such as stw are atomic. cpu_physical_memory_rw
> >  >>   >is a memcpy which makes no such guarantees.
> >  >>   >
> >  >>
> >  >>   Let's change cpu_physical_memory_rw() to provide that guarantee for
> >  >>   aligned two and four byte accesses.  Having separate paths just for
> >  >>   that is not maintainable.
> >  >
> >  >Well, we also have stX_phys convert to target native endian-ness
> >  >(nop for KVM but not necessarily for qemu).
> >  >
> >  >So if we do what you suggest, this patch will become more correct, but
> >  >it would still need to duplicate the endian-ness work.
> >  >
> >  >For that reason, I think calling stX_phys and friends from pci
> >  >makes more sense - we get more simple inline wrappers
> >  >but that code duplication worries me much less than tricky
> >  >endian-ness hidden within a macro.
> >  >
> >
> >  Good point.  Though this is really a virtio specific issue since
> >  other devices have explicit endianness (not guest dependent).
>
> Hmm, not entirely virtio specific, some devices use stX macros to do the
> conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> places.

These are fine - explicit endianness.

> >  I think endian conversion is best made explicit in virtio (like
> >  e1000 does explicit conversions to little endian).
>
> That's certainly possible. Though it's hard to see why duplicating e.g.
>
> static void e100_stw_le_phys(target_phys_addr_t addr, uint16_t val)
> {
>      val = cpu_to_le16(val);
>      cpu_physical_memory_write(addr,&val, sizeof(val));
> }
>
> is a better idea than a central utility that does this.
> Maybe the address is not guaranteed to be aligned in the e100
> case.

The general case is dma'ing a structure, not a single field.  That 
doesn't mean we shouldn't have a helper.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 12:01             ` Avi Kivity
@ 2011-10-02 12:14               ` Michael S. Tsirkin
  2011-10-02 12:26                 ` Avi Kivity
                                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-02 12:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: aliguori, joerg.roedel, qemu-devel, agraf, kraxel,
	eduard.munteanu, David Gibson, rth

On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote:
> On 10/02/2011 01:17 PM, Michael S. Tsirkin wrote:
> >On Sun, Oct 02, 2011 at 12:58:35PM +0200, Avi Kivity wrote:
> >>  On 10/02/2011 12:52 PM, Michael S. Tsirkin wrote:
> >>  >On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> >>  >>   On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> >>  >>   >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> >>  >>   >>    This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> >>  >>   >>    present, these are just stubs which perform directly cpu physical memory
> >>  >>   >>    accesses.
> >>  >>   >>
> >>  >>   >>    Using these stubs, however, distinguishes PCI device DMA transactions from
> >>  >>   >>    other accesses to physical memory, which will allow PCI IOMMU support to
> >>  >>   >>    be added in one place, rather than updating every PCI driver at that time.
> >>  >>   >>
> >>  >>   >>    That is, it allows us to update individual PCI drivers to support an IOMMU
> >>  >>   >>    without having yet determined the details of how the IOMMU emulation will
> >>  >>   >>    operate.  This will let us remove the most bitrot-sensitive part of an
> >>  >>   >>    IOMMU patch in advance.
> >>  >>   >>
> >>  >>   >>    Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> >>  >>   >
> >>  >>   >So something I just thought about:
> >>  >>   >
> >>  >>   >all wrappers now go through cpu_physical_memory_rw.
> >>  >>   >This is a problem as e.g. virtio assumes that
> >>  >>   >accesses such as stw are atomic. cpu_physical_memory_rw
> >>  >>   >is a memcpy which makes no such guarantees.
> >>  >>   >
> >>  >>
> >>  >>   Let's change cpu_physical_memory_rw() to provide that guarantee for
> >>  >>   aligned two and four byte accesses.  Having separate paths just for
> >>  >>   that is not maintainable.
> >>  >
> >>  >Well, we also have stX_phys convert to target native endian-ness
> >>  >(nop for KVM but not necessarily for qemu).
> >>  >
> >>  >So if we do what you suggest, this patch will become more correct, but
> >>  >it would still need to duplicate the endian-ness work.
> >>  >
> >>  >For that reason, I think calling stX_phys and friends from pci
> >>  >makes more sense - we get more simple inline wrappers
> >>  >but that code duplication worries me much less than tricky
> >>  >endian-ness hidden within a macro.
> >>  >
> >>
> >>  Good point.  Though this is really a virtio specific issue since
> >>  other devices have explicit endianness (not guest dependent).
> >
> >Hmm, not entirely virtio specific, some devices use stX macros to do the
> >conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> >places.
> 
> These are fine - explicit endianness.

Right. So changing these to e.g. stl_dma and assuming
LE is default seems like a step backwards.

> >>  I think endian conversion is best made explicit in virtio (like
> >>  e1000 does explicit conversions to little endian).
> >
> >That's certainly possible. Though it's hard to see why duplicating e.g.
> >
> >static void e100_stw_le_phys(target_phys_addr_t addr, uint16_t val)
> >{
> >     val = cpu_to_le16(val);
> >     cpu_physical_memory_write(addr,&val, sizeof(val));
> >}
> >
> >is a better idea than a central utility that does this.
> >Maybe the address is not guaranteed to be aligned in the e100
> >case.
> 
> The general case is dma'ing a structure, not a single field.  That
> doesn't mean we shouldn't have a helper.
> 
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 12:14               ` Michael S. Tsirkin
@ 2011-10-02 12:26                 ` Avi Kivity
  2011-10-03 13:17                 ` Anthony Liguori
  2011-10-12  3:07                 ` David Gibson
  2 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2011-10-02 12:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, joerg.roedel, qemu-devel, agraf, kraxel,
	eduard.munteanu, David Gibson, rth

On 10/02/2011 02:14 PM, Michael S. Tsirkin wrote:
> >
> >  These are fine - explicit endianness.
>
> Right. So changing these to e.g. stl_dma and assuming
> LE is default seems like a step backwards.

Agree.  "l" implies a word with some endianness, not "4 unstructured bytes".


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 10:25   ` Michael S. Tsirkin
  2011-10-02 10:29     ` Avi Kivity
@ 2011-10-03 13:10     ` Anthony Liguori
  1 sibling, 0 replies; 42+ messages in thread
From: Anthony Liguori @ 2011-10-03 13:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kraxel, joerg.roedel, agraf, qemu-devel, avi, eduard.munteanu,
	David Gibson, rth

On 10/02/2011 05:25 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
>> This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
>> present, these are just stubs which perform directly cpu physical memory
>> accesses.
>>
>> Using these stubs, however, distinguishes PCI device DMA transactions from
>> other accesses to physical memory, which will allow PCI IOMMU support to
>> be added in one place, rather than updating every PCI driver at that time.
>>
>> That is, it allows us to update individual PCI drivers to support an IOMMU
>> without having yet determined the details of how the IOMMU emulation will
>> operate.  This will let us remove the most bitrot-sensitive part of an
>> IOMMU patch in advance.
>>
>> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
>
> So something I just thought about:
>
> all wrappers now go through cpu_physical_memory_rw.
> This is a problem as e.g. virtio assumes that
> accesses such as stw are atomic. cpu_physical_memory_rw
> is a memcpy which makes no such guarantees.

That's why I asked for David to add map/unmap functions to the API.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 12:14               ` Michael S. Tsirkin
  2011-10-02 12:26                 ` Avi Kivity
@ 2011-10-03 13:17                 ` Anthony Liguori
  2011-10-12  3:09                   ` David Gibson
  2011-10-14  2:14                   ` David Gibson
  2011-10-12  3:07                 ` David Gibson
  2 siblings, 2 replies; 42+ messages in thread
From: Anthony Liguori @ 2011-10-03 13:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kraxel, joerg.roedel, qemu-devel, agraf, Avi Kivity,
	eduard.munteanu, David Gibson, rth

On 10/02/2011 07:14 AM, Michael S. Tsirkin wrote:
> On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote:
>>> Hmm, not entirely virtio specific, some devices use stX macros to do the
>>> conversion.  E.g. stw_be_phys and stl_le_phys are used in several
>>> places.
>>
>> These are fine - explicit endianness.
>
> Right. So changing these to e.g. stl_dma and assuming
> LE is default seems like a step backwards.

We're generalizing too much.

In general, the device model doesn't need atomic access functions.  That's 
because device model RAM access is not coherent with CPU RAM access.

Virtio is a very, very special case.  virtio requires coherent RAM access.

What we really ought to do is have a variant of the map API that allows for an 
invalidation callback.  That would allow us to map the ring for longer periods 
of time and then we could access the memory directly.

That would not only solve this problem but also improve overall performance.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 12:14               ` Michael S. Tsirkin
  2011-10-02 12:26                 ` Avi Kivity
  2011-10-03 13:17                 ` Anthony Liguori
@ 2011-10-12  3:07                 ` David Gibson
  2011-10-12  7:22                   ` Michael S. Tsirkin
  2 siblings, 1 reply; 42+ messages in thread
From: David Gibson @ 2011-10-12  3:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, kraxel, joerg.roedel, agraf, qemu-devel, Avi Kivity,
	eduard.munteanu, rth

On Sun, Oct 02, 2011 at 02:14:28PM +0200, Michael S. Tsirkin wrote:
> On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote:
> > On 10/02/2011 01:17 PM, Michael S. Tsirkin wrote:
> > >On Sun, Oct 02, 2011 at 12:58:35PM +0200, Avi Kivity wrote:
> > >>  On 10/02/2011 12:52 PM, Michael S. Tsirkin wrote:
> > >>  >On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> > >>  >>   On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> > >>  >>   >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> > >>  >>   >>    This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> > >>  >>   >>    present, these are just stubs which perform directly cpu physical memory
> > >>  >>   >>    accesses.
> > >>  >>   >>
> > >>  >>   >>    Using these stubs, however, distinguishes PCI device DMA transactions from
> > >>  >>   >>    other accesses to physical memory, which will allow PCI IOMMU support to
> > >>  >>   >>    be added in one place, rather than updating every PCI driver at that time.
> > >>  >>   >>
> > >>  >>   >>    That is, it allows us to update individual PCI drivers to support an IOMMU
> > >>  >>   >>    without having yet determined the details of how the IOMMU emulation will
> > >>  >>   >>    operate.  This will let us remove the most bitrot-sensitive part of an
> > >>  >>   >>    IOMMU patch in advance.
> > >>  >>   >>
> > >>  >>   >>    Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> > >>  >>   >
> > >>  >>   >So something I just thought about:
> > >>  >>   >
> > >>  >>   >all wrappers now go through cpu_physical_memory_rw.
> > >>  >>   >This is a problem as e.g. virtio assumes that
> > >>  >>   >accesses such as stw are atomic. cpu_physical_memory_rw
> > >>  >>   >is a memcpy which makes no such guarantees.
> > >>  >>   >
> > >>  >>
> > >>  >>   Let's change cpu_physical_memory_rw() to provide that guarantee for
> > >>  >>   aligned two and four byte accesses.  Having separate paths just for
> > >>  >>   that is not maintainable.
> > >>  >
> > >>  >Well, we also have stX_phys convert to target native endian-ness
> > >>  >(nop for KVM but not necessarily for qemu).
> > >>  >
> > >>  >So if we do what you suggest, this patch will become more correct, but
> > >>  >it would still need to duplicate the endian-ness work.
> > >>  >
> > >>  >For that reason, I think calling stX_phys and friends from pci
> > >>  >makes more sense - we get more simple inline wrappers
> > >>  >but that code duplication worries me much less than tricky
> > >>  >endian-ness hidden within a macro.
> > >>  >
> > >>
> > >>  Good point.  Though this is really a virtio specific issue since
> > >>  other devices have explicit endianness (not guest dependent).
> > >
> > >Hmm, not entirely virtio specific, some devices use stX macros to do the
> > >conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> > >places.
> > 
> > These are fine - explicit endianness.
> 
> Right. So changing these to e.g. stl_dma and assuming
> LE is default seems like a step backwards.

Um.. why?  PCI is defined by the spec to be LE, so I don't see that we
need explicit endianness versions for PCI helpers.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-03 13:17                 ` Anthony Liguori
@ 2011-10-12  3:09                   ` David Gibson
  2011-10-12  9:12                     ` Michael S. Tsirkin
  2011-10-14  2:14                   ` David Gibson
  1 sibling, 1 reply; 42+ messages in thread
From: David Gibson @ 2011-10-12  3:09 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kraxel, Michael S. Tsirkin, joerg.roedel, qemu-devel, agraf,
	Avi Kivity, eduard.munteanu, rth

On Mon, Oct 03, 2011 at 08:17:05AM -0500, Anthony Liguori wrote:
> On 10/02/2011 07:14 AM, Michael S. Tsirkin wrote:
> >On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote:
> >>>Hmm, not entirely virtio specific, some devices use stX macros to do the
> >>>conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> >>>places.
> >>
> >>These are fine - explicit endianness.
> >
> >Right. So changing these to e.g. stl_dma and assuming
> >LE is default seems like a step backwards.
> 
> We're generalizing too much.
> 
> In general, the device model doesn't need atomic access functions.
> That's because device model RAM access is not coherent with CPU RAM
> access.
> 
> Virtio is a very, very special case.  virtio requires coherent RAM
> access.

Right, but it should only need that for the actual rings in the virtio
core.  I was expecting that those would remain as direct physical
memory accesses - precisely because virtio is special - rather than
accesses through any kind of DMA interface.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-02 10:52       ` Michael S. Tsirkin
  2011-10-02 10:58         ` Avi Kivity
@ 2011-10-12  3:11         ` David Gibson
  2011-10-12  8:44           ` Michael S. Tsirkin
  1 sibling, 1 reply; 42+ messages in thread
From: David Gibson @ 2011-10-12  3:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, kraxel, joerg.roedel, agraf, qemu-devel, Avi Kivity,
	eduard.munteanu, rth

On Sun, Oct 02, 2011 at 12:52:39PM +0200, Michael S. Tsirkin wrote:
> On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> > On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> > >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> > >>  This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> > >>  present, these are just stubs which perform directly cpu physical memory
> > >>  accesses.
> > >>
> > >>  Using these stubs, however, distinguishes PCI device DMA transactions from
> > >>  other accesses to physical memory, which will allow PCI IOMMU support to
> > >>  be added in one place, rather than updating every PCI driver at that time.
> > >>
> > >>  That is, it allows us to update individual PCI drivers to support an IOMMU
> > >>  without having yet determined the details of how the IOMMU emulation will
> > >>  operate.  This will let us remove the most bitrot-sensitive part of an
> > >>  IOMMU patch in advance.
> > >>
> > >>  Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> > >
> > >So something I just thought about:
> > >
> > >all wrappers now go through cpu_physical_memory_rw.
> > >This is a problem as e.g. virtio assumes that
> > >accesses such as stw are atomic. cpu_physical_memory_rw
> > >is a memcpy which makes no such guarantees.
> > >
> > 
> > Let's change cpu_physical_memory_rw() to provide that guarantee for
> > aligned two and four byte accesses.  Having separate paths just for
> > that is not maintainable.
> 
> Well, we also have stX_phys convert to target native endian-ness
> (nop for KVM but not necessarily for qemu).

Yes.. as do the stX_pci_dma() helpers.  They assume LE, rather than
having two variants, because PCI is an LE spec, and all normal PCI
devices work in LE.  If we need to model some perverse BE PCI device,
it can reswap itself.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-12  3:07                 ` David Gibson
@ 2011-10-12  7:22                   ` Michael S. Tsirkin
  2011-10-12 15:43                     ` David Gibson
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-12  7:22 UTC (permalink / raw)
  To: Avi Kivity, qemu-devel, joerg.roedel, aliguori, rth, agraf,
	eduard.munteanu, kraxel

On Wed, Oct 12, 2011 at 02:07:46PM +1100, David Gibson wrote:
> Um.. why?  PCI is defined by the spec to be LE, so I don't see that we
> need explicit endianness versions for PCI helpers.

LE in the spec only applies to structures defined by the spec,
that is pci configuration and msix tables in device memory.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-12  3:11         ` David Gibson
@ 2011-10-12  8:44           ` Michael S. Tsirkin
  2011-10-12  9:08             ` Gerd Hoffmann
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-12  8:44 UTC (permalink / raw)
  To: Avi Kivity, qemu-devel, joerg.roedel, aliguori, rth, agraf,
	eduard.munteanu, kraxel

On Wed, Oct 12, 2011 at 02:11:37PM +1100, David Gibson wrote:
> On Sun, Oct 02, 2011 at 12:52:39PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Oct 02, 2011 at 12:29:08PM +0200, Avi Kivity wrote:
> > > On 10/02/2011 12:25 PM, Michael S. Tsirkin wrote:
> > > >On Mon, Sep 05, 2011 at 02:34:56PM +1000, David Gibson wrote:
> > > >>  This patch adds functions to pci.[ch] to perform PCI DMA operations.  At
> > > >>  present, these are just stubs which perform directly cpu physical memory
> > > >>  accesses.
> > > >>
> > > >>  Using these stubs, however, distinguishes PCI device DMA transactions from
> > > >>  other accesses to physical memory, which will allow PCI IOMMU support to
> > > >>  be added in one place, rather than updating every PCI driver at that time.
> > > >>
> > > >>  That is, it allows us to update individual PCI drivers to support an IOMMU
> > > >>  without having yet determined the details of how the IOMMU emulation will
> > > >>  operate.  This will let us remove the most bitrot-sensitive part of an
> > > >>  IOMMU patch in advance.
> > > >>
> > > >>  Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> > > >
> > > >So something I just thought about:
> > > >
> > > >all wrappers now go through cpu_physical_memory_rw.
> > > >This is a problem as e.g. virtio assumes that
> > > >accesses such as stw are atomic. cpu_physical_memory_rw
> > > >is a memcpy which makes no such guarantees.
> > > >
> > > 
> > > Let's change cpu_physical_memory_rw() to provide that guarantee for
> > > aligned two and four byte accesses.  Having separate paths just for
> > > that is not maintainable.
> > 
> > Well, we also have stX_phys convert to target native endian-ness
> > (nop for KVM but not necessarily for qemu).
> 
> Yes.. as do the stX_pci_dma() helpers.  They assume LE, rather than
> having two variants, because PCI is an LE spec, and all normal PCI
> devices work in LE.

IMO, not really. PCI devices do DMA any way they like.  LE is
probably more common because both ARM and x86 processors are LE.

>  If we need to model some perverse BE PCI device,
> it can reswap itself.

An explicit API for this would be cleaner.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-12  8:44           ` Michael S. Tsirkin
@ 2011-10-12  9:08             ` Gerd Hoffmann
  0 siblings, 0 replies; 42+ messages in thread
From: Gerd Hoffmann @ 2011-10-12  9:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, joerg.roedel, agraf, qemu-devel, Avi Kivity,
	eduard.munteanu, rth

   Hi,

>> Yes.. as do the stX_pci_dma() helpers.  They assume LE, rather than
>> having two variants, because PCI is an LE spec, and all normal PCI
>> devices work in LE.
>
> IMO, not really. PCI devices do DMA any way they like.  LE is
> probably more common because both ARM and x86 processors are LE.

Also having _le_ in the function name makes explicitly clear that the 
functions read/write little endian values and byteswaps if needed, which 
makes the code more readable.  I'd suggest to add it even if there is no 
need for a _be_ companion as devices needing that are rare.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-12  3:09                   ` David Gibson
@ 2011-10-12  9:12                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-12  9:12 UTC (permalink / raw)
  To: Anthony Liguori, Avi Kivity, qemu-devel, joerg.roedel, rth, agraf,
	eduard.munteanu, kraxel

On Wed, Oct 12, 2011 at 02:09:26PM +1100, David Gibson wrote:
> On Mon, Oct 03, 2011 at 08:17:05AM -0500, Anthony Liguori wrote:
> > On 10/02/2011 07:14 AM, Michael S. Tsirkin wrote:
> > >On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote:
> > >>>Hmm, not entirely virtio specific, some devices use stX macros to do the
> > >>>conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> > >>>places.
> > >>
> > >>These are fine - explicit endianness.
> > >
> > >Right. So changing these to e.g. stl_dma and assuming
> > >LE is default seems like a step backwards.
> > 
> > We're generalizing too much.
> > 
> > In general, the device model doesn't need atomic access functions.

Anthony, are you sure? PCI both provides atomic operations for devices (likely
uncommon). PCI express spec strongly recommends at least dword update
granularity for both reads and writes.
Some guests might depend on this.

> > That's because device model RAM access is not coherent with CPU RAM
> > access.
> > Virtio is a very, very special case.  virtio requires coherent RAM
> > access.

E.g., e1000 driver seems to allocate its rings in coherent memory.
Required? Your guess is as good as mine. It seems to work fine
ATM without these guarantees.

> Right, but it should only need that for the actual rings in the virtio
> core.  I was expecting that those would remain as direct physical
> memory accesses - precisely because virtio is special - rather than
> accesses through any kind of DMA interface.

At the moment, yes. Further, that was just an example I know about.
How about msi/msix? We don't want to
split these writes as that would confuse the APIC.

> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-12  7:22                   ` Michael S. Tsirkin
@ 2011-10-12 15:43                     ` David Gibson
  2011-10-12 15:45                       ` David Gibson
  0 siblings, 1 reply; 42+ messages in thread
From: David Gibson @ 2011-10-12 15:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, kraxel, joerg.roedel, agraf, qemu-devel, Avi Kivity,
	eduard.munteanu, rth

On Wed, Oct 12, 2011 at 09:22:01AM +0200, Michael S. Tsirkin wrote:
> On Wed, Oct 12, 2011 at 02:07:46PM +1100, David Gibson wrote:
> > Um.. why?  PCI is defined by the spec to be LE, so I don't see that we
> > need explicit endianness versions for PCI helpers.
> 
> LE in the spec only applies to structures defined by the spec,
> that is pci configuration and msix tables in device memory.

Well, true.  But when was the last time you saw a PCI device with BE
registers?  I think it's a rare enough case that the individual device
models can reswap themselves.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-12 15:43                     ` David Gibson
@ 2011-10-12 15:45                       ` David Gibson
  0 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-10-12 15:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, Avi Kivity, qemu-devel, joerg.roedel,
	aliguori, rth, agraf, eduard.munteanu, kraxel

On Thu, Oct 13, 2011 at 02:43:06AM +1100, David Gibson wrote:
> On Wed, Oct 12, 2011 at 09:22:01AM +0200, Michael S. Tsirkin wrote:
> > On Wed, Oct 12, 2011 at 02:07:46PM +1100, David Gibson wrote:
> > > Um.. why?  PCI is defined by the spec to be LE, so I don't see that we
> > > need explicit endianness versions for PCI helpers.
> > 
> > LE in the spec only applies to structures defined by the spec,
> > that is pci configuration and msix tables in device memory.
> 
> Well, true.  But when was the last time you saw a PCI device with BE
> registers?  I think it's a rare enough case that the individual device
> models can reswap themselves.

s/BE registers/BE DMA accessed structures/

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-03 13:17                 ` Anthony Liguori
  2011-10-12  3:09                   ` David Gibson
@ 2011-10-14  2:14                   ` David Gibson
  2011-10-16 12:50                     ` Michael S. Tsirkin
  2011-10-16 13:15                     ` Avi Kivity
  1 sibling, 2 replies; 42+ messages in thread
From: David Gibson @ 2011-10-14  2:14 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kraxel, Michael S. Tsirkin, joerg.roedel, qemu-devel, agraf,
	Avi Kivity, eduard.munteanu, rth

On Mon, Oct 03, 2011 at 08:17:05AM -0500, Anthony Liguori wrote:
> On 10/02/2011 07:14 AM, Michael S. Tsirkin wrote:
> >On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote:
> >>>Hmm, not entirely virtio specific, some devices use stX macros to do the
> >>>conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> >>>places.
> >>
> >>These are fine - explicit endianness.
> >
> >Right. So changing these to e.g. stl_dma and assuming
> >LE is default seems like a step backwards.
> 
> We're generalizing too much.
> 
> In general, the device model doesn't need atomic access functions.
> That's because device model RAM access is not coherent with CPU RAM
> access.

Ok, so the next spin of these patches will have explicit LE and BE
versions of the accessors by popular demand.  I'm still using
cpu_physical_memory_rw() as the backend though, because I can't see a
case where a device could safely _require_ an emulated DMA access to
be atomic.

> Virtio is a very, very special case.  virtio requires coherent RAM access.

Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
god-like hypervisor access to guest system memory.  It should
correctly bypass any IOMMU, and so should remain as
cpu_physical_memory_rw() or the atomic accessors, rather than being
converted to this new API.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] eepro100: Use PCI DMA stub functions
  2011-10-02 11:45   ` Michael S. Tsirkin
@ 2011-10-14  2:15     ` David Gibson
  0 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-10-14  2:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: aliguori, kraxel, joerg.roedel, qemu-devel, agraf, avi,
	eduard.munteanu, rth

On Sun, Oct 02, 2011 at 01:45:28PM +0200, Michael S. Tsirkin wrote:
> On Mon, Sep 05, 2011 at 02:34:58PM +1000, David Gibson wrote:
> > From: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > @@ -744,21 +713,22 @@ static void dump_statistics(EEPRO100State * s)
> >       * values which really matter.
> >       * Number of data should check configuration!!!
> >       */
> > -    cpu_physical_memory_write(s->statsaddr, &s->statistics, s->stats_size);
> > -    e100_stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
> > -    e100_stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
> > -    e100_stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
> > -    e100_stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
> > +    pci_dma_write(&s->dev, s->statsaddr,
> > +                  (uint8_t *) & s->statistics, s->stats_size);
> > +    stl_pci_dma(&s->dev, s->statsaddr + 0, s->statistics.tx_good_frames);
> > +    stl_pci_dma(&s->dev, s->statsaddr + 36, s->statistics.rx_good_frames);
> > +    stl_pci_dma(&s->dev, s->statsaddr + 48, s->statistics.rx_resource_errors);
> > +    stl_pci_dma(&s->dev, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
> 
> At least old stl macros assumed an aligned address.
> Not sure it's still the case but for e100 address might
> not be aligned I think.

The new macros, being implemented in terms of cpu_physical_memory_rw()
with small lengths, do not require aligned addresses.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-14  2:14                   ` David Gibson
@ 2011-10-16 12:50                     ` Michael S. Tsirkin
  2011-10-16 13:15                     ` Avi Kivity
  1 sibling, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-16 12:50 UTC (permalink / raw)
  To: Anthony Liguori, Avi Kivity, qemu-devel, joerg.roedel, rth, agraf,
	eduard.munteanu, kraxel

On Fri, Oct 14, 2011 at 01:14:07PM +1100, David Gibson wrote:
> On Mon, Oct 03, 2011 at 08:17:05AM -0500, Anthony Liguori wrote:
> > On 10/02/2011 07:14 AM, Michael S. Tsirkin wrote:
> > >On Sun, Oct 02, 2011 at 02:01:10PM +0200, Avi Kivity wrote:
> > >>>Hmm, not entirely virtio specific, some devices use stX macros to do the
> > >>>conversion.  E.g. stw_be_phys and stl_le_phys are used in several
> > >>>places.
> > >>
> > >>These are fine - explicit endianness.
> > >
> > >Right. So changing these to e.g. stl_dma and assuming
> > >LE is default seems like a step backwards.
> > 
> > We're generalizing too much.
> > 
> > In general, the device model doesn't need atomic access functions.
> > That's because device model RAM access is not coherent with CPU RAM
> > access.
> 
> Ok, so the next spin of these patches will have explicit LE and BE
> versions of the accessors by popular demand.  I'm still using
> cpu_physical_memory_rw() as the backend though, because I can't see a
> case where a device could safely _require_ an emulated DMA access to
> be atomic.

You don't?
PCI spec supports atomic operations. It also strongly recommends
not splitting accesses below dword boundary.

> > Virtio is a very, very special case.  virtio requires coherent RAM access.
> 
> Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
> god-like hypervisor access to guest system memory.  It should
> correctly bypass any IOMMU, and so should remain as
> cpu_physical_memory_rw() or the atomic accessors, rather than being
> converted to this new API.
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-14  2:14                   ` David Gibson
  2011-10-16 12:50                     ` Michael S. Tsirkin
@ 2011-10-16 13:15                     ` Avi Kivity
  2011-10-18  1:46                       ` David Gibson
  1 sibling, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2011-10-16 13:15 UTC (permalink / raw)
  To: Anthony Liguori, Michael S. Tsirkin, qemu-devel, joerg.roedel,
	rth, agraf, eduard.munteanu, kraxel

On 10/14/2011 04:14 AM, David Gibson wrote:
> > Virtio is a very, very special case.  virtio requires coherent RAM access.
>
> Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
> god-like hypervisor access to guest system memory.  It should
> correctly bypass any IOMMU, and so should remain as
> cpu_physical_memory_rw() or the atomic accessors, rather than being
> converted to this new API.

virtio should definitely not bypass an iommu.  A guest may assign a
virtio device to nested guests, and would wish it confined by the
emulated iommu.

More generally, a guest sees a virtio device as just another pci device,
and has no way to tell that it bypasses the iommu.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-16 13:15                     ` Avi Kivity
@ 2011-10-18  1:46                       ` David Gibson
  2011-10-19  0:31                         ` Rusty Russell
                                           ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: David Gibson @ 2011-10-18  1:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Michael S. Tsirkin, joerg.roedel,
	Paul 'Rusty' Russell, qemu-devel, agraf, kraxel,
	eduard.munteanu, rth

On Sun, Oct 16, 2011 at 03:15:53PM +0200, Avi Kivity wrote:
> On 10/14/2011 04:14 AM, David Gibson wrote:
> > > Virtio is a very, very special case.  virtio requires coherent RAM access.
> >
> > Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
> > god-like hypervisor access to guest system memory.  It should
> > correctly bypass any IOMMU, and so should remain as
> > cpu_physical_memory_rw() or the atomic accessors, rather than being
> > converted to this new API.
> 
> virtio should definitely not bypass an iommu.

So, I just had a chat with Rusty about this.  Perhaps it shouldn't,
but it does.  The spec is in terms of guest physical addresses, not
bus/DMA addresses, and more to the point the Linux driver does *not*
do the necessary dma_map() and unmap operations to treat this as a PCI
DMA.  So like it or not, god-like hypervisor access rather than
emulated PCI DMA is what it does.

>  A guest may assign a
> virtio device to nested guests, and would wish it confined by the
> emulated iommu.

Well, that would be nice, but it can't be done.  It could be fixed,
but it would be an incompatible change so it would need a new feature
bit corresponding changes in the Linux driver to do the dma map/unmap
if it accepts the "respect IOMMU" feature.

> More generally, a guest sees a virtio device as just another pci device,
> and has no way to tell that it bypasses the iommu.

Well, except the fact that the driver knows its a virtio device,
because it's a virtio driver.  It's not like you can write a driver
that uses PCI DMA without knowing the particulars of the device you're
using.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-18  1:46                       ` David Gibson
@ 2011-10-19  0:31                         ` Rusty Russell
  2011-10-19  1:22                         ` Michael S. Tsirkin
  2011-10-19  9:10                         ` Avi Kivity
  2 siblings, 0 replies; 42+ messages in thread
From: Rusty Russell @ 2011-10-19  0:31 UTC (permalink / raw)
  To: David Gibson, Avi Kivity
  Cc: Anthony Liguori, Michael S. Tsirkin, joerg.roedel, qemu-devel,
	agraf, kraxel, eduard.munteanu, rth

On Tue, 18 Oct 2011 12:46:50 +1100, David Gibson <dwg@au1.ibm.com> wrote:
> On Sun, Oct 16, 2011 at 03:15:53PM +0200, Avi Kivity wrote:
> > On 10/14/2011 04:14 AM, David Gibson wrote:
> > > > Virtio is a very, very special case.  virtio requires coherent RAM access.
> > >
> > > Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
> > > god-like hypervisor access to guest system memory.  It should
> > > correctly bypass any IOMMU, and so should remain as
> > > cpu_physical_memory_rw() or the atomic accessors, rather than being
> > > converted to this new API.
> > 
> > virtio should definitely not bypass an iommu.
> 
> So, I just had a chat with Rusty about this.  Perhaps it shouldn't,
> but it does.  The spec is in terms of guest physical addresses, not
> bus/DMA addresses, and more to the point the Linux driver does *not*
> do the necessary dma_map() and unmap operations to treat this as a PCI
> DMA.  So like it or not, god-like hypervisor access rather than
> emulated PCI DMA is what it does.

Yep, it shouldn't but it does.  Can't break it now without a feature
bit, and there's no particular reason to add it until someone really
wants it.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-18  1:46                       ` David Gibson
  2011-10-19  0:31                         ` Rusty Russell
@ 2011-10-19  1:22                         ` Michael S. Tsirkin
  2011-10-19  9:10                         ` Avi Kivity
  2 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2011-10-19  1:22 UTC (permalink / raw)
  To: Avi Kivity, Anthony Liguori, qemu-devel, joerg.roedel, rth, agraf,
	eduard.munteanu, kraxel, Paul 'Rusty' Russell, Jason Wang

On Tue, Oct 18, 2011 at 12:46:50PM +1100, David Gibson wrote:
> On Sun, Oct 16, 2011 at 03:15:53PM +0200, Avi Kivity wrote:
> > On 10/14/2011 04:14 AM, David Gibson wrote:
> > > > Virtio is a very, very special case.  virtio requires coherent RAM access.
> > >
> > > Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
> > > god-like hypervisor access to guest system memory.  It should
> > > correctly bypass any IOMMU, and so should remain as
> > > cpu_physical_memory_rw() or the atomic accessors, rather than being
> > > converted to this new API.
> > 
> > virtio should definitely not bypass an iommu.
> 
> So, I just had a chat with Rusty about this.  Perhaps it shouldn't,
> but it does.  The spec is in terms of guest physical addresses, not
> bus/DMA addresses, and more to the point the Linux driver does *not*
> do the necessary dma_map() and unmap operations to treat this as a PCI
> DMA.  So like it or not, god-like hypervisor access rather than
> emulated PCI DMA is what it does.

Fine, but I'm convinced virtio is not unique in that
it wants atomic accesses.

I just looked at hw/rtl8139.c as one example.
It uses a high bit in a 32 bit register to signal
descriptor ownership. Thus we need to read that
bit first, or read the register atomically.

Current code does cpu_physical_memory_read
which does neither of these things, but
it seems to be a bug. An atomic load would
be the best solution.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-18  1:46                       ` David Gibson
  2011-10-19  0:31                         ` Rusty Russell
  2011-10-19  1:22                         ` Michael S. Tsirkin
@ 2011-10-19  9:10                         ` Avi Kivity
  2011-10-20  2:58                           ` David Gibson
  2 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2011-10-19  9:10 UTC (permalink / raw)
  To: Anthony Liguori, Michael S. Tsirkin, qemu-devel, joerg.roedel,
	rth, agraf, eduard.munteanu, kraxel, Paul 'Rusty' Russell

On 10/18/2011 03:46 AM, David Gibson wrote:
> On Sun, Oct 16, 2011 at 03:15:53PM +0200, Avi Kivity wrote:
> > On 10/14/2011 04:14 AM, David Gibson wrote:
> > > > Virtio is a very, very special case.  virtio requires coherent RAM access.
> > >
> > > Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
> > > god-like hypervisor access to guest system memory.  It should
> > > correctly bypass any IOMMU, and so should remain as
> > > cpu_physical_memory_rw() or the atomic accessors, rather than being
> > > converted to this new API.
> > 
> > virtio should definitely not bypass an iommu.
>
> So, I just had a chat with Rusty about this.  Perhaps it shouldn't,
> but it does.  The spec is in terms of guest physical addresses, not
> bus/DMA addresses, and more to the point the Linux driver does *not*
> do the necessary dma_map() and unmap operations to treat this as a PCI
> DMA.  So like it or not, god-like hypervisor access rather than
> emulated PCI DMA is what it does.

Wow, how did we manage to break virtio in so many different ways?

Is there a way to unbreak it?  On x86 it will continue to work if we
rewrite the spec in terms of pci dma, what about non-x86?

>
> >  A guest may assign a
> > virtio device to nested guests, and would wish it confined by the
> > emulated iommu.
>
> Well, that would be nice, but it can't be done.  It could be fixed,
> but it would be an incompatible change so it would need a new feature
> bit corresponding changes in the Linux driver to do the dma map/unmap
> if it accepts the "respect IOMMU" feature.

Needs to be done IMO.

>
> > More generally, a guest sees a virtio device as just another pci device,
> > and has no way to tell that it bypasses the iommu.
>
> Well, except the fact that the driver knows its a virtio device,
> because it's a virtio driver.  It's not like you can write a driver
> that uses PCI DMA without knowing the particulars of the device you're
> using.

virtio-pci knows it's pci, there's no excuse.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA
  2011-10-19  9:10                         ` Avi Kivity
@ 2011-10-20  2:58                           ` David Gibson
  0 siblings, 0 replies; 42+ messages in thread
From: David Gibson @ 2011-10-20  2:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Michael S. Tsirkin, joerg.roedel,
	Paul 'Rusty' Russell, qemu-devel, agraf, kraxel,
	eduard.munteanu, rth

On Wed, Oct 19, 2011 at 11:10:15AM +0200, Avi Kivity wrote:
> On 10/18/2011 03:46 AM, David Gibson wrote:
> > On Sun, Oct 16, 2011 at 03:15:53PM +0200, Avi Kivity wrote:
> > > On 10/14/2011 04:14 AM, David Gibson wrote:
> > > > > Virtio is a very, very special case.  virtio requires coherent RAM access.
> > > >
> > > > Right.  Virtio's access to memory is *not* emulated PCI DMA, it's
> > > > god-like hypervisor access to guest system memory.  It should
> > > > correctly bypass any IOMMU, and so should remain as
> > > > cpu_physical_memory_rw() or the atomic accessors, rather than being
> > > > converted to this new API.
> > > 
> > > virtio should definitely not bypass an iommu.
> >
> > So, I just had a chat with Rusty about this.  Perhaps it shouldn't,
> > but it does.  The spec is in terms of guest physical addresses, not
> > bus/DMA addresses, and more to the point the Linux driver does *not*
> > do the necessary dma_map() and unmap operations to treat this as a PCI
> > DMA.  So like it or not, god-like hypervisor access rather than
> > emulated PCI DMA is what it does.
> 
> Wow, how did we manage to break virtio in so many different ways?
> 
> Is there a way to unbreak it?

Yes, using a feature bit.

>  On x86 it will continue to work if we
> rewrite the spec in terms of pci dma, what about non-x86?

No, anything with a non-optional IOMMU will break horribly.  That's
why we need a feature bit.

> > >  A guest may assign a
> > > virtio device to nested guests, and would wish it confined by the
> > > emulated iommu.
> >
> > Well, that would be nice, but it can't be done.  It could be fixed,
> > but it would be an incompatible change so it would need a new feature
> > bit corresponding changes in the Linux driver to do the dma map/unmap
> > if it accepts the "respect IOMMU" feature.
> 
> Needs to be done IMO.

Well, sure, but my point is that I'm not volunteering for it.  Someone
who actually needs the feature can do the work.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2011-10-20  2:59 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-05  4:34 [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson
2011-09-05  4:34 ` [Qemu-devel] [PATCH 1/9] Add stub functions for PCI device models to do PCI DMA David Gibson
2011-10-02 10:25   ` Michael S. Tsirkin
2011-10-02 10:29     ` Avi Kivity
2011-10-02 10:52       ` Michael S. Tsirkin
2011-10-02 10:58         ` Avi Kivity
2011-10-02 11:17           ` Michael S. Tsirkin
2011-10-02 11:28             ` Alexander Graf
2011-10-02 11:43               ` Michael S. Tsirkin
2011-10-02 12:01             ` Avi Kivity
2011-10-02 12:14               ` Michael S. Tsirkin
2011-10-02 12:26                 ` Avi Kivity
2011-10-03 13:17                 ` Anthony Liguori
2011-10-12  3:09                   ` David Gibson
2011-10-12  9:12                     ` Michael S. Tsirkin
2011-10-14  2:14                   ` David Gibson
2011-10-16 12:50                     ` Michael S. Tsirkin
2011-10-16 13:15                     ` Avi Kivity
2011-10-18  1:46                       ` David Gibson
2011-10-19  0:31                         ` Rusty Russell
2011-10-19  1:22                         ` Michael S. Tsirkin
2011-10-19  9:10                         ` Avi Kivity
2011-10-20  2:58                           ` David Gibson
2011-10-12  3:07                 ` David Gibson
2011-10-12  7:22                   ` Michael S. Tsirkin
2011-10-12 15:43                     ` David Gibson
2011-10-12 15:45                       ` David Gibson
2011-10-12  3:11         ` David Gibson
2011-10-12  8:44           ` Michael S. Tsirkin
2011-10-12  9:08             ` Gerd Hoffmann
2011-10-03 13:10     ` Anthony Liguori
2011-09-05  4:34 ` [Qemu-devel] [PATCH 2/9] rtl8139: Use PCI DMA stub functions David Gibson
2011-09-05  4:34 ` [Qemu-devel] [PATCH 3/9] eepro100: " David Gibson
2011-10-02 11:45   ` Michael S. Tsirkin
2011-10-14  2:15     ` David Gibson
2011-09-05  4:34 ` [Qemu-devel] [PATCH 4/9] ac97: " David Gibson
2011-09-05  4:35 ` [Qemu-devel] [PATCH 5/9] es1370: " David Gibson
2011-09-05  4:35 ` [Qemu-devel] [PATCH 6/9] e1000: " David Gibson
2011-09-05  4:35 ` [Qemu-devel] [PATCH 7/9] lsi53c895a: " David Gibson
2011-09-05  4:35 ` [Qemu-devel] [PATCH 8/9] pcnet-pci: " David Gibson
2011-09-05  4:35 ` [Qemu-devel] [PATCH 9/9] intel-hda: " David Gibson
2011-09-16  6:58 ` [Qemu-devel] [0/9] Preliminary work for IOMMU emulation support; the easy bits (v2) David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).