qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
@ 2012-11-25  2:02 Liu Ping Fan
  2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount Liu Ping Fan
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

v1:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html

v2:
http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html

v3:
http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html

v4:
http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03857.html

v5:
https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04867.html

v6:
http://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00542.html
--
changes v5->v6:
 Apply fine-grain lock for all address space.
 Introduce separated interface to allow mmio dispatcher called with/without big lock.

changes v6->v7:
 drop wrapper of gcc atomic builtin
  

Liu Ping Fan (7):
  qom: apply atomic on object's refcount
  hotplug: introduce qdev_unplug_complete() to remove device from views
  pci: remove pci device from mem view when unplug
  memory: introduce local lock for address space
  memory: make mmio dispatch able to be out of biglock
  memory: introduce tls context to trace nested mmio request issue
  vcpu: push mmio dispatcher out of big lock

 cpu-common.h      |    3 +
 docs/memory.txt   |    4 +
 exec.c            |  206 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 hw/acpi_piix4.c   |    2 +-
 hw/pci.c          |   13 +++-
 hw/pci.h          |    1 +
 hw/qdev.c         |   26 +++++++
 hw/qdev.h         |    3 +-
 kvm-all.c         |    4 +-
 memory-internal.h |    1 +
 memory.c          |    1 +
 memory.h          |    5 ++
 qom/object.c      |    5 +-
 13 files changed, 250 insertions(+), 24 deletions(-)

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
@ 2012-11-25  2:02 ` Liu Ping Fan
  2012-11-28 17:16   ` Richard Henderson
  2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 2/7] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 qom/object.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index e3e9242..1a697b3 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -600,16 +600,15 @@ GSList *object_class_get_list(const char *implements_type,
 
 void object_ref(Object *obj)
 {
-    obj->ref++;
+     __sync_fetch_and_add(&obj->ref, 1);
 }
 
 void object_unref(Object *obj)
 {
     g_assert(obj->ref > 0);
-    obj->ref--;
 
     /* parent always holds a reference to its children */
-    if (obj->ref == 0) {
+    if (__sync_fetch_and_sub(&obj->ref, 1) == 1) {
         object_finalize(obj);
     }
 }
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v7 2/7] hotplug: introduce qdev_unplug_complete() to remove device from views
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
  2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount Liu Ping Fan
@ 2012-11-25  2:02 ` Liu Ping Fan
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 3/7] pci: remove pci device from mem view when unplug Liu Ping Fan
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

When device unplug has been ack by guest, we first remove it from memory
to prevent incoming access from dispatcher. Then we isolate it from
device composition tree

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/qdev.c |   26 ++++++++++++++++++++++++++
 hw/qdev.h |    3 ++-
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index 9b9aba3..681e133 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -98,6 +98,14 @@ void qdev_set_parent_bus(DeviceState *dev, BusState *bus)
     bus_add_child(bus, dev);
 }
 
+static void qdev_unset_parent(DeviceState *dev)
+{
+    BusState *b = dev->parent_bus;
+
+    object_unparent(OBJECT(dev));
+    bus_remove_child(b, dev);
+}
+
 /* Create a new device.  This only initializes the device state structure
    and allows properties to be set.  qdev_init should be called to
    initialize the actual device emulation.  */
@@ -187,6 +195,24 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
     dev->alias_required_for_version = required_for_version;
 }
 
+static int qdev_unmap(DeviceState *dev)
+{
+    DeviceClass *dc =  DEVICE_GET_CLASS(dev);
+    if (dc->unmap) {
+        dc->unmap(dev);
+    }
+    return 0;
+}
+
+void qdev_unplug_complete(DeviceState *dev, Error **errp)
+{
+    /* isolate from mem view */
+    qdev_unmap(dev);
+    /* isolate from device tree */
+    qdev_unset_parent(dev);
+    object_unref(OBJECT(dev));
+}
+
 void qdev_unplug(DeviceState *dev, Error **errp)
 {
     DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/hw/qdev.h b/hw/qdev.h
index c6ac636..71eb9ca 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -47,7 +47,7 @@ typedef struct DeviceClass {
 
     /* callbacks */
     void (*reset)(DeviceState *dev);
-
+    void (*unmap)(DeviceState *dev);
     /* device state */
     const VMStateDescription *vmsd;
 
@@ -160,6 +160,7 @@ void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version);
 void qdev_unplug(DeviceState *dev, Error **errp);
+void qdev_unplug_complete(DeviceState *dev, Error **errp);
 void qdev_free(DeviceState *dev);
 int qdev_simple_unplug_cb(DeviceState *dev);
 void qdev_machine_creation_done(void);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v7 3/7] pci: remove pci device from mem view when unplug
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
  2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount Liu Ping Fan
  2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 2/7] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
@ 2012-11-25  2:03 ` Liu Ping Fan
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 4/7] memory: introduce local lock for address space Liu Ping Fan
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/acpi_piix4.c |    2 +-
 hw/pci.c        |   13 ++++++++++++-
 hw/pci.h        |    1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 15275cf..b45a016 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -306,7 +306,7 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
             if (pc->no_hotplug) {
                 slot_free = false;
             } else {
-                qdev_free(qdev);
+                qdev_unplug_complete(qdev, NULL);
             }
         }
     }
diff --git a/hw/pci.c b/hw/pci.c
index 7eeaac0..9ba589e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -869,7 +869,6 @@ static int pci_unregister_device(DeviceState *dev)
     PCIDevice *pci_dev = PCI_DEVICE(dev);
     PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
 
-    pci_unregister_io_regions(pci_dev);
     pci_del_option_rom(pci_dev);
 
     if (pc->exit) {
@@ -880,6 +879,17 @@ static int pci_unregister_device(DeviceState *dev)
     return 0;
 }
 
+static void pci_unmap_device(DeviceState *dev)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
+
+    pci_unregister_io_regions(pci_dev);
+    if (pc->unmap) {
+        pc->unmap(pci_dev);
+    }
+}
+
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
                       uint8_t type, MemoryRegion *memory)
 {
@@ -2105,6 +2115,7 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     DeviceClass *k = DEVICE_CLASS(klass);
     k->init = pci_qdev_init;
     k->unplug = pci_unplug_device;
+    k->unmap = pci_unmap_device;
     k->exit = pci_unregister_device;
     k->bus_type = TYPE_PCI_BUS;
     k->props = pci_props;
diff --git a/hw/pci.h b/hw/pci.h
index 1f902f5..898cc5e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -154,6 +154,7 @@ typedef struct PCIDeviceClass {
     DeviceClass parent_class;
 
     int (*init)(PCIDevice *dev);
+    void (*unmap)(PCIDevice *dev);
     PCIUnregisterFunc *exit;
     PCIConfigReadFunc *config_read;
     PCIConfigWriteFunc *config_write;
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v7 4/7] memory: introduce local lock for address space
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
                   ` (2 preceding siblings ...)
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 3/7] pci: remove pci device from mem view when unplug Liu Ping Fan
@ 2012-11-25  2:03 ` Liu Ping Fan
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

For those address spaces which want to be able out of big lock, they
will be protected by their own local.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 memory.c |    1 +
 memory.h |    3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index 2f68d67..18425fd 100644
--- a/memory.c
+++ b/memory.c
@@ -1535,6 +1535,7 @@ void memory_listener_unregister(MemoryListener *listener)
 void address_space_init(AddressSpace *as, MemoryRegion *root)
 {
     memory_region_transaction_begin();
+    qemu_mutex_init(&as->lock);
     as->root = root;
     as->current_map = g_new(FlatView, 1);
     flatview_init(as->current_map);
diff --git a/memory.h b/memory.h
index 79393f1..13a9e3e 100644
--- a/memory.h
+++ b/memory.h
@@ -22,6 +22,7 @@
 #include "cpu-common.h"
 #include "targphys.h"
 #include "qemu-queue.h"
+#include "qemu-thread.h"
 #include "iorange.h"
 #include "ioport.h"
 #include "int128.h"
@@ -164,6 +165,7 @@ typedef struct AddressSpace AddressSpace;
  */
 struct AddressSpace {
     /* All fields are private. */
+    QemuMutex lock;
     const char *name;
     MemoryRegion *root;
     struct FlatView *current_map;
@@ -801,6 +803,7 @@ void mtree_info(fprintf_function mon_printf, void *f);
  *
  * @as: an uninitialized #AddressSpace
  * @root: a #MemoryRegion that routes addesses for the address space
+ * @lock: if true, the physmap protected by local lock, otherwise big lock
  */
 void address_space_init(AddressSpace *as, MemoryRegion *root);
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
                   ` (3 preceding siblings ...)
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 4/7] memory: introduce local lock for address space Liu Ping Fan
@ 2012-11-25  2:03 ` Liu Ping Fan
  2013-05-06 11:21   ` Paolo Bonzini
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 6/7] memory: introduce tls context to trace nested mmio request issue Liu Ping Fan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Without biglock, we try to protect the mr by increase refcnt.
If we cannot inc refcnt, go backward and resort to biglock.

Another point is memory radix-tree can be flushed by another
thread, so we should get the copy of terminal mr to survive
from such issue.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 docs/memory.txt   |    4 ++
 exec.c            |  154 +++++++++++++++++++++++++++++++++++++++++++++++-----
 memory-internal.h |    1 +
 memory.h          |    2 +
 4 files changed, 146 insertions(+), 15 deletions(-)

diff --git a/docs/memory.txt b/docs/memory.txt
index 5bbee8e..6b3d15a 100644
--- a/docs/memory.txt
+++ b/docs/memory.txt
@@ -170,3 +170,7 @@ various constraints can be supplied to control how these callbacks are called:
  - .old_portio and .old_mmio can be used to ease porting from code using
    cpu_register_io_memory() and register_ioport().  They should not be used
    in new code.
+
+MMIO regions are provided with ->ref() and ->unref() callbacks; This pair callbacks
+are optional. When ref() return non-zero, Both MemoryRegion and its opaque are
+safe to use.
diff --git a/exec.c b/exec.c
index 750008c..fa34ef9 100644
--- a/exec.c
+++ b/exec.c
@@ -2280,7 +2280,7 @@ static void register_multipage(AddressSpaceDispatch *d, MemoryRegionSection *sec
                   section_index);
 }
 
-static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
+static void mem_nop(MemoryListener *listener, MemoryRegionSection *section)
 {
     AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
     MemoryRegionSection now = *section, remain = *section;
@@ -2314,6 +2314,26 @@ static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
     }
 }
 
+static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (mr->ops && mr->ops->ref) {
+        mr->ops->ref(mr);
+    }
+    mem_nop(listener, section);
+}
+
+static void mem_del(MemoryListener *listener,
+                            MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (mr->ops && mr->ops->unref) {
+        mr->ops->unref(mr);
+    }
+}
+
 void qemu_flush_coalesced_mmio_buffer(void)
 {
     if (kvm_enabled())
@@ -3165,11 +3185,23 @@ static void io_mem_init(void)
 static void mem_begin(MemoryListener *listener)
 {
     AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
+    AddressSpace *as = d->as;
 
+    /* protect the updating process of mrs in memory core agaist readers */
+    qemu_mutex_lock(&as->lock);
     destroy_all_mappings(d);
     d->phys_map.ptr = PHYS_MAP_NODE_NIL;
 }
 
+static void mem_commit(MemoryListener *listener)
+{
+    AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch,
+                listener);
+    AddressSpace *as = d->as;
+
+    qemu_mutex_unlock(&as->lock);
+}
+
 static void core_begin(MemoryListener *listener)
 {
     phys_sections_clear();
@@ -3243,11 +3275,14 @@ void address_space_init_dispatch(AddressSpace *as)
     d->phys_map  = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .is_leaf = 0 };
     d->listener = (MemoryListener) {
         .begin = mem_begin,
+        .commit = mem_commit,
         .region_add = mem_add,
-        .region_nop = mem_add,
+        .region_del = mem_del,
+        .region_nop = mem_nop,
         .priority = 0,
     };
     as->dispatch = d;
+    as->dispatch->as = as;
     memory_listener_register(&d->listener, as);
 }
 
@@ -3345,6 +3380,68 @@ static void invalidate_and_set_dirty(target_phys_addr_t addr,
     xen_modified_memory(addr, length);
 }
 
+static MemoryRegionSection *subpage_get_terminal(subpage_t *mmio,
+    target_phys_addr_t addr)
+{
+    MemoryRegionSection *section;
+    unsigned int idx = SUBPAGE_IDX(addr);
+
+    section = &phys_sections[mmio->sub_section[idx]];
+    return section;
+}
+
+static bool memory_region_section_ref(MemoryRegionSection *mrs)
+{
+    MemoryRegion *mr;
+    bool ret = false;
+
+    mr = mrs->mr;
+    if (mr->ops && mr->ops->ref) {
+        mr->ops->ref(mr);
+        ret = true;
+    }
+    return ret;
+}
+
+static void memory_region_section_unref(MemoryRegionSection *mrs)
+{
+    MemoryRegion *mr;
+
+    mr = mrs->mr;
+    if (mr->ops && mr->ops->unref) {
+        mr->ops->unref(mr);
+    }
+}
+
+static bool memory_region_section_lookup_ref(AddressSpaceDispatch *d,
+    target_phys_addr_t addr, MemoryRegionSection *mrs)
+{
+    MemoryRegionSection *section;
+    bool ret;
+
+    section = phys_page_find(d, addr >> TARGET_PAGE_BITS);
+    if (section->mr->subpage) {
+        section = subpage_get_terminal(section->mr->opaque, addr);
+    }
+    *mrs = *section;
+    ret = memory_region_section_ref(mrs);
+
+    return ret;
+}
+
+static bool address_space_section_lookup_ref(AddressSpace *as,
+    target_phys_addr_t page, MemoryRegionSection *mrs)
+{
+    bool safe_ref;
+    AddressSpaceDispatch *d = as->dispatch;
+
+    qemu_mutex_lock(&as->lock);
+    safe_ref = memory_region_section_lookup_ref(d, page, mrs);
+    qemu_mutex_unlock(&as->lock);
+
+    return safe_ref;
+}
+
 void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                       int len, bool is_write)
 {
@@ -3353,14 +3450,29 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
     uint8_t *ptr;
     uint32_t val;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    bool safe_ref;
+    MemoryRegionSection *section, obj_mrs;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
+
+        qemu_mutex_lock(&as->lock);
+        safe_ref = memory_region_section_lookup_ref(d, page, &obj_mrs);
+        qemu_mutex_unlock(&as->lock);
+        if (!safe_ref) {
+            qemu_mutex_lock_iothread();
+            qemu_mutex_lock(&as->lock);
+            /* when 2nd try, mem map can change, need to judge it again */
+            safe_ref = memory_region_section_lookup_ref(d, page, &obj_mrs);
+            qemu_mutex_unlock(&as->lock);
+            if (safe_ref) {
+                qemu_mutex_unlock_iothread();
+            }
+        }
+        section = &obj_mrs;
 
         if (is_write) {
             if (!memory_region_is_ram(section->mr)) {
@@ -3425,9 +3537,13 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                 qemu_put_ram_ptr(ptr);
             }
         }
+        memory_region_section_unref(&obj_mrs);
         len -= l;
         buf += l;
         addr += l;
+        if (!safe_ref) {
+            qemu_mutex_unlock_iothread();
+        }
     }
 }
 
@@ -3460,18 +3576,19 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
 void cpu_physical_memory_write_rom(target_phys_addr_t addr,
                                    const uint8_t *buf, int len)
 {
-    AddressSpaceDispatch *d = address_space_memory.dispatch;
     int l;
     uint8_t *ptr;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
 
     while (len > 0) {
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
+        address_space_section_lookup_ref(&address_space_memory,
+                page >> TARGET_PAGE_BITS, &mr_obj);
+        section = &mr_obj;
 
         if (!(memory_region_is_ram(section->mr) ||
               memory_region_is_romd(section->mr))) {
@@ -3489,6 +3606,7 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
         len -= l;
         buf += l;
         addr += l;
+        memory_region_section_unref(&mr_obj);
     }
 }
 
@@ -3550,12 +3668,11 @@ void *address_space_map(AddressSpace *as,
                         target_phys_addr_t *plen,
                         bool is_write)
 {
-    AddressSpaceDispatch *d = as->dispatch;
     target_phys_addr_t len = *plen;
     target_phys_addr_t todo = 0;
     int l;
     target_phys_addr_t page;
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
     ram_addr_t raddr = RAM_ADDR_MAX;
     ram_addr_t rlen;
     void *ret;
@@ -3565,7 +3682,8 @@ void *address_space_map(AddressSpace *as,
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
-        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
+        address_space_section_lookup_ref(as, page >> TARGET_PAGE_BITS, &mr_obj);
+        section = &mr_obj;
 
         if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
             if (todo || bounce.buffer) {
@@ -3579,6 +3697,7 @@ void *address_space_map(AddressSpace *as,
             }
 
             *plen = l;
+            memory_region_section_unref(&mr_obj);
             return bounce.buffer;
         }
         if (!todo) {
@@ -3589,6 +3708,7 @@ void *address_space_map(AddressSpace *as,
         len -= l;
         addr += l;
         todo += l;
+        memory_region_section_unref(&mr_obj);
     }
     rlen = todo;
     ret = qemu_ram_ptr_length(raddr, &rlen);
@@ -4197,12 +4317,16 @@ bool virtio_is_big_endian(void)
 #ifndef CONFIG_USER_ONLY
 bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
 {
-    MemoryRegionSection *section;
+    MemoryRegionSection *section, mr_obj;
+    bool ret;
 
-    section = phys_page_find(address_space_memory.dispatch,
-                             phys_addr >> TARGET_PAGE_BITS);
-
-    return !(memory_region_is_ram(section->mr) ||
+    address_space_section_lookup_ref(&address_space_memory,
+                             phys_addr >> TARGET_PAGE_BITS, &mr_obj);
+    section = &mr_obj;
+    ret = !(memory_region_is_ram(section->mr) ||
              memory_region_is_romd(section->mr));
+    memory_region_section_unref(&mr_obj);
+
+    return ret;
 }
 #endif
diff --git a/memory-internal.h b/memory-internal.h
index b33a99d..ea06bfb 100644
--- a/memory-internal.h
+++ b/memory-internal.h
@@ -38,6 +38,7 @@ struct AddressSpaceDispatch {
      */
     PhysPageEntry phys_map;
     MemoryListener listener;
+    AddressSpace *as;
 };
 
 void address_space_init_dispatch(AddressSpace *as);
diff --git a/memory.h b/memory.h
index 13a9e3e..704d014 100644
--- a/memory.h
+++ b/memory.h
@@ -67,6 +67,8 @@ struct MemoryRegionOps {
                   target_phys_addr_t addr,
                   uint64_t data,
                   unsigned size);
+    void (*ref)(MemoryRegion *mr);
+    void (*unref)(MemoryRegion *mr);
 
     enum device_endian endianness;
     /* Guest-visible constraints: */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v7 6/7] memory: introduce tls context to trace nested mmio request issue
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
                   ` (4 preceding siblings ...)
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
@ 2012-11-25  2:03 ` Liu Ping Fan
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 7/7] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
  2012-12-06  7:28 ` [Qemu-devel] [PATCH v7 0/7] push mmio dispatch " liu ping fan
  7 siblings, 0 replies; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

After breaking down big lock, nested MMIO request which not targeting
at RAM can cause deadlock issue. Supposing the scene: dev_a,b with
fine-grain locks lockA/B, then ABBA dealock issue can be triggered.
We fix this by tracing and rejecting such request.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 exec.c |   37 ++++++++++++++++++++++++++++++++++++-
 1 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/exec.c b/exec.c
index fa34ef9..7eae54e 100644
--- a/exec.c
+++ b/exec.c
@@ -3442,6 +3442,32 @@ static bool address_space_section_lookup_ref(AddressSpace *as,
     return safe_ref;
 }
 
+typedef struct ThreadContext {
+  unsigned int req_pending;
+} ThreadContext;
+
+static DEFINE_TLS(ThreadContext, thread_context) = {
+    .req_pending = 0
+};
+#define thread_context tls_var(thread_context)
+
+static bool address_space_check_inc_req_pending(MemoryRegionSection *section)
+{
+    bool nested;
+
+    nested = ++thread_context.req_pending > 1 ? true : false;
+    /* To fix, will filter iommu case */
+    if (nested && !memory_region_is_ram(section->mr)) {
+        fprintf(stderr, "waring: nested target not RAM is not support");
+    }
+    return nested;
+}
+
+static void address_space_dec_req_pending(void)
+{
+    thread_context.req_pending--;
+}
+
 void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                       int len, bool is_write)
 {
@@ -3450,7 +3476,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
     uint8_t *ptr;
     uint32_t val;
     target_phys_addr_t page;
-    bool safe_ref;
+    bool safe_ref, nested;
     MemoryRegionSection *section, obj_mrs;
 
     while (len > 0) {
@@ -3462,6 +3488,11 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
         qemu_mutex_lock(&as->lock);
         safe_ref = memory_region_section_lookup_ref(d, page, &obj_mrs);
         qemu_mutex_unlock(&as->lock);
+        nested = address_space_check_inc_req_pending(&obj_mrs);
+        if (nested) {
+            goto skip;
+        }
+
         if (!safe_ref) {
             qemu_mutex_lock_iothread();
             qemu_mutex_lock(&as->lock);
@@ -3477,6 +3508,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
         if (is_write) {
             if (!memory_region_is_ram(section->mr)) {
                 target_phys_addr_t addr1;
+
                 addr1 = memory_region_section_addr(section, addr);
                 /* XXX: could force cpu_single_env to NULL to avoid
                    potential bugs */
@@ -3510,6 +3542,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
             if (!(memory_region_is_ram(section->mr) ||
                   memory_region_is_romd(section->mr))) {
                 target_phys_addr_t addr1;
+
                 /* I/O case */
                 addr1 = memory_region_section_addr(section, addr);
                 if (l >= 4 && ((addr1 & 3) == 0)) {
@@ -3537,6 +3570,8 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
                 qemu_put_ram_ptr(ptr);
             }
         }
+skip:
+        address_space_dec_req_pending();
         memory_region_section_unref(&obj_mrs);
         len -= l;
         buf += l;
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v7 7/7] vcpu: push mmio dispatcher out of big lock
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
                   ` (5 preceding siblings ...)
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 6/7] memory: introduce tls context to trace nested mmio request issue Liu Ping Fan
@ 2012-11-25  2:03 ` Liu Ping Fan
  2012-12-06  7:28 ` [Qemu-devel] [PATCH v7 0/7] push mmio dispatch " liu ping fan
  7 siblings, 0 replies; 35+ messages in thread
From: Liu Ping Fan @ 2012-11-25  2:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Anthony Liguori,
	Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

To anti the recursive big lock, introduce separate interfaces to allow
address space dispatcher called with/without big lock.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpu-common.h |    3 +++
 exec.c       |   21 +++++++++++++++++----
 kvm-all.c    |    4 +++-
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index c0d27af..69c1d7a 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -51,6 +51,9 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 
 void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                             int len, int is_write);
+void cpu_physical_memory_nolock_rw(target_phys_addr_t addr, uint8_t *buf,
+                            int len, int is_write);
+
 static inline void cpu_physical_memory_read(target_phys_addr_t addr,
                                             void *buf, int len)
 {
diff --git a/exec.c b/exec.c
index 7eae54e..9fec600 100644
--- a/exec.c
+++ b/exec.c
@@ -3468,8 +3468,8 @@ static void address_space_dec_req_pending(void)
     thread_context.req_pending--;
 }
 
-void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
-                      int len, bool is_write)
+static void address_space_rw_internal(AddressSpace *as, target_phys_addr_t addr,
+                      uint8_t *buf, int len, bool is_write, bool biglock)
 {
     AddressSpaceDispatch *d = as->dispatch;
     int l;
@@ -3493,7 +3493,7 @@ void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
             goto skip;
         }
 
-        if (!safe_ref) {
+        if (!safe_ref && !biglock) {
             qemu_mutex_lock_iothread();
             qemu_mutex_lock(&as->lock);
             /* when 2nd try, mem map can change, need to judge it again */
@@ -3576,12 +3576,18 @@ skip:
         len -= l;
         buf += l;
         addr += l;
-        if (!safe_ref) {
+        if (!safe_ref && !biglock) {
             qemu_mutex_unlock_iothread();
         }
     }
 }
 
+void address_space_rw(AddressSpace *as, target_phys_addr_t addr, uint8_t *buf,
+                      int len, bool is_write)
+{
+    address_space_rw_internal(as, addr, buf, len, is_write, true);
+}
+
 void address_space_write(AddressSpace *as, target_phys_addr_t addr,
                          const uint8_t *buf, int len)
 {
@@ -3607,6 +3613,13 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     return address_space_rw(&address_space_memory, addr, buf, len, is_write);
 }
 
+void cpu_physical_memory_nolock_rw(target_phys_addr_t addr, uint8_t *buf,
+                            int len, int is_write)
+{
+    return address_space_rw_internal(&address_space_memory, addr, buf, len,
+           is_write, false);
+}
+
 /* used for ROM loading : can write in RAM and ROM */
 void cpu_physical_memory_write_rom(target_phys_addr_t addr,
                                    const uint8_t *buf, int len)
diff --git a/kvm-all.c b/kvm-all.c
index c2c6909..41261d2 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1573,10 +1573,12 @@ int kvm_cpu_exec(CPUArchState *env)
             break;
         case KVM_EXIT_MMIO:
             DPRINTF("handle_mmio\n");
-            cpu_physical_memory_rw(run->mmio.phys_addr,
+            qemu_mutex_unlock_iothread();
+            cpu_physical_memory_nolock_rw(run->mmio.phys_addr,
                                    run->mmio.data,
                                    run->mmio.len,
                                    run->mmio.is_write);
+            qemu_mutex_lock_iothread();
             ret = 0;
             break;
         case KVM_EXIT_IRQ_WINDOW_OPEN:
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount
  2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount Liu Ping Fan
@ 2012-11-28 17:16   ` Richard Henderson
  2012-11-29  8:35     ` liu ping fan
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2012-11-28 17:16 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

On 11/24/2012 06:02 PM, Liu Ping Fan wrote:
> -    obj->ref--;
>  
>      /* parent always holds a reference to its children */
> -    if (obj->ref == 0) {
> +    if (__sync_fetch_and_sub(&obj->ref, 1) == 1) {

  if (__sync_sub_and_fetch(&obj->ref, 1) == 0)


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount
  2012-11-28 17:16   ` Richard Henderson
@ 2012-11-29  8:35     ` liu ping fan
  0 siblings, 0 replies; 35+ messages in thread
From: liu ping fan @ 2012-11-29  8:35 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

On Thu, Nov 29, 2012 at 1:16 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 11/24/2012 06:02 PM, Liu Ping Fan wrote:
>> -    obj->ref--;
>>
>>      /* parent always holds a reference to its children */
>> -    if (obj->ref == 0) {
>> +    if (__sync_fetch_and_sub(&obj->ref, 1) == 1) {
>
>   if (__sync_sub_and_fetch(&obj->ref, 1) == 0)
>
Applied, thanks
>
> r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
                   ` (6 preceding siblings ...)
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 7/7] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
@ 2012-12-06  7:28 ` liu ping fan
  2013-05-02 16:58   ` Jan Kiszka
  7 siblings, 1 reply; 35+ messages in thread
From: liu ping fan @ 2012-12-06  7:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, Avi Kivity,
	Anthony Liguori, Stefan Hajnoczi, Paolo Bonzini

Any suggestion? Or new design idea for this?

Thanks
Pingfan

On Sun, Nov 25, 2012 at 10:02 AM, Liu Ping Fan <qemulist@gmail.com> wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> v1:
> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03312.html
>
> v2:
> http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01275.html
>
> v3:
> http://lists.nongnu.org/archive/html/qemu-devel/2012-09/msg01474.html
>
> v4:
> http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03857.html
>
> v5:
> https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04867.html
>
> v6:
> http://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00542.html
> --
> changes v5->v6:
>  Apply fine-grain lock for all address space.
>  Introduce separated interface to allow mmio dispatcher called with/without big lock.
>
> changes v6->v7:
>  drop wrapper of gcc atomic builtin
>
>
> Liu Ping Fan (7):
>   qom: apply atomic on object's refcount
>   hotplug: introduce qdev_unplug_complete() to remove device from views
>   pci: remove pci device from mem view when unplug
>   memory: introduce local lock for address space
>   memory: make mmio dispatch able to be out of biglock
>   memory: introduce tls context to trace nested mmio request issue
>   vcpu: push mmio dispatcher out of big lock
>
>  cpu-common.h      |    3 +
>  docs/memory.txt   |    4 +
>  exec.c            |  206 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  hw/acpi_piix4.c   |    2 +-
>  hw/pci.c          |   13 +++-
>  hw/pci.h          |    1 +
>  hw/qdev.c         |   26 +++++++
>  hw/qdev.h         |    3 +-
>  kvm-all.c         |    4 +-
>  memory-internal.h |    1 +
>  memory.c          |    1 +
>  memory.h          |    5 ++
>  qom/object.c      |    5 +-
>  13 files changed, 250 insertions(+), 24 deletions(-)
>
> --
> 1.7.4.4
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2012-12-06  7:28 ` [Qemu-devel] [PATCH v7 0/7] push mmio dispatch " liu ping fan
@ 2013-05-02 16:58   ` Jan Kiszka
  2013-05-02 17:14     ` Jan Kiszka
  2013-05-03  7:37     ` liu ping fan
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Kiszka @ 2013-05-02 16:58 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, gleb@redhat.com, Stefan Hajnoczi, Marcelo Tosatti,
	qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini

Hi Pingfan,

On 2012-12-06 08:28, liu ping fan wrote:
> Any suggestion? Or new design idea for this?

Finally... I'm getting back to this. I'm currently trying to make use of
this series, adapting it to my needs (selective BQL-free dispatching of
PIO regions).

Is there a newer version available on your side? This one obviously no
longer applies due to all the code movements in QEMU. But it also seems
to contain some bugs, at least in patch 5 (mixed up page number vs. page
address around for address_space_section_lookup_ref).

Then we should get rid of the ref/unref callbacks. Making a memory
region BQL-free must be as simple as setting a flag or (more likely)
adding a reference to the owning QOM object in the region.
Reimplementing ref/unref in device models over and over again is clearly
a no-go. Maybe I'm currently forgetting a use case where overloading the
reference functions is needed, so please help my memory in that case.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-02 16:58   ` Jan Kiszka
@ 2013-05-02 17:14     ` Jan Kiszka
  2013-05-03  7:37     ` liu ping fan
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Kiszka @ 2013-05-02 17:14 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, gleb@redhat.com, Stefan Hajnoczi, Marcelo Tosatti,
	qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini

On 2013-05-02 18:58, Jan Kiszka wrote:
> Hi Pingfan,
> 
> On 2012-12-06 08:28, liu ping fan wrote:
>> Any suggestion? Or new design idea for this?
> 
> Finally... I'm getting back to this. I'm currently trying to make use of
> this series, adapting it to my needs (selective BQL-free dispatching of
> PIO regions).
> 
> Is there a newer version available on your side? This one obviously no
> longer applies due to all the code movements in QEMU. But it also seems
> to contain some bugs, at least in patch 5 (mixed up page number vs. page
> address around for address_space_section_lookup_ref).

Oh, and the series isn't bisectable between 5 and 7.

Jan

> 
> Then we should get rid of the ref/unref callbacks. Making a memory
> region BQL-free must be as simple as setting a flag or (more likely)
> adding a reference to the owning QOM object in the region.
> Reimplementing ref/unref in device models over and over again is clearly
> a no-go. Maybe I'm currently forgetting a use case where overloading the
> reference functions is needed, so please help my memory in that case.
> 
> Jan
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-02 16:58   ` Jan Kiszka
  2013-05-02 17:14     ` Jan Kiszka
@ 2013-05-03  7:37     ` liu ping fan
  2013-05-03  8:04       ` Jan Kiszka
  1 sibling, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-05-03  7:37 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, gleb@redhat.com, Stefan Hajnoczi, Marcelo Tosatti,
	qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini

On Fri, May 3, 2013 at 12:58 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> Hi Pingfan,
>
> On 2012-12-06 08:28, liu ping fan wrote:
>> Any suggestion? Or new design idea for this?
>
> Finally... I'm getting back to this. I'm currently trying to make use of
> this series, adapting it to my needs (selective BQL-free dispatching of
> PIO regions).
>
Glad that you are back :)

> Is there a newer version available on your side? This one obviously no

No, but I can see the code and rebase next week.

> longer applies due to all the code movements in QEMU. But it also seems
> to contain some bugs, at least in patch 5 (mixed up page number vs. page
> address around for address_space_section_lookup_ref).
>
Will pay some time to see it.

> Then we should get rid of the ref/unref callbacks. Making a memory
> region BQL-free must be as simple as setting a flag or (more likely)
> adding a reference to the owning QOM object in the region.
> Reimplementing ref/unref in device models over and over again is clearly
> a no-go. Maybe I'm currently forgetting a use case where overloading the

At the beginning, Avi suggest to enforce mr->opaque to be Device
object, but due to the nested embedded Object, we fail. And finally
Avi suggest ref/unref interface.
>From my point,  we can save lots of reimplementing ref/unref in device
models by telling whether mr->opauque is Object or not.  And leave not
object case to reimplement ref/unref.

> reference functions is needed, so please help my memory in that case.
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-03  7:37     ` liu ping fan
@ 2013-05-03  8:04       ` Jan Kiszka
  2013-05-04  9:47         ` Paolo Bonzini
  2013-05-06  1:57         ` liu ping fan
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Kiszka @ 2013-05-03  8:04 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, gleb@redhat.com, Stefan Hajnoczi, Marcelo Tosatti,
	qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini

On 2013-05-03 09:37, liu ping fan wrote:
> On Fri, May 3, 2013 at 12:58 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> Hi Pingfan,
>>
>> On 2012-12-06 08:28, liu ping fan wrote:
>>> Any suggestion? Or new design idea for this?
>>
>> Finally... I'm getting back to this. I'm currently trying to make use of
>> this series, adapting it to my needs (selective BQL-free dispatching of
>> PIO regions).
>>
> Glad that you are back :)
> 
>> Is there a newer version available on your side? This one obviously no
> 
> No, but I can see the code and rebase next week.

I've already rebased and started to fix/cleanup. Today I will look into
the ref/unref vs. Object topic and try a first unlocked device model. I
can share afterward so that we do not need to do the work twice.

> 
>> longer applies due to all the code movements in QEMU. But it also seems
>> to contain some bugs, at least in patch 5 (mixed up page number vs. page
>> address around for address_space_section_lookup_ref).
>>
> Will pay some time to see it.

Fixed. Specifically subpages were broken. As I've converted portio to
use the memory core dispatcher, I'm getting a lot of them now and
triggered the bugs immediately.

There is still some nesting issue around coalesced MMIO flushing. Need
to look into this today as well. So far I've worked around it by
assuming that nesting is fine if we enter the dispatcher with BQL held.

> 
>> Then we should get rid of the ref/unref callbacks. Making a memory
>> region BQL-free must be as simple as setting a flag or (more likely)
>> adding a reference to the owning QOM object in the region.
>> Reimplementing ref/unref in device models over and over again is clearly
>> a no-go. Maybe I'm currently forgetting a use case where overloading the
> 
> At the beginning, Avi suggest to enforce mr->opaque to be Device
> object, but due to the nested embedded Object, we fail. And finally
> Avi suggest ref/unref interface.
> From my point,  we can save lots of reimplementing ref/unref in device
> models by telling whether mr->opauque is Object or not.  And leave not
> object case to reimplement ref/unref.

We can't change the semantics of opaque as long as old_mmio / old_portio
are around. But we need a flag anyway to indicate if a region is
depending on BQL or not. Adding a separate "Object *owner" to
MemoryRegion can serve both purposes. Then we define something like

void memory_region_set_local_locking(MemoryRegion *mr,
                                     bool local_locking,
                                     Object *owner);

to control the property (if local_locking is true, owner must be
non-NULL, of course). That's quite similar to my old prototype here that
had memory_region_set/clear_global_locking.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-03  8:04       ` Jan Kiszka
@ 2013-05-04  9:47         ` Paolo Bonzini
  2013-05-04 10:42           ` Jan Kiszka
  2013-05-06  1:46           ` liu ping fan
  2013-05-06  1:57         ` liu ping fan
  1 sibling, 2 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-04  9:47 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Liu Ping Fan

Il 03/05/2013 10:04, Jan Kiszka ha scritto:
> We can't change the semantics of opaque as long as old_mmio / old_portio
> are around. But we need a flag anyway to indicate if a region is
> depending on BQL or not. Adding a separate "Object *owner" to
> MemoryRegion can serve both purposes. Then we define something like
> 
> void memory_region_set_local_locking(MemoryRegion *mr,
>                                      bool local_locking,
>                                      Object *owner);
> 
> to control the property (if local_locking is true, owner must be
> non-NULL, of course). That's quite similar to my old prototype here that
> had memory_region_set/clear_global_locking.

I think setting the owner can be done separately from enabling local
lock.  For example, memory_region_find could also have a variant that
adds a ref to the owner.  It would be very similar to what Ping Fan is
doing in the virtio-dataplane's HostMem data structure.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-04  9:47         ` Paolo Bonzini
@ 2013-05-04 10:42           ` Jan Kiszka
  2013-05-06  8:07             ` Paolo Bonzini
  2013-05-06  1:46           ` liu ping fan
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-04 10:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Liu Ping Fan

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On 2013-05-04 11:47, Paolo Bonzini wrote:
> Il 03/05/2013 10:04, Jan Kiszka ha scritto:
>> We can't change the semantics of opaque as long as old_mmio / old_portio
>> are around. But we need a flag anyway to indicate if a region is
>> depending on BQL or not. Adding a separate "Object *owner" to
>> MemoryRegion can serve both purposes. Then we define something like
>>
>> void memory_region_set_local_locking(MemoryRegion *mr,
>>                                      bool local_locking,
>>                                      Object *owner);
>>
>> to control the property (if local_locking is true, owner must be
>> non-NULL, of course). That's quite similar to my old prototype here that
>> had memory_region_set/clear_global_locking.
> 
> I think setting the owner can be done separately from enabling local
> lock.  For example, memory_region_find could also have a variant that
> adds a ref to the owner.  It would be very similar to what Ping Fan is
> doing in the virtio-dataplane's HostMem data structure.

That's trivial to break up, but I'm not sure if there will be reasonable
scenarios where a region requires reference counting without being able
to work without the BQL. RAM, e.g., should always work BQL-free (once we
have the infrastructure in place).

And memory_region_find should likely always increment a reference if the
target region has an owner. We should convert its users to properly
dereference the region once done with it.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-04  9:47         ` Paolo Bonzini
  2013-05-04 10:42           ` Jan Kiszka
@ 2013-05-06  1:46           ` liu ping fan
  1 sibling, 0 replies; 35+ messages in thread
From: liu ping fan @ 2013-05-06  1:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jan Kiszka, qemu-devel

On Sat, May 4, 2013 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 03/05/2013 10:04, Jan Kiszka ha scritto:
>> We can't change the semantics of opaque as long as old_mmio / old_portio
>> are around. But we need a flag anyway to indicate if a region is
>> depending on BQL or not. Adding a separate "Object *owner" to
>> MemoryRegion can serve both purposes. Then we define something like
>>
>> void memory_region_set_local_locking(MemoryRegion *mr,
>>                                      bool local_locking,
>>                                      Object *owner);
>>
>> to control the property (if local_locking is true, owner must be
>> non-NULL, of course). That's quite similar to my old prototype here that
>> had memory_region_set/clear_global_locking.
>
> I think setting the owner can be done separately from enabling local
> lock.  For example, memory_region_find could also have a variant that
> adds a ref to the owner.  It would be very similar to what Ping Fan is
> doing in the virtio-dataplane's HostMem data structure.
>
See reply in another thread .

> Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-03  8:04       ` Jan Kiszka
  2013-05-04  9:47         ` Paolo Bonzini
@ 2013-05-06  1:57         ` liu ping fan
  1 sibling, 0 replies; 35+ messages in thread
From: liu ping fan @ 2013-05-06  1:57 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, gleb@redhat.com, Stefan Hajnoczi, Marcelo Tosatti,
	qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini

On Fri, May 3, 2013 at 4:04 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2013-05-03 09:37, liu ping fan wrote:
>> On Fri, May 3, 2013 at 12:58 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> Hi Pingfan,
>>>
>>> On 2012-12-06 08:28, liu ping fan wrote:
>>>> Any suggestion? Or new design idea for this?
>>>
>>> Finally... I'm getting back to this. I'm currently trying to make use of
>>> this series, adapting it to my needs (selective BQL-free dispatching of
>>> PIO regions).
>>>
>> Glad that you are back :)
>>
>>> Is there a newer version available on your side? This one obviously no
>>
>> No, but I can see the code and rebase next week.
>
> I've already rebased and started to fix/cleanup. Today I will look into
> the ref/unref vs. Object topic and try a first unlocked device model. I
> can share afterward so that we do not need to do the work twice.
>
Ok, thanks.
>>
>>> longer applies due to all the code movements in QEMU. But it also seems
>>> to contain some bugs, at least in patch 5 (mixed up page number vs. page
>>> address around for address_space_section_lookup_ref).
>>>
>> Will pay some time to see it.
>
> Fixed. Specifically subpages were broken. As I've converted portio to
> use the memory core dispatcher, I'm getting a lot of them now and
> triggered the bugs immediately.
>
> There is still some nesting issue around coalesced MMIO flushing. Need
> to look into this today as well. So far I've worked around it by
> assuming that nesting is fine if we enter the dispatcher with BQL held.
>
>>
>>> Then we should get rid of the ref/unref callbacks. Making a memory
>>> region BQL-free must be as simple as setting a flag or (more likely)
>>> adding a reference to the owning QOM object in the region.
>>> Reimplementing ref/unref in device models over and over again is clearly
>>> a no-go. Maybe I'm currently forgetting a use case where overloading the
>>
>> At the beginning, Avi suggest to enforce mr->opaque to be Device
>> object, but due to the nested embedded Object, we fail. And finally
>> Avi suggest ref/unref interface.
>> From my point,  we can save lots of reimplementing ref/unref in device
>> models by telling whether mr->opauque is Object or not.  And leave not
>> object case to reimplement ref/unref.
>
> We can't change the semantics of opaque as long as old_mmio / old_portio
> are around. But we need a flag anyway to indicate if a region is
> depending on BQL or not. Adding a separate "Object *owner" to
> MemoryRegion can serve both purposes. Then we define something like
>
> void memory_region_set_local_locking(MemoryRegion *mr,
>                                      bool local_locking,
>                                      Object *owner);
>
I think, owner imply local_locking, so we need not call this function
for each device,
just those we plan to push out of biglock

> to control the property (if local_locking is true, owner must be
> non-NULL, of course). That's quite similar to my old prototype here that
> had memory_region_set/clear_global_locking.
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-04 10:42           ` Jan Kiszka
@ 2013-05-06  8:07             ` Paolo Bonzini
  2013-05-06  8:40               ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06  8:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Liu Ping Fan

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 04/05/2013 12:42, Jan Kiszka ha scritto:
> On 2013-05-04 11:47, Paolo Bonzini wrote:
>> Il 03/05/2013 10:04, Jan Kiszka ha scritto:
>>> We can't change the semantics of opaque as long as old_mmio /
>>> old_portio are around. But we need a flag anyway to indicate if
>>> a region is depending on BQL or not. Adding a separate "Object
>>> *owner" to MemoryRegion can serve both purposes. Then we define
>>> something like
>>> 
>>> void memory_region_set_local_locking(MemoryRegion *mr, bool
>>> local_locking, Object *owner);
>>> 
>>> to control the property (if local_locking is true, owner must
>>> be non-NULL, of course). That's quite similar to my old
>>> prototype here that had
>>> memory_region_set/clear_global_locking.
>> 
>> I think setting the owner can be done separately from enabling
>> local lock.  For example, memory_region_find could also have a
>> variant that adds a ref to the owner.  It would be very similar
>> to what Ping Fan is doing in the virtio-dataplane's HostMem data
>> structure.
> 
> That's trivial to break up, but I'm not sure if there will be
> reasonable scenarios where a region requires reference counting
> without being able to work without the BQL. RAM, e.g., should
> always work BQL-free (once we have the infrastructure in place).

I think we need to add an owner to all regions (tedious, but
doable---perhaps even scriptable).  The current code covers
address_space_rw, but memory_region_find remains callable only from
BQL-protected regions.  The caller of memory_region_find needs to be
able to inspect the MemoryRegion, even if it is just to fail on
non-RAM regions.  I would like to switch the dataplane code to use
memory_region_find.

BTW, have you seen
http://lwn.net/SubscriberLink/548909/b6fdd846f1232be6/ ? [*]  Perhaps
we can adopt something like that, it solves the same exact problem
that we have, and it's a well-known solution from the literature.

	[*]  The "subscriber link" mechanism allows an LWN.net
	     subscriber to generate a special URL for a
	     subscription-only article. That URL can then be given to
	     others, who will be able to access the article regardless
	     of whether they are subscribed. This feature is made
	     available as a service to LWN subscribers, and in the hope
	     that they will use it to spread the word about their
	     favorite LWN articles.

> And memory_region_find should likely always increment a reference
> if the target region has an owner. We should convert its users to
> properly dereference the region once done with it.

Yes.  But this is what requires you to have an owner for all regions.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRh2TNAAoJEBvWZb6bTYbyVqsP/3DUCevVyhMU0OsDrdMhtbQO
9fTzQmvhbUo2auEzjhjxvl9YH/2exvymsBH1kW2dM5xsct+MsULKMmpw2wucELfd
9i82fS/TbofTWQI0Qz2iCEn6G40aAJf5GC/eMMUINpYlTL0hhaRibaNl4wtwgM+N
FhlohLM0Dki/dQiF+DOfr2TdeFwuJfBpaDVL3Q7YZ+4TXADnjHglltWBwWk0RYvy
1nzUNqah4WwP3yOSlh53kT40VGCgea3mJaogoBTNz1iYdsi2FEGcRdO8JKQqZDoU
0EzfEfBTmACSjXdFOpnkR81PV19DiinRK/Wcj4RGfJJygHAZcabseueWOYKLox+P
Zkjvr1h8HBkJWAZB1yZn0M++ts6nByFFZt0RKDgR9DEhJbKlf6E/7yH/xclecSMR
UynRjuLIZWmKgs+VrfBE8Sda4Wz8NP8oR6A6rv0t9K+oLI0CAA8bj3fQmGhgldkP
AAIyOcsWP7VnizvaLoxicP20fAqvEBPYJhcO80/kVrdubG9yt6ljdHnIwGpbXrR5
hchYoWYK4SsyPp6YwESG1eVPZ/4GNoK0PIjeJELSmUGwbMfwU/VOaIL0DExXpF5Y
b9yct9CpgEW3PhGxqCNgEsPAIbMSpl1OjaAqAdDb+DGZiYwIWE8Mb3SB5Ilsvco2
ZYVzJH7sXoOB9o5k7DIt
=nA/3
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06  8:07             ` Paolo Bonzini
@ 2013-05-06  8:40               ` Jan Kiszka
  2013-05-06 10:27                 ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-06  8:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Liu Ping Fan

On 2013-05-06 10:07, Paolo Bonzini wrote:
> Il 04/05/2013 12:42, Jan Kiszka ha scritto:
>> On 2013-05-04 11:47, Paolo Bonzini wrote:
>>> Il 03/05/2013 10:04, Jan Kiszka ha scritto:
>>>> We can't change the semantics of opaque as long as old_mmio /
>>>> old_portio are around. But we need a flag anyway to indicate if
>>>> a region is depending on BQL or not. Adding a separate "Object
>>>> *owner" to MemoryRegion can serve both purposes. Then we define
>>>> something like
>>>>
>>>> void memory_region_set_local_locking(MemoryRegion *mr, bool
>>>> local_locking, Object *owner);
>>>>
>>>> to control the property (if local_locking is true, owner must
>>>> be non-NULL, of course). That's quite similar to my old
>>>> prototype here that had
>>>> memory_region_set/clear_global_locking.
>>>
>>> I think setting the owner can be done separately from enabling
>>> local lock.  For example, memory_region_find could also have a
>>> variant that adds a ref to the owner.  It would be very similar
>>> to what Ping Fan is doing in the virtio-dataplane's HostMem data
>>> structure.
> 
>> That's trivial to break up, but I'm not sure if there will be
>> reasonable scenarios where a region requires reference counting
>> without being able to work without the BQL. RAM, e.g., should
>> always work BQL-free (once we have the infrastructure in place).
> 
> I think we need to add an owner to all regions (tedious, but
> doable---perhaps even scriptable).  The current code covers
> address_space_rw, but memory_region_find remains callable only from
> BQL-protected regions.  The caller of memory_region_find needs to be
> able to inspect the MemoryRegion, even if it is just to fail on
> non-RAM regions.  I would like to switch the dataplane code to use
> memory_region_find.
> 
> BTW, have you seen
> http://lwn.net/SubscriberLink/548909/b6fdd846f1232be6/ ? [*]  Perhaps
> we can adopt something like that, it solves the same exact problem
> that we have, and it's a well-known solution from the literature.

That looks like a more handy wrapper around mutex_trylock + recursive locks.

But the problem is not the locking mechanism. The issue is the
(non-existing) roll-back logic in the device models. That's the tedious
work - if we would like to go that way. But I'm still optimistic we can
avoid it. We may need lock state inspection (mutex_is_locked), playing
with this ATM.

> 
> 	[*]  The "subscriber link" mechanism allows an LWN.net
> 	     subscriber to generate a special URL for a
> 	     subscription-only article. That URL can then be given to
> 	     others, who will be able to access the article regardless
> 	     of whether they are subscribed. This feature is made
> 	     available as a service to LWN subscribers, and in the hope
> 	     that they will use it to spread the word about their
> 	     favorite LWN articles.
> 
>> And memory_region_find should likely always increment a reference
>> if the target region has an owner. We should convert its users to
>> properly dereference the region once done with it.
> 
> Yes.  But this is what requires you to have an owner for all regions.

You don't need an owner for regions that are protect by the BQL (the
majority in the foreseeable future). For those regions, reference
counting can remain a nop, internally. But that's nothing their users
should care about.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06  8:40               ` Jan Kiszka
@ 2013-05-06 10:27                 ` Paolo Bonzini
  2013-05-06 10:56                   ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 10:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Liu Ping Fan

Il 06/05/2013 10:40, Jan Kiszka ha scritto:

>>
>> 	[*]  The "subscriber link" mechanism allows an LWN.net
>> 	     subscriber to generate a special URL for a
>> 	     subscription-only article. That URL can then be given to
>> 	     others, who will be able to access the article regardless
>> 	     of whether they are subscribed. This feature is made
>> 	     available as a service to LWN subscribers, and in the hope
>> 	     that they will use it to spread the word about their
>> 	     favorite LWN articles.
>>
>>> And memory_region_find should likely always increment a reference
>>> if the target region has an owner. We should convert its users to
>>> properly dereference the region once done with it.
>>
>> Yes.  But this is what requires you to have an owner for all regions.
> 
> You don't need an owner for regions that are protect by the BQL (the
> majority in the foreseeable future). For those regions, reference
> counting can remain a nop, internally.

The problem is that even if I/O for a region is supposed to happen
within the BQL, lookup can happen outside the BQL.  Lookup will use the
region even if it is just to discard it:

           VCPU thread (under BQL)              device thread
 --------------------------------------------------------------------------------------
                                                flatview_ref
                                                memory_region_find returns d->mr
                                                memory_region_ref(d->mr) /* nop */
           qdev_free(d)
             object_unparent(d)
               unrealize(d)
                 memory_region_del_subregion(d->mr)
                   FlatView updated, d->mr not in the new view

                                                flatview_unref
                                                  memory_region_unref(d->mr)
                                                    object_unref(d)
                                                      free(d)
                                                if (!d->mr->is_ram) {        /* BAD! */
                                                  memory_region_unref(d->mr) /* nop */
                                                  return error
                                                }


Here, the memory region is dereferenced *before* we know that it is BQL-free
(in fact, exactly to ascertain whether it is BQL-free).

We can hack around it by putting an is_ram field in FlatRange and
MemoryRegionSection, but it is not a solution.  Here is how giving an
owner to all regions fixes it:

           VCPU thread (under BQL)              device thread
 --------------------------------------------------------------------------------------
                                                flatview_ref
                                                memory_region_find returns d->mr
                                                memory_region_ref(d->mr)
                                                  object_ref(d)

           qdev_free(d)
             object_unparent(d)
               unrealize(d)
                 memory_region_del_subregion(d->mr)
                   FlatView updated, d->mr not in the new view

                                                flatview_unref
                                                  memory_region_unref(d->mr)
                                                    object_unref(d) /* still alive! */

                                                if (!d->mr->is_ram) {
                                                  memory_region_unref(d->mr)
                                                    object_unref(d)
                                                      free(d)
                                                  return error
                                                }

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 10:27                 ` Paolo Bonzini
@ 2013-05-06 10:56                   ` Jan Kiszka
  2013-05-06 10:58                     ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-06 10:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Liu Ping Fan

On 2013-05-06 12:27, Paolo Bonzini wrote:
> Il 06/05/2013 10:40, Jan Kiszka ha scritto:
> 
>>>
>>> 	[*]  The "subscriber link" mechanism allows an LWN.net
>>> 	     subscriber to generate a special URL for a
>>> 	     subscription-only article. That URL can then be given to
>>> 	     others, who will be able to access the article regardless
>>> 	     of whether they are subscribed. This feature is made
>>> 	     available as a service to LWN subscribers, and in the hope
>>> 	     that they will use it to spread the word about their
>>> 	     favorite LWN articles.
>>>
>>>> And memory_region_find should likely always increment a reference
>>>> if the target region has an owner. We should convert its users to
>>>> properly dereference the region once done with it.
>>>
>>> Yes.  But this is what requires you to have an owner for all regions.
>>
>> You don't need an owner for regions that are protect by the BQL (the
>> majority in the foreseeable future). For those regions, reference
>> counting can remain a nop, internally.
> 
> The problem is that even if I/O for a region is supposed to happen
> within the BQL, lookup can happen outside the BQL.  Lookup will use the
> region even if it is just to discard it:
> 
>            VCPU thread (under BQL)              device thread
>  --------------------------------------------------------------------------------------
>                                                 flatview_ref
>                                                 memory_region_find returns d->mr
>                                                 memory_region_ref(d->mr) /* nop */
>            qdev_free(d)
>              object_unparent(d)
>                unrealize(d)
>                  memory_region_del_subregion(d->mr)
>                    FlatView updated, d->mr not in the new view
> 
>                                                 flatview_unref
>                                                   memory_region_unref(d->mr)
>                                                     object_unref(d)
>                                                       free(d)
>                                                 if (!d->mr->is_ram) {        /* BAD! */
>                                                   memory_region_unref(d->mr) /* nop */
>                                                   return error
>                                                 }
> 
> 
> Here, the memory region is dereferenced *before* we know that it is BQL-free
> (in fact, exactly to ascertain whether it is BQL-free).

Both flatview update and lookup *plus* locking type evaluation (i.e.
memory region dereferencing) always happen under the address space lock.
See Pingfan's patch.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 10:56                   ` Jan Kiszka
@ 2013-05-06 10:58                     ` Paolo Bonzini
  2013-05-06 11:11                       ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 10:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Liu Ping Fan

Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>> The problem is that even if I/O for a region is supposed to happen
>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>> region even if it is just to discard it:
>>
>>            VCPU thread (under BQL)              device thread
>>  --------------------------------------------------------------------------------------
>>                                                 flatview_ref
>>                                                 memory_region_find returns d->mr
>>                                                 memory_region_ref(d->mr) /* nop */
>>            qdev_free(d)
>>              object_unparent(d)
>>                unrealize(d)
>>                  memory_region_del_subregion(d->mr)
>>                    FlatView updated, d->mr not in the new view
>>
>>                                                 flatview_unref
>>                                                   memory_region_unref(d->mr)
>>                                                     object_unref(d)
>>                                                       free(d)
>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>                                                   memory_region_unref(d->mr) /* nop */
>>                                                   return error
>>                                                 }
>>
>>
>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>> (in fact, exactly to ascertain whether it is BQL-free).
> 
> Both flatview update and lookup *plus* locking type evaluation (i.e.
> memory region dereferencing) always happen under the address space lock.
> See Pingfan's patch.

That's true of address_space_rw/map, but I don't think it holds for
memory_region_find.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 10:58                     ` Paolo Bonzini
@ 2013-05-06 11:11                       ` Jan Kiszka
  2013-05-06 11:28                         ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-06 11:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Liu Ping Fan

On 2013-05-06 12:58, Paolo Bonzini wrote:
> Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>>> The problem is that even if I/O for a region is supposed to happen
>>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>>> region even if it is just to discard it:
>>>
>>>            VCPU thread (under BQL)              device thread
>>>  --------------------------------------------------------------------------------------
>>>                                                 flatview_ref
>>>                                                 memory_region_find returns d->mr
>>>                                                 memory_region_ref(d->mr) /* nop */
>>>            qdev_free(d)
>>>              object_unparent(d)
>>>                unrealize(d)
>>>                  memory_region_del_subregion(d->mr)
>>>                    FlatView updated, d->mr not in the new view
>>>
>>>                                                 flatview_unref
>>>                                                   memory_region_unref(d->mr)
>>>                                                     object_unref(d)
>>>                                                       free(d)
>>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>>                                                   memory_region_unref(d->mr) /* nop */
>>>                                                   return error
>>>                                                 }
>>>
>>>
>>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>>> (in fact, exactly to ascertain whether it is BQL-free).
>>
>> Both flatview update and lookup *plus* locking type evaluation (i.e.
>> memory region dereferencing) always happen under the address space lock.
>> See Pingfan's patch.
> 
> That's true of address_space_rw/map, but I don't think it holds for
> memory_region_find.

It has to, or it would be broken: Either it is called on a region that
supports reference counting and, thus, increments the counter before
returning, or it has to be called with the BQL held.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock
  2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
@ 2013-05-06 11:21   ` Paolo Bonzini
  2013-05-06 11:25     ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 11:21 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, gleb, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

Il 25/11/2012 03:03, Liu Ping Fan ha scritto:
> @@ -3550,12 +3668,11 @@ void *address_space_map(AddressSpace *as,
>                          target_phys_addr_t *plen,
>                          bool is_write)
>  {
> -    AddressSpaceDispatch *d = as->dispatch;
>      target_phys_addr_t len = *plen;
>      target_phys_addr_t todo = 0;
>      int l;
>      target_phys_addr_t page;
> -    MemoryRegionSection *section;
> +    MemoryRegionSection *section, mr_obj;
>      ram_addr_t raddr = RAM_ADDR_MAX;
>      ram_addr_t rlen;
>      void *ret;
> @@ -3565,7 +3682,8 @@ void *address_space_map(AddressSpace *as,
>          l = (page + TARGET_PAGE_SIZE) - addr;
>          if (l > len)
>              l = len;
> -        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
> +        address_space_section_lookup_ref(as, page >> TARGET_PAGE_BITS, &mr_obj);
> +        section = &mr_obj;
>  
>          if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
>              if (todo || bounce.buffer) {
> @@ -3579,6 +3697,7 @@ void *address_space_map(AddressSpace *as,
>              }
>  
>              *plen = l;
> +            memory_region_section_unref(&mr_obj);
>              return bounce.buffer;
>          }
>          if (!todo) {
> @@ -3589,6 +3708,7 @@ void *address_space_map(AddressSpace *as,
>          len -= l;
>          addr += l;
>          todo += l;
> +        memory_region_section_unref(&mr_obj);
>      }
>      rlen = todo;
>      ret = qemu_ram_ptr_length(raddr, &rlen);

I think this unref is wrong.  You need to delay it to the
address_space_unmap, and this in turns requires changing the signature
of address_space_map.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock
  2013-05-06 11:21   ` Paolo Bonzini
@ 2013-05-06 11:25     ` Jan Kiszka
  2013-05-06 11:30       ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-06 11:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, gleb@redhat.com, Stefan Hajnoczi, Marcelo Tosatti,
	Liu Ping Fan, qemu-devel@nongnu.org, Anthony Liguori

On 2013-05-06 13:21, Paolo Bonzini wrote:
> Il 25/11/2012 03:03, Liu Ping Fan ha scritto:
>> @@ -3550,12 +3668,11 @@ void *address_space_map(AddressSpace *as,
>>                          target_phys_addr_t *plen,
>>                          bool is_write)
>>  {
>> -    AddressSpaceDispatch *d = as->dispatch;
>>      target_phys_addr_t len = *plen;
>>      target_phys_addr_t todo = 0;
>>      int l;
>>      target_phys_addr_t page;
>> -    MemoryRegionSection *section;
>> +    MemoryRegionSection *section, mr_obj;
>>      ram_addr_t raddr = RAM_ADDR_MAX;
>>      ram_addr_t rlen;
>>      void *ret;
>> @@ -3565,7 +3682,8 @@ void *address_space_map(AddressSpace *as,
>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>          if (l > len)
>>              l = len;
>> -        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
>> +        address_space_section_lookup_ref(as, page >> TARGET_PAGE_BITS, &mr_obj);
>> +        section = &mr_obj;
>>  
>>          if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
>>              if (todo || bounce.buffer) {
>> @@ -3579,6 +3697,7 @@ void *address_space_map(AddressSpace *as,
>>              }
>>  
>>              *plen = l;
>> +            memory_region_section_unref(&mr_obj);
>>              return bounce.buffer;
>>          }
>>          if (!todo) {
>> @@ -3589,6 +3708,7 @@ void *address_space_map(AddressSpace *as,
>>          len -= l;
>>          addr += l;
>>          todo += l;
>> +        memory_region_section_unref(&mr_obj);
>>      }
>>      rlen = todo;
>>      ret = qemu_ram_ptr_length(raddr, &rlen);
> 
> I think this unref is wrong.  You need to delay it to the
> address_space_unmap, and this in turns requires changing the signature
> of address_space_map.

Can't RAMBlock hold a reference to the associated region? Then this
could be retrieved on unmap without bothering the caller.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 11:11                       ` Jan Kiszka
@ 2013-05-06 11:28                         ` Paolo Bonzini
  2013-05-06 11:39                           ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 11:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Liu Ping Fan

Il 06/05/2013 13:11, Jan Kiszka ha scritto:
> On 2013-05-06 12:58, Paolo Bonzini wrote:
>> Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>>>> The problem is that even if I/O for a region is supposed to happen
>>>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>>>> region even if it is just to discard it:
>>>>
>>>>            VCPU thread (under BQL)              device thread
>>>>  --------------------------------------------------------------------------------------
>>>>                                                 flatview_ref
>>>>                                                 memory_region_find returns d->mr
>>>>                                                 memory_region_ref(d->mr) /* nop */
>>>>            qdev_free(d)
>>>>              object_unparent(d)
>>>>                unrealize(d)
>>>>                  memory_region_del_subregion(d->mr)
>>>>                    FlatView updated, d->mr not in the new view
>>>>
>>>>                                                 flatview_unref
>>>>                                                   memory_region_unref(d->mr)
>>>>                                                     object_unref(d)
>>>>                                                       free(d)
>>>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>>>                                                   memory_region_unref(d->mr) /* nop */
>>>>                                                   return error
>>>>                                                 }
>>>>
>>>>
>>>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>>>> (in fact, exactly to ascertain whether it is BQL-free).
>>>
>>> Both flatview update and lookup *plus* locking type evaluation (i.e.
>>> memory region dereferencing) always happen under the address space lock.
>>> See Pingfan's patch.
>>
>> That's true of address_space_rw/map, but I don't think it holds for
>> memory_region_find.
> 
> It has to, or it would be broken: Either it is called on a region that
> supports reference counting

You cannot know that in advance, can you?  The address is decided by the
guest.

> and, thus, increments the counter before
> returning, or it has to be called with the BQL held.

... or we need to support reference counting on all regions, so that the
other possibility is automatically true.

Strictly speaking, only regions that can be unplugged need to support
reference counting.


Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock
  2013-05-06 11:25     ` Jan Kiszka
@ 2013-05-06 11:30       ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 11:30 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, gleb@redhat.com, Stefan Hajnoczi, Marcelo Tosatti,
	Liu Ping Fan, qemu-devel@nongnu.org, Anthony Liguori

Il 06/05/2013 13:25, Jan Kiszka ha scritto:
> On 2013-05-06 13:21, Paolo Bonzini wrote:
>> Il 25/11/2012 03:03, Liu Ping Fan ha scritto:
>>> @@ -3550,12 +3668,11 @@ void *address_space_map(AddressSpace *as,
>>>                          target_phys_addr_t *plen,
>>>                          bool is_write)
>>>  {
>>> -    AddressSpaceDispatch *d = as->dispatch;
>>>      target_phys_addr_t len = *plen;
>>>      target_phys_addr_t todo = 0;
>>>      int l;
>>>      target_phys_addr_t page;
>>> -    MemoryRegionSection *section;
>>> +    MemoryRegionSection *section, mr_obj;
>>>      ram_addr_t raddr = RAM_ADDR_MAX;
>>>      ram_addr_t rlen;
>>>      void *ret;
>>> @@ -3565,7 +3682,8 @@ void *address_space_map(AddressSpace *as,
>>>          l = (page + TARGET_PAGE_SIZE) - addr;
>>>          if (l > len)
>>>              l = len;
>>> -        section = phys_page_find(d, page >> TARGET_PAGE_BITS);
>>> +        address_space_section_lookup_ref(as, page >> TARGET_PAGE_BITS, &mr_obj);
>>> +        section = &mr_obj;
>>>  
>>>          if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
>>>              if (todo || bounce.buffer) {
>>> @@ -3579,6 +3697,7 @@ void *address_space_map(AddressSpace *as,
>>>              }
>>>  
>>>              *plen = l;
>>> +            memory_region_section_unref(&mr_obj);
>>>              return bounce.buffer;
>>>          }
>>>          if (!todo) {
>>> @@ -3589,6 +3708,7 @@ void *address_space_map(AddressSpace *as,
>>>          len -= l;
>>>          addr += l;
>>>          todo += l;
>>> +        memory_region_section_unref(&mr_obj);
>>>      }
>>>      rlen = todo;
>>>      ret = qemu_ram_ptr_length(raddr, &rlen);
>>
>> I think this unref is wrong.  You need to delay it to the
>> address_space_unmap, and this in turns requires changing the signature
>> of address_space_map.
> 
> Can't RAMBlock hold a reference to the associated region? Then this
> could be retrieved on unmap without bothering the caller.

Right you are. :)  In fact, RAMBlock already does have block->mr.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 11:28                         ` Paolo Bonzini
@ 2013-05-06 11:39                           ` Jan Kiszka
  2013-05-06 11:47                             ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-06 11:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Liu Ping Fan

On 2013-05-06 13:28, Paolo Bonzini wrote:
> Il 06/05/2013 13:11, Jan Kiszka ha scritto:
>> On 2013-05-06 12:58, Paolo Bonzini wrote:
>>> Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>>>>> The problem is that even if I/O for a region is supposed to happen
>>>>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>>>>> region even if it is just to discard it:
>>>>>
>>>>>            VCPU thread (under BQL)              device thread
>>>>>  --------------------------------------------------------------------------------------
>>>>>                                                 flatview_ref
>>>>>                                                 memory_region_find returns d->mr
>>>>>                                                 memory_region_ref(d->mr) /* nop */
>>>>>            qdev_free(d)
>>>>>              object_unparent(d)
>>>>>                unrealize(d)
>>>>>                  memory_region_del_subregion(d->mr)
>>>>>                    FlatView updated, d->mr not in the new view
>>>>>
>>>>>                                                 flatview_unref
>>>>>                                                   memory_region_unref(d->mr)
>>>>>                                                     object_unref(d)
>>>>>                                                       free(d)
>>>>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>>>>                                                   memory_region_unref(d->mr) /* nop */
>>>>>                                                   return error
>>>>>                                                 }
>>>>>
>>>>>
>>>>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>>>>> (in fact, exactly to ascertain whether it is BQL-free).
>>>>
>>>> Both flatview update and lookup *plus* locking type evaluation (i.e.
>>>> memory region dereferencing) always happen under the address space lock.
>>>> See Pingfan's patch.
>>>
>>> That's true of address_space_rw/map, but I don't think it holds for
>>> memory_region_find.
>>
>> It has to, or it would be broken: Either it is called on a region that
>> supports reference counting
> 
> You cannot know that in advance, can you?  The address is decided by the
> guest.

Need to help me again to get the context: In which case is this a
hot-path that we want to keep BQL-free? Current users of
memory_region_find appear to be all relatively slow paths, thus are fine
with staying under BQL.

> 
>> and, thus, increments the counter before
>> returning, or it has to be called with the BQL held.
> 
> ... or we need to support reference counting on all regions, so that the
> other possibility is automatically true.
> 
> Strictly speaking, only regions that can be unplugged need to support
> reference counting.

That should make the conversion, if actually required, more bearable.
Having to assign an owner to every region around as a precondition to
introduce a new concept with initially less than a handful of users
would be too much, I suppose.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 11:39                           ` Jan Kiszka
@ 2013-05-06 11:47                             ` Paolo Bonzini
  2013-05-06 12:06                               ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 11:47 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Liu Ping Fan

Il 06/05/2013 13:39, Jan Kiszka ha scritto:
> On 2013-05-06 13:28, Paolo Bonzini wrote:
>> Il 06/05/2013 13:11, Jan Kiszka ha scritto:
>>> On 2013-05-06 12:58, Paolo Bonzini wrote:
>>>> Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>>>>>> The problem is that even if I/O for a region is supposed to happen
>>>>>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>>>>>> region even if it is just to discard it:
>>>>>>
>>>>>>            VCPU thread (under BQL)              device thread
>>>>>>  --------------------------------------------------------------------------------------
>>>>>>                                                 flatview_ref
>>>>>>                                                 memory_region_find returns d->mr
>>>>>>                                                 memory_region_ref(d->mr) /* nop */
>>>>>>            qdev_free(d)
>>>>>>              object_unparent(d)
>>>>>>                unrealize(d)
>>>>>>                  memory_region_del_subregion(d->mr)
>>>>>>                    FlatView updated, d->mr not in the new view
>>>>>>
>>>>>>                                                 flatview_unref
>>>>>>                                                   memory_region_unref(d->mr)
>>>>>>                                                     object_unref(d)
>>>>>>                                                       free(d)
>>>>>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>>>>>                                                   memory_region_unref(d->mr) /* nop */
>>>>>>                                                   return error
>>>>>>                                                 }
>>>>>>
>>>>>>
>>>>>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>>>>>> (in fact, exactly to ascertain whether it is BQL-free).
>>>>>
>>>>> Both flatview update and lookup *plus* locking type evaluation (i.e.
>>>>> memory region dereferencing) always happen under the address space lock.
>>>>> See Pingfan's patch.
>>>>
>>>> That's true of address_space_rw/map, but I don't think it holds for
>>>> memory_region_find.
>>>
>>> It has to, or it would be broken: Either it is called on a region that
>>> supports reference counting
>>
>> You cannot know that in advance, can you?  The address is decided by the
>> guest.
> 
> Need to help me again to get the context: In which case is this a
> hot-path that we want to keep BQL-free? Current users of
> memory_region_find appear to be all relatively slow paths, thus are fine
> with staying under BQL.

virtio-blk-dataplane is basically redoing memory_region_find with a
separate data structure, exactly so that it can run outside the BQL
before we get BQL-free MMIO dispatch.

I can try to post patches later today that actually use
memory_region_find instead.

>>> and, thus, increments the counter before
>>> returning, or it has to be called with the BQL held.
>>
>> ... or we need to support reference counting on all regions, so that the
>> other possibility is automatically true.
>>
>> Strictly speaking, only regions that can be unplugged need to support
>> reference counting.
> 
> That should make the conversion, if actually required, more bearable.
> Having to assign an owner to every region around as a precondition to
> introduce a new concept with initially less than a handful of users
> would be too much, I suppose.

I agree.  Though I have some hope that it could be scripted, this
exception would make the conversion possible for non-qdevified devices.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 11:47                             ` Paolo Bonzini
@ 2013-05-06 12:06                               ` Jan Kiszka
  2013-05-06 13:09                                 ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-06 12:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Liu Ping Fan

On 2013-05-06 13:47, Paolo Bonzini wrote:
> Il 06/05/2013 13:39, Jan Kiszka ha scritto:
>> On 2013-05-06 13:28, Paolo Bonzini wrote:
>>> Il 06/05/2013 13:11, Jan Kiszka ha scritto:
>>>> On 2013-05-06 12:58, Paolo Bonzini wrote:
>>>>> Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>>>>>>> The problem is that even if I/O for a region is supposed to happen
>>>>>>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>>>>>>> region even if it is just to discard it:
>>>>>>>
>>>>>>>            VCPU thread (under BQL)              device thread
>>>>>>>  --------------------------------------------------------------------------------------
>>>>>>>                                                 flatview_ref
>>>>>>>                                                 memory_region_find returns d->mr
>>>>>>>                                                 memory_region_ref(d->mr) /* nop */
>>>>>>>            qdev_free(d)
>>>>>>>              object_unparent(d)
>>>>>>>                unrealize(d)
>>>>>>>                  memory_region_del_subregion(d->mr)
>>>>>>>                    FlatView updated, d->mr not in the new view
>>>>>>>
>>>>>>>                                                 flatview_unref
>>>>>>>                                                   memory_region_unref(d->mr)
>>>>>>>                                                     object_unref(d)
>>>>>>>                                                       free(d)
>>>>>>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>>>>>>                                                   memory_region_unref(d->mr) /* nop */
>>>>>>>                                                   return error
>>>>>>>                                                 }
>>>>>>>
>>>>>>>
>>>>>>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>>>>>>> (in fact, exactly to ascertain whether it is BQL-free).
>>>>>>
>>>>>> Both flatview update and lookup *plus* locking type evaluation (i.e.
>>>>>> memory region dereferencing) always happen under the address space lock.
>>>>>> See Pingfan's patch.
>>>>>
>>>>> That's true of address_space_rw/map, but I don't think it holds for
>>>>> memory_region_find.
>>>>
>>>> It has to, or it would be broken: Either it is called on a region that
>>>> supports reference counting
>>>
>>> You cannot know that in advance, can you?  The address is decided by the
>>> guest.
>>
>> Need to help me again to get the context: In which case is this a
>> hot-path that we want to keep BQL-free? Current users of
>> memory_region_find appear to be all relatively slow paths, thus are fine
>> with staying under BQL.
> 
> virtio-blk-dataplane is basically redoing memory_region_find with a
> separate data structure, exactly so that it can run outside the BQL
> before we get BQL-free MMIO dispatch.
> 
> I can try to post patches later today that actually use
> memory_region_find instead.

We could define its semantics as follows: return a reference to the
corresponding memory region, provide this is safe. A reference is safe when
 - the region supports BQL-free operation (thus provides an owner to
   apply reference counting on)
 - the caller holds the BQL (check via qemu_mutex_iothread_is_locked()
   - to be implemented)

The latter implies that the BQL is not dropped before returning the
reference, but that's nothing memory_region_find can enforce.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 12:06                               ` Jan Kiszka
@ 2013-05-06 13:09                                 ` Paolo Bonzini
  2013-05-06 14:05                                   ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 13:09 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Liu Ping Fan

Il 06/05/2013 14:06, Jan Kiszka ha scritto:
> On 2013-05-06 13:47, Paolo Bonzini wrote:
>> Il 06/05/2013 13:39, Jan Kiszka ha scritto:
>>> On 2013-05-06 13:28, Paolo Bonzini wrote:
>>>> Il 06/05/2013 13:11, Jan Kiszka ha scritto:
>>>>> On 2013-05-06 12:58, Paolo Bonzini wrote:
>>>>>> Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>>>>>>>> The problem is that even if I/O for a region is supposed to happen
>>>>>>>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>>>>>>>> region even if it is just to discard it:
>>>>>>>>
>>>>>>>>            VCPU thread (under BQL)              device thread
>>>>>>>>  --------------------------------------------------------------------------------------
>>>>>>>>                                                 flatview_ref
>>>>>>>>                                                 memory_region_find returns d->mr
>>>>>>>>                                                 memory_region_ref(d->mr) /* nop */
>>>>>>>>            qdev_free(d)
>>>>>>>>              object_unparent(d)
>>>>>>>>                unrealize(d)
>>>>>>>>                  memory_region_del_subregion(d->mr)
>>>>>>>>                    FlatView updated, d->mr not in the new view
>>>>>>>>
>>>>>>>>                                                 flatview_unref
>>>>>>>>                                                   memory_region_unref(d->mr)
>>>>>>>>                                                     object_unref(d)
>>>>>>>>                                                       free(d)
>>>>>>>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>>>>>>>                                                   memory_region_unref(d->mr) /* nop */
>>>>>>>>                                                   return error
>>>>>>>>                                                 }
>>>>>>>>
>>>>>>>>
>>>>>>>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>>>>>>>> (in fact, exactly to ascertain whether it is BQL-free).
>>>>>>>
>>>>>>> Both flatview update and lookup *plus* locking type evaluation (i.e.
>>>>>>> memory region dereferencing) always happen under the address space lock.
>>>>>>> See Pingfan's patch.
>>>>>>
>>>>>> That's true of address_space_rw/map, but I don't think it holds for
>>>>>> memory_region_find.
>>>>>
>>>>> It has to, or it would be broken: Either it is called on a region that
>>>>> supports reference counting
>>>>
>>>> You cannot know that in advance, can you?  The address is decided by the
>>>> guest.
>>>
>>> Need to help me again to get the context: In which case is this a
>>> hot-path that we want to keep BQL-free? Current users of
>>> memory_region_find appear to be all relatively slow paths, thus are fine
>>> with staying under BQL.
>>
>> virtio-blk-dataplane is basically redoing memory_region_find with a
>> separate data structure, exactly so that it can run outside the BQL
>> before we get BQL-free MMIO dispatch.
>>
>> I can try to post patches later today that actually use
>> memory_region_find instead.
> 
> We could define its semantics as follows: return a reference to the
> corresponding memory region, provide this is safe. A reference is safe when
>  - the region supports BQL-free operation (thus provides an owner to
>    apply reference counting on)

This doesn't really work.  Regions that are known not to disappear (most
importantly, the main RAM region) also support BQL-free operation, but
have no owner right now.

Also, memory_region_find cannot know if it's returning a valid result,
and the callee cannot check it because the region may have disappeared
already when it is returned.

But I really would be surprised if adding an owner everywhere is so
hard...  let's try that first, it would solve the problem.

>  - the caller holds the BQL (check via qemu_mutex_iothread_is_locked()
>    - to be implemented)
> 
> The latter implies that the BQL is not dropped before returning the
> reference, but that's nothing memory_region_find can enforce.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 13:09                                 ` Paolo Bonzini
@ 2013-05-06 14:05                                   ` Jan Kiszka
  2013-05-06 14:28                                     ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2013-05-06 14:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Liu Ping Fan

On 2013-05-06 15:09, Paolo Bonzini wrote:
> Il 06/05/2013 14:06, Jan Kiszka ha scritto:
>> On 2013-05-06 13:47, Paolo Bonzini wrote:
>>> Il 06/05/2013 13:39, Jan Kiszka ha scritto:
>>>> On 2013-05-06 13:28, Paolo Bonzini wrote:
>>>>> Il 06/05/2013 13:11, Jan Kiszka ha scritto:
>>>>>> On 2013-05-06 12:58, Paolo Bonzini wrote:
>>>>>>> Il 06/05/2013 12:56, Jan Kiszka ha scritto:
>>>>>>>>> The problem is that even if I/O for a region is supposed to happen
>>>>>>>>> within the BQL, lookup can happen outside the BQL.  Lookup will use the
>>>>>>>>> region even if it is just to discard it:
>>>>>>>>>
>>>>>>>>>            VCPU thread (under BQL)              device thread
>>>>>>>>>  --------------------------------------------------------------------------------------
>>>>>>>>>                                                 flatview_ref
>>>>>>>>>                                                 memory_region_find returns d->mr
>>>>>>>>>                                                 memory_region_ref(d->mr) /* nop */
>>>>>>>>>            qdev_free(d)
>>>>>>>>>              object_unparent(d)
>>>>>>>>>                unrealize(d)
>>>>>>>>>                  memory_region_del_subregion(d->mr)
>>>>>>>>>                    FlatView updated, d->mr not in the new view
>>>>>>>>>
>>>>>>>>>                                                 flatview_unref
>>>>>>>>>                                                   memory_region_unref(d->mr)
>>>>>>>>>                                                     object_unref(d)
>>>>>>>>>                                                       free(d)
>>>>>>>>>                                                 if (!d->mr->is_ram) {        /* BAD! */
>>>>>>>>>                                                   memory_region_unref(d->mr) /* nop */
>>>>>>>>>                                                   return error
>>>>>>>>>                                                 }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here, the memory region is dereferenced *before* we know that it is BQL-free
>>>>>>>>> (in fact, exactly to ascertain whether it is BQL-free).
>>>>>>>>
>>>>>>>> Both flatview update and lookup *plus* locking type evaluation (i.e.
>>>>>>>> memory region dereferencing) always happen under the address space lock.
>>>>>>>> See Pingfan's patch.
>>>>>>>
>>>>>>> That's true of address_space_rw/map, but I don't think it holds for
>>>>>>> memory_region_find.
>>>>>>
>>>>>> It has to, or it would be broken: Either it is called on a region that
>>>>>> supports reference counting
>>>>>
>>>>> You cannot know that in advance, can you?  The address is decided by the
>>>>> guest.
>>>>
>>>> Need to help me again to get the context: In which case is this a
>>>> hot-path that we want to keep BQL-free? Current users of
>>>> memory_region_find appear to be all relatively slow paths, thus are fine
>>>> with staying under BQL.
>>>
>>> virtio-blk-dataplane is basically redoing memory_region_find with a
>>> separate data structure, exactly so that it can run outside the BQL
>>> before we get BQL-free MMIO dispatch.
>>>
>>> I can try to post patches later today that actually use
>>> memory_region_find instead.
>>
>> We could define its semantics as follows: return a reference to the
>> corresponding memory region, provide this is safe. A reference is safe when
>>  - the region supports BQL-free operation (thus provides an owner to
>>    apply reference counting on)
> 
> This doesn't really work.  Regions that are known not to disappear (most
> importantly, the main RAM region) also support BQL-free operation, but
> have no owner right now.

Those few are much easier to convert than a full set of PCI and other
hot-pluggable device, that's my point.

> 
> Also, memory_region_find cannot know if it's returning a valid result,
> and the callee cannot check it because the region may have disappeared
> already when it is returned.

Again, we hold the address space lock while checking the conditions. If
a region does not supports BQL-free mode and BQL is not held, we have an
error and return NULL (or bail out with a runtime error).

> 
> But I really would be surprised if adding an owner everywhere is so
> hard...  let's try that first, it would solve the problem.

If we can avoid it, that would only help the process. If we can't, ok.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock
  2013-05-06 14:05                                   ` Jan Kiszka
@ 2013-05-06 14:28                                     ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-05-06 14:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: qemu-devel, Liu Ping Fan

Il 06/05/2013 16:05, Jan Kiszka ha scritto:
>> Also, memory_region_find cannot know if it's returning a valid result,
>> and the callee cannot check it because the region may have disappeared
>> already when it is returned.
> 
> Again, we hold the address space lock while checking the conditions. If
> a region does not supports BQL-free mode and BQL is not held, we have an
> error and return NULL (or bail out with a runtime error).

I've now posted my patches (which are really complementary to Ping
Fan's), and there's no address space lock.  (here is a lock, but the
critical section is literally a handful of instructions and everything
is done with reference counting.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-05-06 14:28 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-25  2:02 [Qemu-devel] [PATCH v7 0/7] push mmio dispatch out of big lock Liu Ping Fan
2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 1/7] qom: apply atomic on object's refcount Liu Ping Fan
2012-11-28 17:16   ` Richard Henderson
2012-11-29  8:35     ` liu ping fan
2012-11-25  2:02 ` [Qemu-devel] [PATCH v7 2/7] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 3/7] pci: remove pci device from mem view when unplug Liu Ping Fan
2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 4/7] memory: introduce local lock for address space Liu Ping Fan
2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 5/7] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
2013-05-06 11:21   ` Paolo Bonzini
2013-05-06 11:25     ` Jan Kiszka
2013-05-06 11:30       ` Paolo Bonzini
2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 6/7] memory: introduce tls context to trace nested mmio request issue Liu Ping Fan
2012-11-25  2:03 ` [Qemu-devel] [PATCH v7 7/7] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
2012-12-06  7:28 ` [Qemu-devel] [PATCH v7 0/7] push mmio dispatch " liu ping fan
2013-05-02 16:58   ` Jan Kiszka
2013-05-02 17:14     ` Jan Kiszka
2013-05-03  7:37     ` liu ping fan
2013-05-03  8:04       ` Jan Kiszka
2013-05-04  9:47         ` Paolo Bonzini
2013-05-04 10:42           ` Jan Kiszka
2013-05-06  8:07             ` Paolo Bonzini
2013-05-06  8:40               ` Jan Kiszka
2013-05-06 10:27                 ` Paolo Bonzini
2013-05-06 10:56                   ` Jan Kiszka
2013-05-06 10:58                     ` Paolo Bonzini
2013-05-06 11:11                       ` Jan Kiszka
2013-05-06 11:28                         ` Paolo Bonzini
2013-05-06 11:39                           ` Jan Kiszka
2013-05-06 11:47                             ` Paolo Bonzini
2013-05-06 12:06                               ` Jan Kiszka
2013-05-06 13:09                                 ` Paolo Bonzini
2013-05-06 14:05                                   ` Jan Kiszka
2013-05-06 14:28                                     ` Paolo Bonzini
2013-05-06  1:46           ` liu ping fan
2013-05-06  1:57         ` liu ping fan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).