* [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem
2012-06-05 10:49 [Qemu-devel] [RFC PATCH 0/2] ivring: Add IVRing driver Yoshihiro YUNOMAE
@ 2012-06-05 10:50 ` Yoshihiro YUNOMAE
2012-06-05 10:50 ` [Qemu-devel] [RFC PATCH 2/2] ivring: Add a ring-buffer reader tool Yoshihiro YUNOMAE
2012-06-05 13:01 ` [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem Yoshihiro YUNOMAE
2 siblings, 0 replies; 9+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-06-05 10:50 UTC (permalink / raw)
To: linux-kernel, Cam Macdonell
Cc: Ohad Ben-Cohen, Grant Likely, Joerg Roedel, Linus Walleij,
Rusty Russell, Borislav Petkov, qemu-devel,
Arnaldo Carvalho de Melo, MyungJoo Ham, systemtap,
Greg Kroah-Hartman, Masami Hiramatsu, yrl.pp-manager.tt,
Akihiro Nagai, Yoshihiro YUNOMAE
This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
a guest OS by recording kernel programing or SystemTap.
This ring-buffer driver is implemented very simple. First 4kB of shared memory
region is control structure of a ring-buffer. In this region, some values for
managing the ring-buffer is stored such as bits and mask of whole memory size,
writing position, threshold value for notification to a reader on a host OS.
This region is used by the reader to know writing position. Then, "total
memory size - 4kB" equals to usable memory region for recording data.
This ring-buffer driver records any data from start to end of the writable
memory region.
When writing size exceeds a threshold value, this driver can notify a reader
to read data by using writel(). As this later patch, reader does not have any
function for receiving the notification. This notification feature will be used
near the future.
As a writer records data in this ring-buffer, spinlock function is used to
avoid competing by some writers in multi CPU environment. Not to use spinlock,
lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
implemented near the future.
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Akihiro Nagai <akhiro.nagai.hw@hitachi.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: linux-kernel@vger.kernel.org
Cc: Cam Macdonell <cam@cs.ualberta.ca>
Cc: qemu-devel@nongnu.org
Cc: systemtap@sourceware.org
---
drivers/Kconfig | 1
drivers/Makefile | 1
drivers/ivshmem/Kconfig | 9 +
drivers/ivshmem/Makefile | 5
drivers/ivshmem/ivring.c | 551 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/ivshmem/ivring.h | 77 ++++++
6 files changed, 644 insertions(+), 0 deletions(-)
create mode 100644 drivers/ivshmem/Kconfig
create mode 100644 drivers/ivshmem/Makefile
create mode 100644 drivers/ivshmem/ivring.c
create mode 100644 drivers/ivshmem/ivring.h
diff --git a/drivers/Kconfig b/drivers/Kconfig
index bfc9186..e01adcd 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -148,4 +148,5 @@ source "drivers/iio/Kconfig"
source "drivers/vme/Kconfig"
+source "drivers/ivshmem/Kconfig"
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 2ba29ff..1ebdd03 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -23,6 +23,7 @@ obj-y += amba/
# really early.
obj-$(CONFIG_DMA_ENGINE) += dma/
+obj-$(CONFIG_IVRING_MANAGER) += ivshmem/
obj-$(CONFIG_VIRTIO) += virtio/
obj-$(CONFIG_XEN) += xen/
diff --git a/drivers/ivshmem/Kconfig b/drivers/ivshmem/Kconfig
new file mode 100644
index 0000000..e84364a
--- /dev/null
+++ b/drivers/ivshmem/Kconfig
@@ -0,0 +1,9 @@
+#
+# IVShmem support drivers
+#
+
+config IVRING_MANAGER
+ tristate "IVRing management driver"
+ help
+ It allows IVShmem, a virtual PCI RAM device in QEMU, to use as a
+ ring-buffer for tracing of a guest.
diff --git a/drivers/ivshmem/Makefile b/drivers/ivshmem/Makefile
new file mode 100644
index 0000000..e725f8c
--- /dev/null
+++ b/drivers/ivshmem/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for IVShmem drivers
+#
+
+obj-$(CONFIG_IVRING_MANAGER) += ivring.o
diff --git a/drivers/ivshmem/ivring.c b/drivers/ivshmem/ivring.c
new file mode 100644
index 0000000..5cbcfb6
--- /dev/null
+++ b/drivers/ivshmem/ivring.c
@@ -0,0 +1,551 @@
+/*
+ * Ring buffer on IVShmem Driver
+ *
+ * (C) 2012 Hitachi, Ltd.
+ * Written by Hitachi Yokohama Research Laboratory.
+ *
+ * Created by Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ * Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
+ * Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ * based on UIOIVShmem Driver, http://www.gitorious.org/nahanni/guest-code,
+ * (C) 2009 Cam Macdonell <cam@cs.ualberta.ca>
+ * based on Hilscher CIF card driver (C) 2007 Hans J. Koch <hjk@linutronix.de>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#include <linux/bitops.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include "./ivring.h"
+
+
+#define IVSHM_OFFS_INTRMASK 0
+#define IVSHM_OFFS_INTRSTATUS 4
+#define IVSHM_OFFS_IVPOSITION 8
+#define IVSHM_OFFS_DOORBELL 12
+
+#define MSIX_NAMEBUF_SIZE 128
+#define DEFAULT_NR_VECTORS 4
+
+#define IVRING_DEVNAME "ivring"
+
+struct ivring_mem {
+ unsigned long addr;
+ unsigned long size;
+ void __iomem *ioaddr;
+};
+
+struct ivring_info {
+ struct pci_dev *dev;
+ int irq;
+ struct ivring_mem mem[2]; /* 0:control, 1:shmem */
+ struct msix_entry *msix_entries;
+ char (*msix_names)[MSIX_NAMEBUF_SIZE];
+ int nvectors;
+ int posn;
+ struct ivring_hdr *hdr;
+};
+
+#define MAX_IVRING_CHN 16
+
+static struct ivring_info *ivring_channels[MAX_IVRING_CHN];
+static spinlock_t ivring_locks[MAX_IVRING_CHN];
+
+static void ivring_init_locks(void)
+{
+ int i;
+
+ for (i = 0; i < MAX_IVRING_CHN; i++)
+ spin_lock_init(&ivring_locks[i]);
+}
+
+#define ivring_lock(id, flags) \
+ spin_lock_irqsave(&ivring_locks[id], flags)
+
+#define ivring_unlock(id, flags) \
+ spin_unlock_irqrestore(&ivring_locks[id], flags)
+
+/* Device I/O helper: Don't check mem[0].ioaddr is ready */
+static int ivring_read_position(struct ivring_info *info)
+{
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_IVPOSITION;
+ u32 val = readl(addr);
+
+ /* return as a singed value */
+ return (int)val;
+}
+
+/* Note: this operation is destructive. Intr status is cleared after reading */
+static u32 ivring_read_intr(struct ivring_info *info)
+{
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_INTRSTATUS;
+ return readl(addr);
+}
+
+static void ivring_write_intrmask(struct ivring_info *info, u32 mask)
+{
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_INTRMASK;
+ writel(mask, addr);
+}
+
+static void ivring_write_doorbell(struct ivring_info *info, int posn, int vec)
+{
+ u32 door = ((posn & 0xffff) << 16) | (vec & 0x00ff);
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_DOORBELL;
+ writel(door, addr);
+}
+
+static unsigned long ivring_shmsize(struct ivring_info *info)
+{
+ return info->mem[1].size;
+}
+
+static int ivring_hdr_init(struct ivring_hdr *hdr, u32 shmsize)
+{
+ if (strncmp(hdr->magic, IVRING_MAGIC, 4) == 0) {
+ printk(KERN_INFO "Ring header is already initialized\n");
+ printk(KERN_INFO "reader %d, writer %d, pos %llx\n",
+ (int)hdr->reader, (int)hdr->writer, hdr->pos);
+ if (hdr->version != IVRING_VERSION) {
+ printk(KERN_ERR "Ring version is different! (%d)\n",
+ (int)hdr->version);
+ return -EINVAL;
+ }
+ return 0;
+ }
+ memset(hdr, 0, IVRING_OFFSET);
+ memcpy(hdr->magic, IVRING_MAGIC, 4);
+ hdr->version = IVRING_VERSION;
+ hdr->reader = -1;
+ hdr->writer = -1;
+ hdr->total_bits = __fls(shmsize);
+ hdr->total_mask = ~(~0 << hdr->total_bits);
+ hdr->threshold = IVRING_INIT_THRESHOLD;
+ hdr->pos = IVRING_STARTPOS;
+ return 1;
+}
+
+static void ivring_notify_reader(struct ivring_info *info)
+{
+ if (info->hdr->reader != -1) {
+ pr_debug("Notify update to reader %d\n", info->hdr->reader);
+ ivring_write_doorbell(info, info->hdr->reader, IVRING_VECTOR);
+ }
+}
+
+static int ivring_init_hdr(struct ivring_info *info)
+{
+ if (!info->mem[1].ioaddr) {
+ printk(KERN_ERR "IVRing: IVShmem is not mapped.\n");
+ return -1;
+ }
+
+ info->hdr = info->mem[1].ioaddr;
+ ivring_hdr_init(info->hdr, ivring_shmsize(info));
+
+ info->hdr->writer = info->posn;
+ ivring_notify_reader(info);
+ return 0;
+}
+
+static void ivring_cleanup_hdr(struct ivring_info *info)
+{
+ if (!info->hdr || info->hdr->writer != info->posn)
+ return;
+
+ info->hdr->writer = -1;
+ /* Don't clear pos */
+ ivring_notify_reader(info);
+}
+
+/**
+ * ivring_ready - get an IVRing channel
+ * @id: ID of a ring-buffer
+ *
+ * Get whether a ring-buffer specified by ID is useable or not.
+ *
+ */
+bool ivring_ready(int id)
+{
+ if (id < 0 || id >= MAX_IVRING_CHN || ivring_channels[id] == NULL)
+ return false;
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(ivring_ready);
+
+/**
+ * ivring_write - record data to IVRing
+ * @id: ID of a ring-buffer
+ * @buf: data buffer
+ * @size: data size(byte)
+ *
+ * Record data from address indicating a position of buffer to a ring-buffer
+ * specified by ID.
+ *
+ * Spinlock function is used and only one buffer is available for some CPUs.
+ * Then, DO NOT record data in SMP environment in this version.
+ *
+ */
+int ivring_write(int id, void *buf, size_t size)
+{
+ struct ivring_info *info;
+ struct ivring_hdr *hdr;
+ unsigned long flags;
+ u32 pos, tbits, room;
+ int ret = 0;
+
+ if (!ivring_ready(id))
+ return -ENOENT;
+
+ ivring_lock(id, flags);
+ info = ivring_channels[id];
+ if (unlikely(info == NULL)) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ hdr = info->hdr;
+ tbits = hdr->total_bits;
+ if (unlikely(size >> tbits)) {
+ /* write-size exceeds bufer-size */
+ ret = -E2BIG;
+ goto out;
+ }
+
+ pos = (u32)hdr->pos & hdr->total_mask;
+
+ if (unlikely((pos + size) >> tbits)) {
+ room = (1 << tbits) - pos;
+ memcpy(ivring_pos_addr(hdr, pos), buf, room);
+ memcpy(ivring_pos_addr(hdr, IVRING_OFFSET), buf + room,
+ size - room);
+ hdr->pos += size + IVRING_OFFSET;
+ } else {
+ memcpy(ivring_pos_addr(hdr, pos), buf, size);
+ hdr->pos += size;
+ }
+
+ /*
+ * Notify reader if counter is over the threshold
+ * This feature will be used for IVRing reader.
+ */
+ if (hdr->threshold < hdr->pos) {
+ hdr->threshold = IVRING_INIT_THRESHOLD;
+ ivring_notify_reader(info);
+ }
+ ret = (int)size;
+out:
+ ivring_unlock(id, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ivring_write);
+
+/* Channel management functions */
+static int ivring_register_channel(struct ivring_info *info)
+{
+ int i;
+ unsigned long flags;
+
+ for (i = 0; i < MAX_IVRING_CHN; i++) {
+ ivring_lock(i, flags);
+ if (ivring_channels[i] == NULL) {
+ ivring_channels[i] = info;
+ info->posn = ivring_read_position(info);
+ ivring_init_hdr(info);
+ printk(KERN_INFO "Add ivring id %d, pos %d\n",
+ i, info->posn);
+ ivring_unlock(i, flags);
+ return i;
+ }
+ ivring_unlock(i, flags);
+ }
+
+ return -1;
+}
+
+static void ivring_unregister_channel(struct ivring_info *info)
+{
+ int i;
+ unsigned long flags;
+
+ for (i = 0; i < MAX_IVRING_CHN; i++) {
+ ivring_lock(i, flags);
+ if (ivring_channels[i] == info) {
+ printk(KERN_INFO "Remove ivring id %d, pos %d\n",
+ i, info->posn);
+ ivring_channels[i] = NULL;
+ ivring_cleanup_hdr(info);
+ ivring_unlock(i, flags);
+ break;
+ }
+ ivring_unlock(i, flags);
+ }
+}
+
+/* IVRing interrupt handlers */
+static void ivring_event_handler(int irq, struct ivring_info *info)
+{
+ /* TODO: update noticed - take a reaction? */
+
+ pr_debug("IVRing: Get an interrupt %d:%d.\n", info->posn, irq);
+}
+
+static irqreturn_t ivring_irq_handler(int irq, void *opaque)
+{
+ struct ivring_info *info = opaque;
+
+ if (ivring_read_intr(info) == 0)
+ return IRQ_NONE;
+
+ ivring_event_handler(irq, info);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t ivring_msix_handler(int irq, void *opaque)
+{
+ struct ivring_info *info = opaque;
+
+ ivring_event_handler(irq, info);
+
+ return IRQ_HANDLED;
+}
+
+static void free_msix_vectors(struct ivring_info *info)
+{
+ int i;
+
+ if (info->irq != -1 || info->nvectors == 0)
+ /* No need to free it */
+ return;
+
+ for (i = 0; i < info->nvectors; i++)
+ free_irq(info->msix_entries[i].vector, info);
+ pci_disable_msix(info->dev);
+ info->nvectors = 0;
+
+ kfree(info->msix_entries);
+ info->msix_entries = NULL;
+ kfree(info->msix_names);
+ info->msix_names = NULL;
+}
+
+/* Setup MSI-X interrupt vectors */
+static int request_msix_vectors(struct ivring_info *info, int nvectors)
+{
+ int i, err;
+
+ info->msix_entries = kzalloc(nvectors * sizeof(*info->msix_entries),
+ GFP_KERNEL);
+ if (info->msix_entries == NULL)
+ return -ENOMEM;
+
+ info->msix_names = kzalloc(nvectors * sizeof(*info->msix_names),
+ GFP_KERNEL);
+ if (info->msix_names == NULL) {
+ kfree(info->msix_entries);
+ info->msix_entries = NULL;
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < nvectors; ++i)
+ info->msix_entries[i].entry = i;
+
+ err = pci_enable_msix(info->dev, info->msix_entries, nvectors);
+ if (err > 0) {
+ nvectors = err; /* msi-x positive error code
+ returns the number available*/
+ err = pci_enable_msix(info->dev, info->msix_entries, nvectors);
+ if (err) {
+ printk(KERN_INFO "no MSI (%d). Back to INTx.\n", err);
+ goto error;
+ }
+ }
+
+ if (err)
+ goto error;
+
+ info->nvectors = nvectors;
+
+ for (i = 0; i < info->nvectors; i++) {
+
+ snprintf(info->msix_names[i], MSIX_NAMEBUF_SIZE,
+ "%s-config", IVRING_DEVNAME);
+
+ err = request_irq(info->msix_entries[i].vector,
+ ivring_msix_handler, 0,
+ info->msix_names[i], info);
+
+ if (err)
+ goto error_free_irq;
+ }
+
+ return 0;
+
+error_free_irq:
+ while (i--)
+ free_irq(info->msix_entries[i].vector, info);
+
+ pci_disable_msix(info->dev);
+ info->nvectors = 0;
+error:
+ kfree(info->msix_entries);
+ info->msix_entries = NULL;
+ kfree(info->msix_names);
+ info->msix_names = NULL;
+ return err;
+
+}
+
+static int __devinit ivring_pci_probe(struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ struct ivring_info *info;
+ int ret;
+
+ info = kzalloc(sizeof(struct ivring_info), GFP_KERNEL);
+ if (!info)
+ return -ENOMEM;
+
+ if (pci_enable_device(dev)) {
+ printk(KERN_ERR "IVRing: Failed to probe device.\n");
+ goto out_free;
+ }
+
+ if (pci_request_regions(dev, IVRING_DEVNAME)) {
+ printk(KERN_ERR "IVRing: Failed (disable).\n");
+ goto out_disable;
+ }
+
+ /* Init control memory region */
+ info->mem[0].addr = pci_resource_start(dev, 0);
+ if (!info->mem[0].addr) {
+ printk(KERN_ERR "IVRing: Failed (release).\n");
+ goto out_release;
+ }
+
+ info->mem[0].size = pci_resource_len(dev, 0);
+ info->mem[0].ioaddr = pci_ioremap_bar(dev, 0);
+ if (!info->mem[0].ioaddr) {
+ printk(KERN_ERR "IVRing: Failed (release).\n");
+ goto out_release;
+ }
+
+ /* Init shmem region */
+ info->mem[1].addr = pci_resource_start(dev, 2);
+ if (!info->mem[1].addr) {
+ printk(KERN_INFO "failed to get addr\n");
+ printk(KERN_ERR "IVRing: Failed (unmap).\n");
+ goto out_unmap;
+ }
+
+ info->mem[1].size = pci_resource_len(dev, 2);
+ info->mem[1].ioaddr = ioremap_cache(info->mem[1].addr,
+ info->mem[1].size);
+ if (!info->mem[1].ioaddr) {
+ printk(KERN_INFO "failed to map addr\n");
+ printk(KERN_ERR "IVRing: Failed (unmap).\n");
+ goto out_unmap;
+ }
+
+ info->dev = dev;
+
+ /* Init interrupt vectors */
+ if (request_msix_vectors(info, DEFAULT_NR_VECTORS) != 0) {
+ info->irq = dev->irq;
+ ret = request_irq(info->irq, ivring_irq_handler, IRQF_SHARED,
+ IVRING_DEVNAME, info);
+ if (ret < 0) {
+ printk(KERN_ERR "IVRing: Failed (unmap2).\n");
+ goto out_unmap2;
+ }
+
+ printk(KERN_INFO "Regular IRQs enabled\n");
+ ivring_write_intrmask(info, 0xffffffff);
+ } else {
+ printk(KERN_INFO "MSI-X enabled\n");
+ info->irq = -1;
+ ivring_write_intrmask(info, 0xffffffff);
+ }
+
+ pci_set_drvdata(dev, info);
+
+ ivring_register_channel(info);
+
+ return 0;
+
+out_unmap2:
+ iounmap(info->mem[1].ioaddr);
+out_unmap:
+ iounmap(info->mem[0].ioaddr);
+out_release:
+ pci_release_regions(dev);
+out_disable:
+ pci_disable_device(dev);
+out_free:
+ kfree(info);
+ return -ENODEV;
+}
+
+static void ivring_pci_remove(struct pci_dev *dev)
+{
+ struct ivring_info *info = pci_get_drvdata(dev);
+
+ ivring_unregister_channel(info);
+
+ if (info->irq != -1)
+ free_irq(info->irq, info);
+ else
+ free_msix_vectors(info);
+
+ pci_release_regions(dev);
+ pci_disable_device(dev);
+ pci_set_drvdata(dev, NULL);
+ iounmap(info->mem[1].ioaddr);
+ iounmap(info->mem[0].ioaddr);
+
+ kfree(info);
+}
+
+static struct pci_device_id ivring_pci_ids[] __devinitdata = {
+ {
+ .vendor = 0x1af4,
+ .device = 0x1110,
+ .subvendor = PCI_ANY_ID,
+ .subdevice = PCI_ANY_ID,
+ },
+ { 0, }
+};
+
+static struct pci_driver ivring_pci_driver = {
+ .name = "ivring",
+ .id_table = ivring_pci_ids,
+ .probe = ivring_pci_probe,
+ .remove = ivring_pci_remove,
+};
+
+static int __init ivring_init_module(void)
+{
+ ivring_init_locks();
+ return pci_register_driver(&ivring_pci_driver);
+}
+
+static void __exit ivring_exit_module(void)
+{
+ pci_unregister_driver(&ivring_pci_driver);
+}
+
+module_init(ivring_init_module);
+module_exit(ivring_exit_module);
+
+MODULE_DEVICE_TABLE(pci, ivring_pci_ids);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Hitachi, Ltd.");
diff --git a/drivers/ivshmem/ivring.h b/drivers/ivshmem/ivring.h
new file mode 100644
index 0000000..2fe9b46
--- /dev/null
+++ b/drivers/ivshmem/ivring.h
@@ -0,0 +1,77 @@
+#ifndef __IVRING_H__
+#define __IVRING_H__
+
+/* ivshmem ring buffer header */
+#ifdef __KERNEL__
+#include <linux/device.h>
+#include <linux/module.h>
+#else
+#include <stdbool.h>
+#include <bits/types.h>
+typedef int32_t s32;
+typedef uint32_t u32;
+typedef uint64_t u64;
+#endif
+
+/* control structure of ivshmem ring buffer */
+struct ivring_hdr {
+ char magic[4]; /* Magic ID */
+ u32 version; /* IVRing version number */
+ s32 reader; /* reader ID */
+ s32 writer; /* writer ID */
+ u32 total_mask; /* bit mask of whole memory size */
+ u32 total_bits; /* bits of whole memory size */
+ u64 pos; /* writing position */
+ u64 threshold; /* threshold value for notification */
+};
+
+#define IVRING_MAGIC "RING"
+#define IVRING_VERSION 1
+#define IVRING_OFFSET 4096 /* This page is for the header */
+#define IVRING_VECTOR 0 /* Doorbell Number */
+#define IVRING_STARTPOS IVRING_OFFSET
+#define IVRING_INIT_THRESHOLD (~0ULL)
+#define IVRING_READ_MARGIN 4096
+
+static inline void *ivring_start_addr(struct ivring_hdr *hdr)
+{
+ return (char *)hdr + IVRING_OFFSET;
+}
+
+static inline void *ivring_end_addr(struct ivring_hdr *hdr)
+{
+ return (char *)hdr + (1 << hdr->total_bits);
+}
+
+static inline void *ivring_pos_addr(struct ivring_hdr *hdr, u32 pos)
+{
+ return (char *)hdr + pos;
+}
+
+static inline void *ivring_pos64_addr(struct ivring_hdr *hdr, u64 pos)
+{
+ u32 pos32;
+ pos32 = (u32)pos & hdr->total_mask;
+ return ivring_pos_addr(hdr, pos32);
+}
+
+static inline bool ivring_verify_pos(struct ivring_hdr *hdr, u32 pos)
+{
+ if (pos < IVRING_OFFSET ||
+ pos >= (1 << hdr->total_bits))
+ return false;
+ return true;
+}
+
+#ifdef __KERNEL__
+/* Kernel ringbuffer(writer) APIs */
+
+/* Get an IVRing channel */
+extern bool ivring_ready(int id);
+
+/* Record data to IVRing */
+extern int ivring_write(int id, void *buf, size_t size);
+
+#endif
+
+#endif /* __IVRING_H__ */
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [RFC PATCH 2/2] ivring: Add a ring-buffer reader tool
2012-06-05 10:49 [Qemu-devel] [RFC PATCH 0/2] ivring: Add IVRing driver Yoshihiro YUNOMAE
2012-06-05 10:50 ` [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem Yoshihiro YUNOMAE
@ 2012-06-05 10:50 ` Yoshihiro YUNOMAE
2012-06-05 13:01 ` [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem Yoshihiro YUNOMAE
2 siblings, 0 replies; 9+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-06-05 10:50 UTC (permalink / raw)
To: linux-kernel, Cam Macdonell
Cc: Ohad Ben-Cohen, Grant Likely, Joerg Roedel, Linus Walleij,
Rusty Russell, Borislav Petkov, qemu-devel,
Arnaldo Carvalho de Melo, MyungJoo Ham, systemtap,
Greg Kroah-Hartman, Masami Hiramatsu, yrl.pp-manager.tt,
Akihiro Nagai, Yoshihiro YUNOMAE
This patch adds a reader tool for IVRing. This tool is used on a host OS and
reads data written by a guest. This reader reads data from a ring-buffer via
POSIX share memory, so the data will be read without memory copying between
a guest and a host. To read data written by a guest, s option assigning same
shared memory object of IVShmem is needed.
Some options are available as follows:
-f: output log file
-h: show usage
-m: shared memory size in MB
-s: shared memory object path
-N: number of log files
-S: log file size in MB
Example:
./ivring_reader -m 2 -f /tmp/log.txt -S 10 -N 2 -s /ivshmem
In this case, two log files are output as /tmp/log.txt.0 and /tmp/log.txt.1
whose sizes are 10MB.
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: Cam Macdonell <cam@cs.ualberta.ca>
Cc: qemu-devel@nongnu.org
Cc: systemtap@sourceware.org
---
tools/Makefile | 1
tools/ivshmem/Makefile | 19 ++
tools/ivshmem/ivring_reader.c | 516 +++++++++++++++++++++++++++++++++++++++++
tools/ivshmem/ivring_reader.h | 15 +
tools/ivshmem/pr_msg.c | 125 ++++++++++
tools/ivshmem/pr_msg.h | 19 ++
6 files changed, 695 insertions(+), 0 deletions(-)
create mode 100644 tools/ivshmem/Makefile
create mode 100644 tools/ivshmem/ivring_reader.c
create mode 100644 tools/ivshmem/ivring_reader.h
create mode 100644 tools/ivshmem/pr_msg.c
create mode 100644 tools/ivshmem/pr_msg.h
diff --git a/tools/Makefile b/tools/Makefile
index 3ae4394..3edf16a 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -5,6 +5,7 @@ help:
@echo ''
@echo ' cpupower - a tool for all things x86 CPU power'
@echo ' firewire - the userspace part of nosy, an IEEE-1394 traffic sniffer'
+ @echo ' ivshmem - the userspace tool for ivshmem device'
@echo ' lguest - a minimal 32-bit x86 hypervisor'
@echo ' perf - Linux performance measurement and analysis tool'
@echo ' selftests - various kernel selftests'
diff --git a/tools/ivshmem/Makefile b/tools/ivshmem/Makefile
new file mode 100644
index 0000000..287508e
--- /dev/null
+++ b/tools/ivshmem/Makefile
@@ -0,0 +1,19 @@
+CC = gcc
+CFLAGS = -O1 -Wall -Werror -g
+LIBS = -lrt
+
+# makefile to build ivshmem tools
+
+all: ivring_reader
+
+.c.o:
+ $(CC) $(CFLAGS) -c $^ -o $@
+
+ivring_reader: ivring_reader.o pr_msg.o
+ $(CC) $(CFLAGS) -o $@ $^ $(LIBS)
+
+install: ivring_reader
+ install ivring_reader /usr/local/bin/
+
+clean:
+ rm -f *.o ivring_reader
diff --git a/tools/ivshmem/ivring_reader.c b/tools/ivshmem/ivring_reader.c
new file mode 100644
index 0000000..d61e9c9
--- /dev/null
+++ b/tools/ivshmem/ivring_reader.c
@@ -0,0 +1,516 @@
+/*
+ * A trace reader for inter-VM shared memory
+ *
+ * (C) 2012 Hitachi, Ltd.
+ * Written by Hitachi Yokohama Research Laboratory.
+ *
+ * Created by Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ * Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
+ * Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ * based on IVShmem Server, http://www.gitorious.org/nahanni/guest-code,
+ * (C) 2009 Cam Macdonell <cam@cs.ualberta.ca>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#include <errno.h>
+#include <string.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/eventfd.h>
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include "../../drivers/ivshmem/ivring.h"
+#include "pr_msg.h"
+#include "ivring_reader.h"
+
+/* default pathes */
+#define DEFAULT_SHM_SIZE (1024*1024)
+#define BUFFER_SIZE 4096
+
+static int global_term;
+static int global_outfd;
+static char *global_log_basename;
+static ssize_t global_log_rotate_size;
+static int global_log_rotate_num;
+#define log_rotate_mode() (global_log_rotate_size && global_log_rotate_num)
+
+/* Handle SIGTERM/SIGINT/SIGQUIT to exit */
+void term_handler(int sig)
+{
+ global_term = sig;
+ pr_info("Receive an interrupt %d\n", sig);
+}
+
+/* Utilities */
+static void *zalloc(size_t size)
+{
+ void *ret = malloc(size);
+ if (ret)
+ memset(ret, 0, size);
+ else
+ pr_perror("malloc");
+ return ret;
+}
+
+static u32 __fls32(u32 word)
+{
+ int num = 31;
+ if (!(word & (~0ul << 16))) {
+ num -= 16;
+ word <<= 16;
+ }
+ if (!(word & (~0ul << (32-8)))) {
+ num -= 8;
+ word <<= 8;
+ }
+ if (!(word & (~0ul << (32-4)))) {
+ num -= 4;
+ word <<= 4;
+ }
+ if (!(word & (~0ul << (32-2)))) {
+ num -= 2;
+ word <<= 2;
+ }
+ if (!(word & (~0ul << (32-1))))
+ num -= 1;
+ return num;
+}
+
+/* IVRing Header functions */
+int ivring_hdr_init(struct ivring_hdr *hdr, u32 shmsize)
+{
+ if (strncmp(hdr->magic, IVRING_MAGIC, 4) == 0) {
+ pr_debug("Ring header is already initialized\n");
+ pr_debug("reader %d, writer %d, pos %llx\n",
+ (int)hdr->reader, (int)hdr->writer, hdr->pos);
+ if (hdr->version != IVRING_VERSION) {
+ pr_debug("Ring version is different! (%d)\n",
+ (int)hdr->version);
+ return -EINVAL;
+ }
+ return 0;
+ }
+ memset(hdr, 0, IVRING_OFFSET);
+ memcpy(hdr->magic, IVRING_MAGIC, 4);
+ hdr->version = IVRING_VERSION;
+ hdr->reader = -1;
+ hdr->writer = -1;
+ hdr->total_bits = __fls32(shmsize);
+ hdr->total_mask = ~(~0 << hdr->total_bits);
+ hdr->threshold = IVRING_INIT_THRESHOLD;
+ hdr->pos = IVRING_STARTPOS;
+ return 1;
+}
+
+void ivring_hdr_free(struct ivring_hdr *hdr, size_t size)
+{
+ munmap(hdr, size);
+}
+
+struct ivring_hdr *ivring_hdr_new(int shmfd, size_t size)
+{
+ struct ivring_hdr *hdr;
+
+ hdr = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, shmfd, 0);
+ if (!hdr) {
+ pr_perror("mmap");
+ return NULL;
+ }
+
+ if (ivring_hdr_init(hdr, (u32)size) < 0) {
+ munmap(hdr, size);
+ return NULL;
+ }
+
+ return hdr;
+}
+
+static inline u64 fixup_pos64(u64 pos, u32 total_mask)
+{
+ if (((u32)pos & total_mask) < IVRING_OFFSET)
+ pos += IVRING_OFFSET;
+ return pos;
+}
+
+struct ivring_read_ops {
+ ssize_t (*read)(void *data, void *saddr, ssize_t size);
+ ssize_t (*cancel)(void *data, ssize_t size);
+};
+
+/* Ringbuffer Reader */
+ssize_t __ivring_read(struct ivring_user *ivr, struct ivring_read_ops *ops,
+ void *data, ssize_t max_size)
+{
+ struct ivring_hdr *hdr = ivr->hdr;
+ void *saddr, *eaddr, *end, *start;
+ u64 rpos;
+ ssize_t read_size;
+
+ if (!hdr)
+ return -EINVAL;
+
+ if (hdr->pos == ivr->rpos) { /* No data is ready */
+ pr_debug("no data\n");
+ return 0;
+ }
+
+ start = ivring_start_addr(hdr);
+ end = ivring_end_addr(hdr);
+
+ rpos = ivr->rpos;
+ if ((hdr->pos - rpos) >> hdr->total_bits) {
+ /* Writer cought up */
+ rpos = hdr->pos - (1 << hdr->total_bits) + IVRING_READ_MARGIN;
+ rpos = fixup_pos64(rpos, hdr->total_mask);
+ pr_debug("Event drop detected! -- fixup\n");
+ ivr->rpos = rpos;
+ }
+ saddr = ivring_pos64_addr(hdr, rpos);
+
+ rpos = fixup_pos64(rpos + max_size, hdr->total_mask);
+ if (rpos > hdr->pos)
+ rpos = hdr->pos;
+ eaddr = ivring_pos64_addr(hdr, rpos);
+
+ if (saddr < eaddr)
+ read_size = ops->read(data, saddr, eaddr - saddr);
+ else {
+ ssize_t tmp;
+ read_size = ops->read(data, saddr, end - saddr);
+ if (read_size < 0)
+ return read_size;
+ tmp = ops->read(data, start, eaddr - start);
+ if (tmp < 0)
+ return tmp;
+ read_size += tmp;
+ }
+
+ if ((hdr->pos - ivr->rpos) >> hdr->total_bits) {
+ /* Cought up again */
+ pr_debug("Overwritten detected!\n");
+ return ops->cancel(data, read_size);
+ }
+
+ ivr->rpos = rpos;
+ return read_size;
+}
+
+/* Read from ring to memory */
+static ssize_t read_memcpy(void *data, void *saddr, ssize_t size)
+{
+ memcpy(data, saddr, size);
+ return size;
+}
+
+static ssize_t cancel_memcpy(void *data, ssize_t size)
+{
+ return -EAGAIN;
+}
+
+ssize_t ivring_memcpy(struct ivring_user *ivr, void *buf, ssize_t bufsize)
+{
+ ssize_t ret;
+ static struct ivring_read_ops ops = {
+ .read = read_memcpy,
+ .cancel = cancel_memcpy};
+
+ do {
+ ret = __ivring_read(ivr, &ops, buf, bufsize);
+ } while (ret == -EAGAIN);
+ return ret;
+}
+
+/* Read from ring to file */
+static ssize_t read_write_fd(void *data, void *saddr, ssize_t size)
+{
+ int fd = (int)(long)data;
+ return write(fd, saddr, size);
+}
+
+static ssize_t cancel_write_fd(void *data, ssize_t size)
+{
+ int fd = (int)(long)data;
+ lseek(fd, (off_t)-size, SEEK_CUR);
+ return -EAGAIN;
+}
+
+ssize_t ivring_read_fd(struct ivring_user *ivr, int fd, ssize_t blocksize)
+{
+ ssize_t ret;
+ static struct ivring_read_ops ops = {
+ .read = read_write_fd,
+ .cancel = cancel_write_fd};
+
+ do {
+ ret = __ivring_read(ivr, &ops, (void *)(long)fd, blocksize);
+ pr_debug("__ivring_read ret=%d\n", ret);
+ } while (ret == -EAGAIN);
+ return ret;
+}
+
+int ivring_init_rpos(struct ivring_user *ivr)
+{
+ if (ivr->hdr->pos > ivr->shm_size)
+ ivr->rpos = ivr->hdr->pos - ivr->shm_size + IVRING_READ_MARGIN;
+ else
+ ivr->rpos = IVRING_STARTPOS;
+
+ return 0;
+}
+
+int ivring_init_hdr(struct ivring_user *ivr)
+{
+ struct stat st;
+ int ret;
+
+ if (fstat(ivr->shm_fd, &st) < 0) {
+ ret = -errno;
+ pr_perror("fstat");
+ return ret;
+ }
+
+ if (ivr->shm_size != st.st_size) {
+ pr_debug("Given shmem size isn't correct\n");
+ ivr->shm_size = st.st_size;
+ }
+
+ ivr->hdr = ivring_hdr_new(ivr->shm_fd, ivr->shm_size);
+ if (!ivr->hdr)
+ return -EINVAL;
+
+ return ivring_init_rpos(ivr);
+}
+
+void ivring_init(struct ivring_user *ivr)
+{
+ ivr->rpos = IVRING_STARTPOS;
+ ivr->hdr = NULL;
+ ivr->shm_size = 0;
+ ivr->shm_fd = -1;
+ ivr->shm_obj = NULL;
+}
+
+void ivring_cleanup(struct ivring_user *ivr)
+{
+ /* Unmap Buffer */
+ if (ivr->hdr) {
+ ivring_hdr_free(ivr->hdr, ivr->shm_size);
+ ivr->hdr = NULL;
+ }
+
+ if (ivr->shm_fd != -1) {
+ close(ivr->shm_fd);
+ ivr->shm_fd = -1;
+ }
+}
+
+int open_outfd(const char *out_path)
+{
+ int fd;
+
+ fd = open(out_path, O_CREAT | O_TRUNC | O_RDWR,
+ S_IRUSR | S_IWUSR);
+ if (fd < 0)
+ pr_perror("open(out_fd)");
+
+ return fd;
+}
+
+static int rotate_log(void)
+{
+ static int current_log_no;
+ char *new_outpath;
+ int length;
+
+ if (global_outfd > 0)
+ close(global_outfd);
+
+ if (log_rotate_mode()) {
+ /* prepare filename "log_basename.XXXX" */
+ length = strlen(global_log_basename) + 10;
+ new_outpath = (char *)malloc(sizeof(char) * length);
+ if (!new_outpath) {
+ pr_perror("malloc()");
+ exit(EXIT_FAILURE);
+ }
+ snprintf(new_outpath, length, "%s.%d", global_log_basename,
+ current_log_no++ % global_log_rotate_num);
+ } else
+ new_outpath = strdup(global_log_basename);
+
+ global_outfd = open_outfd(new_outpath);
+ if (global_outfd < 0)
+ exit(EXIT_FAILURE);
+
+ free(new_outpath);
+
+ return global_outfd;
+}
+
+static int ivring_read(struct ivring_user *ivr)
+{
+ char buf[BUFFER_SIZE];
+ ssize_t size;
+ static ssize_t total_size;
+
+ pr_debug("Try to read buffer.\n");
+
+ do {
+ if (global_outfd >= 0)
+ size = ivring_read_fd(ivr, global_outfd, BUFFER_SIZE);
+ else
+ size = ivring_memcpy(ivr, buf, BUFFER_SIZE);
+ if (size < 0) {
+ pr_err("Ring buffer read Error %d\n", (int)size);
+ return (int)size;
+ } else
+ total_size += size;
+
+ printf("%s", buf);
+ } while (size > 0);
+
+ if (log_rotate_mode() && total_size > global_log_rotate_size) {
+ global_outfd = rotate_log();
+ total_size = 0;
+ }
+
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ struct ivring_user *ivr;
+
+ set_pr_mode(PR_MODE_STDIO, 1, "ivtrace_reader");
+
+ ivr = zalloc(sizeof(struct ivring_user));
+ if (!ivr)
+ return -ENOMEM;
+ ivring_init(ivr);
+
+ if (parse_args(argc, argv, ivr) < 0)
+ exit(-1);
+
+ ivr->shm_fd = shm_open(ivr->shm_obj, O_RDWR, S_IRWXU|S_IRWXG|S_IRWXO);
+ if (ivr->shm_fd < 0) {
+ pr_err("ivtrace_reader: could not open shared file\n");
+ exit(-1);
+ }
+
+ if (ivr->hdr == NULL && ivr->shm_fd != -1) {
+ if (ivring_init_hdr(ivr) < 0) {
+ pr_debug("hdr init error %d\n");
+ exit(-1);
+ }
+ pr_debug("ivring_init_hdr: %p\n", ivr->hdr);
+ }
+
+ /* Setup signal handlers */
+ signal(SIGTERM, term_handler);
+ signal(SIGINT, term_handler);
+ signal(SIGQUIT, term_handler);
+
+ /* Main Loop */
+ while (!global_term) {
+ int ret;
+
+ sleep(1);
+
+ ret = ivring_read(ivr);
+
+ if (ret < 0) {
+ pr_debug("Exit with an error %d\n", ret);
+ goto out;
+ }
+ }
+out:
+ ivring_cleanup(ivr);
+ free(ivr);
+
+ if (global_outfd >= 0)
+ close(global_outfd);
+ if (global_log_basename)
+ free(global_log_basename);
+
+ return 0;
+}
+
+size_t parse_size(const char *arg)
+{
+ uint64_t value;
+ char *ptr;
+
+ value = strtoul(arg, &ptr, 10);
+ switch (*ptr) {
+ case 0: case 'M': case 'm':
+ value <<= 20;
+ break;
+ case 'G': case 'g':
+ value <<= 30;
+ break;
+ default:
+ pr_err("invalid ram size: %s\n", arg);
+ exit(1);
+ }
+ return (size_t)value;
+}
+
+int parse_args(int argc, char **argv, struct ivring_user *ivr)
+{
+ int c;
+
+ ivr->shm_size = DEFAULT_SHM_SIZE;
+
+ while ((c = getopt(argc, argv, "h:m:f:S:N:s:")) != -1) {
+ switch (c) {
+ /* size of shared memory object */
+ case 'm':
+ ivr->shm_size = parse_size(optarg);
+ break;
+ /* output file */
+ case 'f':
+ if (global_log_basename)
+ free(global_log_basename);
+ global_log_basename = strdup(optarg);
+ break;
+ /* log rotation */
+ case 'S':
+ global_log_rotate_size = atoi(optarg) * 1024 * 1024;
+ break;
+ /* number of log files */
+ case 'N':
+ global_log_rotate_num = atoi(optarg);
+ break;
+ /* name of shared memory object */
+ case 's':
+ ivr->shm_obj = optarg;
+ break;
+ case 'h':
+ default:
+ usage(argv[0]);
+ exit(1);
+ }
+ }
+
+ printf("shared object size: %ld (bytes)\n", (long)ivr->shm_size);
+
+ if (ivr->shm_size == 0 || ivr->shm_obj == NULL)
+ return -1;
+
+ if (global_log_basename)
+ global_outfd = rotate_log();
+
+ return 0;
+}
+
+void usage(char const *prg)
+{
+ fprintf(stderr, "use: %s [-h] [-m <size in MB>] [-f <output>]"\
+ "[-S <log size in MB> [-N <log num>] [-s <shm object>]\n", prg);
+}
diff --git a/tools/ivshmem/ivring_reader.h b/tools/ivshmem/ivring_reader.h
new file mode 100644
index 0000000..10fbf10
--- /dev/null
+++ b/tools/ivshmem/ivring_reader.h
@@ -0,0 +1,15 @@
+#ifndef __IVRING_READER__
+#define __IVRING_READER__
+
+struct ivring_user {
+ size_t shm_size; /* shmem size */
+ struct ivring_hdr *hdr; /* Header */
+ u64 rpos; /* Read position */
+ int shm_fd; /* Shared memory fd */
+ char *shm_obj; /* Shared memory object */
+};
+
+extern void usage(char const *prg);
+extern int parse_args(int argc, char **argv, struct ivring_user *ivr);
+
+#endif
diff --git a/tools/ivshmem/pr_msg.c b/tools/ivshmem/pr_msg.c
new file mode 100644
index 0000000..16347e8
--- /dev/null
+++ b/tools/ivshmem/pr_msg.c
@@ -0,0 +1,125 @@
+#define _GNU_SOURCE
+#include <syslog.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "pr_msg.h"
+
+static void pr_stdout(const char *fmt, ...);
+static void pr_stderr(const char *fmt, ...);
+static void pr_syslog(const char *fmt, ...);
+static void pr_syslog_err(const char *fmt, ...);
+static void pr_file(const char *fmt, ...);
+static void pr_file_err(const char *fmt, ...);
+static void pr_void(const char *fmt, ...);
+
+void (*pr_info)(const char *fmt, ...) = pr_stdout;
+void (*pr_err)(const char *fmt, ...) = pr_stderr;
+void (*pr_debug)(const char *fmt, ...) = pr_void;
+
+static int log_fd;
+char *program;
+
+void set_pr_mode(int mode, int debug, const char *prog)
+{
+ if (program)
+ free(program);
+ program = strdup(prog);
+
+ if (mode == PR_MODE_STDIO) {
+ log_fd = -1;
+ pr_info = pr_stdout;
+ pr_err = pr_stderr;
+ } else if (mode == PR_MODE_SYSLOG) {
+ log_fd = -1;
+ openlog(program, 0, 0);
+ pr_info = pr_syslog;
+ pr_err = pr_syslog_err;
+ } else {
+ log_fd = mode;
+ pr_info = pr_file;
+ pr_err = pr_file_err;
+ }
+ if (debug)
+ pr_debug = pr_info;
+ else
+ pr_debug = pr_void;
+}
+
+#define format_varg(bufp, fmt) \
+ do {va_list ap; va_start(ap, fmt); vasprintf(bufp, fmt, ap); \
+ va_end(ap); } while (0)
+
+static void pr_stdout(const char *fmt, ...)
+{
+ char *buf;
+
+ format_varg(&buf, fmt);
+
+ fprintf(stdout, "%s", buf);
+
+ free(buf);
+}
+
+static void pr_stderr(const char *fmt, ...)
+{
+ char *buf;
+
+ format_varg(&buf, fmt);
+
+ fprintf(stderr, "Error: %s", buf);
+
+ free(buf);
+}
+
+static void pr_syslog(const char *fmt, ...)
+{
+ char *buf;
+
+ format_varg(&buf, fmt);
+
+ syslog(LOG_INFO, "%s", buf);
+
+ free(buf);
+}
+
+static void pr_syslog_err(const char *fmt, ...)
+{
+ char *buf;
+
+ format_varg(&buf, fmt);
+
+ syslog(LOG_ERR, "Error: %s", buf);
+
+ free(buf);
+}
+
+static void pr_file(const char *fmt, ...)
+{
+ char *buf;
+
+ format_varg(&buf, fmt);
+
+ write(log_fd, buf, strlen(buf));
+
+ free(buf);
+}
+
+static void pr_file_err(const char *fmt, ...)
+{
+ char *buf;
+
+ format_varg(&buf, fmt);
+
+ write(log_fd, "Error: ", 7);
+ write(log_fd, buf, strlen(buf));
+
+ free(buf);
+}
+
+static void pr_void(const char *fmt, ...)
+{
+ /* Do nothing */
+}
diff --git a/tools/ivshmem/pr_msg.h b/tools/ivshmem/pr_msg.h
new file mode 100644
index 0000000..c9a6acf
--- /dev/null
+++ b/tools/ivshmem/pr_msg.h
@@ -0,0 +1,19 @@
+#ifndef __PR_MSG__
+#define __PR_MSG__
+
+#include <errno.h>
+#include <string.h>
+
+#define PR_MODE_STDIO 0
+#define PR_MODE_SYSLOG 1
+#define PR_MODE_FD(fd) (fd)
+
+extern void set_pr_mode(int mode, int debug, const char *prog);
+
+extern void (*pr_info)(const char *fmt, ...);
+extern void (*pr_err)(const char *fmt, ...);
+extern void (*pr_debug)(const char *fmt, ...);
+
+#define pr_perror(msg) pr_err("%s: %s\n", msg, strerror(errno))
+
+#endif
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem
2012-06-05 10:49 [Qemu-devel] [RFC PATCH 0/2] ivring: Add IVRing driver Yoshihiro YUNOMAE
2012-06-05 10:50 ` [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem Yoshihiro YUNOMAE
2012-06-05 10:50 ` [Qemu-devel] [RFC PATCH 2/2] ivring: Add a ring-buffer reader tool Yoshihiro YUNOMAE
@ 2012-06-05 13:01 ` Yoshihiro YUNOMAE
2012-06-05 13:10 ` Borislav Petkov
2 siblings, 1 reply; 9+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-06-05 13:01 UTC (permalink / raw)
To: linux-kernel, Cam Macdonell
Cc: Ohad Ben-Cohen, Grant Likely, Joerg Roedel, Linus Walleij,
Rusty Russell, Borislav Petkov, qemu-devel,
Arnaldo Carvalho de Melo, MyungJoo Ham, systemtap,
Greg Kroah-Hartman, Masami Hiramatsu, yrl.pp-manager.tt,
Akihiro Nagai, Yoshihiro YUNOMAE
This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
a guest OS by recording kernel programing or SystemTap.
This ring-buffer driver is implemented very simple. First 4kB of shared memory
region is control structure of a ring-buffer. In this region, some values for
managing the ring-buffer is stored such as bits and mask of whole memory size,
writing position, threshold value for notification to a reader on a host OS.
This region is used by the reader to know writing position. Then, "total
memory size - 4kB" equals to usable memory region for recording data.
This ring-buffer driver records any data from start to end of the writable
memory region.
When writing size exceeds a threshold value, this driver can notify a reader
to read data by using writel(). As this later patch, reader does not have any
function for receiving the notification. This notification feature will be used
near the future.
As a writer records data in this ring-buffer, spinlock function is used to
avoid competing by some writers in multi CPU environment. Not to use spinlock,
lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
implemented near the future.
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: linux-kernel@vger.kernel.org
Cc: Cam Macdonell <cam@cs.ualberta.ca>
Cc: qemu-devel@nongnu.org
Cc: systemtap@sourceware.org
---
drivers/Kconfig | 1
drivers/Makefile | 1
drivers/ivshmem/Kconfig | 9 +
drivers/ivshmem/Makefile | 5
drivers/ivshmem/ivring.c | 551 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/ivshmem/ivring.h | 77 ++++++
6 files changed, 644 insertions(+), 0 deletions(-)
create mode 100644 drivers/ivshmem/Kconfig
create mode 100644 drivers/ivshmem/Makefile
create mode 100644 drivers/ivshmem/ivring.c
create mode 100644 drivers/ivshmem/ivring.h
diff --git a/drivers/Kconfig b/drivers/Kconfig
index bfc9186..e01adcd 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -148,4 +148,5 @@ source "drivers/iio/Kconfig"
source "drivers/vme/Kconfig"
+source "drivers/ivshmem/Kconfig"
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 2ba29ff..1ebdd03 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -23,6 +23,7 @@ obj-y += amba/
# really early.
obj-$(CONFIG_DMA_ENGINE) += dma/
+obj-$(CONFIG_IVRING_MANAGER) += ivshmem/
obj-$(CONFIG_VIRTIO) += virtio/
obj-$(CONFIG_XEN) += xen/
diff --git a/drivers/ivshmem/Kconfig b/drivers/ivshmem/Kconfig
new file mode 100644
index 0000000..e84364a
--- /dev/null
+++ b/drivers/ivshmem/Kconfig
@@ -0,0 +1,9 @@
+#
+# IVShmem support drivers
+#
+
+config IVRING_MANAGER
+ tristate "IVRing management driver"
+ help
+ It allows IVShmem, a virtual PCI RAM device in QEMU, to use as a
+ ring-buffer for tracing of a guest.
diff --git a/drivers/ivshmem/Makefile b/drivers/ivshmem/Makefile
new file mode 100644
index 0000000..e725f8c
--- /dev/null
+++ b/drivers/ivshmem/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for IVShmem drivers
+#
+
+obj-$(CONFIG_IVRING_MANAGER) += ivring.o
diff --git a/drivers/ivshmem/ivring.c b/drivers/ivshmem/ivring.c
new file mode 100644
index 0000000..5cbcfb6
--- /dev/null
+++ b/drivers/ivshmem/ivring.c
@@ -0,0 +1,551 @@
+/*
+ * Ring buffer on IVShmem Driver
+ *
+ * (C) 2012 Hitachi, Ltd.
+ * Written by Hitachi Yokohama Research Laboratory.
+ *
+ * Created by Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ * Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
+ * Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ * based on UIOIVShmem Driver, http://www.gitorious.org/nahanni/guest-code,
+ * (C) 2009 Cam Macdonell <cam@cs.ualberta.ca>
+ * based on Hilscher CIF card driver (C) 2007 Hans J. Koch <hjk@linutronix.de>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#include <linux/bitops.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include "./ivring.h"
+
+
+#define IVSHM_OFFS_INTRMASK 0
+#define IVSHM_OFFS_INTRSTATUS 4
+#define IVSHM_OFFS_IVPOSITION 8
+#define IVSHM_OFFS_DOORBELL 12
+
+#define MSIX_NAMEBUF_SIZE 128
+#define DEFAULT_NR_VECTORS 4
+
+#define IVRING_DEVNAME "ivring"
+
+struct ivring_mem {
+ unsigned long addr;
+ unsigned long size;
+ void __iomem *ioaddr;
+};
+
+struct ivring_info {
+ struct pci_dev *dev;
+ int irq;
+ struct ivring_mem mem[2]; /* 0:control, 1:shmem */
+ struct msix_entry *msix_entries;
+ char (*msix_names)[MSIX_NAMEBUF_SIZE];
+ int nvectors;
+ int posn;
+ struct ivring_hdr *hdr;
+};
+
+#define MAX_IVRING_CHN 16
+
+static struct ivring_info *ivring_channels[MAX_IVRING_CHN];
+static spinlock_t ivring_locks[MAX_IVRING_CHN];
+
+static void ivring_init_locks(void)
+{
+ int i;
+
+ for (i = 0; i < MAX_IVRING_CHN; i++)
+ spin_lock_init(&ivring_locks[i]);
+}
+
+#define ivring_lock(id, flags) \
+ spin_lock_irqsave(&ivring_locks[id], flags)
+
+#define ivring_unlock(id, flags) \
+ spin_unlock_irqrestore(&ivring_locks[id], flags)
+
+/* Device I/O helper: Don't check mem[0].ioaddr is ready */
+static int ivring_read_position(struct ivring_info *info)
+{
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_IVPOSITION;
+ u32 val = readl(addr);
+
+ /* return as a singed value */
+ return (int)val;
+}
+
+/* Note: this operation is destructive. Intr status is cleared after reading */
+static u32 ivring_read_intr(struct ivring_info *info)
+{
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_INTRSTATUS;
+ return readl(addr);
+}
+
+static void ivring_write_intrmask(struct ivring_info *info, u32 mask)
+{
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_INTRMASK;
+ writel(mask, addr);
+}
+
+static void ivring_write_doorbell(struct ivring_info *info, int posn, int vec)
+{
+ u32 door = ((posn & 0xffff) << 16) | (vec & 0x00ff);
+ void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_DOORBELL;
+ writel(door, addr);
+}
+
+static unsigned long ivring_shmsize(struct ivring_info *info)
+{
+ return info->mem[1].size;
+}
+
+static int ivring_hdr_init(struct ivring_hdr *hdr, u32 shmsize)
+{
+ if (strncmp(hdr->magic, IVRING_MAGIC, 4) == 0) {
+ printk(KERN_INFO "Ring header is already initialized\n");
+ printk(KERN_INFO "reader %d, writer %d, pos %llx\n",
+ (int)hdr->reader, (int)hdr->writer, hdr->pos);
+ if (hdr->version != IVRING_VERSION) {
+ printk(KERN_ERR "Ring version is different! (%d)\n",
+ (int)hdr->version);
+ return -EINVAL;
+ }
+ return 0;
+ }
+ memset(hdr, 0, IVRING_OFFSET);
+ memcpy(hdr->magic, IVRING_MAGIC, 4);
+ hdr->version = IVRING_VERSION;
+ hdr->reader = -1;
+ hdr->writer = -1;
+ hdr->total_bits = __fls(shmsize);
+ hdr->total_mask = ~(~0 << hdr->total_bits);
+ hdr->threshold = IVRING_INIT_THRESHOLD;
+ hdr->pos = IVRING_STARTPOS;
+ return 1;
+}
+
+static void ivring_notify_reader(struct ivring_info *info)
+{
+ if (info->hdr->reader != -1) {
+ pr_debug("Notify update to reader %d\n", info->hdr->reader);
+ ivring_write_doorbell(info, info->hdr->reader, IVRING_VECTOR);
+ }
+}
+
+static int ivring_init_hdr(struct ivring_info *info)
+{
+ if (!info->mem[1].ioaddr) {
+ printk(KERN_ERR "IVRing: IVShmem is not mapped.\n");
+ return -1;
+ }
+
+ info->hdr = info->mem[1].ioaddr;
+ ivring_hdr_init(info->hdr, ivring_shmsize(info));
+
+ info->hdr->writer = info->posn;
+ ivring_notify_reader(info);
+ return 0;
+}
+
+static void ivring_cleanup_hdr(struct ivring_info *info)
+{
+ if (!info->hdr || info->hdr->writer != info->posn)
+ return;
+
+ info->hdr->writer = -1;
+ /* Don't clear pos */
+ ivring_notify_reader(info);
+}
+
+/**
+ * ivring_ready - get an IVRing channel
+ * @id: ID of a ring-buffer
+ *
+ * Get whether a ring-buffer specified by ID is useable or not.
+ *
+ */
+bool ivring_ready(int id)
+{
+ if (id < 0 || id >= MAX_IVRING_CHN || ivring_channels[id] == NULL)
+ return false;
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(ivring_ready);
+
+/**
+ * ivring_write - record data to IVRing
+ * @id: ID of a ring-buffer
+ * @buf: data buffer
+ * @size: data size(byte)
+ *
+ * Record data from address indicating a position of buffer to a ring-buffer
+ * specified by ID.
+ *
+ * Spinlock function is used and only one buffer is available for some CPUs.
+ * Then, DO NOT record data in SMP environment in this version.
+ *
+ */
+int ivring_write(int id, void *buf, size_t size)
+{
+ struct ivring_info *info;
+ struct ivring_hdr *hdr;
+ unsigned long flags;
+ u32 pos, tbits, room;
+ int ret = 0;
+
+ if (!ivring_ready(id))
+ return -ENOENT;
+
+ ivring_lock(id, flags);
+ info = ivring_channels[id];
+ if (unlikely(info == NULL)) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ hdr = info->hdr;
+ tbits = hdr->total_bits;
+ if (unlikely(size >> tbits)) {
+ /* write-size exceeds bufer-size */
+ ret = -E2BIG;
+ goto out;
+ }
+
+ pos = (u32)hdr->pos & hdr->total_mask;
+
+ if (unlikely((pos + size) >> tbits)) {
+ room = (1 << tbits) - pos;
+ memcpy(ivring_pos_addr(hdr, pos), buf, room);
+ memcpy(ivring_pos_addr(hdr, IVRING_OFFSET), buf + room,
+ size - room);
+ hdr->pos += size + IVRING_OFFSET;
+ } else {
+ memcpy(ivring_pos_addr(hdr, pos), buf, size);
+ hdr->pos += size;
+ }
+
+ /*
+ * Notify reader if counter is over the threshold
+ * This feature will be used for IVRing reader.
+ */
+ if (hdr->threshold < hdr->pos) {
+ hdr->threshold = IVRING_INIT_THRESHOLD;
+ ivring_notify_reader(info);
+ }
+ ret = (int)size;
+out:
+ ivring_unlock(id, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ivring_write);
+
+/* Channel management functions */
+static int ivring_register_channel(struct ivring_info *info)
+{
+ int i;
+ unsigned long flags;
+
+ for (i = 0; i < MAX_IVRING_CHN; i++) {
+ ivring_lock(i, flags);
+ if (ivring_channels[i] == NULL) {
+ ivring_channels[i] = info;
+ info->posn = ivring_read_position(info);
+ ivring_init_hdr(info);
+ printk(KERN_INFO "Add ivring id %d, pos %d\n",
+ i, info->posn);
+ ivring_unlock(i, flags);
+ return i;
+ }
+ ivring_unlock(i, flags);
+ }
+
+ return -1;
+}
+
+static void ivring_unregister_channel(struct ivring_info *info)
+{
+ int i;
+ unsigned long flags;
+
+ for (i = 0; i < MAX_IVRING_CHN; i++) {
+ ivring_lock(i, flags);
+ if (ivring_channels[i] == info) {
+ printk(KERN_INFO "Remove ivring id %d, pos %d\n",
+ i, info->posn);
+ ivring_channels[i] = NULL;
+ ivring_cleanup_hdr(info);
+ ivring_unlock(i, flags);
+ break;
+ }
+ ivring_unlock(i, flags);
+ }
+}
+
+/* IVRing interrupt handlers */
+static void ivring_event_handler(int irq, struct ivring_info *info)
+{
+ /* TODO: update noticed - take a reaction? */
+
+ pr_debug("IVRing: Get an interrupt %d:%d.\n", info->posn, irq);
+}
+
+static irqreturn_t ivring_irq_handler(int irq, void *opaque)
+{
+ struct ivring_info *info = opaque;
+
+ if (ivring_read_intr(info) == 0)
+ return IRQ_NONE;
+
+ ivring_event_handler(irq, info);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t ivring_msix_handler(int irq, void *opaque)
+{
+ struct ivring_info *info = opaque;
+
+ ivring_event_handler(irq, info);
+
+ return IRQ_HANDLED;
+}
+
+static void free_msix_vectors(struct ivring_info *info)
+{
+ int i;
+
+ if (info->irq != -1 || info->nvectors == 0)
+ /* No need to free it */
+ return;
+
+ for (i = 0; i < info->nvectors; i++)
+ free_irq(info->msix_entries[i].vector, info);
+ pci_disable_msix(info->dev);
+ info->nvectors = 0;
+
+ kfree(info->msix_entries);
+ info->msix_entries = NULL;
+ kfree(info->msix_names);
+ info->msix_names = NULL;
+}
+
+/* Setup MSI-X interrupt vectors */
+static int request_msix_vectors(struct ivring_info *info, int nvectors)
+{
+ int i, err;
+
+ info->msix_entries = kzalloc(nvectors * sizeof(*info->msix_entries),
+ GFP_KERNEL);
+ if (info->msix_entries == NULL)
+ return -ENOMEM;
+
+ info->msix_names = kzalloc(nvectors * sizeof(*info->msix_names),
+ GFP_KERNEL);
+ if (info->msix_names == NULL) {
+ kfree(info->msix_entries);
+ info->msix_entries = NULL;
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < nvectors; ++i)
+ info->msix_entries[i].entry = i;
+
+ err = pci_enable_msix(info->dev, info->msix_entries, nvectors);
+ if (err > 0) {
+ nvectors = err; /* msi-x positive error code
+ returns the number available*/
+ err = pci_enable_msix(info->dev, info->msix_entries, nvectors);
+ if (err) {
+ printk(KERN_INFO "no MSI (%d). Back to INTx.\n", err);
+ goto error;
+ }
+ }
+
+ if (err)
+ goto error;
+
+ info->nvectors = nvectors;
+
+ for (i = 0; i < info->nvectors; i++) {
+
+ snprintf(info->msix_names[i], MSIX_NAMEBUF_SIZE,
+ "%s-config", IVRING_DEVNAME);
+
+ err = request_irq(info->msix_entries[i].vector,
+ ivring_msix_handler, 0,
+ info->msix_names[i], info);
+
+ if (err)
+ goto error_free_irq;
+ }
+
+ return 0;
+
+error_free_irq:
+ while (i--)
+ free_irq(info->msix_entries[i].vector, info);
+
+ pci_disable_msix(info->dev);
+ info->nvectors = 0;
+error:
+ kfree(info->msix_entries);
+ info->msix_entries = NULL;
+ kfree(info->msix_names);
+ info->msix_names = NULL;
+ return err;
+
+}
+
+static int __devinit ivring_pci_probe(struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ struct ivring_info *info;
+ int ret;
+
+ info = kzalloc(sizeof(struct ivring_info), GFP_KERNEL);
+ if (!info)
+ return -ENOMEM;
+
+ if (pci_enable_device(dev)) {
+ printk(KERN_ERR "IVRing: Failed to probe device.\n");
+ goto out_free;
+ }
+
+ if (pci_request_regions(dev, IVRING_DEVNAME)) {
+ printk(KERN_ERR "IVRing: Failed (disable).\n");
+ goto out_disable;
+ }
+
+ /* Init control memory region */
+ info->mem[0].addr = pci_resource_start(dev, 0);
+ if (!info->mem[0].addr) {
+ printk(KERN_ERR "IVRing: Failed (release).\n");
+ goto out_release;
+ }
+
+ info->mem[0].size = pci_resource_len(dev, 0);
+ info->mem[0].ioaddr = pci_ioremap_bar(dev, 0);
+ if (!info->mem[0].ioaddr) {
+ printk(KERN_ERR "IVRing: Failed (release).\n");
+ goto out_release;
+ }
+
+ /* Init shmem region */
+ info->mem[1].addr = pci_resource_start(dev, 2);
+ if (!info->mem[1].addr) {
+ printk(KERN_INFO "failed to get addr\n");
+ printk(KERN_ERR "IVRing: Failed (unmap).\n");
+ goto out_unmap;
+ }
+
+ info->mem[1].size = pci_resource_len(dev, 2);
+ info->mem[1].ioaddr = ioremap_cache(info->mem[1].addr,
+ info->mem[1].size);
+ if (!info->mem[1].ioaddr) {
+ printk(KERN_INFO "failed to map addr\n");
+ printk(KERN_ERR "IVRing: Failed (unmap).\n");
+ goto out_unmap;
+ }
+
+ info->dev = dev;
+
+ /* Init interrupt vectors */
+ if (request_msix_vectors(info, DEFAULT_NR_VECTORS) != 0) {
+ info->irq = dev->irq;
+ ret = request_irq(info->irq, ivring_irq_handler, IRQF_SHARED,
+ IVRING_DEVNAME, info);
+ if (ret < 0) {
+ printk(KERN_ERR "IVRing: Failed (unmap2).\n");
+ goto out_unmap2;
+ }
+
+ printk(KERN_INFO "Regular IRQs enabled\n");
+ ivring_write_intrmask(info, 0xffffffff);
+ } else {
+ printk(KERN_INFO "MSI-X enabled\n");
+ info->irq = -1;
+ ivring_write_intrmask(info, 0xffffffff);
+ }
+
+ pci_set_drvdata(dev, info);
+
+ ivring_register_channel(info);
+
+ return 0;
+
+out_unmap2:
+ iounmap(info->mem[1].ioaddr);
+out_unmap:
+ iounmap(info->mem[0].ioaddr);
+out_release:
+ pci_release_regions(dev);
+out_disable:
+ pci_disable_device(dev);
+out_free:
+ kfree(info);
+ return -ENODEV;
+}
+
+static void ivring_pci_remove(struct pci_dev *dev)
+{
+ struct ivring_info *info = pci_get_drvdata(dev);
+
+ ivring_unregister_channel(info);
+
+ if (info->irq != -1)
+ free_irq(info->irq, info);
+ else
+ free_msix_vectors(info);
+
+ pci_release_regions(dev);
+ pci_disable_device(dev);
+ pci_set_drvdata(dev, NULL);
+ iounmap(info->mem[1].ioaddr);
+ iounmap(info->mem[0].ioaddr);
+
+ kfree(info);
+}
+
+static struct pci_device_id ivring_pci_ids[] __devinitdata = {
+ {
+ .vendor = 0x1af4,
+ .device = 0x1110,
+ .subvendor = PCI_ANY_ID,
+ .subdevice = PCI_ANY_ID,
+ },
+ { 0, }
+};
+
+static struct pci_driver ivring_pci_driver = {
+ .name = "ivring",
+ .id_table = ivring_pci_ids,
+ .probe = ivring_pci_probe,
+ .remove = ivring_pci_remove,
+};
+
+static int __init ivring_init_module(void)
+{
+ ivring_init_locks();
+ return pci_register_driver(&ivring_pci_driver);
+}
+
+static void __exit ivring_exit_module(void)
+{
+ pci_unregister_driver(&ivring_pci_driver);
+}
+
+module_init(ivring_init_module);
+module_exit(ivring_exit_module);
+
+MODULE_DEVICE_TABLE(pci, ivring_pci_ids);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Hitachi, Ltd.");
diff --git a/drivers/ivshmem/ivring.h b/drivers/ivshmem/ivring.h
new file mode 100644
index 0000000..2fe9b46
--- /dev/null
+++ b/drivers/ivshmem/ivring.h
@@ -0,0 +1,77 @@
+#ifndef __IVRING_H__
+#define __IVRING_H__
+
+/* ivshmem ring buffer header */
+#ifdef __KERNEL__
+#include <linux/device.h>
+#include <linux/module.h>
+#else
+#include <stdbool.h>
+#include <bits/types.h>
+typedef int32_t s32;
+typedef uint32_t u32;
+typedef uint64_t u64;
+#endif
+
+/* control structure of ivshmem ring buffer */
+struct ivring_hdr {
+ char magic[4]; /* Magic ID */
+ u32 version; /* IVRing version number */
+ s32 reader; /* reader ID */
+ s32 writer; /* writer ID */
+ u32 total_mask; /* bit mask of whole memory size */
+ u32 total_bits; /* bits of whole memory size */
+ u64 pos; /* writing position */
+ u64 threshold; /* threshold value for notification */
+};
+
+#define IVRING_MAGIC "RING"
+#define IVRING_VERSION 1
+#define IVRING_OFFSET 4096 /* This page is for the header */
+#define IVRING_VECTOR 0 /* Doorbell Number */
+#define IVRING_STARTPOS IVRING_OFFSET
+#define IVRING_INIT_THRESHOLD (~0ULL)
+#define IVRING_READ_MARGIN 4096
+
+static inline void *ivring_start_addr(struct ivring_hdr *hdr)
+{
+ return (char *)hdr + IVRING_OFFSET;
+}
+
+static inline void *ivring_end_addr(struct ivring_hdr *hdr)
+{
+ return (char *)hdr + (1 << hdr->total_bits);
+}
+
+static inline void *ivring_pos_addr(struct ivring_hdr *hdr, u32 pos)
+{
+ return (char *)hdr + pos;
+}
+
+static inline void *ivring_pos64_addr(struct ivring_hdr *hdr, u64 pos)
+{
+ u32 pos32;
+ pos32 = (u32)pos & hdr->total_mask;
+ return ivring_pos_addr(hdr, pos32);
+}
+
+static inline bool ivring_verify_pos(struct ivring_hdr *hdr, u32 pos)
+{
+ if (pos < IVRING_OFFSET ||
+ pos >= (1 << hdr->total_bits))
+ return false;
+ return true;
+}
+
+#ifdef __KERNEL__
+/* Kernel ringbuffer(writer) APIs */
+
+/* Get an IVRing channel */
+extern bool ivring_ready(int id);
+
+/* Record data to IVRing */
+extern int ivring_write(int id, void *buf, size_t size);
+
+#endif
+
+#endif /* __IVRING_H__ */
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem
2012-06-05 13:01 ` [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem Yoshihiro YUNOMAE
@ 2012-06-05 13:10 ` Borislav Petkov
2012-06-05 14:13 ` Yoshihiro YUNOMAE
2012-06-05 23:03 ` Anthony Liguori
0 siblings, 2 replies; 9+ messages in thread
From: Borislav Petkov @ 2012-06-05 13:10 UTC (permalink / raw)
To: Yoshihiro YUNOMAE
Cc: Ohad Ben-Cohen, Grant Likely, qemu-devel, Greg Kroah-Hartman,
Linus Walleij, Rusty Russell, linux-kernel, Borislav Petkov,
Arnaldo Carvalho de Melo, MyungJoo Ham, systemtap, Joerg Roedel,
Masami Hiramatsu, yrl.pp-manager.tt, Cam Macdonell, Akihiro Nagai
On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
> This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
> QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
> a guest OS by recording kernel programing or SystemTap.
>
> This ring-buffer driver is implemented very simple. First 4kB of shared memory
> region is control structure of a ring-buffer. In this region, some values for
> managing the ring-buffer is stored such as bits and mask of whole memory size,
> writing position, threshold value for notification to a reader on a host OS.
> This region is used by the reader to know writing position. Then, "total
> memory size - 4kB" equals to usable memory region for recording data.
> This ring-buffer driver records any data from start to end of the writable
> memory region.
>
> When writing size exceeds a threshold value, this driver can notify a reader
> to read data by using writel(). As this later patch, reader does not have any
> function for receiving the notification. This notification feature will be used
> near the future.
>
> As a writer records data in this ring-buffer, spinlock function is used to
> avoid competing by some writers in multi CPU environment. Not to use spinlock,
> lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
> implemented near the future.
Yet another ring buffer?
We already have an ftrace and perf ring buffer, can't you use one of those?
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem
2012-06-05 13:10 ` Borislav Petkov
@ 2012-06-05 14:13 ` Yoshihiro YUNOMAE
2012-06-05 23:03 ` Anthony Liguori
1 sibling, 0 replies; 9+ messages in thread
From: Yoshihiro YUNOMAE @ 2012-06-05 14:13 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ohad Ben-Cohen, Grant Likely, qemu-devel, Greg Kroah-Hartman,
Linus Walleij, Rusty Russell, linux-kernel, Steven Rostedt,
Borislav Petkov, Arnaldo Carvalho de Melo, MyungJoo Ham,
systemtap, Joerg Roedel, Masami Hiramatsu, yrl.pp-manager.tt,
Cam Macdonell, Akihiro Nagai
(2012/06/05 22:10), Borislav Petkov wrote:
> On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
>> This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
>> QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
>> a guest OS by recording kernel programing or SystemTap.
>>
>> This ring-buffer driver is implemented very simple. First 4kB of shared memory
>> region is control structure of a ring-buffer. In this region, some values for
>> managing the ring-buffer is stored such as bits and mask of whole memory size,
>> writing position, threshold value for notification to a reader on a host OS.
>> This region is used by the reader to know writing position. Then, "total
>> memory size - 4kB" equals to usable memory region for recording data.
>> This ring-buffer driver records any data from start to end of the writable
>> memory region.
>>
>> When writing size exceeds a threshold value, this driver can notify a reader
>> to read data by using writel(). As this later patch, reader does not have any
>> function for receiving the notification. This notification feature will be used
>> near the future.
>>
>> As a writer records data in this ring-buffer, spinlock function is used to
>> avoid competing by some writers in multi CPU environment. Not to use spinlock,
>> lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
>> implemented near the future.
>
> Yet another ring buffer?
Yes, unfortunately...
> We already have an ftrace and perf ring buffer, can't you use one of those?
>
No, because those do not support to allocate buffer
from PCI memory device, nor pass the control structure
over it.
However, indeed, we understand what you would like to say.
This series is just RFC and we'd like to ask who is
interested in the guest tracing and how it should be
implemented.
- no more ring buffer. enhance perf/ftrace ring buffer to
enable allocating buffers on shared memory.
Other comments are welcome.
Thank you,
--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@hitachi.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem
2012-06-05 13:10 ` Borislav Petkov
2012-06-05 14:13 ` Yoshihiro YUNOMAE
@ 2012-06-05 23:03 ` Anthony Liguori
2012-06-05 23:22 ` Greg Kroah-Hartman
1 sibling, 1 reply; 9+ messages in thread
From: Anthony Liguori @ 2012-06-05 23:03 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ohad Ben-Cohen, Arnaldo Carvalho de Melo, Borislav Petkov,
Greg Kroah-Hartman, Yoshihiro YUNOMAE, Rusty Russell, qemu-devel,
linux-kernel, Grant Likely, MyungJoo Ham, Akihiro Nagai,
systemtap, Joerg Roedel, Masami Hiramatsu, yrl.pp-manager.tt,
Cam Macdonell, Linus Walleij
On 06/05/2012 09:10 PM, Borislav Petkov wrote:
> On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
>> This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
>> QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
>> a guest OS by recording kernel programing or SystemTap.
>>
>> This ring-buffer driver is implemented very simple. First 4kB of shared memory
>> region is control structure of a ring-buffer. In this region, some values for
>> managing the ring-buffer is stored such as bits and mask of whole memory size,
>> writing position, threshold value for notification to a reader on a host OS.
>> This region is used by the reader to know writing position. Then, "total
>> memory size - 4kB" equals to usable memory region for recording data.
>> This ring-buffer driver records any data from start to end of the writable
>> memory region.
>>
>> When writing size exceeds a threshold value, this driver can notify a reader
>> to read data by using writel(). As this later patch, reader does not have any
>> function for receiving the notification. This notification feature will be used
>> near the future.
>>
>> As a writer records data in this ring-buffer, spinlock function is used to
>> avoid competing by some writers in multi CPU environment. Not to use spinlock,
>> lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
>> implemented near the future.
>
> Yet another ring buffer?
>
> We already have an ftrace and perf ring buffer, can't you use one of those?
Not to mention virtio :-)
Why not just make a virtio device for this kind of thing?
Regards,
Anthony Liguori
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem
2012-06-05 23:03 ` Anthony Liguori
@ 2012-06-05 23:22 ` Greg Kroah-Hartman
2012-06-06 14:44 ` Masami Hiramatsu
0 siblings, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2012-06-05 23:22 UTC (permalink / raw)
To: Anthony Liguori
Cc: Ohad Ben-Cohen, Arnaldo Carvalho de Melo, Borislav Petkov,
Joerg Roedel, Yoshihiro YUNOMAE, Rusty Russell, qemu-devel,
Borislav Petkov, linux-kernel, Grant Likely, MyungJoo Ham,
systemtap, yrl.pp-manager.tt, Masami Hiramatsu, Akihiro Nagai,
Cam Macdonell, Linus Walleij
On Wed, Jun 06, 2012 at 07:03:06AM +0800, Anthony Liguori wrote:
> On 06/05/2012 09:10 PM, Borislav Petkov wrote:
> >On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
> >>This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
> >>QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
> >>a guest OS by recording kernel programing or SystemTap.
> >>
> >>This ring-buffer driver is implemented very simple. First 4kB of shared memory
> >>region is control structure of a ring-buffer. In this region, some values for
> >>managing the ring-buffer is stored such as bits and mask of whole memory size,
> >>writing position, threshold value for notification to a reader on a host OS.
> >>This region is used by the reader to know writing position. Then, "total
> >>memory size - 4kB" equals to usable memory region for recording data.
> >>This ring-buffer driver records any data from start to end of the writable
> >>memory region.
> >>
> >>When writing size exceeds a threshold value, this driver can notify a reader
> >>to read data by using writel(). As this later patch, reader does not have any
> >>function for receiving the notification. This notification feature will be used
> >>near the future.
> >>
> >>As a writer records data in this ring-buffer, spinlock function is used to
> >>avoid competing by some writers in multi CPU environment. Not to use spinlock,
> >>lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
> >>implemented near the future.
> >
> >Yet another ring buffer?
> >
> >We already have an ftrace and perf ring buffer, can't you use one of those?
>
> Not to mention virtio :-)
>
> Why not just make a virtio device for this kind of thing?
Yeah, that's exactly what I was thinking, why reinvent things again?
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/2] ivring: Add a ring-buffer driver on IVShmem
2012-06-05 23:22 ` Greg Kroah-Hartman
@ 2012-06-06 14:44 ` Masami Hiramatsu
0 siblings, 0 replies; 9+ messages in thread
From: Masami Hiramatsu @ 2012-06-06 14:44 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Ohad Ben-Cohen, Arnaldo Carvalho de Melo, Borislav Petkov,
systemtap, Joerg Roedel, Yoshihiro YUNOMAE, Rusty Russell,
qemu-devel, Borislav Petkov, linux-kernel, Grant Likely,
MyungJoo Ham, Anthony Liguori, yrl.pp-manager.tt, Akihiro Nagai,
Cam Macdonell, Linus Walleij
(2012/06/06 8:22), Greg Kroah-Hartman wrote:
> On Wed, Jun 06, 2012 at 07:03:06AM +0800, Anthony Liguori wrote:
>> On 06/05/2012 09:10 PM, Borislav Petkov wrote:
>>> On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
>>>> This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
>>>> QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
>>>> a guest OS by recording kernel programing or SystemTap.
>>>>
>>>> This ring-buffer driver is implemented very simple. First 4kB of shared memory
>>>> region is control structure of a ring-buffer. In this region, some values for
>>>> managing the ring-buffer is stored such as bits and mask of whole memory size,
>>>> writing position, threshold value for notification to a reader on a host OS.
>>>> This region is used by the reader to know writing position. Then, "total
>>>> memory size - 4kB" equals to usable memory region for recording data.
>>>> This ring-buffer driver records any data from start to end of the writable
>>>> memory region.
>>>>
>>>> When writing size exceeds a threshold value, this driver can notify a reader
>>>> to read data by using writel(). As this later patch, reader does not have any
>>>> function for receiving the notification. This notification feature will be used
>>>> near the future.
>>>>
>>>> As a writer records data in this ring-buffer, spinlock function is used to
>>>> avoid competing by some writers in multi CPU environment. Not to use spinlock,
>>>> lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
>>>> implemented near the future.
>>>
>>> Yet another ring buffer?
>>>
>>> We already have an ftrace and perf ring buffer, can't you use one of those?
>>
>> Not to mention virtio :-)
>>
>> Why not just make a virtio device for this kind of thing?
>
> Yeah, that's exactly what I was thinking, why reinvent things again?
Agreed. Actually, we think this is just a concept prototype.
Because of many restrictions of this device, especially for
scalability (which Yoshihiro will give a talk in LinuxCon Japan),
we are considering to move onto a virtio-based shmem device.
Afaics, it seems possible to use it virtio-ballon like way to pass
actual pages of the guest ring buffer to host. Then the reader
can read the pages directly from qemu.
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com
^ permalink raw reply [flat|nested] 9+ messages in thread