* Re: [PATCH] virtio_blk: Add help function to format mass of disks
From: Asias He @ 2012-04-10 13:08 UTC (permalink / raw)
To: Ren Mingxin
Cc: axboe, kvm, linux-scsi, mst, asamymuthupa, linux-kernel,
virtualization, James.Bottomley, tj
In-Reply-To: <1334042885-8330-1-git-send-email-renmx@cn.fujitsu.com>
On 04/10/2012 03:28 PM, Ren Mingxin wrote:
> The current virtio block's naming algorithm just supports 18278
> (26^3 + 26^2 + 26) disks. If there are mass of virtio blocks,
> there will be disks with the same name.
>
> Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add
> function "virtblk_name_format()" for virtio block to support mass
> of disks naming.
>
> Signed-off-by: Ren Mingxin<renmx@cn.fujitsu.com>
Make sense to me.
Acked-by: Asias He <asias@redhat.com>
> ---
> drivers/block/virtio_blk.c | 38 ++++++++++++++++++++++++++------------
> 1 files changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index c4a60ba..86516c8 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -374,6 +374,31 @@ static int init_vq(struct virtio_blk *vblk)
> return err;
> }
>
> +static int virtblk_name_format(char *prefix, int index, char *buf, int buflen)
> +{
> + const int base = 'z' - 'a' + 1;
> + char *begin = buf + strlen(prefix);
> + char *begin = buf + strlen(prefix);
> + char *end = buf + buflen;
> + char *p;
> + int unit;
> +
> + p = end - 1;
> + *p = '\0';
> + unit = base;
> + do {
> + if (p == begin)
> + return -EINVAL;
> + *--p = 'a' + (index % unit);
> + index = (index / unit) - 1;
> + } while (index>= 0);
> +
> + memmove(begin, p, end - p);
> + memcpy(buf, prefix, strlen(prefix));
> +
> + return 0;
> +}
> +
> static int __devinit virtblk_probe(struct virtio_device *vdev)
> {
> struct virtio_blk *vblk;
> @@ -442,18 +467,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
>
> q->queuedata = vblk;
>
> - if (index< 26) {
> - sprintf(vblk->disk->disk_name, "vd%c", 'a' + index % 26);
> - } else if (index< (26 + 1) * 26) {
> - sprintf(vblk->disk->disk_name, "vd%c%c",
> - 'a' + index / 26 - 1, 'a' + index % 26);
> - } else {
> - const unsigned int m1 = (index / 26 - 1) / 26 - 1;
> - const unsigned int m2 = (index / 26 - 1) % 26;
> - const unsigned int m3 = index % 26;
> - sprintf(vblk->disk->disk_name, "vd%c%c%c",
> - 'a' + m1, 'a' + m2, 'a' + m3);
> - }
> + virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
>
> vblk->disk->major = major;
> vblk->disk->first_minor = index_to_minor(index);
--
Asias
^ permalink raw reply
* Re: [PATCH 0/2] adding tracepoints to vhost
From: Stefan Hajnoczi @ 2012-04-10 12:54 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: kvm, netdev, linux-kernel, virtualization
In-Reply-To: <20120410124250.GB29808@redhat.com>
On Tue, Apr 10, 2012 at 1:42 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Apr 10, 2012 at 12:40:50PM +0100, Stefan Hajnoczi wrote:
>> On Tue, Apr 10, 2012 at 3:58 AM, Jason Wang <jasowang@redhat.com> wrote:
>> > To help in vhost analyzing, the following series adding basic tracepoints to
>> > vhost. Operations of both virtqueues and vhost works were traced in current
>> > implementation, net code were untouched. A top-like satistics displaying script
>> > were introduced to help the troubleshooting.
>> >
>> > TODO:
>> > - net specific tracepoints?
>> >
>> > ---
>> >
>> > Jason Wang (2):
>> > vhost: basic tracepoints
>> > tools: virtio: add a top-like utility for displaying vhost satistics
>> >
>> >
>> > drivers/vhost/trace.h | 153 ++++++++++++++++++++
>> > drivers/vhost/vhost.c | 17 ++
>> > tools/virtio/vhost_stat | 360 +++++++++++++++++++++++++++++++++++++++++++++++
>> > 3 files changed, 528 insertions(+), 2 deletions(-)
>> > create mode 100644 drivers/vhost/trace.h
>> > create mode 100755 tools/virtio/vhost_stat
>>
>> Perhaps this can replace the vhost log feature? I'm not sure if
>> tracepoints support the right data types but it seems like vhost
>> debugging could be done using tracing with less code.
>>
>> Stefan
>
> vhost log is not a debugging tool, it logs memory accesses for
> migration.
Thanks. I totally misunderstood its purpose.
Stefan
^ permalink raw reply
* Re: [PATCH 2/2] virtio-mmio: Devices parameter parsing
From: Pawel Moll @ 2012-04-10 12:53 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org
In-Reply-To: <CA+1xoqfov64Wz9e_G8ZTcXnAatw_VqYxecU5FX3Zrmw_XTL-Cw@mail.gmail.com>
> On Mon, Dec 12, 2011 at 6:57 PM, Pawel Moll <pawel.moll@arm.com> wrote:
> > This patch adds an option to instantiate guest virtio-mmio devices
> > basing on a kernel command line (or module) parameter, for example:
> >
> > virtio_mmio.devices=0x100@0x100b0000:48
> >
> > Signed-off-by: Pawel Moll <pawel.moll@arm.com>
On Mon, 2012-04-09 at 17:32 +0100, Sasha Levin wrote:
> Ping? Was this patch forgotten?
Damn, it has been forgetten indeed. It took a while to get the required
"<level>-param" patch got merged and then it slipped through cracks...
I'll re-test it against v3.4-rc and make sure it gets queued for 3.5
(3.4 is unlikely, especially now when Rusty is honeymooning ;-)
Thanks for the reminder and sorry about the delay!
Paweł
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH 0/2] adding tracepoints to vhost
From: Michael S. Tsirkin @ 2012-04-10 12:42 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <CAJSP0QWA1z6_YQe9hhpG7bCyZQarmU5NXbHsLGTL_zPTk3dOUA@mail.gmail.com>
On Tue, Apr 10, 2012 at 12:40:50PM +0100, Stefan Hajnoczi wrote:
> On Tue, Apr 10, 2012 at 3:58 AM, Jason Wang <jasowang@redhat.com> wrote:
> > To help in vhost analyzing, the following series adding basic tracepoints to
> > vhost. Operations of both virtqueues and vhost works were traced in current
> > implementation, net code were untouched. A top-like satistics displaying script
> > were introduced to help the troubleshooting.
> >
> > TODO:
> > - net specific tracepoints?
> >
> > ---
> >
> > Jason Wang (2):
> > vhost: basic tracepoints
> > tools: virtio: add a top-like utility for displaying vhost satistics
> >
> >
> > drivers/vhost/trace.h | 153 ++++++++++++++++++++
> > drivers/vhost/vhost.c | 17 ++
> > tools/virtio/vhost_stat | 360 +++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 528 insertions(+), 2 deletions(-)
> > create mode 100644 drivers/vhost/trace.h
> > create mode 100755 tools/virtio/vhost_stat
>
> Perhaps this can replace the vhost log feature? I'm not sure if
> tracepoints support the right data types but it seems like vhost
> debugging could be done using tracing with less code.
>
> Stefan
vhost log is not a debugging tool, it logs memory accesses for
migration.
^ permalink raw reply
* Re: [PATCH 0/2] adding tracepoints to vhost
From: Stefan Hajnoczi @ 2012-04-10 11:40 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, mst, linux-kernel, kvm, virtualization
In-Reply-To: <20120410025327.49693.93562.stgit@amd-6168-8-1.englab.nay.redhat.com>
On Tue, Apr 10, 2012 at 3:58 AM, Jason Wang <jasowang@redhat.com> wrote:
> To help in vhost analyzing, the following series adding basic tracepoints to
> vhost. Operations of both virtqueues and vhost works were traced in current
> implementation, net code were untouched. A top-like satistics displaying script
> were introduced to help the troubleshooting.
>
> TODO:
> - net specific tracepoints?
>
> ---
>
> Jason Wang (2):
> vhost: basic tracepoints
> tools: virtio: add a top-like utility for displaying vhost satistics
>
>
> drivers/vhost/trace.h | 153 ++++++++++++++++++++
> drivers/vhost/vhost.c | 17 ++
> tools/virtio/vhost_stat | 360 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 528 insertions(+), 2 deletions(-)
> create mode 100644 drivers/vhost/trace.h
> create mode 100755 tools/virtio/vhost_stat
Perhaps this can replace the vhost log feature? I'm not sure if
tracepoints support the right data types but it seems like vhost
debugging could be done using tracing with less code.
Stefan
^ permalink raw reply
* [PATCH] virtio_blk: Add help function to format mass of disks
From: Ren Mingxin @ 2012-04-10 7:28 UTC (permalink / raw)
To: mst, rusty
Cc: axboe, kvm, linux-scsi, asamymuthupa, linux-kernel,
virtualization, James.Bottomley, Ren Mingxin, tj
The current virtio block's naming algorithm just supports 18278
(26^3 + 26^2 + 26) disks. If there are mass of virtio blocks,
there will be disks with the same name.
Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add
function "virtblk_name_format()" for virtio block to support mass
of disks naming.
Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com>
---
drivers/block/virtio_blk.c | 38 ++++++++++++++++++++++++++------------
1 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index c4a60ba..86516c8 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -374,6 +374,31 @@ static int init_vq(struct virtio_blk *vblk)
return err;
}
+static int virtblk_name_format(char *prefix, int index, char *buf, int buflen)
+{
+ const int base = 'z' - 'a' + 1;
+ char *begin = buf + strlen(prefix);
+ char *begin = buf + strlen(prefix);
+ char *end = buf + buflen;
+ char *p;
+ int unit;
+
+ p = end - 1;
+ *p = '\0';
+ unit = base;
+ do {
+ if (p == begin)
+ return -EINVAL;
+ *--p = 'a' + (index % unit);
+ index = (index / unit) - 1;
+ } while (index >= 0);
+
+ memmove(begin, p, end - p);
+ memcpy(buf, prefix, strlen(prefix));
+
+ return 0;
+}
+
static int __devinit virtblk_probe(struct virtio_device *vdev)
{
struct virtio_blk *vblk;
@@ -442,18 +467,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
q->queuedata = vblk;
- if (index < 26) {
- sprintf(vblk->disk->disk_name, "vd%c", 'a' + index % 26);
- } else if (index < (26 + 1) * 26) {
- sprintf(vblk->disk->disk_name, "vd%c%c",
- 'a' + index / 26 - 1, 'a' + index % 26);
- } else {
- const unsigned int m1 = (index / 26 - 1) / 26 - 1;
- const unsigned int m2 = (index / 26 - 1) % 26;
- const unsigned int m3 = index % 26;
- sprintf(vblk->disk->disk_name, "vd%c%c%c",
- 'a' + m1, 'a' + m2, 'a' + m3);
- }
+ virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
vblk->disk->major = major;
vblk->disk->first_minor = index_to_minor(index);
--
1.7.1
^ permalink raw reply related
* [PATCH 2/2] tools: virtio: add a top-like utility for displaying vhost satistics
From: Jason Wang @ 2012-04-10 2:58 UTC (permalink / raw)
To: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20120410025327.49693.93562.stgit@amd-6168-8-1.englab.nay.redhat.com>
This patch adds simple python to display vhost satistics of vhost, the codes
were based on kvm_stat script from qemu. As work function has been recored,
filters could be used to distinguish which kinds of work are being executed or
queued:
vhost statistics
vhost_virtio_get_vq_desc 1460997 219682
vhost_work_execute_start 101248 12842
vhost_work_execute_end 101247 12842
vhost_work_queue_wakeup 101263 12841
vhost_virtio_signal 68452 8659
vhost_work_queue_wakeup(rx_net) 51797 6584
vhost_work_execute_start(rx_net) 51795 6584
vhost_work_queue_coalesce 35737 6571
vhost_work_queue_coalesce(rx_net) 35709 6566
vhost_virtio_update_avail_event 49512 6271
vhost_work_execute_start(tx_kick) 49429 6254
vhost_work_queue_wakeup(tx_kick) 49442 6252
vhost_work_queue_coalesce(tx_kick) 28 5
vhost_work_execute_start(rx_kick) 22 3
vhost_work_queue_wakeup(rx_kick) 22 3
vhost_poll_start 4 0
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
tools/virtio/vhost_stat | 360 +++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 360 insertions(+), 0 deletions(-)
create mode 100755 tools/virtio/vhost_stat
diff --git a/tools/virtio/vhost_stat b/tools/virtio/vhost_stat
new file mode 100755
index 0000000..b730f3b
--- /dev/null
+++ b/tools/virtio/vhost_stat
@@ -0,0 +1,360 @@
+#!/usr/bin/python
+#
+# top-like utility for displaying vhost statistics
+#
+# Copyright 2012 Red Hat, Inc.
+#
+# Modified from kvm_stat from qemu
+#
+# This work is licensed under the terms of the GNU GPL, version 2. See
+# the COPYING file in the top-level directory.
+
+import curses
+import sys, os, time, optparse
+
+work_types = {
+ "handle_rx_kick" : "rx_kick",
+ "handle_tx_kick" : "tx_kick",
+ "handle_rx_net" : "rx_net",
+ "handle_tx_net" : "tx_net",
+ "vhost_attach_cgroups_work": "cg_attach"
+ }
+
+addr = {}
+
+kallsyms = file("/proc/kallsyms").readlines()
+for kallsym in kallsyms:
+ entry = kallsym.split()
+ if entry[2] in work_types.keys():
+ addr["0x%s" % entry[0]] = work_types[entry[2]]
+
+filters = {
+ 'vhost_work_queue_wakeup': ('function', addr),
+ 'vhost_work_queue_coalesce' : ('function', addr),
+ 'vhost_work_execute_start' : ('function', addr),
+ 'vhost_poll_start' : ('function', addr),
+ 'vhost_poll_stop' : ('function', addr),
+}
+
+def invert(d):
+ return dict((x[1], x[0]) for x in d.iteritems())
+
+for f in filters:
+ filters[f] = (filters[f][0], invert(filters[f][1]))
+
+import ctypes, struct, array
+
+libc = ctypes.CDLL('libc.so.6')
+syscall = libc.syscall
+class perf_event_attr(ctypes.Structure):
+ _fields_ = [('type', ctypes.c_uint32),
+ ('size', ctypes.c_uint32),
+ ('config', ctypes.c_uint64),
+ ('sample_freq', ctypes.c_uint64),
+ ('sample_type', ctypes.c_uint64),
+ ('read_format', ctypes.c_uint64),
+ ('flags', ctypes.c_uint64),
+ ('wakeup_events', ctypes.c_uint32),
+ ('bp_type', ctypes.c_uint32),
+ ('bp_addr', ctypes.c_uint64),
+ ('bp_len', ctypes.c_uint64),
+ ]
+def _perf_event_open(attr, pid, cpu, group_fd, flags):
+ return syscall(298, ctypes.pointer(attr), ctypes.c_int(pid),
+ ctypes.c_int(cpu), ctypes.c_int(group_fd),
+ ctypes.c_long(flags))
+
+PERF_TYPE_HARDWARE = 0
+PERF_TYPE_SOFTWARE = 1
+PERF_TYPE_TRACEPOINT = 2
+PERF_TYPE_HW_CACHE = 3
+PERF_TYPE_RAW = 4
+PERF_TYPE_BREAKPOINT = 5
+
+PERF_SAMPLE_IP = 1 << 0
+PERF_SAMPLE_TID = 1 << 1
+PERF_SAMPLE_TIME = 1 << 2
+PERF_SAMPLE_ADDR = 1 << 3
+PERF_SAMPLE_READ = 1 << 4
+PERF_SAMPLE_CALLCHAIN = 1 << 5
+PERF_SAMPLE_ID = 1 << 6
+PERF_SAMPLE_CPU = 1 << 7
+PERF_SAMPLE_PERIOD = 1 << 8
+PERF_SAMPLE_STREAM_ID = 1 << 9
+PERF_SAMPLE_RAW = 1 << 10
+
+PERF_FORMAT_TOTAL_TIME_ENABLED = 1 << 0
+PERF_FORMAT_TOTAL_TIME_RUNNING = 1 << 1
+PERF_FORMAT_ID = 1 << 2
+PERF_FORMAT_GROUP = 1 << 3
+
+import re
+
+sys_tracing = '/sys/kernel/debug/tracing'
+
+class Group(object):
+ def __init__(self, cpu):
+ self.events = []
+ self.group_leader = None
+ self.cpu = cpu
+ def add_event(self, name, event_set, tracepoint, filter = None):
+ self.events.append(Event(group = self,
+ name = name, event_set = event_set,
+ tracepoint = tracepoint, filter = filter))
+ if len(self.events) == 1:
+ self.file = os.fdopen(self.events[0].fd)
+ def read(self):
+ bytes = 8 * (1 + len(self.events))
+ fmt = 'xxxxxxxx' + 'q' * len(self.events)
+ return dict(zip([event.name for event in self.events],
+ struct.unpack(fmt, self.file.read(bytes))))
+
+class Event(object):
+ def __init__(self, group, name, event_set, tracepoint, filter = None):
+ self.name = name
+ attr = perf_event_attr()
+ attr.type = PERF_TYPE_TRACEPOINT
+ attr.size = ctypes.sizeof(attr)
+ id_path = os.path.join(sys_tracing, 'events', event_set,
+ tracepoint, 'id')
+ id = int(file(id_path).read())
+ attr.config = id
+ attr.sample_type = (PERF_SAMPLE_RAW
+ | PERF_SAMPLE_TIME
+ | PERF_SAMPLE_CPU)
+ attr.sample_period = 1
+ attr.read_format = PERF_FORMAT_GROUP
+ group_leader = -1
+ if group.events:
+ group_leader = group.events[0].fd
+ fd = _perf_event_open(attr, -1, group.cpu, group_leader, 0)
+ if fd == -1:
+ raise Exception('perf_event_open failed')
+ if filter:
+ import fcntl
+ fcntl.ioctl(fd, 0x40082406, filter)
+ self.fd = fd
+ def enable(self):
+ import fcntl
+ fcntl.ioctl(self.fd, 0x00002400, 0)
+ def disable(self):
+ import fcntl
+ fcntl.ioctl(self.fd, 0x00002401, 0)
+
+class TracepointProvider(object):
+ def __init__(self):
+ path = os.path.join(sys_tracing, 'events', 'vhost')
+ fields = [f
+ for f in os.listdir(path)
+ if os.path.isdir(os.path.join(path, f))]
+ extra = []
+ for f in fields:
+ if f in filters:
+ subfield, values = filters[f]
+ for name, number in values.iteritems():
+ # kvm_exit(MMIO)
+ extra.append(f + '(' + name + ')')
+ fields += extra
+ self._setup(fields)
+ self.select(fields)
+ def fields(self):
+ return self._fields
+ def _setup(self, _fields):
+ self._fields = _fields
+ cpure = r'cpu([0-9]+)'
+ self.cpus = [int(re.match(cpure, x).group(1))
+ for x in os.listdir('/sys/devices/system/cpu')
+ if re.match(cpure, x)]
+ import resource
+ nfiles = len(self.cpus) * 1000
+ resource.setrlimit(resource.RLIMIT_NOFILE, (nfiles, nfiles))
+ events = []
+ self.group_leaders = []
+ for cpu in self.cpus:
+ group = Group(cpu)
+ for name in _fields:
+ tracepoint = name
+ filter = None
+ # for field like kvm_exit(MMIO)
+ m = re.match(r'(.*)\((.*)\)', name)
+ if m:
+ tracepoint, sub = m.groups()
+ filter = '%s==%s\0' % (filters[tracepoint][0],
+ filters[tracepoint][1][sub])
+ event = group.add_event(name, event_set = 'vhost',
+ tracepoint = tracepoint,
+ filter = filter)
+ self.group_leaders.append(group)
+ def select(self, fields):
+ for group in self.group_leaders:
+ for event in group.events:
+ if event.name in fields:
+ event.enable()
+ else:
+ event.disable()
+ def read(self):
+ from collections import defaultdict
+ ret = defaultdict(int)
+ for group in self.group_leaders:
+ for name, val in group.read().iteritems():
+ ret[name] += val
+ return ret
+
+class Stats:
+ def __init__(self, provider, fields = None):
+ self.provider = provider
+ self.fields_filter = fields
+ self._update()
+ def _update(self):
+ def wanted(key):
+ import re
+ if not self.fields_filter:
+ return True
+ return re.match(self.fields_filter, key) is not None
+ self.values = dict([(key, None)
+ for key in provider.fields()
+ if wanted(key)])
+ self.provider.select(self.values.keys())
+ def set_fields_filter(self, fields_filter):
+ self.fields_filter = fields_filter
+ self._update()
+ def get(self):
+ new = self.provider.read()
+ for key in self.provider.fields():
+ oldval = self.values.get(key, (0, 0))
+ newval = new[key]
+ newdelta = None
+ if oldval is not None:
+ newdelta = newval - oldval[0]
+ self.values[key] = (newval, newdelta)
+ return self.values
+
+if not os.access('/sys/kernel/debug', os.F_OK):
+ print 'Please enable CONFIG_DEBUG_FS in your kernel'
+ sys.exit(1)
+if not os.access('/sys/module/vhost_net', os.F_OK):
+ print 'Please make sure vhost module are loaded'
+ sys.exit(1)
+
+label_width = 40
+number_width = 10
+
+def tui(screen, stats):
+ curses.use_default_colors()
+ curses.noecho()
+ drilldown = False
+ fields_filter = stats.fields_filter
+ def update_drilldown():
+ if not fields_filter:
+ if drilldown:
+ stats.set_fields_filter(None)
+ else:
+ stats.set_fields_filter(r'^[^\(]*$')
+ update_drilldown()
+ def refresh(sleeptime):
+ screen.erase()
+ screen.addstr(0, 0, 'vhost statistics')
+ row = 2
+ s = stats.get()
+ def sortkey(x):
+ if s[x][1]:
+ return (-s[x][1], -s[x][0])
+ else:
+ return (0, -s[x][0])
+ for key in sorted(s.keys(), key = sortkey):
+ if row >= screen.getmaxyx()[0]:
+ break
+ values = s[key]
+ if not values[0] and not values[1]:
+ break
+ col = 1
+ screen.addstr(row, col, key)
+ col += label_width
+ screen.addstr(row, col, '%10d' % (values[0],))
+ col += number_width
+ if values[1] is not None:
+ screen.addstr(row, col, '%8d' % (values[1] / sleeptime,))
+ row += 1
+ screen.refresh()
+
+ sleeptime = 0.25
+ while True:
+ refresh(sleeptime)
+ curses.halfdelay(int(sleeptime * 10))
+ sleeptime = 3
+ try:
+ c = screen.getkey()
+ if c == 'x':
+ drilldown = not drilldown
+ update_drilldown()
+ if c == 'q':
+ break
+ except KeyboardInterrupt:
+ break
+ except curses.error:
+ continue
+
+def batch(stats):
+ s = stats.get()
+ time.sleep(1)
+ s = stats.get()
+ for key in sorted(s.keys()):
+ values = s[key]
+ print '%-22s%10d%10d' % (key, values[0], values[1])
+
+def log(stats):
+ keys = sorted(stats.get().iterkeys())
+ def banner():
+ for k in keys:
+ print '%10s' % k[0:9],
+ print
+ def statline():
+ s = stats.get()
+ for k in keys:
+ print ' %9d' % s[k][1],
+ print
+ line = 0
+ banner_repeat = 20
+ while True:
+ time.sleep(1)
+ if line % banner_repeat == 0:
+ banner()
+ statline()
+ line += 1
+
+options = optparse.OptionParser()
+options.add_option('-1', '--once', '--batch',
+ action = 'store_true',
+ default = False,
+ dest = 'once',
+ help = 'run in batch mode for one second',
+ )
+options.add_option('-l', '--log',
+ action = 'store_true',
+ default = False,
+ dest = 'log',
+ help = 'run in logging mode (like vmstat)',
+ )
+options.add_option('-f', '--fields',
+ action = 'store',
+ default = None,
+ dest = 'fields',
+ help = 'fields to display (regex)',
+ )
+(options, args) = options.parse_args(sys.argv)
+
+try:
+ provider = TracepointProvider()
+except:
+ print "Could not initialize tracepoint"
+ sys.exit(1)
+
+stats = Stats(provider, fields = options.fields)
+
+if options.log:
+ log(stats)
+elif not options.once:
+ import curses.wrapper
+ curses.wrapper(tui, stats)
+else:
+ batch(stats)
^ permalink raw reply related
* [PATCH 1/2] vhost: basic tracepoints
From: Jason Wang @ 2012-04-10 2:58 UTC (permalink / raw)
To: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20120410025327.49693.93562.stgit@amd-6168-8-1.englab.nay.redhat.com>
To help for the performance optimizations and debugging, this patch tracepoints
for vhost. Pay attention that the tracepoints are only for vhost, net code are
not touched.
Two kinds of activities were traced: virtio and vhost work.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/trace.h | 153 +++++++++++++++++++++++++++++++++++++++++++++++++
drivers/vhost/vhost.c | 17 +++++
2 files changed, 168 insertions(+), 2 deletions(-)
create mode 100644 drivers/vhost/trace.h
diff --git a/drivers/vhost/trace.h b/drivers/vhost/trace.h
new file mode 100644
index 0000000..0423899
--- /dev/null
+++ b/drivers/vhost/trace.h
@@ -0,0 +1,153 @@
+#if !defined(_TRACE_VHOST_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VHOST_H
+
+#include <linux/tracepoint.h>
+#include "vhost.h"
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vhost
+
+/*
+ * Tracepoint for updating used flag.
+ */
+TRACE_EVENT(vhost_virtio_update_used_flags,
+ TP_PROTO(struct vhost_virtqueue *vq),
+ TP_ARGS(vq),
+
+ TP_STRUCT__entry(
+ __field(struct vhost_virtqueue *, vq)
+ __field(u16, used_flags)
+ ),
+
+ TP_fast_assign(
+ __entry->vq = vq;
+ __entry->used_flags = vq->used_flags;
+ ),
+
+ TP_printk("vhost update used flag %x to vq %p notify %s",
+ __entry->used_flags, __entry->vq,
+ (__entry->used_flags & VRING_USED_F_NO_NOTIFY) ?
+ "disabled" : "enabled")
+);
+
+/*
+ * Tracepoint for updating avail event.
+ */
+TRACE_EVENT(vhost_virtio_update_avail_event,
+ TP_PROTO(struct vhost_virtqueue *vq),
+ TP_ARGS(vq),
+
+ TP_STRUCT__entry(
+ __field(struct vhost_virtqueue *, vq)
+ __field(u16, avail_idx)
+ ),
+
+ TP_fast_assign(
+ __entry->vq = vq;
+ __entry->avail_idx = vq->avail_idx;
+ ),
+
+ TP_printk("vhost update avail idx %u(%u) for vq %p",
+ __entry->avail_idx, __entry->avail_idx %
+ __entry->vq->num, __entry->vq)
+);
+
+/*
+ * Tracepoint for processing descriptor.
+ */
+TRACE_EVENT(vhost_virtio_get_vq_desc,
+ TP_PROTO(struct vhost_virtqueue *vq, unsigned int index,
+ unsigned out, unsigned int in),
+ TP_ARGS(vq, index, out, in),
+
+ TP_STRUCT__entry(
+ __field(struct vhost_virtqueue *, vq)
+ __field(unsigned int, head)
+ __field(unsigned int, out)
+ __field(unsigned int, in)
+ ),
+
+ TP_fast_assign(
+ __entry->vq = vq;
+ __entry->head = index;
+ __entry->out = out;
+ __entry->in = in;
+ ),
+
+ TP_printk("vhost get vq %p head %u out %u in %u",
+ __entry->vq, __entry->head, __entry->out, __entry->in)
+
+);
+
+/*
+ * Tracepoint for signal guest.
+ */
+TRACE_EVENT(vhost_virtio_signal,
+ TP_PROTO(struct vhost_virtqueue *vq),
+ TP_ARGS(vq),
+
+ TP_STRUCT__entry(
+ __field(struct vhost_virtqueue *, vq)
+ ),
+
+ TP_fast_assign(
+ __entry->vq = vq;
+ ),
+
+ TP_printk("vhost signal vq %p", __entry->vq)
+);
+
+DECLARE_EVENT_CLASS(vhost_work_template,
+ TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
+ TP_ARGS(dev, work),
+
+ TP_STRUCT__entry(
+ __field(struct vhost_dev *, dev)
+ __field(struct vhost_work *, work)
+ __field(void *, function)
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->work = work;
+ __entry->function = work->fn;
+ ),
+
+ TP_printk("%pf for work %p dev %p",
+ __entry->function, __entry->work, __entry->dev)
+);
+
+DEFINE_EVENT(vhost_work_template, vhost_work_queue_wakeup,
+ TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
+ TP_ARGS(dev, work));
+
+DEFINE_EVENT(vhost_work_template, vhost_work_queue_coalesce,
+ TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
+ TP_ARGS(dev, work));
+
+DEFINE_EVENT(vhost_work_template, vhost_poll_start,
+ TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
+ TP_ARGS(dev, work));
+
+DEFINE_EVENT(vhost_work_template, vhost_poll_stop,
+ TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
+ TP_ARGS(dev, work));
+
+DEFINE_EVENT(vhost_work_template, vhost_work_execute_start,
+ TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
+ TP_ARGS(dev, work));
+
+DEFINE_EVENT(vhost_work_template, vhost_work_execute_end,
+ TP_PROTO(struct vhost_dev *dev, struct vhost_work *work),
+ TP_ARGS(dev, work));
+
+#endif /* _TRACE_VHOST_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../drivers/vhost
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
+
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c14c42b..23f8d85 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -31,6 +31,8 @@
#include <linux/if_arp.h>
#include "vhost.h"
+#define CREATE_TRACE_POINTS
+#include "trace.h"
enum {
VHOST_MEMORY_MAX_NREGIONS = 64,
@@ -50,6 +52,7 @@ static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
poll = container_of(pt, struct vhost_poll, table);
poll->wqh = wqh;
add_wait_queue(wqh, &poll->wait);
+ trace_vhost_poll_start(NULL, &poll->work);
}
static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
@@ -101,6 +104,7 @@ void vhost_poll_start(struct vhost_poll *poll, struct file *file)
void vhost_poll_stop(struct vhost_poll *poll)
{
remove_wait_queue(poll->wqh, &poll->wait);
+ trace_vhost_poll_stop(NULL, &poll->work);
}
static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work *work,
@@ -147,7 +151,9 @@ static inline void vhost_work_queue(struct vhost_dev *dev,
list_add_tail(&work->node, &dev->work_list);
work->queue_seq++;
wake_up_process(dev->worker);
- }
+ trace_vhost_work_queue_wakeup(dev, work);
+ } else
+ trace_vhost_work_queue_coalesce(dev, work);
spin_unlock_irqrestore(&dev->work_lock, flags);
}
@@ -221,7 +227,9 @@ static int vhost_worker(void *data)
if (work) {
__set_current_state(TASK_RUNNING);
+ trace_vhost_work_execute_start(dev, work);
work->fn(work);
+ trace_vhost_work_execute_end(dev, work);
} else
schedule();
@@ -1011,6 +1019,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
}
+ trace_vhost_virtio_update_used_flags(vq);
return 0;
}
@@ -1030,6 +1039,7 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
}
+ trace_vhost_virtio_update_avail_event(vq);
return 0;
}
@@ -1319,6 +1329,7 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
/* Assume notifications from guest are disabled at this point,
* if they aren't we would need to update avail_event index. */
BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
+ trace_vhost_virtio_get_vq_desc(vq, head, *out_num, *in_num);
return head;
}
@@ -1485,8 +1496,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
{
/* Signal the Guest tell them we used something up. */
- if (vq->call_ctx && vhost_notify(dev, vq))
+ if (vq->call_ctx && vhost_notify(dev, vq)) {
eventfd_signal(vq->call_ctx, 1);
+ trace_vhost_virtio_signal(vq);
+ }
}
/* And here's the combo meal deal. Supersize me! */
^ permalink raw reply related
* [PATCH 0/2] adding tracepoints to vhost
From: Jason Wang @ 2012-04-10 2:58 UTC (permalink / raw)
To: netdev, virtualization, linux-kernel, kvm, mst
To help in vhost analyzing, the following series adding basic tracepoints to
vhost. Operations of both virtqueues and vhost works were traced in current
implementation, net code were untouched. A top-like satistics displaying script
were introduced to help the troubleshooting.
TODO:
- net specific tracepoints?
---
Jason Wang (2):
vhost: basic tracepoints
tools: virtio: add a top-like utility for displaying vhost satistics
drivers/vhost/trace.h | 153 ++++++++++++++++++++
drivers/vhost/vhost.c | 17 ++
tools/virtio/vhost_stat | 360 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 528 insertions(+), 2 deletions(-)
create mode 100644 drivers/vhost/trace.h
create mode 100755 tools/virtio/vhost_stat
--
Jason Wang
^ permalink raw reply
* Re: [PATCH 2/2] virtio-mmio: Devices parameter parsing
From: Sasha Levin @ 2012-04-09 16:32 UTC (permalink / raw)
To: Pawel Moll; +Cc: linux-kernel, virtualization
In-Reply-To: <1323712627-17353-2-git-send-email-pawel.moll@arm.com>
Ping? Was this patch forgotten?
On Mon, Dec 12, 2011 at 6:57 PM, Pawel Moll <pawel.moll@arm.com> wrote:
> This patch adds an option to instantiate guest virtio-mmio devices
> basing on a kernel command line (or module) parameter, for example:
>
> virtio_mmio.devices=0x100@0x100b0000:48
>
> Signed-off-by: Pawel Moll <pawel.moll@arm.com>
> ---
> Documentation/kernel-parameters.txt | 17 ++++
> drivers/virtio/Kconfig | 11 +++
> drivers/virtio/virtio_mmio.c | 163 +++++++++++++++++++++++++++++++++++
> 3 files changed, 191 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index a0c5c5f..c155d13 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -110,6 +110,7 @@ parameter is applicable:
> USB USB support is enabled.
> USBHID USB Human Interface Device support is enabled.
> V4L Video For Linux support is enabled.
> + VMMIO Driver for memory mapped virtio devices is enabled.
> VGA The VGA console has been enabled.
> VT Virtual terminal support is enabled.
> WDT Watchdog support is enabled.
> @@ -2720,6 +2721,22 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> video= [FB] Frame buffer configuration
> See Documentation/fb/modedb.txt.
>
> + virtio_mmio.device=
> + [VMMIO] Memory mapped virtio (platform) device.
> +
> + <size>@<baseaddr>:<irq>[:<id>]
> + where:
> + <size> := size (can use standard suffixes
> + like K, M and G)
> + <baseaddr> := physical base address
> + <irq> := interrupt number (as passed to
> + request_irq())
> + <id> := (optional) platform device id
> + example:
> + virtio_mmio.device=1K@0x100b0000:48:7
> +
> + Can be used multiple times for multiple devices.
> +
> vga= [BOOT,X86-32] Select a particular video mode
> See Documentation/x86/boot.txt and
> Documentation/svga.txt.
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 1a61939..f38b17a 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -46,4 +46,15 @@ config VIRTIO_BALLOON
>
> If unsure, say N.
>
> +config VIRTIO_MMIO_CMDLINE_DEVICES
> + bool "Memory mapped virtio devices parameter parsing"
> + depends on VIRTIO_MMIO
> + ---help---
> + Allow virtio-mmio devices instantiation via the kernel command line
> + or module parameters. Be aware that using incorrect parameters (base
> + address in particular) can crash your system - you have been warned.
> + See Documentation/kernel-parameters.txt for details.
> +
> + If unsure, say 'N'.
> +
> endmenu
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 7317dc2..eab0007 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -6,6 +6,50 @@
> * This module allows virtio devices to be used over a virtual, memory mapped
> * platform device.
> *
> + * The guest device(s) may be instantiated in one of three equivalent ways:
> + *
> + * 1. Static platform device in board's code, eg.:
> + *
> + * static struct platform_device v2m_virtio_device = {
> + * .name = "virtio-mmio",
> + * .id = -1,
> + * .num_resources = 2,
> + * .resource = (struct resource []) {
> + * {
> + * .start = 0x1001e000,
> + * .end = 0x1001e0ff,
> + * .flags = IORESOURCE_MEM,
> + * }, {
> + * .start = 42 + 32,
> + * .end = 42 + 32,
> + * .flags = IORESOURCE_IRQ,
> + * },
> + * }
> + * };
> + *
> + * 2. Device Tree node, eg.:
> + *
> + * virtio_block@1e000 {
> + * compatible = "virtio,mmio";
> + * reg = <0x1e000 0x100>;
> + * interrupts = <42>;
> + * }
> + *
> + * 3. Kernel module (or command line) parameter. Can be used more than once -
> + * one device will be created for each one. Syntax:
> + *
> + * [virtio_mmio.]device=<size>@<baseaddr>:<irq>[:<id>]
> + * where:
> + * <size> := size (can use standard suffixes like K, M or G)
> + * <baseaddr> := physical base address
> + * <irq> := interrupt number (as passed to request_irq())
> + * <id> := (optional) platform device id
> + * eg.:
> + * virtio_mmio.device=0x100@0x100b0000:48 \
> + * virtio_mmio.device=1K@0x1001e000:74
> + *
> + *
> + *
> * Registers layout (all 32-bit wide):
> *
> * offset d. name description
> @@ -42,6 +86,8 @@
> * See the COPYING file in the top-level directory.
> */
>
> +#define pr_fmt(fmt) "virtio-mmio: " fmt
> +
> #include <linux/highmem.h>
> #include <linux/interrupt.h>
> #include <linux/io.h>
> @@ -443,6 +489,122 @@ static int __devexit virtio_mmio_remove(struct platform_device *pdev)
>
>
>
> +/* Devices list parameter */
> +
> +#if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES)
> +
> +static struct device vm_cmdline_parent = {
> + .init_name = "virtio-mmio-cmdline",
> +};
> +
> +static int vm_cmdline_parent_registered;
> +static int vm_cmdline_id;
> +
> +static int vm_cmdline_set(const char *device,
> + const struct kernel_param *kp)
> +{
> + int err;
> + struct resource resources[2] = {};
> + char *str;
> + long long int base;
> + int processed, consumed = 0;
> + struct platform_device *pdev;
> +
> + resources[0].flags = IORESOURCE_MEM;
> + resources[1].flags = IORESOURCE_IRQ;
> +
> + resources[0].end = memparse(device, &str) - 1;
> +
> + processed = sscanf(str, "@%lli:%u%n:%d%n",
> + &base, &resources[1].start, &consumed,
> + &vm_cmdline_id, &consumed);
> +
> + if (processed < 2 || processed > 3 || str[consumed])
> + return -EINVAL;
> +
> + resources[0].start = base;
> + resources[0].end += base;
> + resources[1].end = resources[1].start;
> +
> + if (!vm_cmdline_parent_registered) {
> + err = device_register(&vm_cmdline_parent);
> + if (err) {
> + pr_err("Failed to register parent device!\n");
> + return err;
> + }
> + vm_cmdline_parent_registered = 1;
> + }
> +
> + pr_info("Registering device virtio-mmio.%d at 0x%llx-0x%llx, IRQ %d.\n",
> + vm_cmdline_id++,
> + (unsigned long long)resources[0].start,
> + (unsigned long long)resources[0].end,
> + (int)resources[1].start);
> +
> + pdev = platform_device_register_resndata(&vm_cmdline_parent,
> + "virtio-mmio", vm_cmdline_id++,
> + resources, ARRAY_SIZE(resources), NULL, 0);
> + if (IS_ERR(pdev))
> + return PTR_ERR(pdev);
> +
> + return 0;
> +}
> +
> +static int vm_cmdline_get_device(struct device *dev, void *data)
> +{
> + char *buffer = data;
> + unsigned int len = strlen(buffer);
> + struct platform_device *pdev = to_platform_device(dev);
> +
> + snprintf(buffer + len, PAGE_SIZE - len, "0x%llx@0x%llx:%llu:%d\n",
> + pdev->resource[0].end - pdev->resource[0].start + 1ULL,
> + (unsigned long long)pdev->resource[0].start,
> + (unsigned long long)pdev->resource[1].start,
> + pdev->id);
> + return 0;
> +}
> +
> +static int vm_cmdline_get(char *buffer, const struct kernel_param *kp)
> +{
> + buffer[0] = '\0';
> + device_for_each_child(&vm_cmdline_parent, buffer,
> + vm_cmdline_get_device);
> + return strlen(buffer) + 1;
> +}
> +
> +static struct kernel_param_ops vm_cmdline_param_ops = {
> + .set = vm_cmdline_set,
> + .get = vm_cmdline_get,
> +};
> +
> +device_param_cb(device, &vm_cmdline_param_ops, NULL, S_IRUSR);
> +
> +static int vm_unregister_cmdline_device(struct device *dev,
> + void *data)
> +{
> + platform_device_unregister(to_platform_device(dev));
> +
> + return 0;
> +}
> +
> +static void vm_unregister_cmdline_devices(void)
> +{
> + if (vm_cmdline_parent_registered) {
> + device_for_each_child(&vm_cmdline_parent, NULL,
> + vm_unregister_cmdline_device);
> + device_unregister(&vm_cmdline_parent);
> + vm_cmdline_parent_registered = 0;
> + }
> +}
> +
> +#else
> +
> +static void vm_unregister_cmdline_devices(void)
> +{
> +}
> +
> +#endif
> +
> /* Platform driver */
>
> static struct of_device_id virtio_mmio_match[] = {
> @@ -469,6 +631,7 @@ static int __init virtio_mmio_init(void)
> static void __exit virtio_mmio_exit(void)
> {
> platform_driver_unregister(&virtio_mmio_driver);
> + vm_unregister_cmdline_devices();
> }
>
> module_init(virtio_mmio_init);
> --
> 1.7.5.4
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* [PATCH] skbuff: struct ubuf_info callback type safety
From: Michael S. Tsirkin @ 2012-04-09 10:24 UTC (permalink / raw)
To: linux-kernel, kvm, virtualization, netdev
Cc: Shirley Ma, Stephen Hemminger, David S. Miller, Eric Dumazet,
Ian Campbell
The skb struct ubuf_info callback gets passed struct ubuf_info
itself, not the arg value as the field name and the function signature
seem to imply. Rename the arg field to ctx to match usage,
add documentation and change the callback argument type
to make usage clear and to have compiler check correctness.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/vhost/net.c | 2 +-
drivers/vhost/vhost.c | 5 ++---
drivers/vhost/vhost.h | 2 +-
include/linux/skbuff.h | 7 ++++---
4 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f0da2c3..1f21d2a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -238,7 +238,7 @@ static void handle_tx(struct vhost_net *net)
vq->heads[vq->upend_idx].len = len;
ubuf->callback = vhost_zerocopy_callback;
- ubuf->arg = vq->ubufs;
+ ubuf->ctx = vq->ubufs;
ubuf->desc = vq->upend_idx;
msg.msg_control = ubuf;
msg.msg_controllen = sizeof(ubuf);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 947f00d..51e4c1e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1598,10 +1598,9 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
kfree(ubufs);
}
-void vhost_zerocopy_callback(void *arg)
+void vhost_zerocopy_callback(struct ubuf_info *ubuf)
{
- struct ubuf_info *ubuf = arg;
- struct vhost_ubuf_ref *ubufs = ubuf->arg;
+ struct vhost_ubuf_ref *ubufs = ubuf->ctx;
struct vhost_virtqueue *vq = ubufs->vq;
/* set len = 1 to mark this desc buffers done DMA */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8dcf4cc..8de1fd5 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -188,7 +188,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
unsigned int log_num, u64 len);
-void vhost_zerocopy_callback(void *arg);
+void vhost_zerocopy_callback(struct ubuf_info *);
int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
#define vq_err(vq, fmt, ...) do { \
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3337027..4c3f138 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -238,11 +238,12 @@ enum {
/*
* The callback notifies userspace to release buffers when skb DMA is done in
* lower device, the skb last reference should be 0 when calling this.
- * The desc is used to track userspace buffer index.
+ * The ctx field is used to track device context.
+ * The desc field is used to track userspace buffer index.
*/
struct ubuf_info {
- void (*callback)(void *);
- void *arg;
+ void (*callback)(struct ubuf_info *);
+ void *ctx;
unsigned long desc;
};
--
1.7.9.111.gf3fb0
^ permalink raw reply related
* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Michael S. Tsirkin @ 2012-04-09 7:50 UTC (permalink / raw)
To: Ren Mingxin
Cc: Jens Axboe, SCSI, KVM, LKML, VIRTUAL, James Bottomley, Tejun Heo
In-Reply-To: <4F825BE7.80204@cn.fujitsu.com>
On Mon, Apr 09, 2012 at 11:47:51AM +0800, Ren Mingxin wrote:
> On 04/04/2012 04:01 PM, Michael S. Tsirkin wrote:
> >On Mon, Apr 02, 2012 at 12:00:45PM -0700, Tejun Heo wrote:
> >>On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote:
> >>>So if we're agreed no other devices going forwards should ever use this
> >>>interface, is there any point unifying the interface? No matter how
> >>>many caveats you hedge it round with, putting the API in a central place
> >>>will be a bit like a honey trap for careless bears. It might be safer
> >>>just to leave it buried in the three current drivers.
> >>Yeah, that was my hope but I think it would be easier to enforce to
> >>have a common function which is clearly marked legacy so that new
> >>driver writers can go look for the naming code in the existing ones,
> >>find out they're all using the same function which is marked legacy
> >>and explains what to do for newer drivers.
> >I think I'm not the only one to be confused about the
> >preferred direction here.
> >James, do you agree to the approach above?
> >
> >It would be nice to fix virtio block for 3.4, so
> >how about this:
> >- I'll just apply the original bugfix patch for 3.4 -
> > it only affects virtio
>
> Sorry, about only affects virtio, I'm not very clear here:
> 1) Just duplicate the disk name format function in virtio_blk
> like the original patch: https://lkml.org/lkml/2012/3/28/45
So I'd like to apply this, and we can discuss the
deduplication for 3.5. Please post a version of this that
1. isn't line-wrapped and doesn't have damaged whitespace
so I can run git am on it
2. lists the # of duspported disks correctly as 26^3+(26^2+26)
in the description
Thanks!
> 2) Move the disk name format function into block core like
> this patch series but only affects virtio(not affect mtip32xx).
> Do you mean the 2) one or something else?
>
> >- Ren will repost the refactoring patch on top, and we can
> > keep up the discussion
> >
> >Ren if you agree, can you make this a two patch series please?
> >
>
> Sure.
>
> --
> Thanks,
> Ren
^ permalink raw reply
* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Michael S. Tsirkin @ 2012-04-09 7:33 UTC (permalink / raw)
To: Ren Mingxin
Cc: Jens Axboe, SCSI, KVM, LKML, VIRTUAL, James Bottomley, Tejun Heo
In-Reply-To: <4F825BE7.80204@cn.fujitsu.com>
On Mon, Apr 09, 2012 at 11:47:51AM +0800, Ren Mingxin wrote:
> On 04/04/2012 04:01 PM, Michael S. Tsirkin wrote:
> >On Mon, Apr 02, 2012 at 12:00:45PM -0700, Tejun Heo wrote:
> >>On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote:
> >>>So if we're agreed no other devices going forwards should ever use this
> >>>interface, is there any point unifying the interface? No matter how
> >>>many caveats you hedge it round with, putting the API in a central place
> >>>will be a bit like a honey trap for careless bears. It might be safer
> >>>just to leave it buried in the three current drivers.
> >>Yeah, that was my hope but I think it would be easier to enforce to
> >>have a common function which is clearly marked legacy so that new
> >>driver writers can go look for the naming code in the existing ones,
> >>find out they're all using the same function which is marked legacy
> >>and explains what to do for newer drivers.
> >I think I'm not the only one to be confused about the
> >preferred direction here.
> >James, do you agree to the approach above?
> >
> >It would be nice to fix virtio block for 3.4, so
> >how about this:
> >- I'll just apply the original bugfix patch for 3.4 -
> > it only affects virtio
>
> Sorry, about only affects virtio, I'm not very clear here:
> 1) Just duplicate the disk name format function in virtio_blk
> like the original patch: https://lkml.org/lkml/2012/3/28/45
> 2) Move the disk name format function into block core like
> this patch series but only affects virtio(not affect mtip32xx).
> Do you mean the 2) one or something else?
I mean 1) - I'll apply the original patch.
> >- Ren will repost the refactoring patch on top, and we can
> > keep up the discussion
> >
> >Ren if you agree, can you make this a two patch series please?
> >
>
> Sure.
>
> --
> Thanks,
> Ren
^ permalink raw reply
* Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
From: Ren Mingxin @ 2012-04-09 3:47 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jens Axboe, SCSI, KVM, LKML, VIRTUAL, James Bottomley, Tejun Heo
In-Reply-To: <20120404080149.GC22658@redhat.com>
On 04/04/2012 04:01 PM, Michael S. Tsirkin wrote:
> On Mon, Apr 02, 2012 at 12:00:45PM -0700, Tejun Heo wrote:
>> On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote:
>>> So if we're agreed no other devices going forwards should ever use this
>>> interface, is there any point unifying the interface? No matter how
>>> many caveats you hedge it round with, putting the API in a central place
>>> will be a bit like a honey trap for careless bears. It might be safer
>>> just to leave it buried in the three current drivers.
>> Yeah, that was my hope but I think it would be easier to enforce to
>> have a common function which is clearly marked legacy so that new
>> driver writers can go look for the naming code in the existing ones,
>> find out they're all using the same function which is marked legacy
>> and explains what to do for newer drivers.
> I think I'm not the only one to be confused about the
> preferred direction here.
> James, do you agree to the approach above?
>
> It would be nice to fix virtio block for 3.4, so
> how about this:
> - I'll just apply the original bugfix patch for 3.4 -
> it only affects virtio
Sorry, about only affects virtio, I'm not very clear here:
1) Just duplicate the disk name format function in virtio_blk
like the original patch: https://lkml.org/lkml/2012/3/28/45
2) Move the disk name format function into block core like
this patch series but only affects virtio(not affect mtip32xx).
Do you mean the 2) one or something else?
> - Ren will repost the refactoring patch on top, and we can
> keep up the discussion
>
> Ren if you agree, can you make this a two patch series please?
>
Sure.
--
Thanks,
Ren
^ permalink raw reply
* [PATCH 05/14] kvm tools: Add virtio-mmio support
From: Asias He @ 2012-04-07 23:52 UTC (permalink / raw)
To: Pekka Enberg
Cc: Peter Maydell, kvm, Pawel Moll, virtualization, Cyrill Gorcunov,
Sasha Levin, Ingo Molnar
In-Reply-To: <1333842778-4349-1-git-send-email-asias@redhat.com>
From: Asias He <asias.hejun@gmail.com>
This patch is based on Sasha's 'kvm tools: Add support for virtio-mmio'
patch. ioeventfds support is added which was missing in the previous one.
VQ size/align is still not supported.
It adds support for the new virtio-mmio transport layer added in
3.2-rc1. The purpose of this new layer is to allow virtio to work on
systems which don't necessarily support PCI, such as embedded systems.
To apply the patch on top of the KVM tools tree, you must first pull
Linus' tree on top. Also, CONFIG_VIRTIO_MMIO=y should be set in the
guest kernel.
To easily test it it's recommended to apply Pawel Moll's patch named
'virtio-mmio: Devices parameter parsing' on top, and define the
virtio-mmio device using kernel command line.
LKVM will print a message to help user to figure out how to add kernel
command line to support virtio-mmio.
To instantiate guest virtio-mmio devices using kernel command line (or
module) parameter, e.g.
virtio_mmio.devices=0x200@0xd2000000:5,0x200@0xd2000200:6
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Asias He <asias.hejun@gmail.com>
---
tools/kvm/Makefile | 1 +
tools/kvm/include/kvm/virtio-mmio.h | 58 ++++++++
tools/kvm/virtio/mmio.c | 256 +++++++++++++++++++++++++++++++++++
3 files changed, 315 insertions(+), 0 deletions(-)
create mode 100644 tools/kvm/include/kvm/virtio-mmio.h
create mode 100644 tools/kvm/virtio/mmio.c
diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 38d4788..376990b 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -86,6 +86,7 @@ OBJS += hw/vesa.o
OBJS += hw/pci-shmem.o
OBJS += kvm-ipc.o
OBJS += builtin-sandbox.o
+OBJS += virtio/mmio.o
# Additional ARCH settings for x86
ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \
diff --git a/tools/kvm/include/kvm/virtio-mmio.h b/tools/kvm/include/kvm/virtio-mmio.h
new file mode 100644
index 0000000..e0ede3c
--- /dev/null
+++ b/tools/kvm/include/kvm/virtio-mmio.h
@@ -0,0 +1,58 @@
+#ifndef KVM__VIRTIO_MMIO_H
+#define KVM__VIRTIO_MMIO_H
+
+#include <linux/types.h>
+#include <linux/virtio_mmio.h>
+
+#define VIRTIO_MMIO_MAX_VQ 3
+#define VIRTIO_MMIO_MAX_CONFIG 1
+#define VIRTIO_MMIO_IO_SIZE 0x200
+
+struct kvm;
+
+struct virtio_mmio_ioevent_param {
+ struct virtio_device *vdev;
+ u32 vq;
+};
+
+struct virtio_mmio_hdr {
+ char magic[4];
+ u32 version;
+ u32 device_id;
+ u32 vendor_id;
+ u32 host_features;
+ u32 host_features_sel;
+ u32 reserved_1[2];
+ u32 guest_features;
+ u32 guest_features_sel;
+ u32 guest_page_size;
+ u32 reserved_2;
+ u32 queue_sel;
+ u32 queue_num_max;
+ u32 queue_num;
+ u32 queue_align;
+ u32 queue_pfn;
+ u32 reserved_3[3];
+ u32 queue_notify;
+ u32 reserved_4[3];
+ u32 interrupt_state;
+ u32 interrupt_ack;
+ u32 reserved_5[2];
+ u32 status;
+} __attribute__((packed));
+
+struct virtio_mmio {
+ u32 addr;
+ void *dev;
+ struct kvm *kvm;
+ u8 irq;
+ struct virtio_mmio_hdr hdr;
+ struct virtio_mmio_ioevent_param ioeventfds[VIRTIO_MMIO_MAX_VQ];
+};
+
+int virtio_mmio_signal_vq(struct kvm *kvm, struct virtio_device *vdev, u32 vq);
+int virtio_mmio_signal_config(struct kvm *kvm, struct virtio_device *vdev);
+int virtio_mmio_exit(struct kvm *kvm, struct virtio_device *vdev);
+int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
+ int device_id, int subsys_id, int class);
+#endif
diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c
new file mode 100644
index 0000000..8319954
--- /dev/null
+++ b/tools/kvm/virtio/mmio.c
@@ -0,0 +1,256 @@
+#include "kvm/virtio-mmio.h"
+#include "kvm/ioeventfd.h"
+#include "kvm/ioport.h"
+#include "kvm/virtio.h"
+#include "kvm/kvm.h"
+#include "kvm/irq.h"
+
+#include <linux/virtio_mmio.h>
+#include <string.h>
+
+static u32 virtio_mmio_io_space_blocks = KVM_VIRTIO_MMIO_AREA;
+
+static u32 virtio_mmio_get_io_space_block(u32 size)
+{
+ u32 block = virtio_mmio_io_space_blocks;
+ virtio_mmio_io_space_blocks += size;
+
+ return block;
+}
+
+static void virtio_mmio_ioevent_callback(struct kvm *kvm, void *param)
+{
+ struct virtio_mmio_ioevent_param *ioeventfd = param;
+ struct virtio_mmio *vmmio = ioeventfd->vdev->virtio;
+
+ ioeventfd->vdev->ops->notify_vq(kvm, vmmio->dev, ioeventfd->vq);
+}
+
+static int virtio_mmio_init_ioeventfd(struct kvm *kvm,
+ struct virtio_device *vdev, u32 vq)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+ struct ioevent ioevent;
+ int err;
+
+ vmmio->ioeventfds[vq] = (struct virtio_mmio_ioevent_param) {
+ .vdev = vdev,
+ .vq = vq,
+ };
+
+ ioevent = (struct ioevent) {
+ .io_addr = vmmio->addr + VIRTIO_MMIO_QUEUE_NOTIFY,
+ .io_len = sizeof(u32),
+ .fn = virtio_mmio_ioevent_callback,
+ .fn_ptr = &vmmio->ioeventfds[vq],
+ .datamatch = vq,
+ .fn_kvm = kvm,
+ .fd = eventfd(0, 0),
+ };
+
+ err = ioeventfd__add_event(&ioevent, false);
+ if (err)
+ return err;
+
+ if (vdev->ops->notify_vq_eventfd)
+ vdev->ops->notify_vq_eventfd(kvm, vmmio->dev, vq, ioevent.fd);
+
+ return 0;
+}
+
+int virtio_mmio_signal_vq(struct kvm *kvm, struct virtio_device *vdev, u32 vq)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+
+ vmmio->hdr.interrupt_state |= VIRTIO_MMIO_INT_VRING;
+ kvm__irq_trigger(vmmio->kvm, vmmio->irq);
+
+ return 0;
+}
+
+int virtio_mmio_signal_config(struct kvm *kvm, struct virtio_device *vdev)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+
+ vmmio->hdr.interrupt_state |= VIRTIO_MMIO_INT_CONFIG;
+ kvm__irq_trigger(vmmio->kvm, vmmio->irq);
+
+ return 0;
+}
+
+static void virtio_mmio_device_specific(u64 addr, u8 *data, u32 len,
+ u8 is_write, struct virtio_device *vdev)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+ u32 i;
+
+ for (i = 0; i < len; i++) {
+ if (is_write)
+ vdev->ops->set_config(vmmio->kvm, vmmio->dev,
+ *(u8 *)data + i, addr + i);
+ else
+ data[i] = vdev->ops->get_config(vmmio->kvm,
+ vmmio->dev, addr + i);
+ }
+}
+
+static void virtio_mmio_config_in(u64 addr, void *data, u32 len,
+ struct virtio_device *vdev)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+ u32 val = 0;
+
+ switch (addr) {
+ case VIRTIO_MMIO_MAGIC_VALUE:
+ case VIRTIO_MMIO_VERSION:
+ case VIRTIO_MMIO_DEVICE_ID:
+ case VIRTIO_MMIO_VENDOR_ID:
+ case VIRTIO_MMIO_STATUS:
+ case VIRTIO_MMIO_INTERRUPT_STATUS:
+ ioport__write32(data, *(u32 *)(((void *)&vmmio->hdr) + addr));
+ break;
+ case VIRTIO_MMIO_HOST_FEATURES:
+ if (vmmio->hdr.host_features_sel == 0)
+ val = vdev->ops->get_host_features(vmmio->kvm,
+ vmmio->dev);
+ ioport__write32(data, val);
+ break;
+ case VIRTIO_MMIO_QUEUE_PFN:
+ val = vdev->ops->get_pfn_vq(vmmio->kvm, vmmio->dev,
+ vmmio->hdr.queue_sel);
+ ioport__write32(data, val);
+ break;
+ case VIRTIO_MMIO_QUEUE_NUM_MAX:
+ val = vdev->ops->get_size_vq(vmmio->kvm, vmmio->dev,
+ vmmio->hdr.queue_sel);
+ ioport__write32(data, val);
+ break;
+ default:
+ break;
+ }
+}
+
+static void virtio_mmio_config_out(u64 addr, void *data, u32 len,
+ struct virtio_device *vdev)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+ u32 val = 0;
+
+ switch (addr) {
+ case VIRTIO_MMIO_HOST_FEATURES_SEL:
+ case VIRTIO_MMIO_GUEST_FEATURES_SEL:
+ case VIRTIO_MMIO_QUEUE_SEL:
+ case VIRTIO_MMIO_STATUS:
+ val = ioport__read32(data);
+ *(u32 *)(((void *)&vmmio->hdr) + addr) = val;
+ break;
+ case VIRTIO_MMIO_GUEST_FEATURES:
+ if (vmmio->hdr.guest_features_sel == 0) {
+ val = ioport__read32(data);
+ vdev->ops->set_guest_features(vmmio->kvm,
+ vmmio->dev, val);
+ }
+ break;
+ case VIRTIO_MMIO_GUEST_PAGE_SIZE:
+ val = ioport__read32(data);
+ vmmio->hdr.guest_page_size = val;
+ /* FIXME: set guest page size */
+ break;
+ case VIRTIO_MMIO_QUEUE_NUM:
+ val = ioport__read32(data);
+ vmmio->hdr.queue_num = val;
+ /* FIXME: set vq size */
+ vdev->ops->set_size_vq(vmmio->kvm, vmmio->dev,
+ vmmio->hdr.queue_sel, val);
+ break;
+ case VIRTIO_MMIO_QUEUE_ALIGN:
+ val = ioport__read32(data);
+ vmmio->hdr.queue_align = val;
+ /* FIXME: set used ring alignment */
+ break;
+ case VIRTIO_MMIO_QUEUE_PFN:
+ val = ioport__read32(data);
+ virtio_mmio_init_ioeventfd(vmmio->kvm, vdev, vmmio->hdr.queue_sel);
+ vdev->ops->init_vq(vmmio->kvm, vmmio->dev,
+ vmmio->hdr.queue_sel, val);
+ break;
+ case VIRTIO_MMIO_QUEUE_NOTIFY:
+ val = ioport__read32(data);
+ vdev->ops->notify_vq(vmmio->kvm, vmmio->dev, val);
+ break;
+ case VIRTIO_MMIO_INTERRUPT_ACK:
+ val = ioport__read32(data);
+ vmmio->hdr.interrupt_state &= ~val;
+ break;
+ default:
+ break;
+ };
+}
+
+static void virtio_mmio_mmio_callback(u64 addr, u8 *data, u32 len,
+ u8 is_write, void *ptr)
+{
+ struct virtio_device *vdev = ptr;
+ struct virtio_mmio *vmmio = vdev->virtio;
+ u32 offset = addr - vmmio->addr;
+
+ if (offset >= VIRTIO_MMIO_CONFIG) {
+ offset -= VIRTIO_MMIO_CONFIG;
+ virtio_mmio_device_specific(offset, data, len, is_write, ptr);
+ return;
+ }
+
+ if (is_write)
+ virtio_mmio_config_out(offset, data, len, ptr);
+ else
+ virtio_mmio_config_in(offset, data, len, ptr);
+}
+
+int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
+ int device_id, int subsys_id, int class)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+ u8 device, pin, line;
+
+ vmmio->addr = virtio_mmio_get_io_space_block(VIRTIO_MMIO_IO_SIZE);
+ vmmio->kvm = kvm;
+ vmmio->dev = dev;
+
+ kvm__register_mmio(kvm, vmmio->addr, VIRTIO_MMIO_IO_SIZE,
+ false, virtio_mmio_mmio_callback, vdev);
+
+ vmmio->hdr = (struct virtio_mmio_hdr) {
+ .magic = {'v', 'i', 'r', 't'},
+ .version = 1,
+ .device_id = device_id - 0x1000 + 1,
+ .vendor_id = 0x4d564b4c , /* 'LKVM' */
+ .queue_num_max = 256,
+ };
+
+ if (irq__register_device(subsys_id, &device, &pin, &line) < 0)
+ return -1;
+ vmmio->irq = line;
+
+ /*
+ * Instantiate guest virtio-mmio devices using kernel command line
+ * (or module) parameter, e.g
+ *
+ * virtio_mmio.devices=0x200@0xd2000000:5,0x200@0xd2000200:6
+ */
+ pr_info("virtio-mmio.devices=0x%x@0x%x:%d\n", VIRTIO_MMIO_IO_SIZE, vmmio->addr, line);
+
+ return 0;
+}
+
+int virtio_mmio_exit(struct kvm *kvm, struct virtio_device *vdev)
+{
+ struct virtio_mmio *vmmio = vdev->virtio;
+ int i;
+
+ kvm__deregister_mmio(kvm, vmmio->addr);
+
+ for (i = 0; i < VIRTIO_MMIO_MAX_VQ; i++)
+ ioeventfd__del_event(vmmio->addr + VIRTIO_MMIO_QUEUE_NOTIFY, i);
+
+ return 0;
+}
--
1.7.7.6
^ permalink raw reply related
* [PATCH RESEND 1/1] Drivers: scsi: storvsc: Properly handle errors from the host
From: K. Y. Srinivasan @ 2012-04-05 19:26 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering, jbottomley,
hch, linux-scsi, apw
If the host returns error for pass through commands, deal with them
appropriately. I would like to thank James for patiently helping
me with this patch.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/scsi/storvsc_drv.c | 20 +++++++++++++++-----
1 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 83a1972..528d52b 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -785,12 +785,22 @@ static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request)
/*
* If there is an error; offline the device since all
* error recovery strategies would have already been
- * deployed on the host side.
+ * deployed on the host side. However, if the command
+ * were a pass-through command deal with it appropriately.
*/
- if (vm_srb->srb_status == SRB_STATUS_ERROR)
- scmnd->result = DID_TARGET_FAILURE << 16;
- else
- scmnd->result = vm_srb->scsi_status;
+ scmnd->result = vm_srb->scsi_status;
+
+ if (vm_srb->srb_status == SRB_STATUS_ERROR) {
+ switch (scmnd->cmnd[0]) {
+ case ATA_16:
+ case ATA_12:
+ set_host_byte(scmnd, DID_PASSTHROUGH);
+ break;
+ default:
+ set_host_byte(scmnd, DID_TARGET_FAILURE);
+ }
+ }
+
/*
* If the LUN is invalid; remove the device.
--
1.7.4.1
^ permalink raw reply related
* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-04-05 10:40 UTC (permalink / raw)
To: Avi Kivity
Cc: KVM, Alan Meadows, Peter Zijlstra, Stefano Stabellini,
the arch/x86 maintainers, LKML, Konrad Rzeszutek Wilk, Andi Kleen,
Srivatsa Vaddagiri, Jeremy Fitzhardinge, H. Peter Anvin,
Attilio Rao, Ingo Molnar, Virtualization, Linus Torvalds,
Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F7D5F5B.2020209@redhat.com>
On 04/05/2012 02:31 PM, Avi Kivity wrote:
> On 04/02/2012 12:51 PM, Raghavendra K T wrote:
>> On 04/01/2012 07:23 PM, Avi Kivity wrote:
>>> On 04/01/2012 04:48 PM, Raghavendra K T wrote:
>>>>>> I have patch something like below in mind to try:
>>>>>>
>>>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>>>> index d3b98b1..5127668 100644
>>>>>> --- a/virt/kvm/kvm_main.c
>>>>>> +++ b/virt/kvm/kvm_main.c
>>>>>> @@ -1608,15 +1608,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>>>>> * else and called schedule in __vcpu_run. Hopefully that
>>>>>> * VCPU is holding the lock that we need and will release it.
>>>>>> * We approximate round-robin by starting at the last boosted
>>>>>> VCPU.
>>>>>> + * Priority is given to vcpu that are unhalted.
>>>>>> */
>>>>>> - for (pass = 0; pass< 2&& !yielded; pass++) {
>>>>>> + for (pass = 0; pass< 3&& !yielded; pass++) {
>>>>>> kvm_for_each_vcpu(i, vcpu, kvm) {
>>>>>> struct task_struct *task = NULL;
>>>>>> struct pid *pid;
>>>>>> - if (!pass&& i< last_boosted_vcpu) {
>>>>>> + if (!pass&& !vcpu->pv_unhalted)
>>>>>> + continue;
>>>>>> + else if (pass == 1&& i< last_boosted_vcpu) {
>>>>>> i = last_boosted_vcpu;
>>>>>> continue;
>>>>>> - } else if (pass&& i> last_boosted_vcpu)
>>>>>> + } else if (pass == 2&& i> last_boosted_vcpu)
>>>>>> break;
>>>>>> if (vcpu == me)
>>>>>> continue;
>>>>>>
>>>>>
>>>>> Actually I think this is unneeded. The loops tries to find vcpus
>> that
>>>>> are runnable but not running (vcpu_active(vcpu->wq)), and halted
>> vcpus
>>>>> don't match this condition.
>>>>>
>>
>> Oh! I think I misinterpreted your statement. hmm I got it. you told to
>> remove if (vcpu == me) condition.
>
> No, the entire patch is unneeded. My original comment was:
>
>> from the PLE handler, don't wake up a vcpu that is sleeping because it
> is waiting for a kick
>
> But the PLE handler never wakes up sleeping vcpus anyway.
I agree with you. It is already doing that. But my approach here is
little different.
In 2 classes of vcpus we have (one is subset of another when we try to
do yield_to) viz,
1) runnable and kicked < (subset of) 2) just runnable
what we are trying to do here is targeting 1) first so that we get good
lock progress.
Here was the sequence I was talking.
vcpu1 releases lock->finds that vcpu2 is next candidate ->
kick hypercall -> kvm_pv_kick_cpu_op -> set kicked flag ->
vcpu->kick(vcpu2)
at this point we have vcpu2 waiting for getting scheduled. But
above yield call can wake *anybody*.
I agree this is not serious (rather it is overhead) when there are are
less number of vcpus, But when we have more number of vcpu/vm.. it may
not scale well. My attempt was to fix that.
Let me know if I am completely missing something..
>
> There's still a conflict with PLE in that it may trigger during the spin
> phase and send a random yield_to() somewhere. Maybe it's sufficient to
> tune the PLE timeout to be longer than the spinlock timeout.
>
Ok ... But we also should be cautious that, we may do more halt, though
we are about to get spinlock.
Need more study on this.
^ permalink raw reply
* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Avi Kivity @ 2012-04-05 9:15 UTC (permalink / raw)
To: Thomas Gleixner
Cc: the arch/x86 maintainers, KVM, Konrad Rzeszutek Wilk,
Peter Zijlstra, Stefano Stabellini, Raghavendra K T, LKML,
Andi Kleen, Srivatsa Vaddagiri, Jeremy Fitzhardinge,
H. Peter Anvin, Attilio Rao, Ingo Molnar, Virtualization,
Linus Torvalds, Xen Devel, Stephan Diestelhorst
In-Reply-To: <alpine.LFD.2.02.1204021117560.2542@ionos>
On 04/02/2012 12:26 PM, Thomas Gleixner wrote:
> > One thing about it is that it can give many false positives. Consider a
> > fine-grained spinlock that is being accessed by many threads. That is,
> > the lock is taken and released with high frequency, but there is no
> > contention, because each vcpu is accessing a different instance. So the
> > host scheduler will needlessly delay preemption of vcpus that happen to
> > be holding a lock, even though this gains nothing.
>
> You're talking about per cpu locks, right? I can see the point there,
> but per cpu stuff might be worth annotating to avoid this.
Or any lock which is simply uncontended. Say a single process is
running, the rest of the system is idle. It will take and release many
locks; but it can be preempted at any point by the hypervisor with no
performance loss.
The overhead is arming a timer twice and an extra exit per deferred
context switch. Perhaps not much given that you don't see tons of
context switches on virt workloads, at least without threaded interrupts
(or maybe interrupt threads should override this mechanism, by being
realtime threads).
> > A second issue may happen with a lock that is taken and released with
> > high frequency, with a high hold percentage. The host scheduler may
> > always sample the guest in a held state, leading it to conclude that
> > it's exceeding its timeout when in fact the lock is held for a short
> > time only.
>
> Well, no. You can avoid that.
>
> host VCPU
> spin_lock()
> modify_global_state()
> exit
> check_global_state()
> mark_global_state()
> reschedule vcpu
>
> spin_unlock()
> check_global_state()
> trigger_exit()
>
> So when an exit occures in the locked section, then the host can mark
> the global state to tell the guest to issue a trap on unlock.
Right.
How does this nest? Do we trigger the exit on the outermost unlock?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply
* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Avi Kivity @ 2012-04-05 9:01 UTC (permalink / raw)
To: Raghavendra K T
Cc: KVM, Alan Meadows, Peter Zijlstra, Stefano Stabellini,
the arch/x86 maintainers, LKML, Konrad Rzeszutek Wilk, Andi Kleen,
Srivatsa Vaddagiri, Jeremy Fitzhardinge, H. Peter Anvin,
Attilio Rao, Ingo Molnar, Virtualization, Linus Torvalds,
Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F7976B6.5050000@linux.vnet.ibm.com>
On 04/02/2012 12:51 PM, Raghavendra K T wrote:
> On 04/01/2012 07:23 PM, Avi Kivity wrote:
> > On 04/01/2012 04:48 PM, Raghavendra K T wrote:
> >>>> I have patch something like below in mind to try:
> >>>>
> >>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>>> index d3b98b1..5127668 100644
> >>>> --- a/virt/kvm/kvm_main.c
> >>>> +++ b/virt/kvm/kvm_main.c
> >>>> @@ -1608,15 +1608,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >>>> * else and called schedule in __vcpu_run. Hopefully that
> >>>> * VCPU is holding the lock that we need and will release it.
> >>>> * We approximate round-robin by starting at the last boosted
> >>>> VCPU.
> >>>> + * Priority is given to vcpu that are unhalted.
> >>>> */
> >>>> - for (pass = 0; pass< 2&& !yielded; pass++) {
> >>>> + for (pass = 0; pass< 3&& !yielded; pass++) {
> >>>> kvm_for_each_vcpu(i, vcpu, kvm) {
> >>>> struct task_struct *task = NULL;
> >>>> struct pid *pid;
> >>>> - if (!pass&& i< last_boosted_vcpu) {
> >>>> + if (!pass&& !vcpu->pv_unhalted)
> >>>> + continue;
> >>>> + else if (pass == 1&& i< last_boosted_vcpu) {
> >>>> i = last_boosted_vcpu;
> >>>> continue;
> >>>> - } else if (pass&& i> last_boosted_vcpu)
> >>>> + } else if (pass == 2&& i> last_boosted_vcpu)
> >>>> break;
> >>>> if (vcpu == me)
> >>>> continue;
> >>>>
> >>>
> >>> Actually I think this is unneeded. The loops tries to find vcpus
> that
> >>> are runnable but not running (vcpu_active(vcpu->wq)), and halted
> vcpus
> >>> don't match this condition.
> >>>
>
> Oh! I think I misinterpreted your statement. hmm I got it. you told to
> remove if (vcpu == me) condition.
No, the entire patch is unneeded. My original comment was:
> from the PLE handler, don't wake up a vcpu that is sleeping because it
is waiting for a kick
But the PLE handler never wakes up sleeping vcpus anyway.
There's still a conflict with PLE in that it may trigger during the spin
phase and send a random yield_to() somewhere. Maybe it's sufficient to
tune the PLE timeout to be longer than the spinlock timeout.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply
* Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Raghavendra K T @ 2012-04-05 8:43 UTC (permalink / raw)
To: Avi Kivity, H. Peter Anvin, Ingo Molnar
Cc: the arch/x86 maintainers, KVM, Alan Meadows, Peter Zijlstra,
Stefano Stabellini, LKML, Konrad Rzeszutek Wilk, Andi Kleen,
Jeremy Fitzhardinge, Srivatsa Vaddagiri, Attilio Rao,
Virtualization, Linus Torvalds, Xen Devel, Stephan Diestelhorst
In-Reply-To: <4F785DCF.7020809@redhat.com>
On 04/01/2012 07:23 PM, Avi Kivity wrote:
> On 04/01/2012 04:48 PM, Raghavendra K T wrote:
>>>> I have patch something like below in mind to try:
>>>>
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index d3b98b1..5127668 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -1608,15 +1608,18 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>>>> * else and called schedule in __vcpu_run. Hopefully that
>>>> * VCPU is holding the lock that we need and will release it.
>>>> * We approximate round-robin by starting at the last boosted
>>>> VCPU.
>>>> + * Priority is given to vcpu that are unhalted.
>>>> */
>>>> - for (pass = 0; pass< 2&& !yielded; pass++) {
>>>> + for (pass = 0; pass< 3&& !yielded; pass++) {
>>>> kvm_for_each_vcpu(i, vcpu, kvm) {
>>>> struct task_struct *task = NULL;
>>>> struct pid *pid;
>>>> - if (!pass&& i< last_boosted_vcpu) {
>>>> + if (!pass&& !vcpu->pv_unhalted)
>>>> + continue;
>>>> + else if (pass == 1&& i< last_boosted_vcpu) {
>>>> i = last_boosted_vcpu;
>>>> continue;
>>>> - } else if (pass&& i> last_boosted_vcpu)
>>>> + } else if (pass == 2&& i> last_boosted_vcpu)
>>>> break;
>>>> if (vcpu == me)
>>>> continue;
>>>>
>>>
[...]
> I'm interested in how PLE does vs. your patches, both with PLE enabled
> and disabled.
>
Here is the result taken on PLE machine. Results seem to support all
our assumptions.
Following are the observations from results:
1) There is a huge benefit for Non PLE based configuration.
(base_nople vs pv_ple) (around 90%)
2) ticketlock + kvm patches does go well along with PLE (more benefit
is seen not degradation)
(base_ple vs pv_ple)
3) The ticketlock + kvm patches make behaves almost like PLE enabled
machine (base_ple vs pv_nople)
4) ple handler modification patches seem to give advantage (pv_ple vs
pv_ple_optimized). More study needed
probably with higher M/N ratio Avi pointed.
configurations:
base_nople = 3.3-rc6 with CONFIG_PARAVIRT_SPINLOCK=n - PLE
base_ple = 3.3-rc6 with CONFIG_PARAVIRT_SPINLOCK=n + PLE
pv_ple = 3.3-rc6 with CONFIG_PARAVIRT_SPINLOCK=y + PLE +
ticketlock + kvm patches
pv_nople = 3.3-rc6 with CONFIG_PARAVIRT_SPINLOCK=y - PLE +
ticketlock + kvm patches
pv_ple_optimized = 3.3-rc6 with CONFIG_PARAVIRT_SPINLOCK=y + PLE +
optimizaton patch + ticketlock
+ kvm patches + posted with ple_handler modification (yield to kicked
vcpu).
Machine : IBM xSeries with Intel(R) Xeon(R) X7560 2.27GHz CPU with 32
core, with 8
online cores and 4*64GB RAM
3 guests running with 2GB RAM, 8vCPU.
Results:
-------
case A)
1x: 1 kernbench 2 idle
2x: 1 kernbench 1 while1 hog 1 idle
3x: 1 kernbench 2 while1 hog
Average time taken in sec for kernbench run (std). [ lower the value
better ]
base_nople base_ple pv_ple
pv_nople pv_ple_optimized
1x 72.8284 (89.8757) 70.475 (85.6979) 63.5033 (72.7041)
65.7634 (77.0504) 64.3284 (73.2688)
2x 823.053 (1113.05) 110.971 (132.829) 105.099 (128.738)
139.058 (165.156) 106.268 (129.611)
3x 3244.37 (4707.61) 150.265 (184.766) 138.341 (172.69)
139.106 (163.549) 133.238 (168.388)
Percentage improvement calculation w.r.t base_nople
-------------------------------------------------
base_ple pv_ple pv_nople pv_ple_optimized
1x 3.23143 12.8042 9.70089 11.6713
2x 86.5172 87.2306 83.1046 87.0886
3x 95.3684 95.736 95.7124 95.8933
-------------------
Percentage improvement calculation w.r.t base_ple
-------------------------------------------------
base_nople pv_ple pv_nople pv_ple_optimized
1x -3.3393 9.89244 6.68549 8.72167
2x -641.683 5.29147 -25.3102 4.23804
3x -2059.1 7.93531 7.42621 11.3313
case B)
all 3 guests running kernbench
Average time taken in sec for kernbench run (std). [ lower the value
better ].
Note that std is calculated over 6*3 run average from all 3 guests
given by kernbench
base_nople base_ple pv_ple
pv_nople pv_ple_opt
2886.92 (18.289131) 204.80333 (7.1784039) 200.22517 (10.134804)
202.091 (12.249673) 201.60683 (7.881737)
Percentage improvement calculation w.r.t base_nople
-------------------------------------------------
base_ple pv_ple pv_nople pv_ple_optimized
92.9058 93.0644 93 93.0166
Percentage improvement calculation w.r.t base_ple
-------------------------------------------------
base_nople pv_ple pv_nople pv_ple_optimized
-1309.606 2.2354 1.324 1.5607
I hope the experimental results should convey same message if somebody
does benchmarking.
Also as Ian pointed in the thread, the earlier results from Attilio
and me was to convince that framework is acceptable on native.
Does this convince to consider for it to go to next merge window?
comments /suggestions please...
^ permalink raw reply
* Re: [V6 PATCH] virtio-net: send gratuitous packets when needed
From: Michael S. Tsirkin @ 2012-04-05 6:37 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, qemu-devel, linux-kernel, Amit Shah, virtualization,
davem
In-Reply-To: <4F7D33FF.2060304@redhat.com>
On Thu, Apr 05, 2012 at 01:56:15PM +0800, Jason Wang wrote:
> On 04/04/2012 03:49 PM, Michael S. Tsirkin wrote:
> >On Wed, Mar 28, 2012 at 01:44:28PM +0800, Jason Wang wrote:
> >>As hypervior does not have the knowledge of guest network configuration, it's
> >>better to ask guest to send gratuitous packets when needed.
> >>
> >>Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
> >>is set, a workqueue is scheduled to send gratuitous packet through
> >>NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
> >>VIRTIO_NET_F_GUEST_ANNOUNCE.
> >>
> >>Changes from v5:
> >>- notify the chain before acking the link annoucement
> >>- ack the link announcement notification through control vq
> >>
> >>Changes from v4:
> >>- typos
> >>- handle workqueue unconditionally
> >>- move VIRTIO_NET_S_ANNOUNCE to bit 8 to separate rw bits from ro bits
> >>
> >>Changes from v3:
> >>- cancel the workqueue during freeze
> >>
> >>Changes from v2:
> >>- fix the race between unregister_dev() and workqueue
> >>
> >>Signed-off-by: Jason Wang<jasowang@redhat.com>
> >I think this needs some fixes. See below.
> >Thanks.
> >
> >>---
> >> drivers/net/virtio_net.c | 32 +++++++++++++++++++++++++++++++-
> >> include/linux/virtio_net.h | 13 +++++++++++++
> >> 2 files changed, 44 insertions(+), 1 deletions(-)
> >>
> >>diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>index 4880aa8..0f60da7 100644
> >>--- a/drivers/net/virtio_net.c
> >>+++ b/drivers/net/virtio_net.c
> >>@@ -72,6 +72,9 @@ struct virtnet_info {
> >> /* Work struct for refilling if we run low on memory. */
> >> struct delayed_work refill;
> >>
> >>+ /* Work struct for sending gratuitous packets. */
> >A bit confusing: it does not send anything itself.
> >A better comment 'for announcing the existence of device
> >on the network'.
> >
> >>+ struct work_struct announce;
> >>+
> >> /* Chain pages by the private ptr. */
> >> struct page *pages;
> >>
> >>@@ -781,12 +784,30 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> >> return status == VIRTIO_NET_OK;
> >> }
> >>
> >>+static void virtnet_ack_link_announce(struct virtnet_info *vi)
> >>+{
> >>+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
> >>+ VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL,
> >>+ 0, 0)) {
> >This can run in parallel with other commands. That's pretty bad -
> >will corrupt the cvq.
> >Take rtnl lock around calls to this function?
>
> It's only used by callbacks of netif_notify_peers(), so rtnl lock
> have already been hold. Add a comment to clarify this?
> >>+ dev_warn(&vi->dev->dev, "Failed to ack link nnounce.\n");
> >announce
>
> Sorry for the typos, would double check in the future.
> >>+ }
> >Better to drop {} around a single statement.
> >
> >>+}
> >>+
> >>+static void announce_work(struct work_struct *work)
> >>+{
> >>+ struct virtnet_info *vi = container_of(work, struct virtnet_info,
> >>+ announce);
> >>+ netif_notify_peers(vi->dev);
> >>+ virtnet_ack_link_announce(vi);
> >>+}
> >>+
> >> static int virtnet_close(struct net_device *dev)
> >> {
> >> struct virtnet_info *vi = netdev_priv(dev);
> >>
> >> /* Make sure refill_work doesn't re-enable napi! */
> >> cancel_delayed_work_sync(&vi->refill);
> >>+ cancel_work_sync(&vi->announce);
> >I think that a config change event can trigger after this point,
> >so cancel here won't be effective. But why do it here?
> >virtnet_remove not a better place? We can do it
> >after remove_vq_common.
>
> Sure.
Except, trying to send a command on cvq after device
has been reset would deadlock. I suspect we'll need
a flag like block has to prevent this.
> >> napi_disable(&vi->napi);
> >>
> >> return 0;
> >>@@ -962,11 +983,17 @@ static void virtnet_update_status(struct virtnet_info *vi)
> >> return;
> >>
> >> /* Ignore unknown (future) status bits */
> >>- v&= VIRTIO_NET_S_LINK_UP;
> >>+ v&= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;
> >The announce bit in vi->status is always clear,
> >so this is IMO confusing. I would do:
> >
> > if (v& VIRTIO_NET_S_ANNOUNCE) {
> > schedule
> > }
> >
> > v&= VIRTIO_NET_S_LINK_UP;
> >
> >and the rest of the logic is not necessary then.
> >
> >Also, this might run extra ack announce commands after
> >VIRTIO_NET_S_ANNOUNCE bit is clear. The spec
> >isn't clear on whether this is legal.
> >
> >It would be very hard to fix this, so let's add a comment
> >stating that it's legal, and clarify the spec
> >in any case.
>
> How about delay the clearing of VIRTIO_NET_S_ANNOUNCE bit in the
> workqueue and protect the updating with
> spinlock_irqsave()/spinloc_irqrestore()? Then later notification
> would be suppressed if the sending is not completed.
Not sure what's meant, just make sure we don't lose
these events.
> >>
> >> if (vi->status == v)
> >> return;
> >>
> >>+ if (v& VIRTIO_NET_S_ANNOUNCE) {
> >>+ v&= ~VIRTIO_NET_S_ANNOUNCE;
> >>+ if (v& VIRTIO_NET_S_LINK_UP)
> >>+ schedule_work(&vi->announce);
> >I think we really want an nrt wq here - if this triggers
> >multiple times there's no good reason to try and run
> >the ack command many times in parallel.
>
> Yes.
> >>+ }
> >>+
> >> vi->status = v;
> >>
> >> if (vi->status& VIRTIO_NET_S_LINK_UP) {
> >
> >>@@ -1076,6 +1103,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> >> goto free;
> >>
> >> INIT_DELAYED_WORK(&vi->refill, refill_work);
> >>+ INIT_WORK(&vi->announce, announce_work);
> >> sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
> >> sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
> >>
> >>@@ -1187,6 +1215,7 @@ static int virtnet_freeze(struct virtio_device *vdev)
> >> virtqueue_disable_cb(vi->svq);
> >> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
> >> virtqueue_disable_cb(vi->cvq);
> >>+ cancel_work_sync(&vi->announce);
> >A config change event can trigger after this point,
> >so cancel here won't be effective.
> >Possibly, we need state like config_enable in the block
> >device.
>
> Probably needed.
> >
> >Also, what exactly will happen on suspend?
> >As we reset, ANNOUCE bit will be clear so -
> >do we forget to announce? Probably not good ...
>
> This problem should be the same as normal suspending/resuming, if
> user want to announce the link they should use arp_notify instead.
> >
> >>
> >> netif_device_detach(vi->dev);
> >> cancel_delayed_work_sync(&vi->refill);
> >>@@ -1233,6 +1262,7 @@ static unsigned int features[] = {
> >> VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
> >> VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
> >> VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
> >>+ VIRTIO_NET_F_GUEST_ANNOUNCE,
> >> };
> >>
> >> static struct virtio_driver virtio_net_driver = {
> >>diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> >>index 970d5a2..383e8a0 100644
> >>--- a/include/linux/virtio_net.h
> >>+++ b/include/linux/virtio_net.h
> >>@@ -49,8 +49,10 @@
> >> #define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
> >> #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
> >> #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
> >>+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can send gratituous packet */
> >Rusty aligned the comments using tabs so let's do it here too?
> >A better comment 'can announce device on the network'.
> >
> >>
> >> #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
> >>+#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
> >why 3 spaces before the value here?
> >
> >>
> >> struct virtio_net_config {
> >> /* The config defining mac address (if VIRTIO_NET_F_MAC) */
> >>@@ -152,4 +154,15 @@ struct virtio_net_ctrl_mac {
> >> #define VIRTIO_NET_CTRL_VLAN_ADD 0
> >> #define VIRTIO_NET_CTRL_VLAN_DEL 1
> >>
> >>+/*
> >>+ * Control link announce acknowledgement
> >>+ *
> >>+ * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
> >>+ * driver has recevied the notification and device would clear the
> >s/recevied/received/
> >
> >A bit clearer to replace 'and' with ; here.
> >
> >>+ * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
> >s/filed/field/
> >s/received/receives/
> >
> >>+ * this command.
> >>+ */
> >>+#define VIRTIO_NET_CTRL_ANNOUNCE 3
> >>+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
> >>+
> >> #endif /* _LINUX_VIRTIO_NET_H */
> >>
> >>_______________________________________________
> >>Virtualization mailing list
> >>Virtualization@lists.linux-foundation.org
> >>https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)
From: Jason Wang @ 2012-04-05 6:32 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: kvm, netdev, virtualization, Amit Shah, David Miller
In-Reply-To: <20120404093229.GA3804@redhat.com>
On 04/04/2012 05:32 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote:
>> > Michael S. Tsirkin noticed that we could run the refill work after
>> > ndo_close, which can re-enable napi - we don't disable it until
>> > virtnet_remove. This is clearly wrong, so move the workqueue control
>> > to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).
>> >
>> > One subtle point: virtnet_probe() could simply fail if it couldn't
>> > allocate a receive buffer, but that's less polite in virtnet_open() so
>> > we schedule a refill as we do in the normal receive path if we run out
>> > of memory.
>> >
>> > Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
> Doh.
> napi_disable does not prevent the following
> napi_schedule, does it?
>
> Can someone confirm that I am not seeing things please?
Looks like napi_disable() does prevent the following scheduling, as
napi_schedule_prep() returns true only when there's an 0 -> 1 transition
of NAPI_STATE_SCHED bit.
^ permalink raw reply
* Re: [V6 PATCH] virtio-net: send gratuitous packets when needed
From: Jason Wang @ 2012-04-05 5:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, qemu-devel, linux-kernel, Amit Shah, virtualization,
davem
In-Reply-To: <20120404074937.GA22658@redhat.com>
On 04/04/2012 03:49 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 28, 2012 at 01:44:28PM +0800, Jason Wang wrote:
>> As hypervior does not have the knowledge of guest network configuration, it's
>> better to ask guest to send gratuitous packets when needed.
>>
>> Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and when it
>> is set, a workqueue is scheduled to send gratuitous packet through
>> NETDEV_NOTIFY_PEERS. This feature is negotiated through bit
>> VIRTIO_NET_F_GUEST_ANNOUNCE.
>>
>> Changes from v5:
>> - notify the chain before acking the link annoucement
>> - ack the link announcement notification through control vq
>>
>> Changes from v4:
>> - typos
>> - handle workqueue unconditionally
>> - move VIRTIO_NET_S_ANNOUNCE to bit 8 to separate rw bits from ro bits
>>
>> Changes from v3:
>> - cancel the workqueue during freeze
>>
>> Changes from v2:
>> - fix the race between unregister_dev() and workqueue
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> I think this needs some fixes. See below.
> Thanks.
>
>> ---
>> drivers/net/virtio_net.c | 32 +++++++++++++++++++++++++++++++-
>> include/linux/virtio_net.h | 13 +++++++++++++
>> 2 files changed, 44 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 4880aa8..0f60da7 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -72,6 +72,9 @@ struct virtnet_info {
>> /* Work struct for refilling if we run low on memory. */
>> struct delayed_work refill;
>>
>> + /* Work struct for sending gratuitous packets. */
> A bit confusing: it does not send anything itself.
> A better comment 'for announcing the existence of device
> on the network'.
>
>> + struct work_struct announce;
>> +
>> /* Chain pages by the private ptr. */
>> struct page *pages;
>>
>> @@ -781,12 +784,30 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>> return status == VIRTIO_NET_OK;
>> }
>>
>> +static void virtnet_ack_link_announce(struct virtnet_info *vi)
>> +{
>> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
>> + VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL,
>> + 0, 0)) {
> This can run in parallel with other commands. That's pretty bad -
> will corrupt the cvq.
> Take rtnl lock around calls to this function?
It's only used by callbacks of netif_notify_peers(), so rtnl lock have
already been hold. Add a comment to clarify this?
>> + dev_warn(&vi->dev->dev, "Failed to ack link nnounce.\n");
> announce
Sorry for the typos, would double check in the future.
>> + }
> Better to drop {} around a single statement.
>
>> +}
>> +
>> +static void announce_work(struct work_struct *work)
>> +{
>> + struct virtnet_info *vi = container_of(work, struct virtnet_info,
>> + announce);
>> + netif_notify_peers(vi->dev);
>> + virtnet_ack_link_announce(vi);
>> +}
>> +
>> static int virtnet_close(struct net_device *dev)
>> {
>> struct virtnet_info *vi = netdev_priv(dev);
>>
>> /* Make sure refill_work doesn't re-enable napi! */
>> cancel_delayed_work_sync(&vi->refill);
>> + cancel_work_sync(&vi->announce);
> I think that a config change event can trigger after this point,
> so cancel here won't be effective. But why do it here?
> virtnet_remove not a better place? We can do it
> after remove_vq_common.
Sure.
>> napi_disable(&vi->napi);
>>
>> return 0;
>> @@ -962,11 +983,17 @@ static void virtnet_update_status(struct virtnet_info *vi)
>> return;
>>
>> /* Ignore unknown (future) status bits */
>> - v&= VIRTIO_NET_S_LINK_UP;
>> + v&= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;
> The announce bit in vi->status is always clear,
> so this is IMO confusing. I would do:
>
> if (v& VIRTIO_NET_S_ANNOUNCE) {
> schedule
> }
>
> v&= VIRTIO_NET_S_LINK_UP;
>
> and the rest of the logic is not necessary then.
>
> Also, this might run extra ack announce commands after
> VIRTIO_NET_S_ANNOUNCE bit is clear. The spec
> isn't clear on whether this is legal.
>
> It would be very hard to fix this, so let's add a comment
> stating that it's legal, and clarify the spec
> in any case.
How about delay the clearing of VIRTIO_NET_S_ANNOUNCE bit in the
workqueue and protect the updating with
spinlock_irqsave()/spinloc_irqrestore()? Then later notification would
be suppressed if the sending is not completed.
>>
>> if (vi->status == v)
>> return;
>>
>> + if (v& VIRTIO_NET_S_ANNOUNCE) {
>> + v&= ~VIRTIO_NET_S_ANNOUNCE;
>> + if (v& VIRTIO_NET_S_LINK_UP)
>> + schedule_work(&vi->announce);
> I think we really want an nrt wq here - if this triggers
> multiple times there's no good reason to try and run
> the ack command many times in parallel.
Yes.
>> + }
>> +
>> vi->status = v;
>>
>> if (vi->status& VIRTIO_NET_S_LINK_UP) {
>
>> @@ -1076,6 +1103,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>> goto free;
>>
>> INIT_DELAYED_WORK(&vi->refill, refill_work);
>> + INIT_WORK(&vi->announce, announce_work);
>> sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
>> sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
>>
>> @@ -1187,6 +1215,7 @@ static int virtnet_freeze(struct virtio_device *vdev)
>> virtqueue_disable_cb(vi->svq);
>> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
>> virtqueue_disable_cb(vi->cvq);
>> + cancel_work_sync(&vi->announce);
> A config change event can trigger after this point,
> so cancel here won't be effective.
> Possibly, we need state like config_enable in the block
> device.
Probably needed.
>
> Also, what exactly will happen on suspend?
> As we reset, ANNOUCE bit will be clear so -
> do we forget to announce? Probably not good ...
This problem should be the same as normal suspending/resuming, if user
want to announce the link they should use arp_notify instead.
>
>>
>> netif_device_detach(vi->dev);
>> cancel_delayed_work_sync(&vi->refill);
>> @@ -1233,6 +1262,7 @@ static unsigned int features[] = {
>> VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
>> VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>> VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
>> + VIRTIO_NET_F_GUEST_ANNOUNCE,
>> };
>>
>> static struct virtio_driver virtio_net_driver = {
>> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
>> index 970d5a2..383e8a0 100644
>> --- a/include/linux/virtio_net.h
>> +++ b/include/linux/virtio_net.h
>> @@ -49,8 +49,10 @@
>> #define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
>> #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
>> #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
>> +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can send gratituous packet */
> Rusty aligned the comments using tabs so let's do it here too?
> A better comment 'can announce device on the network'.
>
>>
>> #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
>> +#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
> why 3 spaces before the value here?
>
>>
>> struct virtio_net_config {
>> /* The config defining mac address (if VIRTIO_NET_F_MAC) */
>> @@ -152,4 +154,15 @@ struct virtio_net_ctrl_mac {
>> #define VIRTIO_NET_CTRL_VLAN_ADD 0
>> #define VIRTIO_NET_CTRL_VLAN_DEL 1
>>
>> +/*
>> + * Control link announce acknowledgement
>> + *
>> + * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
>> + * driver has recevied the notification and device would clear the
> s/recevied/received/
>
> A bit clearer to replace 'and' with ; here.
>
>> + * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
> s/filed/field/
> s/received/receives/
>
>> + * this command.
>> + */
>> +#define VIRTIO_NET_CTRL_ANNOUNCE 3
>> + #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
>> +
>> #endif /* _LINUX_VIRTIO_NET_H */
>>
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)
From: Michael S. Tsirkin @ 2012-04-04 9:47 UTC (permalink / raw)
To: Rusty Russell; +Cc: kvm, netdev, virtualization, Amit Shah, David Miller
In-Reply-To: <20120404093229.GA3804@redhat.com>
On Wed, Apr 04, 2012 at 12:32:29PM +0300, Michael S. Tsirkin wrote:
> On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote:
> > Michael S. Tsirkin noticed that we could run the refill work after
> > ndo_close, which can re-enable napi - we don't disable it until
> > virtnet_remove. This is clearly wrong, so move the workqueue control
> > to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).
> >
> > One subtle point: virtnet_probe() could simply fail if it couldn't
> > allocate a receive buffer, but that's less polite in virtnet_open() so
> > we schedule a refill as we do in the normal receive path if we run out
> > of memory.
> >
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> Doh.
> napi_disable does not prevent the following
> napi_schedule, does it?
>
> Can someone confirm that I am not seeing things please?
Yes, I *was* seeing things. After napi_disable, NAPI_STATE_SCHED
is set to napi_schedule does nothing.
Sorry about the noise.
> And this means this hack does not work:
> try_fill_recv can still run in parallel with
> napi, corrupting the vq.
>
> I suspect we need to resurrect a patch that used a
> dedicated flag to avoid this race.
>
> Comments?
>
> > ---
> > drivers/net/virtio_net.c | 17 +++++++++++++----
> > 1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct
> > return err;
> > }
> >
> > -/* Returns false if we couldn't fill entirely (OOM). */
> > +/*
> > + * Returns false if we couldn't fill entirely (OOM).
> > + *
> > + * Normally run in the receive path, but can also be run from ndo_open
> > + * before we're receiving packets, or from refill_work which is
> > + * careful to disable receiving (using napi_disable).
> > + */
> > static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
> > {
> > int err;
> > @@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic
> > {
> > struct virtnet_info *vi = netdev_priv(dev);
> >
> > + /* Make sure we have some buffers: if oom use wq. */
> > + if (!try_fill_recv(vi, GFP_KERNEL))
> > + schedule_delayed_work(&vi->refill, 0);
> > +
> > virtnet_napi_enable(vi);
> > return 0;
> > }
> > @@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi
> > {
> > struct virtnet_info *vi = netdev_priv(dev);
> >
> > + /* Make sure refill_work doesn't re-enable napi! */
> > + cancel_delayed_work_sync(&vi->refill);
> > napi_disable(&vi->napi);
> >
> > return 0;
> > @@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d
> >
> > unregister:
> > unregister_netdev(dev);
> > - cancel_delayed_work_sync(&vi->refill);
> > free_vqs:
> > vdev->config->del_vqs(vdev);
> > free_stats:
> > @@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str
> > /* Stop all the virtqueues. */
> > vdev->config->reset(vdev);
> >
> > -
> > unregister_netdev(vi->dev);
> > - cancel_delayed_work_sync(&vi->refill);
> >
> > /* Free unused buffers in both send and recv, if any. */
> > free_unused_bufs(vi);
^ permalink raw reply
* question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)
From: Michael S. Tsirkin @ 2012-04-04 9:32 UTC (permalink / raw)
To: Rusty Russell; +Cc: kvm, netdev, virtualization, Amit Shah, David Miller
In-Reply-To: <87pqf7fyld.fsf@rustcorp.com.au>
On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote:
> Michael S. Tsirkin noticed that we could run the refill work after
> ndo_close, which can re-enable napi - we don't disable it until
> virtnet_remove. This is clearly wrong, so move the workqueue control
> to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).
>
> One subtle point: virtnet_probe() could simply fail if it couldn't
> allocate a receive buffer, but that's less polite in virtnet_open() so
> we schedule a refill as we do in the normal receive path if we run out
> of memory.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Doh.
napi_disable does not prevent the following
napi_schedule, does it?
Can someone confirm that I am not seeing things please?
And this means this hack does not work:
try_fill_recv can still run in parallel with
napi, corrupting the vq.
I suspect we need to resurrect a patch that used a
dedicated flag to avoid this race.
Comments?
> ---
> drivers/net/virtio_net.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct
> return err;
> }
>
> -/* Returns false if we couldn't fill entirely (OOM). */
> +/*
> + * Returns false if we couldn't fill entirely (OOM).
> + *
> + * Normally run in the receive path, but can also be run from ndo_open
> + * before we're receiving packets, or from refill_work which is
> + * careful to disable receiving (using napi_disable).
> + */
> static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
> {
> int err;
> @@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic
> {
> struct virtnet_info *vi = netdev_priv(dev);
>
> + /* Make sure we have some buffers: if oom use wq. */
> + if (!try_fill_recv(vi, GFP_KERNEL))
> + schedule_delayed_work(&vi->refill, 0);
> +
> virtnet_napi_enable(vi);
> return 0;
> }
> @@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi
> {
> struct virtnet_info *vi = netdev_priv(dev);
>
> + /* Make sure refill_work doesn't re-enable napi! */
> + cancel_delayed_work_sync(&vi->refill);
> napi_disable(&vi->napi);
>
> return 0;
> @@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d
>
> unregister:
> unregister_netdev(dev);
> - cancel_delayed_work_sync(&vi->refill);
> free_vqs:
> vdev->config->del_vqs(vdev);
> free_stats:
> @@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str
> /* Stop all the virtqueues. */
> vdev->config->reset(vdev);
>
> -
> unregister_netdev(vi->dev);
> - cancel_delayed_work_sync(&vi->refill);
>
> /* Free unused buffers in both send and recv, if any. */
> free_unused_bufs(vi);
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox