Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* Re: BUG: 'list_empty(&vgdev->free_vbufs)' is true!
From: Jiri Slaby @ 2016-11-15  8:55 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: David Airlie, virtualization, Linux kernel mailing list,
	dri-devel, Michael S. Tsirkin
In-Reply-To: <1479199588.32639.16.camel@redhat.com>

On 11/15/2016, 09:46 AM, Gerd Hoffmann wrote:
> On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote:
>> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
>>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
>>>> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
>>>>> Hi,
>>>>>
>>>>> I can relatively easily reproduce this bug:
>>>
>>> How?
>>
>> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.
> 
> fbcon?  Or xorg/wayland with terminal app?

Ah, just console, so fbcon. No X server running.

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: BUG: 'list_empty(&vgdev->free_vbufs)' is true!
From: Gerd Hoffmann @ 2016-11-15  9:05 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: David Airlie, virtualization, Linux kernel mailing list,
	dri-devel, Michael S. Tsirkin
In-Reply-To: <18ae4a55-7c36-b55c-0ce3-03aef0bab1ec@suse.cz>

On Di, 2016-11-15 at 09:55 +0100, Jiri Slaby wrote:
> On 11/15/2016, 09:46 AM, Gerd Hoffmann wrote:
> > On Fr, 2016-11-11 at 17:28 +0100, Jiri Slaby wrote:
> >> On 11/09/2016, 09:01 AM, Gerd Hoffmann wrote:
> >>> On Di, 2016-11-08 at 22:37 +0200, Michael S. Tsirkin wrote:
> >>>> On Mon, Nov 07, 2016 at 09:43:24AM +0100, Jiri Slaby wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I can relatively easily reproduce this bug:
> >>>
> >>> How?
> >>
> >> Run dmesg -w in the qemu window (virtio_gpu) to see a lot of output.
> > 
> > fbcon?  Or xorg/wayland with terminal app?
> 
> Ah, just console, so fbcon. No X server running.

Hmm, /me looks puzzled.  fbcon doesn't do cursor updates, so the cursor
queue can hardly be full and there should be enough buffers even without
allocating 16 extra bufs.  I'll go try reproduce and analyze that one.
The +16 patch submitted nevertheless as temporary stopgap.

cheers,
  Gerd

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Paolo Bonzini @ 2016-11-15  9:57 UTC (permalink / raw)
  To: Michael S. Tsirkin, Namhyung Kim
  Cc: virtio-dev, Tony Luck, Kees Cook, kvm,
	Radim Krčmář, Anton Vorontsov, LKML,
	Steven Rostedt, qemu-devel, Minchan Kim, Anthony Liguori,
	Colin Cross, virtualization, Ingo Molnar
In-Reply-To: <20161115065658-mutt-send-email-mst@kernel.org>



On 15/11/2016 06:06, Michael S. Tsirkin wrote:
> On Tue, Nov 15, 2016 at 01:50:21PM +0900, Namhyung Kim wrote:
>> Hi Michael,
>>
>> On Thu, Nov 10, 2016 at 06:39:55PM +0200, Michael S. Tsirkin wrote:
>>> On Sat, Aug 20, 2016 at 05:07:42PM +0900, Namhyung Kim wrote:
>>>> The virtio pstore driver provides interface to the pstore subsystem so
>>>> that the guest kernel's log/dump message can be saved on the host
>>>> machine.  Users can access the log file directly on the host, or on the
>>>> guest at the next boot using pstore filesystem.  It currently deals with
>>>> kernel log (printk) buffer only, but we can extend it to have other
>>>> information (like ftrace dump) later.
>>>>
>>>> It supports legacy PCI device using single order-2 page buffer.
>>>
>>> Do you mean a legacy virtio device? I don't see why
>>> you would want to support pre-1.0 mode.
>>> If you drop that, you can drop all cpu_to_virtio things
>>> and just use __le accessors.
>>
>> I was thinking about the kvmtools which lacks 1.0 support AFAIK.
> 
> Unless kvmtools wants to be left behind it has to go 1.0.

And it also has to go ACPI.  Is there any reason, apart from kvmtool, to
make a completely new virtio device, with no support in existing guests,
rather than implement ACPI ERST?

Paolo

>>  But
>> I think it'd be better to always use __le type anyway.  Will change.
>>
>>
>>>
>>>> It uses
>>>> two virtqueues - one for (sync) read and another for (async) write.
>>>> Since it cannot wait for write finished, it supports up to 128
>>>> concurrent IO.  The buffer size is configurable now.
>>>>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Anthony Liguori <aliguori@amazon.com>
>>>> Cc: Anton Vorontsov <anton@enomsg.org>
>>>> Cc: Colin Cross <ccross@android.com>
>>>> Cc: Kees Cook <keescook@chromium.org>
>>>> Cc: Tony Luck <tony.luck@intel.com>
>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>> Cc: Ingo Molnar <mingo@kernel.org>
>>>> Cc: Minchan Kim <minchan@kernel.org>
>>>> Cc: kvm@vger.kernel.org
>>>> Cc: qemu-devel@nongnu.org
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>>>> ---
>>>>  drivers/virtio/Kconfig             |  10 +
>>>>  drivers/virtio/Makefile            |   1 +
>>>>  drivers/virtio/virtio_pstore.c     | 417 +++++++++++++++++++++++++++++++++++++
>>>>  include/uapi/linux/Kbuild          |   1 +
>>>>  include/uapi/linux/virtio_ids.h    |   1 +
>>>>  include/uapi/linux/virtio_pstore.h |  74 +++++++
>>>>  6 files changed, 504 insertions(+)
>>>>  create mode 100644 drivers/virtio/virtio_pstore.c
>>>>  create mode 100644 include/uapi/linux/virtio_pstore.h
>>>>
>>>> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
>>>> index 77590320d44c..8f0e6c796c12 100644
>>>> --- a/drivers/virtio/Kconfig
>>>> +++ b/drivers/virtio/Kconfig
>>>> @@ -58,6 +58,16 @@ config VIRTIO_INPUT
>>>>  
>>>>  	 If unsure, say M.
>>>>  
>>>> +config VIRTIO_PSTORE
>>>> +	tristate "Virtio pstore driver"
>>>> +	depends on VIRTIO
>>>> +	depends on PSTORE
>>>> +	---help---
>>>> +	 This driver supports virtio pstore devices to save/restore
>>>> +	 panic and oops messages on the host.
>>>> +
>>>> +	 If unsure, say M.
>>>> +
>>>>   config VIRTIO_MMIO
>>>>  	tristate "Platform bus driver for memory mapped virtio devices"
>>>>  	depends on HAS_IOMEM && HAS_DMA
>>>> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
>>>> index 41e30e3dc842..bee68cb26d48 100644
>>>> --- a/drivers/virtio/Makefile
>>>> +++ b/drivers/virtio/Makefile
>>>> @@ -5,3 +5,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>>>>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>>>>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>>>>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>>>> +obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o
>>>> diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c
>>>> new file mode 100644
>>>> index 000000000000..0a63c7db4278
>>>> --- /dev/null
>>>> +++ b/drivers/virtio/virtio_pstore.c
>>>> @@ -0,0 +1,417 @@
>>>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>>> +
>>>> +#include <linux/kernel.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/pstore.h>
>>>> +#include <linux/virtio.h>
>>>> +#include <linux/virtio_config.h>
>>>> +#include <uapi/linux/virtio_ids.h>
>>>> +#include <uapi/linux/virtio_pstore.h>
>>>> +
>>>> +#define VIRT_PSTORE_ORDER    2
>>>> +#define VIRT_PSTORE_BUFSIZE  (4096 << VIRT_PSTORE_ORDER)
>>>> +#define VIRT_PSTORE_NR_REQ   128
>>>> +
>>>> +struct virtio_pstore {
>>>> +	struct virtio_device	*vdev;
>>>> +	struct virtqueue	*vq[2];
>>>
>>> I'd add named fields instead of an array here, vq[0]
>>> vq[1] all over the place is hard to read.
>>
>> Will change.
>>
>>>
>>>> +	struct pstore_info	 pstore;
>>>> +	struct virtio_pstore_req req[VIRT_PSTORE_NR_REQ];
>>>> +	struct virtio_pstore_res res[VIRT_PSTORE_NR_REQ];
>>>> +	unsigned int		 req_id;
>>>> +
>>>> +	/* Waiting for host to ack */
>>>> +	wait_queue_head_t	acked;
>>>> +	int			failed;
>>>> +};
>>>> +
>>>> +#define TYPE_TABLE_ENTRY(_entry)				\
>>>> +	{ PSTORE_TYPE_##_entry, VIRTIO_PSTORE_TYPE_##_entry }
>>>> +
>>>> +struct type_table {
>>>> +	int pstore;
>>>> +	u16 virtio;
>>>> +} type_table[] = {
>>>> +	TYPE_TABLE_ENTRY(DMESG),
>>>> +};
>>>> +
>>>> +#undef TYPE_TABLE_ENTRY
>>>
>>> let's avoid macros for now pls. In fact, I would just open-code this
>>> in to_virtio_type below. We can always change our minds later if
>>> lots of types are added.
>>
>> Yep.
>>
>>>
>>>> +
>>>> +
>>>
>>> single emoty line pls
>>
>> Ok.
>>
>>>
>>>> +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type)
>>>> +{
>>>> +	unsigned int i;
>>>> +
>>>> +	for (i = 0; i < ARRAY_SIZE(type_table); i++) {
>>>> +		if (type == type_table[i].pstore)
>>>> +			return cpu_to_virtio16(vps->vdev, type_table[i].virtio);
>>>> +	}
>>>> +
>>>> +	return cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
>>>
>>> This assigns u16 to __virtio type, sparse will warn
>>> if you enable endian-ness checks.
>>> Pls fix that and generally, please make sure this is
>>> clean from sparse warnings.
>>
>> I'll run sparse before sending patch next time.
>>
>>>
>>>> +}
>>>> +
>>>> +static enum pstore_type_id from_virtio_type(struct virtio_pstore *vps, u16 type)
>>>> +{
>>>> +	unsigned int i;
>>>> +
>>>> +	for (i = 0; i < ARRAY_SIZE(type_table); i++) {
>>>> +		if (virtio16_to_cpu(vps->vdev, type) == type_table[i].virtio)
>>>> +			return type_table[i].pstore;
>>>> +	}
>>>> +
>>>> +	return PSTORE_TYPE_UNKNOWN;
>>>> +}
>>>> +
>>>> +static void virtpstore_ack(struct virtqueue *vq)
>>>> +{
>>>> +	struct virtio_pstore *vps = vq->vdev->priv;
>>>> +
>>>> +	wake_up(&vps->acked);
>>>> +}
>>>> +
>>>> +static void virtpstore_check(struct virtqueue *vq)
>>>> +{
>>>> +	struct virtio_pstore *vps = vq->vdev->priv;
>>>> +	struct virtio_pstore_res *res;
>>>> +	unsigned int len;
>>>> +
>>>> +	res = virtqueue_get_buf(vq, &len);
>>>> +	if (res == NULL)
>>>> +		return;
>>>> +
>>>> +	if (virtio32_to_cpu(vq->vdev, res->ret) < 0)
>>>> +		vps->failed = 1;
>>>> +}
>>>> +
>>>> +static void virt_pstore_get_reqs(struct virtio_pstore *vps,
>>>> +				 struct virtio_pstore_req **preq,
>>>> +				 struct virtio_pstore_res **pres)
>>>> +{
>>>> +	unsigned int idx = vps->req_id++ % VIRT_PSTORE_NR_REQ;
>>>> +
>>>> +	*preq = &vps->req[idx];
>>>> +	*pres = &vps->res[idx];
>>>> +
>>>> +	memset(*preq, 0, sizeof(**preq));
>>>> +	memset(*pres, 0, sizeof(**pres));
>>>> +}
>>>> +
>>>> +static int virt_pstore_open(struct pstore_info *psi)
>>>> +{
>>>> +	struct virtio_pstore *vps = psi->data;
>>>> +	struct virtio_pstore_req *req;
>>>> +	struct virtio_pstore_res *res;
>>>> +	struct scatterlist sgo[1], sgi[1];
>>>> +	struct scatterlist *sgs[2] = { sgo, sgi };
>>>> +	unsigned int len;
>>>> +
>>>> +	virt_pstore_get_reqs(vps, &req, &res);
>>>> +
>>>> +	req->cmd = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_OPEN);
>>>> +
>>>> +	sg_init_one(sgo, req, sizeof(*req));
>>>> +	sg_init_one(sgi, res, sizeof(*res));
>>>> +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
>>>> +	virtqueue_kick(vps->vq[0]);
>>>> +
>>>> +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
>>>
>>> Does this block userspace in an uninterruptible wait if
>>> hardware is slow? That's not nice.
>>
>> Yes, but it's not a common operation and I just wanted to make it
>> simple.
>>
>>
>>>
>>>> +	return virtio32_to_cpu(vps->vdev, res->ret);
>>>> +}
>>>> +
>>
>> [SNIP]
>>>> +struct virtio_pstore_fileinfo {
>>>> +	__virtio64		id;
>>>> +	__virtio32		count;
>>>> +	__virtio16		type;
>>>> +	__virtio16		unused;
>>>> +	__virtio32		flags;
>>>> +	__virtio32		len;
>>>> +	__virtio64		time_sec;
>>>> +	__virtio32		time_nsec;
>>>> +	__virtio32		reserved;
>>>> +};
>>>> +
>>>> +struct virtio_pstore_config {
>>>> +	__virtio32		bufsize;
>>>> +};
>>>> +
>>>
>>> What exactly does each field mean? I'm especially
>>> interested in time fields - maintaining a consistent
>>> time between host and guest is not a simple problem.
>>
>> These are required by pstore and will be used to create corresponding
>> files in the pstore filesystem.  The time fields are for mtime and
>> ctime and, I think, it's just a hint for user and doesn't require
>> strict consistency.
> 
> Pls add documentation. I would just drop hints for now.
> 
>>
>> Thanks for your review!
>> Namhyung
>>
>>>
>>>> +#endif /* _LINUX_VIRTIO_PSTORE_H */
>>>> -- 
>>>> 2.9.3
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
From: Christian Borntraeger @ 2016-11-15 10:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar
In-Reply-To: <1477386195-32736-1-git-send-email-borntraeger@de.ibm.com>

On 10/25/2016 11:03 AM, Christian Borntraeger wrote:
> Peter,
> 
> here is v2 with some improved patch descriptions and some fixes. The
> previous version has survived one day of linux-next and I only changed
> small parts.
> So unless there is some other issue, feel free to pull (or to apply
> the patches) to tip/locking.
> 
> The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:
> 
>   Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  tags/cpurelax
> 
> for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:
> 
>   processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)

Ping.

Peter, you had these patches in your
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/
repository, but now the patches are gone. 
Any feedback?



> 
> ----------------------------------------------------------------
> cpu_relax: drop lowlatency, introduce yield
> 
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.
> 
> So my proposal boils down to:
> - lowest latency: use barrier() or mb() if necessary
> - low latency: use cpu_relax (e.g. might give up some cpu for the other
>   _hardware_ threads)
> - really give up CPU: use  cpu_relax_yield
> 
> PS: In the long run I would also try to provide for s390 something
> like cpu_relax_yield_to with a cpu number (or just add that to
> cpu_relax_yield), since a yield_to is always better than a yield as
> long as we know the waiter.
> 
> ----------------------------------------------------------------
> Christian Borntraeger (5):
>       processor.h: introduce cpu_relax_yield
>       stop_machine: yield CPU during stop machine
>       s390: make cpu_relax a barrier again
>       processor.h: Remove cpu_relax_lowlatency users
>       processor.h: remove cpu_relax_lowlatency
> 
>  arch/alpha/include/asm/processor.h      | 2 +-
>  arch/arc/include/asm/processor.h        | 4 ++--
>  arch/arm/include/asm/processor.h        | 2 +-
>  arch/arm64/include/asm/processor.h      | 2 +-
>  arch/avr32/include/asm/processor.h      | 2 +-
>  arch/blackfin/include/asm/processor.h   | 2 +-
>  arch/c6x/include/asm/processor.h        | 2 +-
>  arch/cris/include/asm/processor.h       | 2 +-
>  arch/frv/include/asm/processor.h        | 2 +-
>  arch/h8300/include/asm/processor.h      | 2 +-
>  arch/hexagon/include/asm/processor.h    | 2 +-
>  arch/ia64/include/asm/processor.h       | 2 +-
>  arch/m32r/include/asm/processor.h       | 2 +-
>  arch/m68k/include/asm/processor.h       | 2 +-
>  arch/metag/include/asm/processor.h      | 2 +-
>  arch/microblaze/include/asm/processor.h | 2 +-
>  arch/mips/include/asm/processor.h       | 2 +-
>  arch/mn10300/include/asm/processor.h    | 2 +-
>  arch/nios2/include/asm/processor.h      | 2 +-
>  arch/openrisc/include/asm/processor.h   | 2 +-
>  arch/parisc/include/asm/processor.h     | 2 +-
>  arch/powerpc/include/asm/processor.h    | 2 +-
>  arch/s390/include/asm/processor.h       | 4 ++--
>  arch/s390/kernel/processor.c            | 4 ++--
>  arch/score/include/asm/processor.h      | 2 +-
>  arch/sh/include/asm/processor.h         | 2 +-
>  arch/sparc/include/asm/processor_32.h   | 2 +-
>  arch/sparc/include/asm/processor_64.h   | 2 +-
>  arch/tile/include/asm/processor.h       | 2 +-
>  arch/unicore32/include/asm/processor.h  | 2 +-
>  arch/x86/include/asm/processor.h        | 2 +-
>  arch/x86/um/asm/processor.h             | 2 +-
>  arch/xtensa/include/asm/processor.h     | 2 +-
>  drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
>  drivers/vhost/net.c                     | 4 ++--
>  kernel/locking/mcs_spinlock.h           | 4 ++--
>  kernel/locking/mutex.c                  | 4 ++--
>  kernel/locking/osq_lock.c               | 6 +++---
>  kernel/locking/qrwlock.c                | 6 +++---
>  kernel/locking/rwsem-xadd.c             | 4 ++--
>  kernel/stop_machine.c                   | 2 +-
>  lib/lockref.c                           | 2 +-
>  42 files changed, 53 insertions(+), 53 deletions(-)
> 

^ permalink raw reply

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
From: Russell King - ARM Linux @ 2016-11-15 12:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar
In-Reply-To: <1477386195-32736-2-git-send-email-borntraeger@de.ibm.com>

On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.

Rather than having to update all these architectures in this way, can't
we put in some linux/*.h header something like:

#ifndef cpu_relax_yield
#define cpu_relax_yield() cpu_relax()
#endif

so only those architectures that need to do something need to be
modified?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply

* Re: [PATCH] vhost/vsock: Remove unused but set variable
From: Stefan Hajnoczi @ 2016-11-15 13:00 UTC (permalink / raw)
  To: Tobias Klauser; +Cc: netdev, virtualization, kvm, Michael S. Tsirkin
In-Reply-To: <20161111132631.25708-1-tklauser@distanz.ch>


[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]

On Fri, Nov 11, 2016 at 02:26:31PM +0100, Tobias Klauser wrote:
> Remove the unused but set variable vq in vhost_transport_send_pkt() to
> fix the following GCC warning when building with 'W=1':
> 
>   drivers/vhost/vsock.c:198:26: warning: variable ‘vq’ set but not used
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
> ---
>  drivers/vhost/vsock.c | 3 ---
>  1 file changed, 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
From: Christian Borntraeger @ 2016-11-15 13:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar
In-Reply-To: <20161115123029.GT1041@n2100.armlinux.org.uk>

On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>> For spinning loops people do often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency.
>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>> towards other hardware threads in an SMT environment.
>> On s390 cpu_relax does even more, it uses an hypercall to the
>> hypervisor to give up the timeslice.
>> In contrast to the SMT yielding this can result in larger latencies.
>> In some places this latency is unwanted, so another variant
>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>> and more places, lets revert the logic and provide a cpu_relax_yield
>> that can be called in places where yielding is more important than
>> latency. By default this is the same as cpu_relax on all architectures.
> 
> Rather than having to update all these architectures in this way, can't
> we put in some linux/*.h header something like:
> 
> #ifndef cpu_relax_yield
> #define cpu_relax_yield() cpu_relax()
> #endif
> 
> so only those architectures that need to do something need to be
> modified?

These patches are part of linux-next since a month or so, changing that 
would invalidate all the next testing. If people want that, I can certainly
do that, though.

^ permalink raw reply

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
From: Russell King - ARM Linux @ 2016-11-15 13:37 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar
In-Reply-To: <7f7850e4-c7bb-9cc1-2d65-a1555e97988a@de.ibm.com>

On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> > On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> >> For spinning loops people do often use barrier() or cpu_relax().
> >> For most architectures cpu_relax and barrier are the same, but on
> >> some architectures cpu_relax can add some latency.
> >> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> >> towards other hardware threads in an SMT environment.
> >> On s390 cpu_relax does even more, it uses an hypercall to the
> >> hypervisor to give up the timeslice.
> >> In contrast to the SMT yielding this can result in larger latencies.
> >> In some places this latency is unwanted, so another variant
> >> "cpu_relax_lowlatency" was introduced. Before this is used in more
> >> and more places, lets revert the logic and provide a cpu_relax_yield
> >> that can be called in places where yielding is more important than
> >> latency. By default this is the same as cpu_relax on all architectures.
> > 
> > Rather than having to update all these architectures in this way, can't
> > we put in some linux/*.h header something like:
> > 
> > #ifndef cpu_relax_yield
> > #define cpu_relax_yield() cpu_relax()
> > #endif
> > 
> > so only those architectures that need to do something need to be
> > modified?
> 
> These patches are part of linux-next since a month or so, changing that 
> would invalidate all the next testing. If people want that, I can certainly
> do that, though.

It's three weeks since you posted them.  For one of those weeks (the
week you posted them) I was away, and missed them while catching up.
Sorry, but it sometimes takes a while to spot things amongst the
backlog, and normally takes some subsequent activity on the thread to
bring it back into view.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
From: Christian Borntraeger @ 2016-11-15 13:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar
In-Reply-To: <20161115133758.GV1041@n2100.armlinux.org.uk>

On 11/15/2016 02:37 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
>> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
>>> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>>>> For spinning loops people do often use barrier() or cpu_relax().
>>>> For most architectures cpu_relax and barrier are the same, but on
>>>> some architectures cpu_relax can add some latency.
>>>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>>>> towards other hardware threads in an SMT environment.
>>>> On s390 cpu_relax does even more, it uses an hypercall to the
>>>> hypervisor to give up the timeslice.
>>>> In contrast to the SMT yielding this can result in larger latencies.
>>>> In some places this latency is unwanted, so another variant
>>>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>>>> and more places, lets revert the logic and provide a cpu_relax_yield
>>>> that can be called in places where yielding is more important than
>>>> latency. By default this is the same as cpu_relax on all architectures.
>>>
>>> Rather than having to update all these architectures in this way, can't
>>> we put in some linux/*.h header something like:
>>>
>>> #ifndef cpu_relax_yield
>>> #define cpu_relax_yield() cpu_relax()
>>> #endif
>>>
>>> so only those architectures that need to do something need to be
>>> modified?
>>
>> These patches are part of linux-next since a month or so, changing that 
>> would invalidate all the next testing. If people want that, I can certainly
>> do that, though.
> 
> It's three weeks since you posted them.  For one of those weeks (the
> week you posted them) I was away, and missed them while catching up.
> Sorry, but it sometimes takes a while to spot things amongst the
> backlog, and normally takes some subsequent activity on the thread to
> bring it back into view.

Absolutely no need to apologize. Thank you for doing the review and the proposal. 
I will do whatever is consensus, but since this looks like tip/locking material
I will wait for Peter or Ingo to decide about the if and how.

Christian

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Michael S. Tsirkin @ 2016-11-15 14:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: virtio-dev, Tony Luck, Kees Cook, kvm,
	Radim Krčmář, Anton Vorontsov, LKML,
	Steven Rostedt, qemu-devel, Minchan Kim, Anthony Liguori,
	Colin Cross, Paolo Bonzini, virtualization, Ingo Molnar
In-Reply-To: <20161115055011.GB15992@danjae.aot.lge.com>

On Tue, Nov 15, 2016 at 02:50:11PM +0900, Namhyung Kim wrote:
> On Tue, Nov 15, 2016 at 07:06:28AM +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 15, 2016 at 01:50:21PM +0900, Namhyung Kim wrote:
> > > On Thu, Nov 10, 2016 at 06:39:55PM +0200, Michael S. Tsirkin wrote:
> > > [SNIP]
> > > > > +struct virtio_pstore_fileinfo {
> > > > > +	__virtio64		id;
> > > > > +	__virtio32		count;
> > > > > +	__virtio16		type;
> > > > > +	__virtio16		unused;
> > > > > +	__virtio32		flags;
> > > > > +	__virtio32		len;
> > > > > +	__virtio64		time_sec;
> > > > > +	__virtio32		time_nsec;
> > > > > +	__virtio32		reserved;
> > > > > +};
> > > > > +
> > > > > +struct virtio_pstore_config {
> > > > > +	__virtio32		bufsize;
> > > > > +};
> > > > > +
> > > > 
> > > > What exactly does each field mean? I'm especially
> > > > interested in time fields - maintaining a consistent
> > > > time between host and guest is not a simple problem.
> > > 
> > > These are required by pstore and will be used to create corresponding
> > > files in the pstore filesystem.  The time fields are for mtime and
> > > ctime and, I think, it's just a hint for user and doesn't require
> > > strict consistency.
> > 
> > Pls add documentation. I would just drop hints for now.
> 
> Well, I'll add docmentation.  But I think just dropping might not good
> since they all have host time and it's helpful to know their relative
> difference in guest.
> 
> Thanks,
> Namhyung

If it's part of host/guest ABI it needs to be better defined.
"It's just a hint does not need to be exact" is too vague,
we need to specify what kind of change will or will not
break guests.

-- 
MST

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Namhyung Kim @ 2016-11-15 14:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: virtio-dev, Tony Luck, Kees Cook, kvm,
	Radim Krčmář, Michael S. Tsirkin, LKML,
	Steven Rostedt, qemu-devel, Minchan Kim, Anton Vorontsov,
	Anthony Liguori, Colin Cross, virtualization, Ingo Molnar
In-Reply-To: <f514ac65-3105-a12c-9e10-2d3b1e8a0516@redhat.com>

Hi,

On Tue, Nov 15, 2016 at 10:57:29AM +0100, Paolo Bonzini wrote:
> 
> 
> On 15/11/2016 06:06, Michael S. Tsirkin wrote:
> > On Tue, Nov 15, 2016 at 01:50:21PM +0900, Namhyung Kim wrote:
> >> Hi Michael,
> >>
> >> On Thu, Nov 10, 2016 at 06:39:55PM +0200, Michael S. Tsirkin wrote:
> >>> On Sat, Aug 20, 2016 at 05:07:42PM +0900, Namhyung Kim wrote:
> >>>> The virtio pstore driver provides interface to the pstore subsystem so
> >>>> that the guest kernel's log/dump message can be saved on the host
> >>>> machine.  Users can access the log file directly on the host, or on the
> >>>> guest at the next boot using pstore filesystem.  It currently deals with
> >>>> kernel log (printk) buffer only, but we can extend it to have other
> >>>> information (like ftrace dump) later.
> >>>>
> >>>> It supports legacy PCI device using single order-2 page buffer.
> >>>
> >>> Do you mean a legacy virtio device? I don't see why
> >>> you would want to support pre-1.0 mode.
> >>> If you drop that, you can drop all cpu_to_virtio things
> >>> and just use __le accessors.
> >>
> >> I was thinking about the kvmtools which lacks 1.0 support AFAIK.
> > 
> > Unless kvmtools wants to be left behind it has to go 1.0.
> 
> And it also has to go ACPI.  Is there any reason, apart from kvmtool, to
> make a completely new virtio device, with no support in existing guests,
> rather than implement ACPI ERST?

Well, I know nothing about ACPI.  It looks like a huge spec and I
don't want to dig into it just for this.

What I want is to speed up dumping guest kernel message (especially
for ftrace dump).

Thanks,
Namhyung

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Paolo Bonzini @ 2016-11-15 14:38 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: virtio-dev, Tony Luck, Kees Cook, kvm,
	Radim Krčmář, Michael S. Tsirkin, LKML,
	Steven Rostedt, qemu-devel, Minchan Kim, Anton Vorontsov,
	Anthony Liguori, Colin Cross, virtualization, Ingo Molnar
In-Reply-To: <20161115143629.GA29740@danjae.aot.lge.com>



On 15/11/2016 15:36, Namhyung Kim wrote:
> Hi,
> 
> On Tue, Nov 15, 2016 at 10:57:29AM +0100, Paolo Bonzini wrote:
>>
>>
>> On 15/11/2016 06:06, Michael S. Tsirkin wrote:
>>> On Tue, Nov 15, 2016 at 01:50:21PM +0900, Namhyung Kim wrote:
>>>> Hi Michael,
>>>>
>>>> On Thu, Nov 10, 2016 at 06:39:55PM +0200, Michael S. Tsirkin wrote:
>>>>> On Sat, Aug 20, 2016 at 05:07:42PM +0900, Namhyung Kim wrote:
>>>>>> The virtio pstore driver provides interface to the pstore subsystem so
>>>>>> that the guest kernel's log/dump message can be saved on the host
>>>>>> machine.  Users can access the log file directly on the host, or on the
>>>>>> guest at the next boot using pstore filesystem.  It currently deals with
>>>>>> kernel log (printk) buffer only, but we can extend it to have other
>>>>>> information (like ftrace dump) later.
>>>>>>
>>>>>> It supports legacy PCI device using single order-2 page buffer.
>>>>>
>>>>> Do you mean a legacy virtio device? I don't see why
>>>>> you would want to support pre-1.0 mode.
>>>>> If you drop that, you can drop all cpu_to_virtio things
>>>>> and just use __le accessors.
>>>>
>>>> I was thinking about the kvmtools which lacks 1.0 support AFAIK.
>>>
>>> Unless kvmtools wants to be left behind it has to go 1.0.
>>
>> And it also has to go ACPI.  Is there any reason, apart from kvmtool, to
>> make a completely new virtio device, with no support in existing guests,
>> rather than implement ACPI ERST?
> 
> Well, I know nothing about ACPI.  It looks like a huge spec and I
> don't want to dig into it just for this.

ERST (error record serialization table) is a small subset of the ACPI spec.

Paolo

^ permalink raw reply

* Re: [PATCH 2/3] qemu: Implement virtio-pstore device
From: Michael S. Tsirkin @ 2016-11-15 14:38 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: virtio-dev, Tony Luck, Daniel P . Berrange, Kees Cook, kvm,
	Radim Krčmář, Anton Vorontsov, LKML,
	Steven Rostedt, qemu-devel, Minchan Kim, Anthony Liguori,
	Colin Cross, Paolo Bonzini, virtualization, Ingo Molnar
In-Reply-To: <20161115062336.GA16821@danjae.aot.lge.com>

On Tue, Nov 15, 2016 at 03:23:36PM +0900, Namhyung Kim wrote:
> On Fri, Nov 11, 2016 at 12:50:03AM +0200, Michael S. Tsirkin wrote:
> > On Fri, Sep 16, 2016 at 07:05:47PM +0900, Namhyung Kim wrote:
> > > On Tue, Sep 13, 2016 at 06:57:10PM +0300, Michael S. Tsirkin wrote:
> > > > On Sat, Aug 20, 2016 at 05:07:43PM +0900, Namhyung Kim wrote:
> > > > > +
> > > > > +/* the index should match to the type value */
> > > > > +static const char *virtio_pstore_file_prefix[] = {
> > > > > +    "unknown-",		/* VIRTIO_PSTORE_TYPE_UNKNOWN */
> > > > 
> > > > Is there value in treating everything unexpected as "unknown"
> > > > and rotating them as if they were logs?
> > > > It might be better to treat everything that's not known
> > > > as guest error.
> > > 
> > > I was thinking about the version mismatch between the kernel and qemu.
> > > I'd like to make the device can deal with a new kernel version which
> > > might implement a new pstore message type.  It will be saved as
> > > unknown but the kernel can read it properly later.
> > 
> > Well it'll have a different prefix. E.g. if kernel has
> > two different types they will end up in the same
> > file, hardly what was wanted.
> 
> Right, I think it needs to add 'type' info to the filename for unknown
> type.
> 
> Thanks,
> Namhyung

And that opens all kind of resource management issues as guest
might be able to open a ton of these unexpected types.
So don't try to predict the future, if you add a new type
you add a feature flag. Ignore or error on things you can
not handle.

-- 
MST

^ permalink raw reply

* Re: [PATCH] vhost/scsi: Remove unused but set variable
From: Stefan Hajnoczi @ 2016-11-15 14:40 UTC (permalink / raw)
  To: Tobias Klauser; +Cc: netdev, virtualization, kvm, Michael S. Tsirkin
In-Reply-To: <20161111132710.25804-1-tklauser@distanz.ch>


[-- Attachment #1.1: Type: text/plain, Size: 471 bytes --]

On Fri, Nov 11, 2016 at 02:27:10PM +0100, Tobias Klauser wrote:
> Remove the unused but set variable se_tpg in vhost_scsi_nexus_cb() to
> fix the following GCC warning when building with 'W=1':
> 
>   drivers/vhost/scsi.c:1752:26: warning: variable ‘se_tpg’ set but not used
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
> ---
>  drivers/vhost/scsi.c | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v7 06/11] x86, paravirt: Add interface to support kvm/xen vcpu preempted check
From: Peter Zijlstra @ 2016-11-15 15:47 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: kvm, rkrcmar, benh, will.deacon, virtualization, paulus,
	kernellwp, linux-s390, dave, xen-devel-request, x86, mingo,
	xen-devel, paulmck, konrad.wilk, boqun.feng, jgross, linux-kernel,
	David.Laight, mpe, pbonzini, linuxppc-dev
In-Reply-To: <1478077718-37424-7-git-send-email-xinhui.pan@linux.vnet.ibm.com>

On Wed, Nov 02, 2016 at 05:08:33AM -0400, Pan Xinhui wrote:
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 0f400c0..38c3bb7 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -310,6 +310,8 @@ struct pv_lock_ops {
>  
>  	void (*wait)(u8 *ptr, u8 val);
>  	void (*kick)(int cpu);
> +
> +	bool (*vcpu_is_preempted)(int cpu);
>  };

So that ends up with a full function call in the native case. I did
something like the below on top, completely untested, not been near a
compiler etc..

It doesn't get rid of the branch, but at least it avoids the function
call, and hardware should have no trouble predicting a constant
condition.

Also, it looks like you end up not setting vcpu_is_preempted when KVM
doesn't support steal clock, which would end up in an instant NULL
deref. Fixed that too.

---
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -673,6 +673,11 @@ static __always_inline void pv_kick(int
 	PVOP_VCALL1(pv_lock_ops.kick, cpu);
 }
 
+static __always_inline void pv_vcpu_is_prempted(int cpu)
+{
+	PVOP_VCALLEE1(pv_lock_ops.vcpu_is_preempted, cpu);
+}
+
 #endif /* SMP && PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -309,7 +309,7 @@ struct pv_lock_ops {
 	void (*wait)(u8 *ptr, u8 val);
 	void (*kick)(int cpu);
 
-	bool (*vcpu_is_preempted)(int cpu);
+	struct paravirt_callee_save vcpu_is_preempted;
 };
 
 /* This contains all the paravirt structures: we get a convenient
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -32,6 +32,12 @@ static inline void queued_spin_unlock(st
 {
 	pv_queued_spin_unlock(lock);
 }
+
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+	return pv_vcpu_is_preempted(cpu);
+}
 #else
 static inline void queued_spin_unlock(struct qspinlock *lock)
 {
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -26,14 +26,6 @@
 extern struct static_key paravirt_ticketlocks_enabled;
 static __always_inline bool static_key_false(struct static_key *key);
 
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define vcpu_is_preempted vcpu_is_preempted
-static inline bool vcpu_is_preempted(int cpu)
-{
-	return pv_lock_ops.vcpu_is_preempted(cpu);
-}
-#endif
-
 #include <asm/qspinlock.h>
 
 /*
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -415,15 +415,6 @@ void kvm_disable_steal_time(void)
 	wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
-static bool kvm_vcpu_is_preempted(int cpu)
-{
-	struct kvm_steal_time *src;
-
-	src = &per_cpu(steal_time, cpu);
-
-	return !!src->preempted;
-}
-
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -480,9 +471,6 @@ void __init kvm_guest_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
 		has_steal_clock = 1;
 		pv_time_ops.steal_clock = kvm_steal_clock;
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-		pv_lock_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
-#endif
 	}
 
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
@@ -604,6 +592,14 @@ static void kvm_wait(u8 *ptr, u8 val)
 	local_irq_restore(flags);
 }
 
+static bool __kvm_vcpu_is_preempted(int cpu)
+{
+	struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
+
+	return !!src->preempted;
+}
+PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
+
 /*
  * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
  */
@@ -620,6 +616,12 @@ void __init kvm_spinlock_init(void)
 	pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
 	pv_lock_ops.wait = kvm_wait;
 	pv_lock_ops.kick = kvm_kick_cpu;
+	pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted);
+
+	if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
+		pv_lock_ops.vcpu_is_preempted =
+			PV_CALLEE_SAVE(__kvm_vcpu_is_preempted);
+	}
 }
 
 static __init int kvm_spinlock_init_jump(void)
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -12,7 +12,6 @@ __visible void __native_queued_spin_unlo
 {
 	native_queued_spin_unlock(lock);
 }
-
 PV_CALLEE_SAVE_REGS_THUNK(__native_queued_spin_unlock);
 
 bool pv_is_native_spin_unlock(void)
@@ -21,9 +20,16 @@ bool pv_is_native_spin_unlock(void)
 		__raw_callee_save___native_queued_spin_unlock;
 }
 
-static bool native_vcpu_is_preempted(int cpu)
+__visible bool __native_vcpu_is_preempted(int cpu)
 {
-	return 0;
+	return false;
+}
+PV_CALLEE_SAVE_REGS_THUNK(__native_vcpu_is_preempted);
+
+bool pv_is_native_vcpu_is_preempted(void)
+{
+	return pv_lock_ops.queued_spin_unlock.func ==
+		__raw_callee_save__native_vcpu_is_preempted;
 }
 
 struct pv_lock_ops pv_lock_ops = {
@@ -32,7 +38,7 @@ struct pv_lock_ops pv_lock_ops = {
 	.queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
 	.wait = paravirt_nop,
 	.kick = paravirt_nop,
-	.vcpu_is_preempted = native_vcpu_is_preempted,
+	.vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted),
 #endif /* SMP */
 };
 EXPORT_SYMBOL(pv_lock_ops);
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -11,6 +11,7 @@ DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %c
 
 #if defined(CONFIG_PARAVIRT_SPINLOCKS)
 DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%eax)");
+DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "movl $0, %eax");
 #endif
 
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
@@ -26,6 +27,7 @@ unsigned paravirt_patch_ident_64(void *i
 }
 
 extern bool pv_is_native_spin_unlock(void);
+extern bool pv_is_native_vcpu_is_preempted(void);
 
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 		      unsigned long addr, unsigned len)
@@ -54,6 +56,12 @@ unsigned native_patch(u8 type, u16 clobb
 				end   = end_pv_lock_ops_queued_spin_unlock;
 				goto patch_site;
 			}
+		case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
+			if (pv_is_native_vcpu_is_preempted()) {
+				start = start_pv_lock_ops_vcpu_is_preempted;
+				end   = end_pv_lock_ops_vcpu_is_preempted;
+				goto patch_site;
+			}
 #endif
 
 	default:
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -20,6 +20,7 @@ DEF_NATIVE(, mov64, "mov %rdi, %rax");
 
 #if defined(CONFIG_PARAVIRT_SPINLOCKS)
 DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%rdi)");
+DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "movl $0, rax");
 #endif
 
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
@@ -35,6 +36,7 @@ unsigned paravirt_patch_ident_64(void *i
 }
 
 extern bool pv_is_native_spin_unlock(void);
+extern bool pv_is_native_vcpu_is_preempted(void);
 
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 		      unsigned long addr, unsigned len)
@@ -66,6 +68,12 @@ unsigned native_patch(u8 type, u16 clobb
 				end   = end_pv_lock_ops_queued_spin_unlock;
 				goto patch_site;
 			}
+		case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
+			if (pv_is_native_vcpu_is_preempted()) {
+				start = start_pv_lock_ops_vcpu_is_preempted;
+				end   = end_pv_lock_ops_vcpu_is_preempted;
+				goto patch_site;
+			}
 #endif
 
 	default:
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -114,6 +114,8 @@ void xen_uninit_lock_cpu(int cpu)
 	per_cpu(irq_name, cpu) = NULL;
 }
 
+PV_CALLEE_SAVE_REGS_THUNK(xen_vcpu_stolen);
+
 /*
  * Our init of PV spinlocks is split in two init functions due to us
  * using paravirt patching and jump labels patching and having to do
@@ -136,8 +138,7 @@ void __init xen_init_spinlocks(void)
 	pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
 	pv_lock_ops.wait = xen_qlock_wait;
 	pv_lock_ops.kick = xen_qlock_kick;
-
-	pv_lock_ops.vcpu_is_preempted = xen_vcpu_stolen;
+	pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(xen_vcpu_stolen);
 }
 
 /*

^ permalink raw reply

* Re: virtio_pci irq handling cleanups
From: Christoph Hellwig @ 2016-11-15 20:33 UTC (permalink / raw)
  To: mst; +Cc: linux-kernel, virtualization
In-Reply-To: <1478473783-12589-1-git-send-email-hch@lst.de>

FYI, I've got an updated version of this series that I will send
out together with the affinity patches once a prep patch makes it
to the tip tree.  Feel free to ignore this version for now.

^ permalink raw reply

* Re: [PATCH v7 06/11] x86, paravirt: Add interface to support kvm/xen vcpu preempted check
From: Pan Xinhui @ 2016-11-16  4:19 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: kvm, rkrcmar, benh, will.deacon, virtualization, paulus,
	kernellwp, linux-s390, dave, xen-devel-request, x86, mingo,
	xen-devel, paulmck, konrad.wilk, boqun.feng, jgross, linux-kernel,
	David.Laight, mpe, pbonzini, linuxppc-dev
In-Reply-To: <20161115154706.GF11311@worktop.programming.kicks-ass.net>



在 2016/11/15 23:47, Peter Zijlstra 写道:
> On Wed, Nov 02, 2016 at 05:08:33AM -0400, Pan Xinhui wrote:
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> index 0f400c0..38c3bb7 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -310,6 +310,8 @@ struct pv_lock_ops {
>>
>>  	void (*wait)(u8 *ptr, u8 val);
>>  	void (*kick)(int cpu);
>> +
>> +	bool (*vcpu_is_preempted)(int cpu);
>>  };
>
> So that ends up with a full function call in the native case. I did
> something like the below on top, completely untested, not been near a
> compiler etc..
>
Hi, Peter.
	I think we can avoid a function call in a simpler way. How about below

static inline bool vcpu_is_preempted(int cpu)
{
	/* only set in pv case*/
	if (pv_lock_ops.vcpu_is_preempted)
		return pv_lock_ops.vcpu_is_preempted(cpu);
	return false;
}


> It doesn't get rid of the branch, but at least it avoids the function
> call, and hardware should have no trouble predicting a constant
> condition.
>
> Also, it looks like you end up not setting vcpu_is_preempted when KVM
> doesn't support steal clock, which would end up in an instant NULL
> deref. Fixed that too.
>
maybe not true. There is .vcpu_is_preempted = native_vcpu_is_preempted when we define pv_lock_ops.

your patch is a good example for any people who want to add any native/pv function. :)

thanks
xinhui

> ---
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -673,6 +673,11 @@ static __always_inline void pv_kick(int
>  	PVOP_VCALL1(pv_lock_ops.kick, cpu);
>  }
>
> +static __always_inline void pv_vcpu_is_prempted(int cpu)
> +{
> +	PVOP_VCALLEE1(pv_lock_ops.vcpu_is_preempted, cpu);
> +}
> +
>  #endif /* SMP && PARAVIRT_SPINLOCKS */
>
>  #ifdef CONFIG_X86_32
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -309,7 +309,7 @@ struct pv_lock_ops {
>  	void (*wait)(u8 *ptr, u8 val);
>  	void (*kick)(int cpu);
>
> -	bool (*vcpu_is_preempted)(int cpu);
> +	struct paravirt_callee_save vcpu_is_preempted;
>  };
>
>  /* This contains all the paravirt structures: we get a convenient
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -32,6 +32,12 @@ static inline void queued_spin_unlock(st
>  {
>  	pv_queued_spin_unlock(lock);
>  }
> +
> +#define vcpu_is_preempted vcpu_is_preempted
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> +	return pv_vcpu_is_preempted(cpu);
> +}
>  #else
>  static inline void queued_spin_unlock(struct qspinlock *lock)
>  {
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -26,14 +26,6 @@
>  extern struct static_key paravirt_ticketlocks_enabled;
>  static __always_inline bool static_key_false(struct static_key *key);
>
> -#ifdef CONFIG_PARAVIRT_SPINLOCKS
> -#define vcpu_is_preempted vcpu_is_preempted
> -static inline bool vcpu_is_preempted(int cpu)
> -{
> -	return pv_lock_ops.vcpu_is_preempted(cpu);
> -}
> -#endif
> -
>  #include <asm/qspinlock.h>
>
>  /*
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -415,15 +415,6 @@ void kvm_disable_steal_time(void)
>  	wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
>  }
>
> -static bool kvm_vcpu_is_preempted(int cpu)
> -{
> -	struct kvm_steal_time *src;
> -
> -	src = &per_cpu(steal_time, cpu);
> -
> -	return !!src->preempted;
> -}
> -
>  #ifdef CONFIG_SMP
>  static void __init kvm_smp_prepare_boot_cpu(void)
>  {
> @@ -480,9 +471,6 @@ void __init kvm_guest_init(void)
>  	if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
>  		has_steal_clock = 1;
>  		pv_time_ops.steal_clock = kvm_steal_clock;
> -#ifdef CONFIG_PARAVIRT_SPINLOCKS
> -		pv_lock_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
> -#endif
>  	}
>
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> @@ -604,6 +592,14 @@ static void kvm_wait(u8 *ptr, u8 val)
>  	local_irq_restore(flags);
>  }
>
> +static bool __kvm_vcpu_is_preempted(int cpu)
> +{
> +	struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
> +
> +	return !!src->preempted;
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
> +
>  /*
>   * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
>   */
> @@ -620,6 +616,12 @@ void __init kvm_spinlock_init(void)
>  	pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
>  	pv_lock_ops.wait = kvm_wait;
>  	pv_lock_ops.kick = kvm_kick_cpu;
> +	pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted);
> +
> +	if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
> +		pv_lock_ops.vcpu_is_preempted =
> +			PV_CALLEE_SAVE(__kvm_vcpu_is_preempted);
> +	}
>  }
>
>  static __init int kvm_spinlock_init_jump(void)
> --- a/arch/x86/kernel/paravirt-spinlocks.c
> +++ b/arch/x86/kernel/paravirt-spinlocks.c
> @@ -12,7 +12,6 @@ __visible void __native_queued_spin_unlo
>  {
>  	native_queued_spin_unlock(lock);
>  }
> -
>  PV_CALLEE_SAVE_REGS_THUNK(__native_queued_spin_unlock);
>
>  bool pv_is_native_spin_unlock(void)
> @@ -21,9 +20,16 @@ bool pv_is_native_spin_unlock(void)
>  		__raw_callee_save___native_queued_spin_unlock;
>  }
>
> -static bool native_vcpu_is_preempted(int cpu)
> +__visible bool __native_vcpu_is_preempted(int cpu)
>  {
> -	return 0;
> +	return false;
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(__native_vcpu_is_preempted);
> +
> +bool pv_is_native_vcpu_is_preempted(void)
> +{
> +	return pv_lock_ops.queued_spin_unlock.func ==
> +		__raw_callee_save__native_vcpu_is_preempted;
>  }
>
copy-paste issue...

>  struct pv_lock_ops pv_lock_ops = {
> @@ -32,7 +38,7 @@ struct pv_lock_ops pv_lock_ops = {
>  	.queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
>  	.wait = paravirt_nop,
>  	.kick = paravirt_nop,
> -	.vcpu_is_preempted = native_vcpu_is_preempted,
> +	.vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted),
>  #endif /* SMP */
>  };
>  EXPORT_SYMBOL(pv_lock_ops);
> --- a/arch/x86/kernel/paravirt_patch_32.c
> +++ b/arch/x86/kernel/paravirt_patch_32.c
> @@ -11,6 +11,7 @@ DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %c
>
>  #if defined(CONFIG_PARAVIRT_SPINLOCKS)
>  DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%eax)");
> +DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "movl $0, %eax");
>  #endif
>
>  unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
> @@ -26,6 +27,7 @@ unsigned paravirt_patch_ident_64(void *i
>  }
>
>  extern bool pv_is_native_spin_unlock(void);
> +extern bool pv_is_native_vcpu_is_preempted(void);
>
>  unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
>  		      unsigned long addr, unsigned len)
> @@ -54,6 +56,12 @@ unsigned native_patch(u8 type, u16 clobb
>  				end   = end_pv_lock_ops_queued_spin_unlock;
>  				goto patch_site;
>  			}
> +		case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
> +			if (pv_is_native_vcpu_is_preempted()) {
> +				start = start_pv_lock_ops_vcpu_is_preempted;
> +				end   = end_pv_lock_ops_vcpu_is_preempted;
> +				goto patch_site;
> +			}
>  #endif
>
>  	default:
> --- a/arch/x86/kernel/paravirt_patch_64.c
> +++ b/arch/x86/kernel/paravirt_patch_64.c
> @@ -20,6 +20,7 @@ DEF_NATIVE(, mov64, "mov %rdi, %rax");
>
>  #if defined(CONFIG_PARAVIRT_SPINLOCKS)
>  DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%rdi)");
> +DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "movl $0, rax");
>  #endif
>
>  unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
> @@ -35,6 +36,7 @@ unsigned paravirt_patch_ident_64(void *i
>  }
>
>  extern bool pv_is_native_spin_unlock(void);
> +extern bool pv_is_native_vcpu_is_preempted(void);
>
>  unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
>  		      unsigned long addr, unsigned len)
> @@ -66,6 +68,12 @@ unsigned native_patch(u8 type, u16 clobb
>  				end   = end_pv_lock_ops_queued_spin_unlock;
>  				goto patch_site;
>  			}
> +		case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
> +			if (pv_is_native_vcpu_is_preempted()) {
> +				start = start_pv_lock_ops_vcpu_is_preempted;
> +				end   = end_pv_lock_ops_vcpu_is_preempted;
> +				goto patch_site;
> +			}
>  #endif
>
>  	default:
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -114,6 +114,8 @@ void xen_uninit_lock_cpu(int cpu)
>  	per_cpu(irq_name, cpu) = NULL;
>  }
>
> +PV_CALLEE_SAVE_REGS_THUNK(xen_vcpu_stolen);
> +
>  /*
>   * Our init of PV spinlocks is split in two init functions due to us
>   * using paravirt patching and jump labels patching and having to do
> @@ -136,8 +138,7 @@ void __init xen_init_spinlocks(void)
>  	pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
>  	pv_lock_ops.wait = xen_qlock_wait;
>  	pv_lock_ops.kick = xen_qlock_kick;
> -
> -	pv_lock_ops.vcpu_is_preempted = xen_vcpu_stolen;
> +	pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(xen_vcpu_stolen);
>  }
>
>  /*
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Namhyung Kim @ 2016-11-16  7:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: virtio-dev, Tony Luck, Kees Cook, KVM,
	Radim Krčmář, Michael S. Tsirkin, LKML,
	Steven Rostedt, qemu-devel, Minchan Kim, Anton Vorontsov,
	Anthony Liguori, Colin Cross, virtualization, Ingo Molnar
In-Reply-To: <a6fc0acb-6ecc-a184-9e06-e76c7398fc08@redhat.com>

Hi,

On Tue, Nov 15, 2016 at 11:38 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 15/11/2016 15:36, Namhyung Kim wrote:
>> Hi,
>>
>> On Tue, Nov 15, 2016 at 10:57:29AM +0100, Paolo Bonzini wrote:
>>>
>>>
>>> On 15/11/2016 06:06, Michael S. Tsirkin wrote:
>>>> On Tue, Nov 15, 2016 at 01:50:21PM +0900, Namhyung Kim wrote:
>>>>> Hi Michael,
>>>>>
>>>>> On Thu, Nov 10, 2016 at 06:39:55PM +0200, Michael S. Tsirkin wrote:
>>>>>> On Sat, Aug 20, 2016 at 05:07:42PM +0900, Namhyung Kim wrote:
>>>>>>> The virtio pstore driver provides interface to the pstore subsystem so
>>>>>>> that the guest kernel's log/dump message can be saved on the host
>>>>>>> machine.  Users can access the log file directly on the host, or on the
>>>>>>> guest at the next boot using pstore filesystem.  It currently deals with
>>>>>>> kernel log (printk) buffer only, but we can extend it to have other
>>>>>>> information (like ftrace dump) later.
>>>>>>>
>>>>>>> It supports legacy PCI device using single order-2 page buffer.
>>>>>>
>>>>>> Do you mean a legacy virtio device? I don't see why
>>>>>> you would want to support pre-1.0 mode.
>>>>>> If you drop that, you can drop all cpu_to_virtio things
>>>>>> and just use __le accessors.
>>>>>
>>>>> I was thinking about the kvmtools which lacks 1.0 support AFAIK.
>>>>
>>>> Unless kvmtools wants to be left behind it has to go 1.0.
>>>
>>> And it also has to go ACPI.  Is there any reason, apart from kvmtool, to
>>> make a completely new virtio device, with no support in existing guests,
>>> rather than implement ACPI ERST?
>>
>> Well, I know nothing about ACPI.  It looks like a huge spec and I
>> don't want to dig into it just for this.
>
> ERST (error record serialization table) is a small subset of the ACPI spec.

Not sure how independent ERST is from ACPI and other specs.  It looks
like referencing UEFI spec at least.  Btw, is the ERST used for pstore
only (in Linux)?

Also I need to control pstore driver like using bigger buffer,
enabling specific message types and so on if ERST supports.  Is it
possible for ERST to provide such information?

Thanks,
Namhyung

^ permalink raw reply

* RE: [PATCH] crypto: add virtio-crypto driver
From: Gonglei (Arei) @ 2016-11-16 10:09 UTC (permalink / raw)
  To: Gonglei (Arei), qemu-devel@nongnu.org,
	virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org
  Cc: Huangweidong (C), Claudio Fontana, mst@redhat.com, Luonengjun,
	Hanweidong (Randy), Huangpeng (Peter), Xuquan (Quan Xu),
	stefanha@redhat.com, Zhoujian (jay, Euler),
	arei.gonglei@hotmail.com, davem@davemloft.net, Wubin (H),
	herbert@gondor.apana.org.au
In-Reply-To: <1479106074-32036-1-git-send-email-arei.gonglei@huawei.com>

Hi Michael,

	May I should convert all __virtio32/64 to le32/64 in virtio_crypto.h ?


> +#define VIRTIO_CRYPTO_OPCODE(service, op)   (((service) << 8) | (op))
> +
> +struct virtio_crypto_ctrl_header {
> +#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02)
> +#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03)
> +#define VIRTIO_CRYPTO_HASH_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02)
> +#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03)
> +#define VIRTIO_CRYPTO_MAC_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02)
> +#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03)
> +#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
> +#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
> +	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
> +	__virtio32 opcode;
> +	__virtio32 algo;
> +	__virtio32 flag;
> +	/* data virtqueue id */
> +	__virtio32 queue_id;
> +};
> +
> +struct virtio_crypto_cipher_session_para {
> +#define VIRTIO_CRYPTO_NO_CIPHER                 0
> +#define VIRTIO_CRYPTO_CIPHER_ARC4               1
> +#define VIRTIO_CRYPTO_CIPHER_AES_ECB            2
> +#define VIRTIO_CRYPTO_CIPHER_AES_CBC            3
> +#define VIRTIO_CRYPTO_CIPHER_AES_CTR            4
> +#define VIRTIO_CRYPTO_CIPHER_DES_ECB            5
> +#define VIRTIO_CRYPTO_CIPHER_DES_CBC            6
> +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB           7
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC           8
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR           9
> +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8          10
> +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA2        11
> +#define VIRTIO_CRYPTO_CIPHER_AES_F8             12
> +#define VIRTIO_CRYPTO_CIPHER_AES_XTS            13
> +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3           14
> +	__virtio32 algo;
> +	/* length of key */
> +	__virtio32 keylen;
> +
> +#define VIRTIO_CRYPTO_OP_ENCRYPT  1
> +#define VIRTIO_CRYPTO_OP_DECRYPT  2
> +	/* encrypt or decrypt */
> +	__virtio32 op;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_session_input {
> +	/* Device-writable part */
> +	__virtio64 session_id;
> +	__virtio32 status;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_cipher_session_req {
> +	struct virtio_crypto_cipher_session_para para;
> +};
> +
> +struct virtio_crypto_hash_session_para {
> +#define VIRTIO_CRYPTO_NO_HASH            0
> +#define VIRTIO_CRYPTO_HASH_MD5           1
> +#define VIRTIO_CRYPTO_HASH_SHA1          2
> +#define VIRTIO_CRYPTO_HASH_SHA_224       3
> +#define VIRTIO_CRYPTO_HASH_SHA_256       4
> +#define VIRTIO_CRYPTO_HASH_SHA_384       5
> +#define VIRTIO_CRYPTO_HASH_SHA_512       6
> +#define VIRTIO_CRYPTO_HASH_SHA3_224      7
> +#define VIRTIO_CRYPTO_HASH_SHA3_256      8
> +#define VIRTIO_CRYPTO_HASH_SHA3_384      9
> +#define VIRTIO_CRYPTO_HASH_SHA3_512      10
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128      11
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256      12
> +	__virtio32 algo;
> +	/* hash result length */
> +	__virtio32 hash_result_len;
> +};
> +
> +struct virtio_crypto_hash_create_session_req {
> +	struct virtio_crypto_hash_session_para para;
> +};
> +
> +struct virtio_crypto_mac_session_para {
> +#define VIRTIO_CRYPTO_NO_MAC                       0
> +#define VIRTIO_CRYPTO_MAC_HMAC_MD5                 1
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA1                2
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224             3
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256             4
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384             5
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512             6
> +#define VIRTIO_CRYPTO_MAC_CMAC_3DES                25
> +#define VIRTIO_CRYPTO_MAC_CMAC_AES                 26
> +#define VIRTIO_CRYPTO_MAC_KASUMI_F9                27
> +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA2              28
> +#define VIRTIO_CRYPTO_MAC_GMAC_AES                 41
> +#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH             42
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_AES               49
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9         50
> +#define VIRTIO_CRYPTO_MAC_XCBC_AES                 53
> +	__virtio32 algo;
> +	/* hash result length */
> +	__virtio32 hash_result_len;
> +	/* length of authenticated key */
> +	__virtio32 auth_key_len;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_mac_create_session_req {
> +	struct virtio_crypto_mac_session_para para;
> +};
> +
> +struct virtio_crypto_aead_session_para {
> +#define VIRTIO_CRYPTO_NO_AEAD     0
> +#define VIRTIO_CRYPTO_AEAD_GCM    1
> +#define VIRTIO_CRYPTO_AEAD_CCM    2
> +#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305  3
> +	__virtio32 algo;
> +	/* length of key */
> +	__virtio32 key_len;
> +	/* hash result length */
> +	__virtio32 hash_result_len;
> +	/* length of the additional authenticated data (AAD) in bytes */
> +	__virtio32 aad_len;
> +	/* encrypt or decrypt, See above VIRTIO_CRYPTO_OP_* */
> +	__virtio32 op;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_aead_create_session_req {
> +	struct virtio_crypto_aead_session_para para;
> +};
> +
> +struct virtio_crypto_alg_chain_session_para {
> +#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_HASH_THEN_CIPHER  1
> +#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_CIPHER_THEN_HASH  2
> +	__virtio32 alg_chain_order;
> +/* Plain hash */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_PLAIN    1
> +/* Authenticated hash (mac) */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_AUTH     2
> +/* Nested hash */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_NESTED   3
> +	__virtio32 hash_mode;
> +	struct virtio_crypto_cipher_session_para cipher_param;
> +	union {
> +		struct virtio_crypto_hash_session_para hash_param;
> +		struct virtio_crypto_mac_session_para mac_param;
> +	} u;
> +	/* length of the additional authenticated data (AAD) in bytes */
> +	__virtio32 aad_len;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_alg_chain_session_req {
> +	struct virtio_crypto_alg_chain_session_para para;
> +};
> +
> +struct virtio_crypto_sym_create_session_req {
> +	union {
> +		struct virtio_crypto_cipher_session_req cipher;
> +		struct virtio_crypto_alg_chain_session_req chain;
> +	} u;
> +
> +	/* Device-readable part */
> +
> +/* No operation */
> +#define VIRTIO_CRYPTO_SYM_OP_NONE  0
> +/* Cipher only operation on the data */
> +#define VIRTIO_CRYPTO_SYM_OP_CIPHER  1
> +/*
> + * Chain any cipher with any hash or mac operation. The order
> + * depends on the value of alg_chain_order param
> + */
> +#define VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING  2
> +	__virtio32 op_type;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_destroy_session_req {
> +	/* Device-readable part */
> +	__virtio64  session_id;
> +};
> +
> +/* The request of the control viritqueue's packet */
> +struct virtio_crypto_op_ctrl_req {
> +	struct virtio_crypto_ctrl_header header;
> +
> +	union {
> +		struct virtio_crypto_sym_create_session_req
> +			sym_create_session;
> +		struct virtio_crypto_hash_create_session_req
> +			hash_create_session;
> +		struct virtio_crypto_mac_create_session_req
> +			mac_create_session;
> +		struct virtio_crypto_aead_create_session_req
> +			aead_create_session;
> +		struct virtio_crypto_destroy_session_req
> +			destroy_session;
> +	} u;
> +};
> +
> +struct virtio_crypto_op_header {
> +#define VIRTIO_CRYPTO_CIPHER_ENCRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x00)
> +#define VIRTIO_CRYPTO_CIPHER_DECRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x01)
> +#define VIRTIO_CRYPTO_HASH \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x00)
> +#define VIRTIO_CRYPTO_MAC \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_ENCRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_DECRYPT \
> +	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
> +	__virtio32 opcode;
> +	/* algo should be service-specific algorithms */
> +	__virtio32 algo;
> +	/* session_id should be service-specific algorithms */
> +	__virtio64 session_id;
> +	/* control flag to control the request */
> +	__virtio32 flag;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_cipher_para {
> +	/*
> +	 * Byte Length of valid IV/Counter
> +	 *
> +	 * For block ciphers in CBC or F8 mode, or for Kasumi in F8 mode, or for
> +	 *   SNOW3G in UEA2 mode, this is the length of the IV (which
> +	 *   must be the same as the block length of the cipher).
> +	 * For block ciphers in CTR mode, this is the length of the counter
> +	 *   (which must be the same as the block length of the cipher).
> +	 * For AES-XTS, this is the 128bit tweak, i, from IEEE Std 1619-2007.
> +	 *
> +	 * The IV/Counter will be updated after every partial cryptographic
> +	 * operation.
> +	 */
> +	__virtio32 iv_len;
> +	/* length of source data */
> +	__virtio32 src_data_len;
> +	/* length of dst data */
> +	__virtio32 dst_data_len;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_hash_para {
> +	/* length of source data */
> +	__virtio32 src_data_len;
> +	/* hash result length */
> +	__virtio32 hash_result_len;
> +};
> +
> +struct virtio_crypto_mac_para {
> +	struct virtio_crypto_hash_para hash;
> +};
> +
> +struct virtio_crypto_aead_para {
> +	/*
> +	 * Byte Length of valid IV data pointed to by the below iv_addr
> +	 * parameter.
> +	 *
> +	 * For GCM mode, this is either 12 (for 96-bit IVs) or 16, in which
> +	 *   case iv_addr points to J0.
> +	 * For CCM mode, this is the length of the nonce, which can be in the
> +	 *   range 7 to 13 inclusive.
> +	 */
> +	__virtio32 iv_len;
> +	/* length of additional auth data */
> +	__virtio32 aad_len;
> +	/* length of source data */
> +	__virtio32 src_data_len;
> +	/* length of dst data */
> +	__virtio32 dst_data_len;
> +};
> +
> +struct virtio_crypto_cipher_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_cipher_para para;
> +};
> +
> +struct virtio_crypto_hash_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_hash_para para;
> +};
> +
> +struct virtio_crypto_mac_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_mac_para para;
> +};
> +
> +struct virtio_crypto_alg_chain_data_para {
> +	__virtio32 iv_len;
> +	/* Length of source data */
> +	__virtio32 src_data_len;
> +	/* Length of destination data */
> +	__virtio32 dst_data_len;
> +	/* Starting point for cipher processing in source data */
> +	__virtio32 cipher_start_src_offset;
> +	/* Length of the source data that the cipher will be computed on */
> +	__virtio32 len_to_cipher;
> +	/* Starting point for hash processing in source data */
> +	__virtio32 hash_start_src_offset;
> +	/* Length of the source data that the hash will be computed on */
> +	__virtio32 len_to_hash;
> +	/* Length of the additional auth data */
> +	__virtio32 aad_len;
> +	/* Length of the hash result */
> +	__virtio32 hash_result_len;
> +	__virtio32 reserved;
> +};
> +
> +struct virtio_crypto_alg_chain_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_alg_chain_data_para para;
> +};
> +
> +struct virtio_crypto_sym_data_req {
> +	union {
> +		struct virtio_crypto_cipher_data_req cipher;
> +		struct virtio_crypto_alg_chain_data_req chain;
> +	} u;
> +
> +	/* See above VIRTIO_CRYPTO_SYM_OP_* */
> +	__virtio32 op_type;
> +	__virtio32 padding;
> +};
> +
> +struct virtio_crypto_aead_data_req {
> +	/* Device-readable part */
> +	struct virtio_crypto_aead_para para;
> +};
> +
> +/* The request of the data viritqueue's packet */
> +struct virtio_crypto_op_data_req {
> +	struct virtio_crypto_op_header header;
> +
> +	union {
> +		struct virtio_crypto_sym_data_req  sym_req;
> +		struct virtio_crypto_hash_data_req hash_req;
> +		struct virtio_crypto_mac_data_req mac_req;
> +		struct virtio_crypto_aead_data_req aead_req;
> +	} u;
> +};
> +
> +#define VIRTIO_CRYPTO_OK        0
> +#define VIRTIO_CRYPTO_ERR       1
> +#define VIRTIO_CRYPTO_BADMSG    2
> +#define VIRTIO_CRYPTO_NOTSUPP   3
> +#define VIRTIO_CRYPTO_INVSESS   4 /* Invaild session id */
> +
> +/* The accelerator hardware is ready */
> +#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
> +#define VIRTIO_CRYPTO_S_STARTED  (1 << 1)
> +
> +struct virtio_crypto_config {
> +	/* See VIRTIO_CRYPTO_OP_* above */
> +	__virtio32  status;
> +
> +	/*
> +	 * Maximum number of data queue legal values are between 1 and
> 0x8000
> +	 */
> +	__virtio32  max_dataqueues;
> +
> +	/*
> +	 * Specifies the services mask which the devcie support,
> +	 * see VIRTIO_CRYPTO_SERVICE_* above
> +	 */
> +	__virtio32 crypto_services;
> +
> +	/* Detailed algorithms mask */
> +	__virtio32 cipher_algo_l;
> +	__virtio32 cipher_algo_h;
> +	__virtio32 hash_algo;
> +	__virtio32 mac_algo_l;
> +	__virtio32 mac_algo_h;
> +	__virtio32 aead_algo;
> +	/* Maximum length of cipher key */
> +	__virtio32 max_cipher_key_len;
> +	/* Maximum length of authenticated key */
> +	__virtio32 max_auth_key_len;
> +	__virtio32 reserve;
> +	/* Maximum size of each crypto request's content */
> +	__virtio64 max_size;
> +};
> +

^ permalink raw reply

* Re: [PATCH v7 06/11] x86, paravirt: Add interface to support kvm/xen vcpu preempted check
From: Peter Zijlstra @ 2016-11-16 10:23 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: kvm, rkrcmar, benh, will.deacon, virtualization, paulus,
	kernellwp, linux-s390, dave, mpe, x86, mingo, Pan Xinhui,
	xen-devel, paulmck, konrad.wilk, boqun.feng, jgross, linux-kernel,
	David.Laight, xen-devel-request, pbonzini, linuxppc-dev
In-Reply-To: <a6026a0c-9ad8-5025-c616-eb33f96e91ce@linux.vnet.ibm.com>

On Wed, Nov 16, 2016 at 12:19:09PM +0800, Pan Xinhui wrote:
> Hi, Peter.
> 	I think we can avoid a function call in a simpler way. How about below
> 
> static inline bool vcpu_is_preempted(int cpu)
> {
> 	/* only set in pv case*/
> 	if (pv_lock_ops.vcpu_is_preempted)
> 		return pv_lock_ops.vcpu_is_preempted(cpu);
> 	return false;
> }

That is still more expensive. It needs to do an actual load and makes it
hard to predict the branch, you'd have to actually wait for the load to
complete etc.

Also, it generates more code.

Paravirt muck should strive to be as cheap as possible when ran on
native hardware.

^ permalink raw reply

* Re: [PATCH v7 06/11] x86, paravirt: Add interface to support kvm/xen vcpu preempted check
From: Christian Borntraeger @ 2016-11-16 11:29 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: kvm, rkrcmar, benh, will.deacon, virtualization, paulus,
	kernellwp, linux-s390, dave, mpe, x86, mingo, Pan Xinhui,
	xen-devel, paulmck, konrad.wilk, boqun.feng, jgross, linux-kernel,
	David.Laight, xen-devel-request, pbonzini, linuxppc-dev
In-Reply-To: <20161116102355.GP3142@twins.programming.kicks-ass.net>

On 11/16/2016 11:23 AM, Peter Zijlstra wrote:
> On Wed, Nov 16, 2016 at 12:19:09PM +0800, Pan Xinhui wrote:
>> Hi, Peter.
>> 	I think we can avoid a function call in a simpler way. How about below
>>
>> static inline bool vcpu_is_preempted(int cpu)
>> {
>> 	/* only set in pv case*/
>> 	if (pv_lock_ops.vcpu_is_preempted)
>> 		return pv_lock_ops.vcpu_is_preempted(cpu);
>> 	return false;
>> }
> 
> That is still more expensive. It needs to do an actual load and makes it
> hard to predict the branch, you'd have to actually wait for the load to
> complete etc.

Out of curiosity, why is that hard to predict?
On s390 the branch prediction runs asynchronously ahead of the downstream
pipeline (e.g. search for "IBM z Systems Processor Optimization Primer" page 11).
given enough capacity, I would assume that modern x86 processors would do the same
and be able to predict this is as soon as it becomes hot (and otherwise you would
 not notice the branch miss anyway). Is x86 behaving differently here?

> Also, it generates more code.
> 
> Paravirt muck should strive to be as cheap as possible when ran on
> native hardware.

As I am interested in this series from the s390 point of view, this is 
the only thing that block this series?

Is there a chance to add a static key around the paravirt ops somehow?

^ permalink raw reply

* Re: [PATCH v7 06/11] x86, paravirt: Add interface to support kvm/xen vcpu preempted check
From: Peter Zijlstra @ 2016-11-16 11:43 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kvm, rkrcmar, benh, will.deacon, virtualization, paulus,
	kernellwp, linux-s390, dave, mpe, x86, mingo, Pan Xinhui,
	xen-devel, paulmck, Pan Xinhui, konrad.wilk, boqun.feng, jgross,
	linux-kernel, David.Laight, xen-devel-request, pbonzini,
	linuxppc-dev
In-Reply-To: <f8b9d1be-98a2-4d99-1ae1-c31c6fe13903@de.ibm.com>

On Wed, Nov 16, 2016 at 12:29:44PM +0100, Christian Borntraeger wrote:
> On 11/16/2016 11:23 AM, Peter Zijlstra wrote:
> > On Wed, Nov 16, 2016 at 12:19:09PM +0800, Pan Xinhui wrote:
> >> Hi, Peter.
> >> 	I think we can avoid a function call in a simpler way. How about below
> >>
> >> static inline bool vcpu_is_preempted(int cpu)
> >> {
> >> 	/* only set in pv case*/
> >> 	if (pv_lock_ops.vcpu_is_preempted)
> >> 		return pv_lock_ops.vcpu_is_preempted(cpu);
> >> 	return false;
> >> }
> > 
> > That is still more expensive. It needs to do an actual load and makes it
> > hard to predict the branch, you'd have to actually wait for the load to
> > complete etc.
> 
> Out of curiosity, why is that hard to predict?
> On s390 the branch prediction runs asynchronously ahead of the downstream
> pipeline (e.g. search for "IBM z Systems Processor Optimization Primer" page 11).
> given enough capacity, I would assume that modern x86 processors would do the same
> and be able to predict this is as soon as it becomes hot (and otherwise you would
>  not notice the branch miss anyway). Is x86 behaving differently here?

Not sure how exactly it works, but it seems to me that an immediate
assignment to the value you're going to compare would leave very little
doubt.

Then again, maybe cores aren't that smart and only look at the
hysterical btb for prediction.

> > Also, it generates more code.
> > 
> > Paravirt muck should strive to be as cheap as possible when ran on
> > native hardware.
> 
> As I am interested in this series from the s390 point of view, this is 
> the only thing that block this series?

Ingo was rewriting the changelog, other than that, no, I can do this on
top. Just spotted this because Ingo and me talked it over.

> Is there a chance to add a static key around the paravirt ops somehow?

More code generation still, replacing the call with an immediate
assignment to the return register is the shortest possible option I
think.

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Paolo Bonzini @ 2016-11-16 12:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: virtio-dev, Tony Luck, Kees Cook, KVM,
	Radim Krčmář, Michael S. Tsirkin, LKML,
	Steven Rostedt, qemu-devel, Minchan Kim, Anton Vorontsov,
	Anthony Liguori, Colin Cross, virtualization, Ingo Molnar
In-Reply-To: <CAM9d7cj8CxS77WC-7O7fRX1s0sqTZqJo=ZHseN-hc2jpE8QD2Q@mail.gmail.com>

> Not sure how independent ERST is from ACPI and other specs.  It looks
> like referencing UEFI spec at least.

It is just the format of error records that comes from the UEFI spec
(include/linux/cper.h) but you can ignore it, I think.  It should be
handled by tools on the host side.  For you, the error log address
range contains a CPER header followed by a binary blob.  In practice,
you only need the record length field (bytes 20-23 of the header),
though it may be a good idea to validate the signature at the beginning
of the header.

> Btw, is the ERST used for pstore only (in Linux)?

Yes.  It can store various records, including dmesg and MCE.

There are other examples in QEMU of interfaces with ACPI.  They all use the
DSDT, but the logic is similar.  For example, docs/specs/acpi_mem_hotplug.txt
documents the memory hotplug interface. In all cases, ACPI tables contain small
programs that talk to specialized hardware registers, typically allocated to
hard-coded I/O ports.

In your case, the registers could occupy 16 consecutive I/O ports, like the
following:

     0x00       read/write   operation type (0=write,1=read,2=clear,3=dummy write)

     0x01       read-only    bit 7: if set, operation in progress

                             bit 0-6: operation status, see "Command Status Definition" in
                             the ACPI spec

     0x02       read-only    when read:

                             - read a 64-bit record id from the store to memory,
                               from the address that was last written to 0x08.

                             - if the id is valid and is not the last id in the store,
                               write the next 64-bit record id to the same address

                             - otherwise, write the first record id to the same address,
                               or 0xffffffffffffffff if the store is empty

     0x03                    unused, read as zero

     0x04-0x07  read/write   offset of the error record into the error log address range

     0x08-0x0b  read/write   when read, return number of stored records

                             when written, the written value is a 32-bit memory address,
                             which points to a 64-bit location used to communicate record ids.

     0x0c-0x0f  read/write   when read, always return -1 (together with the "mask" field
                             and READ_REGISTER, this lets ERST instructions return any value!)

                             when written, trigger the pstore operation:

                             - if the current operation is a dummy write, do nothing

                             - if the current operation is a write, write a new record, using
                             the written value as the base of the error log address range.  The
                             length must be parsed from the CPER header.

                             - if the current operation is a clear, read the record id
                             from the memory location that was last written to 0x08 and do the
                             operation.  the value written is ignored.

                             - if the current operation is a read, read the record id from the
                             memory location that was last written to 0x08, using the written
                             value as the base of the error log address range.

In addition, the firmware will need to reserve a few KB of RAM for the error log
address range (I checked a real system and it reserves 8KB).  The first eight
bytes are needed for the record identifier interface, because there's no such
thing as 64-bit I/O ports, and the rest can be used for the actual buffer.

QEMU already has an interface to allocate RAM and patch the address into an
ACPI table (bios_linker_loader_alloc).  Because this interface is actually meant
to load data from QEMU into the firmware (using the "fw_cfg" interface), you
would have to add a dummy 8KB file to fw_cfg using fw_cfg_add_file (for
example "etc/erst-memory"), it can be just full of zeros.

QEMU supports two chipsets, PIIX and ICH9, and the free I/O port ranges are
different.  You could use 0xa20 for ICH9 and 0xae20 for PIIX.

All in all, the contents of the ERST table would not be very different from a
non-virtual system, except that on real hardware the firmware would use SMIs
as the trap mechanism.  You almost have a one-to-one mapping between ERST
actions and registers accesses:

   BEGIN_WRITE_OPERATION                  write value 0 to register at 0x00
   BEGIN_READ_OPERATION                   write value 1 to register at 0x00
   BEGIN_CLEAR_OPERATION                  write value 2 to register at 0x00
   BEGIN_DUMMY_WRITE_OPERATION            write value 3 to register at 0x00
   END_OPERATION                          no-op
   CHECK_BUSY_STATUS                      read register at 0x01 with mask 0x80
   GET_COMMAND_STATUS                     read register at 0x01 with mask 0x7f
   SET_RECORD_OFFSET                      write register at 0x04
   GET_RECORD_COUNT                       read register at 0x08
   EXECUTE_OPERATION                      write ERST memory base + 8 to 0x0c
   GET_ERROR_LOG_ADDRESS_RANGE            read register at 0x0c (with mask = ERST memory base + 8)
   GET_ERROR_LOG_ADDRESS_RANGE_LENGTH     read register at 0x0c (with mask = 8192 - 8 = 8184)
   GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES read register at 0x0c (with mask = 0)

Only the get/set record identifier instructions are a little harder:

   GET_RECORD_IDENTIFIER                  write ERST memory base to register at 0x08
                                          read register at 0x02
                                          read eight bytes at ERST memory base

   SET_RECORD_IDENTIFIER                  write ERST memory base to register at 0x08
                                          write eight bytes at ERST memory base

On top of this, you need to add the APEI UUID (see apei_osc_setup in Linux)
to build_q35_osc_method, and use "-M q35" when you start QEMU.  If you need
more help just ask.  I or others can help you with the ACPI glue, then you
can write the file backend yourself, based on your existing virtio-pstore code.

> Also I need to control pstore driver like using bigger buffer,
> enabling specific message types and so on if ERST supports.  Is it
> possible for ERST to provide such information?

It's the normal pstore driver, same as on a real server.  What exactly do you
need?

Paolo

^ permalink raw reply

* [PATCH 1/1] sched: provide common cpu_relax_yield definition
From: Christian Borntraeger @ 2016-11-16 12:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Catalin Marinas, x86, Will Deacon,
	Russell King, Heiko Carstens, linux-kernel, sparclinux,
	Noam Camus, Nicholas Piggin, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

No need to duplicate the same define everywhere. Since
the only user is stop-machine and the only provider is
s390, we can use a default implementation of cpu_relax_yield
in sched.h.

Suggested-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 -
 arch/arc/include/asm/processor.h        | 3 ---
 arch/arm/include/asm/processor.h        | 2 --
 arch/arm64/include/asm/processor.h      | 2 --
 arch/avr32/include/asm/processor.h      | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h        | 1 -
 arch/cris/include/asm/processor.h       | 1 -
 arch/frv/include/asm/processor.h        | 1 -
 arch/h8300/include/asm/processor.h      | 1 -
 arch/hexagon/include/asm/processor.h    | 1 -
 arch/ia64/include/asm/processor.h       | 1 -
 arch/m32r/include/asm/processor.h       | 1 -
 arch/m68k/include/asm/processor.h       | 1 -
 arch/metag/include/asm/processor.h      | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h       | 1 -
 arch/mn10300/include/asm/processor.h    | 1 -
 arch/nios2/include/asm/processor.h      | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h     | 1 -
 arch/powerpc/include/asm/processor.h    | 2 --
 arch/s390/include/asm/processor.h       | 1 +
 arch/score/include/asm/processor.h      | 1 -
 arch/sh/include/asm/processor.h         | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h       | 2 --
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h        | 2 --
 arch/x86/um/asm/processor.h             | 1 -
 arch/xtensa/include/asm/processor.h     | 1 -
 include/linux/sched.h                   | 4 ++++
 33 files changed, 5 insertions(+), 38 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 31e8dbe..2fec2de 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,7 +58,6 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index d102a49..6e1242d 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,15 +60,12 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()		barrier()
-#define cpu_relax_yield()	cpu_relax()
 
 #else
 
 #define cpu_relax()     \
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
-#define cpu_relax_yield()	cpu_relax()
-
 #endif
 
 #define copy_segments(tsk, mm)      do { } while (0)
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 9e71c58b..c3d5fc1 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,8 +82,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()			barrier()
 #endif
 
-#define cpu_relax_yield()  	              cpu_relax()
-
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
 
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 6132f64..747c65a 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,8 +149,6 @@ static inline void cpu_relax(void)
 	asm volatile("yield" ::: "memory");
 }
 
-#define cpu_relax_yield()                     cpu_relax()
-
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
 					 struct task_struct *next);
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index ee62365..972adcc 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,7 +92,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()		barrier()
-#define cpu_relax_yield()	cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 57acfb1..85d4af9 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -92,7 +92,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    	smp_mb()
-#define cpu_relax_yield()      cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 1fd22e7..b9eb3da 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -121,7 +121,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(task)	(task_pt_regs(task)->sp)
 
 #define cpu_relax()		do { } while (0)
-#define cpu_relax_yield()             cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
 
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 1a57841..15b815d 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -63,7 +63,6 @@ static inline void release_thread(struct task_struct *dead_task)
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
-#define cpu_relax_yield() cpu_relax()
 
 void default_idle(void);
 
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index c1e5f2a..ddaeb9c 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -107,7 +107,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk)->thread.frame0->sp)
 
 #define cpu_relax() barrier()
-#define cpu_relax_yield() cpu_relax()
 
 /* data cache prefetch */
 #define ARCH_HAS_PREFETCH
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 42d6053..65132d7 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -127,7 +127,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    barrier()
-#define cpu_relax_yield() cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
 	local_irq_disable();		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 5d694cc..45a8254 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -56,7 +56,6 @@ struct thread_struct {
 }
 
 #define cpu_relax() __vmyield()
-#define cpu_relax_yield() cpu_relax()
 
 /*
  * Decides where the kernel will search for a free chunk of vm space during
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 0c2c3b2..03911a3 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -547,7 +547,6 @@ ia64_eoi (void)
 }
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
-#define cpu_relax_yield() cpu_relax()
 
 static inline int
 ia64_get_irr(unsigned int vector)
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 9b83a13..5767367 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -133,6 +133,5 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)  ((tsk)->thread.sp)
 
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index b0d0442..f5f790c 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -156,6 +156,5 @@ unsigned long get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk)	((struct pt_regs *) ((tsk)->thread.esp0))
 
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index ee302a6..ec6a490 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -152,7 +152,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define user_stack_pointer(regs)        ((regs)->ctx.AX[0].U0)
 
 #define cpu_relax()     barrier()
-#define cpu_relax_yield() cpu_relax()
 
 extern void setup_priv(void);
 
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index 08ec1f7..37ef196 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -22,7 +22,6 @@
 extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
-# define cpu_relax_yield() cpu_relax()
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 8ea95e7..95b8c47 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -389,7 +389,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
 
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 /*
  * Return_address is a replacement for __builtin_return_address(count)
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index d11397b..18e17ab 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -69,7 +69,6 @@ extern void print_cpu_info(struct mn10300_cpuinfo *);
 extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
-#define cpu_relax_yield() cpu_relax()
 
 /*
  * User space process size: 1.75GB (default).
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index d32c176..3bbbc3d 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -88,7 +88,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.kregs->sp)
 
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 7f47fc7..a908e6c 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -92,7 +92,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
-#define cpu_relax_yield() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_OPENRISC_PROCESSOR_H */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index a4a07f4..ca40741 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -309,7 +309,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.regs.gr[30])
 
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 /*
  * parisc_requires_coherency() is used to identify the combined VIPT/PIPT
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 5684e68..dac83fc 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,8 +404,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #define cpu_relax()	barrier()
 #endif
 
-#define cpu_relax_yield() cpu_relax()
-
 /* Check that a certain kernel stack pointer is valid in task_struct p */
 int validate_sp(unsigned long sp, struct task_struct *p,
                        unsigned long nbytes);
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 17c001a..9eab1cb 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -234,6 +234,7 @@ static inline unsigned short stap(void)
 /*
  * Give up the time slice of the virtual PU.
  */
+#define cpu_relax_yield cpu_relax_yield
 void cpu_relax_yield(void);
 
 #define cpu_relax() barrier()
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index a1e97c0..d9a922d 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -24,7 +24,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 #define cpu_relax()		barrier()
-#define cpu_relax_yield()	cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
 /*
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 9454ff1..5addd69 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -97,7 +97,6 @@ extern struct sh_cpuinfo cpu_data[];
 
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 void default_idle(void);
 void stop_this_cpu(void *);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index fc32b73..365d4cb 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -119,7 +119,6 @@ extern struct task_struct *last_task_used_math;
 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
-#define cpu_relax_yield() cpu_relax()
 
 extern void (*sparc_idle)(void);
 
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 12787df..6448cfc 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -216,7 +216,6 @@ unsigned long get_wchan(struct task_struct *task);
 				     "nop\n\t"				\
 				     ".previous"			\
 				     ::: "memory")
-#define cpu_relax_yield() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
  * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index c1c228b..0bc9968 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -264,8 +264,6 @@ static inline void cpu_relax(void)
 	barrier();
 }
 
-#define cpu_relax_yield() cpu_relax()
-
 /* Info on this processor (see fs/proc/cpuinfo.c) */
 struct seq_operations;
 extern const struct seq_operations cpuinfo_op;
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index eeefe7c..4eaa421 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -71,7 +71,6 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
-#define cpu_relax_yield()		cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7513c99..c84605b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,8 +588,6 @@ static __always_inline void cpu_relax(void)
 	rep_nop();
 }
 
-#define cpu_relax_yield() cpu_relax()
-
 /* Stop speculative execution and prefetching of modified code. */
 static inline void sync_core(void)
 {
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index b4bd63b..c77db22 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -26,7 +26,6 @@ static inline void rep_nop(void)
 }
 
 #define cpu_relax()		rep_nop()
-#define cpu_relax_yield()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
 
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index 7d8d6be..86ffcd6 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -206,7 +206,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)		(task_pt_regs(tsk)->areg[1])
 
 #define cpu_relax()  barrier()
-#define cpu_relax_yield() cpu_relax()
 
 /* Special register access. */
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..c1aa3b0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2444,6 +2444,10 @@ static inline void calc_load_enter_idle(void) { }
 static inline void calc_load_exit_idle(void) { }
 #endif /* CONFIG_NO_HZ_COMMON */
 
+#ifndef cpu_relax_yield
+#define cpu_relax_yield() cpu_relax()
+#endif
+
 /*
  * Do not use outside of architecture code which knows its limitations.
  *
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH 1/1] sched: provide common cpu_relax_yield definition
From: David Hildenbrand @ 2016-11-16 12:34 UTC (permalink / raw)
  To: Christian Borntraeger, Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Catalin Marinas, x86, Will Deacon,
	Russell King, Heiko Carstens, linux-kernel, sparclinux,
	Noam Camus, Nicholas Piggin, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar
In-Reply-To: <1479298985-191589-1-git-send-email-borntraeger@de.ibm.com>

Am 16.11.2016 um 13:23 schrieb Christian Borntraeger:
> No need to duplicate the same define everywhere. Since
> the only user is stop-machine and the only provider is
> s390, we can use a default implementation of cpu_relax_yield
> in sched.h.
>
> Suggested-by: Russell King <linux@armlinux.org.uk>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

Looks good to me!

Reviewed-by: David Hildenbrand <david@redhat.com>

David

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox