* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Waiman Long @ 2016-07-11 15:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-s390, dave, benh, Pan Xinhui, boqun.feng, will.deacon,
linux-kernel, virtualization, mingo, paulus, mpe, schwidefsky,
pbonzini, paulmck, linuxppc-dev
In-Reply-To: <20160706065255.GH30909@twins.programming.kicks-ass.net>
On 07/06/2016 02:52 AM, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>> change fomr v1:
>> a simplier definition of default vcpu_is_preempted
>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> add more comments
>> thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p&& perf report
>>
>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>
> Paolo, could you help out with an (x86) KVM interface for this?
>
> Waiman, could you see if you can utilize this to get rid of the
> SPIN_THRESHOLD in qspinlock_paravirt?
That API is certainly useful to make the paravirt spinlock perform
better. However, I am not sure if we can completely get rid of the
SPIN_THRESHOLD at this point. It is not just the kvm, the xen code need
to be modified as well.
Cheers,
Longman
^ permalink raw reply
* [PATCH 1/1] balloon: check the number of available pages in leak balloon
From: Denis V. Lunev @ 2016-07-11 12:28 UTC (permalink / raw)
To: virtualization, linux-kernel; +Cc: den, Konstantin Neumoin, Michael S. Tsirkin
From: Konstantin Neumoin <kneumoin@virtuozzo.com>
The balloon has a special mechanism that is subscribed to the oom
notification which leads to deflation for a fixed number of pages.
The number is always fixed even when the balloon is fully deflated.
But leak_balloon did not expect that the pages to deflate will be more
than taken, and raise a "BUG" in balloon_page_dequeue when page list
will be empty.
So, the simplest solution would be to check that the number of releases
pages is less or equal to the number taken pages.
Signed-off-by: Konstantin Neumoin <kneumoin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
---
drivers/virtio/virtio_balloon.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 476c0e3..f6ea8f4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -202,6 +202,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
num = min(num, ARRAY_SIZE(vb->pfns));
mutex_lock(&vb->balloon_lock);
+ /* We can't release more pages than taken */
+ num = min(num, (size_t)vb->num_pages);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
page = balloon_page_dequeue(vb_dev_info);
--
2.1.4
^ permalink raw reply related
* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
From: Jason Wang @ 2016-07-08 9:14 UTC (permalink / raw)
To: Michael S. Tsirkin, Craig Gallek
Cc: kvm, Eric Dumazet, netdev, LKML, virtualization, brouer,
David Miller
In-Reply-To: <20160708091855-mutt-send-email-mst@redhat.com>
On 2016年07月08日 14:19, Michael S. Tsirkin wrote:
> On Wed, Jul 06, 2016 at 01:45:58PM -0400, Craig Gallek wrote:
>> >On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang<jasowang@redhat.com> wrote:
>>> > >Hi all:
>>> > >
>>> > >This series tries to switch to use skb array in tun. This is used to
>>> > >eliminate the spinlock contention between producer and consumer. The
>>> > >conversion was straightforward: just introdce a tx skb array and use
>>> > >it instead of sk_receive_queue.
>> >
>> >I'm seeing the splat below after this series. I'm still wrapping my
>> >head around this code, but it appears to be happening because the
>> >tun_struct passed into tun_queue_resize is uninitialized.
>> >Specifically, iteration over the disabled list_head fails because prev
>> >= next = NULL. This seems to happen when a startup script on my test
>> >machine changes the queue length. I'll try to figure out what's
>> >happening, but if it's obvious to someone else from the stack, please
>> >let me know.
> Don't see anything obvious. I'm traveling, will look at it when I'm back
> unless it's fixed by then. Jason, any idea?
>
Looks like Craig has posted a fix to this:
http://patchwork.ozlabs.org/patch/645645/
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
From: Michael S. Tsirkin @ 2016-07-08 6:19 UTC (permalink / raw)
To: Craig Gallek
Cc: Eric Dumazet, kvm, netdev, LKML, virtualization, brouer,
David Miller
In-Reply-To: <CAEfhGixKLfLOj11ERgPhpVB+UXbqxrvkQKgQgn19bDjroO_4dQ@mail.gmail.com>
On Wed, Jul 06, 2016 at 01:45:58PM -0400, Craig Gallek wrote:
> On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang <jasowang@redhat.com> wrote:
> > Hi all:
> >
> > This series tries to switch to use skb array in tun. This is used to
> > eliminate the spinlock contention between producer and consumer. The
> > conversion was straightforward: just introdce a tx skb array and use
> > it instead of sk_receive_queue.
>
> I'm seeing the splat below after this series. I'm still wrapping my
> head around this code, but it appears to be happening because the
> tun_struct passed into tun_queue_resize is uninitialized.
> Specifically, iteration over the disabled list_head fails because prev
> = next = NULL. This seems to happen when a startup script on my test
> machine changes the queue length. I'll try to figure out what's
> happening, but if it's obvious to someone else from the stack, please
> let me know.
Uploading your .config might help BTW.
> [ 72.322236] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000010
> [ 72.329993] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [ 72.336032] PGD 7f054f1067 PUD 7ef6f3f067 PMD 0
> [ 72.340616] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> [ 72.345498] gsmi: Log Shutdown Reason 0x03
> [ 72.349541] Modules linked in: w1_therm wire cdc_acm ehci_pci
> ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
> [ 72.359870] CPU: 12 PID: 7820 Comm: set.ixion-haswe Not tainted
> 4.7.0-dbx-DEV #10
> [ 72.360253] mlx4_en: eth0: Link Up
> [ 72.370618] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15,
> BIOS 2.50.0 01/21/2016
> [ 72.378525] task: ffff883f2501e8c0 ti: ffff883f3ef08000 task.ti:
> ffff883f3ef08000
> [ 72.385917] RIP: 0010:[<ffffffff8153c1a0>] [<ffffffff8153c1a0>]
> tun_device_event+0x110/0x340
> [ 72.394353] RSP: 0018:ffff883f3ef0bbe8 EFLAGS: 00010202
> [ 72.399599] RAX: fffffffffffffae8 RBX: ffff887ef9883378 RCX: 0000000000000000
> [ 72.406647] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
> [ 72.413694] RBP: ffff883f3ef0bc58 R08: 0000000000000000 R09: 0000000000000001
> [ 72.420742] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
> [ 72.427789] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883f3ef0bd10
> [ 72.434837] FS: 00007fac4e5dd700(0000) GS:ffff883f7f700000(0000)
> knlGS:0000000000000000
> [ 72.442832] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 72.448507] CR2: 0000000000000010 CR3: 0000007ef66ac000 CR4: 00000000001406e0
> [ 72.455555] Stack:
> [ 72.457541] ffff883f3ef0bc18 0000000000000246 000000000000001e
> ffff887ef9880000
> [ 72.464880] ffff883f3ef0bd10 0000000000000000 0000000000000000
> ffffffff00000000
> [ 72.472219] ffff883f3ef0bc58 ffffffff81d24070 00000000fffffff9
> ffffffff81cec7a0
> [ 72.479559] Call Trace:
> [ 72.481986] [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
> [ 72.487844] [<ffffffff816365c0>] ? show_tx_maxrate+0x30/0x30
> [ 72.493520] [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
> [ 72.499801] [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
> [ 72.506001] [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
> [ 72.512706] [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
> [ 72.518986] [<ffffffff816365f8>] change_tx_queue_len+0x38/0x80
> [ 72.524838] [<ffffffff816381cf>] netdev_store.isra.5+0xbf/0xd0
> [ 72.530688] [<ffffffff81638330>] tx_queue_len_store+0x50/0x60
> [ 72.536459] [<ffffffff814a6798>] dev_attr_store+0x18/0x30
> [ 72.541888] [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
> [ 72.547306] [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
> [ 72.553077] [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
> [ 72.559534] [<ffffffff8125a108>] __vfs_write+0x28/0x120
> [ 72.564781] [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
> [ 72.570542] [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [ 72.576303] [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [ 72.582063] [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
> [ 72.587138] [<ffffffff8125c092>] SyS_write+0x52/0xa0
> [ 72.592135] [<ffffffff817528a5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> [ 72.598497] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00
> 00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6
> 41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa
> ff ff
> [ 72.617767] RIP [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [ 72.623883] RSP <ffff883f3ef0bbe8>
> [ 72.627327] CR2: 0000000000000010
> [ 72.630638] ---[ end trace b0c54137cf861b91 ]---
^ permalink raw reply
* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
From: Michael S. Tsirkin @ 2016-07-08 6:19 UTC (permalink / raw)
To: Craig Gallek
Cc: Eric Dumazet, kvm, netdev, LKML, virtualization, brouer,
David Miller
In-Reply-To: <CAEfhGixKLfLOj11ERgPhpVB+UXbqxrvkQKgQgn19bDjroO_4dQ@mail.gmail.com>
On Wed, Jul 06, 2016 at 01:45:58PM -0400, Craig Gallek wrote:
> On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang <jasowang@redhat.com> wrote:
> > Hi all:
> >
> > This series tries to switch to use skb array in tun. This is used to
> > eliminate the spinlock contention between producer and consumer. The
> > conversion was straightforward: just introdce a tx skb array and use
> > it instead of sk_receive_queue.
>
> I'm seeing the splat below after this series. I'm still wrapping my
> head around this code, but it appears to be happening because the
> tun_struct passed into tun_queue_resize is uninitialized.
> Specifically, iteration over the disabled list_head fails because prev
> = next = NULL. This seems to happen when a startup script on my test
> machine changes the queue length. I'll try to figure out what's
> happening, but if it's obvious to someone else from the stack, please
> let me know.
Don't see anything obvious. I'm traveling, will look at it when I'm back
unless it's fixed by then. Jason, any idea?
> [ 72.322236] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000010
> [ 72.329993] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [ 72.336032] PGD 7f054f1067 PUD 7ef6f3f067 PMD 0
> [ 72.340616] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> [ 72.345498] gsmi: Log Shutdown Reason 0x03
> [ 72.349541] Modules linked in: w1_therm wire cdc_acm ehci_pci
> ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
> [ 72.359870] CPU: 12 PID: 7820 Comm: set.ixion-haswe Not tainted
> 4.7.0-dbx-DEV #10
> [ 72.360253] mlx4_en: eth0: Link Up
> [ 72.370618] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15,
> BIOS 2.50.0 01/21/2016
> [ 72.378525] task: ffff883f2501e8c0 ti: ffff883f3ef08000 task.ti:
> ffff883f3ef08000
> [ 72.385917] RIP: 0010:[<ffffffff8153c1a0>] [<ffffffff8153c1a0>]
> tun_device_event+0x110/0x340
> [ 72.394353] RSP: 0018:ffff883f3ef0bbe8 EFLAGS: 00010202
> [ 72.399599] RAX: fffffffffffffae8 RBX: ffff887ef9883378 RCX: 0000000000000000
> [ 72.406647] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
> [ 72.413694] RBP: ffff883f3ef0bc58 R08: 0000000000000000 R09: 0000000000000001
> [ 72.420742] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
> [ 72.427789] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883f3ef0bd10
> [ 72.434837] FS: 00007fac4e5dd700(0000) GS:ffff883f7f700000(0000)
> knlGS:0000000000000000
> [ 72.442832] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 72.448507] CR2: 0000000000000010 CR3: 0000007ef66ac000 CR4: 00000000001406e0
> [ 72.455555] Stack:
> [ 72.457541] ffff883f3ef0bc18 0000000000000246 000000000000001e
> ffff887ef9880000
> [ 72.464880] ffff883f3ef0bd10 0000000000000000 0000000000000000
> ffffffff00000000
> [ 72.472219] ffff883f3ef0bc58 ffffffff81d24070 00000000fffffff9
> ffffffff81cec7a0
> [ 72.479559] Call Trace:
> [ 72.481986] [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
> [ 72.487844] [<ffffffff816365c0>] ? show_tx_maxrate+0x30/0x30
> [ 72.493520] [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
> [ 72.499801] [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
> [ 72.506001] [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
> [ 72.512706] [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
> [ 72.518986] [<ffffffff816365f8>] change_tx_queue_len+0x38/0x80
> [ 72.524838] [<ffffffff816381cf>] netdev_store.isra.5+0xbf/0xd0
> [ 72.530688] [<ffffffff81638330>] tx_queue_len_store+0x50/0x60
> [ 72.536459] [<ffffffff814a6798>] dev_attr_store+0x18/0x30
> [ 72.541888] [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
> [ 72.547306] [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
> [ 72.553077] [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
> [ 72.559534] [<ffffffff8125a108>] __vfs_write+0x28/0x120
> [ 72.564781] [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
> [ 72.570542] [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [ 72.576303] [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
> [ 72.582063] [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
> [ 72.587138] [<ffffffff8125c092>] SyS_write+0x52/0xa0
> [ 72.592135] [<ffffffff817528a5>] entry_SYSCALL_64_fastpath+0x18/0xa8
> [ 72.598497] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00
> 00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6
> 41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa
> ff ff
> [ 72.617767] RIP [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
> [ 72.623883] RSP <ffff883f3ef0bbe8>
> [ 72.627327] CR2: 0000000000000010
> [ 72.630638] ---[ end trace b0c54137cf861b91 ]---
^ permalink raw reply
* Re: [PATCH 0/2] virtio/s390 patches for 4.8
From: Michael S. Tsirkin @ 2016-07-07 16:42 UTC (permalink / raw)
To: Cornelia Huck; +Cc: virtualization, kvm, linux-s390
In-Reply-To: <1467904077-1887-1-git-send-email-cornelia.huck@de.ibm.com>
On Thu, Jul 07, 2016 at 05:07:55PM +0200, Cornelia Huck wrote:
> Michael,
>
> here are two virtio/s390 patches for 4.8.
>
> First, Jing Liu noticed that she could trigger panics while playing
> around with hvc0 as preferred console but no virtio console: This
> can be fixed by not discarding our early_put_chars after init (as
> the minimal fix).
>
> This made us wonder why we still have that code around when no current
> host code supports the old transport: We have no idea whether this
> still works, and it's probably a good idea to put a deprecation
> message in there to check whether anyone screams.
>
> Patches are against your vhost branch.
thanks!
> Christian Borntraeger (1):
> virtio/s390: keep early_put_chars
>
> Cornelia Huck (1):
> virtio/s390: deprecate old transport
>
> arch/s390/Kconfig | 13 +++++++++++++
> drivers/s390/virtio/Makefile | 6 +++++-
> drivers/s390/virtio/kvm_virtio.c | 4 +++-
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
> --
> 2.6.6
^ permalink raw reply
* [PATCH 2/2] virtio/s390: deprecate old transport
From: Cornelia Huck @ 2016-07-07 15:07 UTC (permalink / raw)
To: mst; +Cc: virtualization, kvm, linux-s390
In-Reply-To: <1467904077-1887-1-git-send-email-cornelia.huck@de.ibm.com>
There only ever have been two host implementations of the old
s390-virtio (pre-ccw) transport: the experimental kuli userspace,
and qemu. As qemu switched its default to ccw with 2.4 (with most
users having used ccw well before that) and removed the old transport
entirely in 2.6, s390-virtio probably hasn't been in active use for
quite some time and is therefore likely to bitrot.
Let's start the slow march towards removing the code by deprecating
it.
Note that this also deprecates the early virtio console code, which
has been causing trouble in the guest without being wired up in any
relevant hypervisor code.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
arch/s390/Kconfig | 13 +++++++++++++
drivers/s390/virtio/Makefile | 6 +++++-
drivers/s390/virtio/kvm_virtio.c | 2 ++
3 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index a8c2590..80fffc2 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -875,4 +875,17 @@ config S390_GUEST
Select this option if you want to run the kernel as a guest under
the KVM hypervisor.
+config S390_GUEST_OLD_TRANSPORT
+ def_bool y
+ prompt "Guest support for old s390 virtio transport (DEPRECATED)"
+ depends on S390_GUEST
+ help
+ Enable this option to add support for the old s390-virtio
+ transport (i.e. virtio devices NOT based on virtio-ccw). This
+ type of virtio devices is only available on the experimental
+ kuli userspace or with old (< 2.6) qemu. If you are running
+ with a modern version of qemu (which supports virtio-ccw since
+ 1.4 and uses it by default since version 2.4), you probably won't
+ need this.
+
endmenu
diff --git a/drivers/s390/virtio/Makefile b/drivers/s390/virtio/Makefile
index 241891a..df40692 100644
--- a/drivers/s390/virtio/Makefile
+++ b/drivers/s390/virtio/Makefile
@@ -6,4 +6,8 @@
# it under the terms of the GNU General Public License (version 2 only)
# as published by the Free Software Foundation.
-obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o
+s390-virtio-objs := virtio_ccw.o
+ifdef CONFIG_S390_GUEST_OLD_TRANSPORT
+s390-virtio-objs += kvm_virtio.o
+endif
+obj-$(CONFIG_S390_GUEST) += $(s390-virtio-objs)
diff --git a/drivers/s390/virtio/kvm_virtio.c b/drivers/s390/virtio/kvm_virtio.c
index b0a849f..5e5c11f 100644
--- a/drivers/s390/virtio/kvm_virtio.c
+++ b/drivers/s390/virtio/kvm_virtio.c
@@ -458,6 +458,8 @@ static int __init kvm_devices_init(void)
if (test_devices_support(total_memory_size) < 0)
return -ENODEV;
+ pr_warn("The s390-virtio transport is deprecated. Please switch to a modern host providing virtio-ccw.\n");
+
rc = vmem_add_mapping(total_memory_size, PAGE_SIZE);
if (rc)
return rc;
--
2.6.6
^ permalink raw reply related
* [PATCH 1/2] virtio/s390: keep early_put_chars
From: Cornelia Huck @ 2016-07-07 15:07 UTC (permalink / raw)
To: mst; +Cc: linux-s390, kvm, Jing Liu, virtualization
In-Reply-To: <1467904077-1887-1-git-send-email-cornelia.huck@de.ibm.com>
From: Christian Borntraeger <borntraeger@de.ibm.com>
In case the registration of the hvc tty never happens AND the kernel
thinks that hvc0 is the preferred console we should keep the early
printk function to avoid a kernel panic due to code being removed.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
drivers/s390/virtio/kvm_virtio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/virtio/kvm_virtio.c b/drivers/s390/virtio/kvm_virtio.c
index 1d060fd..b0a849f 100644
--- a/drivers/s390/virtio/kvm_virtio.c
+++ b/drivers/s390/virtio/kvm_virtio.c
@@ -482,7 +482,7 @@ static int __init kvm_devices_init(void)
}
/* code for early console output with virtio_console */
-static __init int early_put_chars(u32 vtermno, const char *buf, int count)
+static int early_put_chars(u32 vtermno, const char *buf, int count)
{
char scratch[17];
unsigned int len = count;
--
2.6.6
^ permalink raw reply related
* [PATCH 0/2] virtio/s390 patches for 4.8
From: Cornelia Huck @ 2016-07-07 15:07 UTC (permalink / raw)
To: mst; +Cc: virtualization, kvm, linux-s390
Michael,
here are two virtio/s390 patches for 4.8.
First, Jing Liu noticed that she could trigger panics while playing
around with hvc0 as preferred console but no virtio console: This
can be fixed by not discarding our early_put_chars after init (as
the minimal fix).
This made us wonder why we still have that code around when no current
host code supports the old transport: We have no idea whether this
still works, and it's probably a good idea to put a deprecation
message in there to check whether anyone screams.
Patches are against your vhost branch.
Christian Borntraeger (1):
virtio/s390: keep early_put_chars
Cornelia Huck (1):
virtio/s390: deprecate old transport
arch/s390/Kconfig | 13 +++++++++++++
drivers/s390/virtio/Makefile | 6 +++++-
drivers/s390/virtio/kvm_virtio.c | 4 +++-
3 files changed, 21 insertions(+), 2 deletions(-)
--
2.6.6
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-07 11:21 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, benh, kvm, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, Pan Xinhui,
schwidefsky, Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <20160707094215.GT30921@twins.programming.kicks-ass.net>
On Thu, Jul 07, 2016 at 11:42:15AM +0200, Peter Zijlstra wrote:
> +static void update_steal_time_preempt(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_steal_time *st;
> +
> + if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
> + return;
> +
> + if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
> + &vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
> + return;
> +
> + st = &vcpu->arch.st.steal;
> +
> + st->pad[KVM_ST_PAD_PREEMPT] = 1; /* we've stopped running */
So maybe have this be:
... = kvm_vcpu_running();
That avoids marking the vcpu preempted when we do hlt and such.
> + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
> + st, sizeof(struct kvm_steal_time));
> +}
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-07 11:15 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, benh, kvm, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, Pan Xinhui,
schwidefsky, Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+CxKVqrFB-cRPEvmUamZ5Y8cOGe3pyc0LMJy0t9_jo+i0Q@mail.gmail.com>
On Thu, Jul 07, 2016 at 06:27:26PM +0800, Wanpeng Li wrote:
> In addition, I see xen's vcpu_runstate_info::state is updated during
> schedule, so I think I can do this similarly through kvm preemption
> notifier. IIUC, xen hypervisor has VCPUOP_get_runstate_info hypercall
> implemention, so the desired interface can be implemented if they add
> hypercall callsite in domU. I can add hypercall to kvm similarly.
So I suspect Xen has the page its writing to pinned in memory; so that a
write to it is guaranteed to not fault.
Otherwise I cannot see this working.
That is part of the larger surgery required for KVM steal time to get
'fixed'. Currently that steal time stuff uses kvm_write_guest_cached()
which appears to be able to fault in guest pages.
Or I'm not reading this stuff right; which is entirely possible.
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-07 11:09 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, benh, kvm, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, Pan Xinhui,
schwidefsky, Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <20160707094215.GT30921@twins.programming.kicks-ass.net>
On Thu, Jul 07, 2016 at 11:42:15AM +0200, Peter Zijlstra wrote:
> I suspect you want something like so; except this has holes in.
>
> We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
> after enabling it, this means that if we get preempted in between, the
> vcpu is reported as running even though it very much is not.
>
> Fixing that requires much larger surgery.
Note that this same hole is already a 'problem' for steal time
accounting. The thread can accrue further delays (iow steal time) after
we've called record_steal_time(). These delays will go unaccounted.
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-07 11:08 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, benh, kvm, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, Pan Xinhui,
schwidefsky, Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+CyrPbOVpJ6-Wa=g6oXTU0vVfw-ciLgm-duZeVF6MBH=RA@mail.gmail.com>
On Thu, Jul 07, 2016 at 06:12:51PM +0800, Wanpeng Li wrote:
> Btw, do this in preemption
> notifier means that the vCPU is real preempted on host, however,
> depends on vmexit is different semantic I think.
Not sure; suppose the vcpu is about to reenter, eg, we're in
vcpu_enter_guest() but before the preempt_disable() and the thread gets
preempted. Are we then not preempted? The vcpu might still very much be
in running state but had to service an vmexit due to an MSR or IO port
or whatnot.
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Wanpeng Li @ 2016-07-07 10:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-s390, Davidlohr Bueso, benh, kvm, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, Pan Xinhui,
schwidefsky, Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+CyrPbOVpJ6-Wa=g6oXTU0vVfw-ciLgm-duZeVF6MBH=RA@mail.gmail.com>
2016-07-07 18:12 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2016-07-07 17:42 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
>> On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote:
>>> 2016-07-06 20:28 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>> > Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to
>>> > indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
>>> > VCPU has been scheduled out since the last time the guest reset the bit.
>>> > The guest can use an xchg to test-and-clear it. The bit can be
>>> > accessed at any time, independent of the version field.
>>>
>>> If one vCPU is preempted, and guest check it several times before this
>>> vCPU is scheded in, then the first time we can get "vCPU is
>>> preempted", however, since the field is cleared, the second time we
>>> will get "vCPU is running".
>>>
>>> Do you mean we should call record_steal_time() in both kvm_sched_in()
>>> and kvm_sched_out() to record this field? Btw, if we should keep both
>>> vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
>>> simultaneous?
>>
>> I suspect you want something like so; except this has holes in.
>>
>> We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
>> after enabling it, this means that if we get preempted in between, the
>> vcpu is reported as running even though it very much is not.
>
> Paolo also point out this to me offline yesterday: "Please change
> pad[12] to "__u32 preempted; __u32 pad[11];" too, and remember to
> update Documentation/virtual/kvm/msr.txt!". Btw, do this in preemption
> notifier means that the vCPU is real preempted on host, however,
> depends on vmexit is different semantic I think.
In addition, I see xen's vcpu_runstate_info::state is updated during
schedule, so I think I can do this similarly through kvm preemption
notifier. IIUC, xen hypervisor has VCPUOP_get_runstate_info hypercall
implemention, so the desired interface can be implemented if they add
hypercall callsite in domU. I can add hypercall to kvm similarly.
Paolo, thoughts?
Regards,
Wanpeng Li
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Wanpeng Li @ 2016-07-07 10:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-s390, Davidlohr Bueso, benh, kvm, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, Pan Xinhui,
schwidefsky, Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <20160707094215.GT30921@twins.programming.kicks-ass.net>
2016-07-07 17:42 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote:
>> 2016-07-06 20:28 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> > Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to
>> > indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
>> > VCPU has been scheduled out since the last time the guest reset the bit.
>> > The guest can use an xchg to test-and-clear it. The bit can be
>> > accessed at any time, independent of the version field.
>>
>> If one vCPU is preempted, and guest check it several times before this
>> vCPU is scheded in, then the first time we can get "vCPU is
>> preempted", however, since the field is cleared, the second time we
>> will get "vCPU is running".
>>
>> Do you mean we should call record_steal_time() in both kvm_sched_in()
>> and kvm_sched_out() to record this field? Btw, if we should keep both
>> vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
>> simultaneous?
>
> I suspect you want something like so; except this has holes in.
>
> We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
> after enabling it, this means that if we get preempted in between, the
> vcpu is reported as running even though it very much is not.
Paolo also point out this to me offline yesterday: "Please change
pad[12] to "__u32 preempted; __u32 pad[11];" too, and remember to
update Documentation/virtual/kvm/msr.txt!". Btw, do this in preemption
notifier means that the vCPU is real preempted on host, however,
depends on vmexit is different semantic I think.
Regards,
Wanpeng Li
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-07 9:42 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, benh, kvm, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, Pan Xinhui,
schwidefsky, Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+Cz1Ncxm7g0ophT+ijUmqnmTq9VwCZau3wMS56Vkm2+7fQ@mail.gmail.com>
On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote:
> 2016-07-06 20:28 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> > Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to
> > indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
> > VCPU has been scheduled out since the last time the guest reset the bit.
> > The guest can use an xchg to test-and-clear it. The bit can be
> > accessed at any time, independent of the version field.
>
> If one vCPU is preempted, and guest check it several times before this
> vCPU is scheded in, then the first time we can get "vCPU is
> preempted", however, since the field is cleared, the second time we
> will get "vCPU is running".
>
> Do you mean we should call record_steal_time() in both kvm_sched_in()
> and kvm_sched_out() to record this field? Btw, if we should keep both
> vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
> simultaneous?
I suspect you want something like so; except this has holes in.
We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
after enabling it, this means that if we get preempted in between, the
vcpu is reported as running even though it very much is not.
Fixing that requires much larger surgery.
---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2766723c951..117270df43b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1997,8 +1997,29 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
vcpu->arch.pv_time_enabled = false;
}
+static void update_steal_time_preempt(struct kvm_vcpu *vcpu)
+{
+ struct kvm_steal_time *st;
+
+ if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
+ return;
+
+ if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+ &vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
+ return;
+
+ st = &vcpu->arch.st.steal;
+
+ st->pad[KVM_ST_PAD_PREEMPT] = 1; /* we've stopped running */
+
+ kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+ st, sizeof(struct kvm_steal_time));
+}
+
static void record_steal_time(struct kvm_vcpu *vcpu)
{
+ struct kvm_steal_time *st;
+
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
@@ -2006,29 +2027,34 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
return;
- if (vcpu->arch.st.steal.version & 1)
- vcpu->arch.st.steal.version += 1; /* first time write, random junk */
+ st = &vcpu->arch.st.steal;
+
+ if (st->version & 1) {
+ st->flags = KVM_ST_FLAG_PREEMPT;
+ st->version += 1; /* first time write, random junk */
+ }
- vcpu->arch.st.steal.version += 1;
+ st->version += 1;
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
- &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
+ st, sizeof(struct kvm_steal_time));
smp_wmb();
- vcpu->arch.st.steal.steal += current->sched_info.run_delay -
+ st->steal += current->sched_info.run_delay -
vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
+ st->pad[KVM_ST_PAD_PREEMPT] = 0; /* we're about to start running */
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
- &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
+ st, sizeof(struct kvm_steal_time));
smp_wmb();
- vcpu->arch.st.steal.version += 1;
+ st->version += 1;
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
- &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
+ st, sizeof(struct kvm_steal_time));
}
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -6693,6 +6719,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_enable();
+ update_steal_time_preempt(vcpu);
+
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
/*
^ permalink raw reply related
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Wanpeng Li @ 2016-07-07 8:48 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-s390, Davidlohr Bueso, benh, kvm, Peter Zijlstra,
Pan Xinhui, boqun.feng, will.deacon, linux-kernel@vger.kernel.org,
Waiman Long, virtualization, Ingo Molnar, Paul Mackerras, mpe,
schwidefsky, Paul McKenney, linuxppc-dev
In-Reply-To: <8e8edf1b-b64b-3c44-b580-b9271663844c@redhat.com>
2016-07-06 20:28 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 06/07/2016 14:08, Wanpeng Li wrote:
>> 2016-07-06 18:44 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>>
>>>
>>> On 06/07/2016 08:52, Peter Zijlstra wrote:
>>>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>>>>> change fomr v1:
>>>>> a simplier definition of default vcpu_is_preempted
>>>>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>>>>> add more comments
>>>>> thanks boqun and Peter's suggestion.
>>>>>
>>>>> This patch set aims to fix lock holder preemption issues.
>>>>>
>>>>> test-case:
>>>>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>>>>
>>>>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>>>>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>>>>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>>>>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>>>>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>>>>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>>>>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>>>>
>>>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>>>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>>>>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>>>
>>>> Paolo, could you help out with an (x86) KVM interface for this?
>>>
>>> If it's just for spin loops, you can check if the version field in the
>>> steal time structure has changed.
>>
>> Steal time will not be updated until ahead of next vmentry except
>> wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
>> currently, right?
>
> Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to
> indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
> VCPU has been scheduled out since the last time the guest reset the bit.
> The guest can use an xchg to test-and-clear it. The bit can be
> accessed at any time, independent of the version field.
If one vCPU is preempted, and guest check it several times before this
vCPU is scheded in, then the first time we can get "vCPU is
preempted", however, since the field is cleared, the second time we
will get "vCPU is running".
Do you mean we should call record_steal_time() in both kvm_sched_in()
and kvm_sched_out() to record this field? Btw, if we should keep both
vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
simultaneous?
Regards,
Wanpeng Li
^ permalink raw reply
* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
From: Craig Gallek @ 2016-07-06 17:45 UTC (permalink / raw)
To: Jason Wang
Cc: Eric Dumazet, kvm, mst, netdev, LKML, virtualization, brouer,
David Miller
On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang <jasowang@redhat.com> wrote:
> Hi all:
>
> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_receive_queue.
I'm seeing the splat below after this series. I'm still wrapping my
head around this code, but it appears to be happening because the
tun_struct passed into tun_queue_resize is uninitialized.
Specifically, iteration over the disabled list_head fails because prev
= next = NULL. This seems to happen when a startup script on my test
machine changes the queue length. I'll try to figure out what's
happening, but if it's obvious to someone else from the stack, please
let me know.
[ 72.322236] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000010
[ 72.329993] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[ 72.336032] PGD 7f054f1067 PUD 7ef6f3f067 PMD 0
[ 72.340616] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[ 72.345498] gsmi: Log Shutdown Reason 0x03
[ 72.349541] Modules linked in: w1_therm wire cdc_acm ehci_pci
ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
[ 72.359870] CPU: 12 PID: 7820 Comm: set.ixion-haswe Not tainted
4.7.0-dbx-DEV #10
[ 72.360253] mlx4_en: eth0: Link Up
[ 72.370618] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15,
BIOS 2.50.0 01/21/2016
[ 72.378525] task: ffff883f2501e8c0 ti: ffff883f3ef08000 task.ti:
ffff883f3ef08000
[ 72.385917] RIP: 0010:[<ffffffff8153c1a0>] [<ffffffff8153c1a0>]
tun_device_event+0x110/0x340
[ 72.394353] RSP: 0018:ffff883f3ef0bbe8 EFLAGS: 00010202
[ 72.399599] RAX: fffffffffffffae8 RBX: ffff887ef9883378 RCX: 0000000000000000
[ 72.406647] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
[ 72.413694] RBP: ffff883f3ef0bc58 R08: 0000000000000000 R09: 0000000000000001
[ 72.420742] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
[ 72.427789] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883f3ef0bd10
[ 72.434837] FS: 00007fac4e5dd700(0000) GS:ffff883f7f700000(0000)
knlGS:0000000000000000
[ 72.442832] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 72.448507] CR2: 0000000000000010 CR3: 0000007ef66ac000 CR4: 00000000001406e0
[ 72.455555] Stack:
[ 72.457541] ffff883f3ef0bc18 0000000000000246 000000000000001e
ffff887ef9880000
[ 72.464880] ffff883f3ef0bd10 0000000000000000 0000000000000000
ffffffff00000000
[ 72.472219] ffff883f3ef0bc58 ffffffff81d24070 00000000fffffff9
ffffffff81cec7a0
[ 72.479559] Call Trace:
[ 72.481986] [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
[ 72.487844] [<ffffffff816365c0>] ? show_tx_maxrate+0x30/0x30
[ 72.493520] [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
[ 72.499801] [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
[ 72.506001] [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
[ 72.512706] [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
[ 72.518986] [<ffffffff816365f8>] change_tx_queue_len+0x38/0x80
[ 72.524838] [<ffffffff816381cf>] netdev_store.isra.5+0xbf/0xd0
[ 72.530688] [<ffffffff81638330>] tx_queue_len_store+0x50/0x60
[ 72.536459] [<ffffffff814a6798>] dev_attr_store+0x18/0x30
[ 72.541888] [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
[ 72.547306] [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
[ 72.553077] [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
[ 72.559534] [<ffffffff8125a108>] __vfs_write+0x28/0x120
[ 72.564781] [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
[ 72.570542] [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
[ 72.576303] [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
[ 72.582063] [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
[ 72.587138] [<ffffffff8125c092>] SyS_write+0x52/0xa0
[ 72.592135] [<ffffffff817528a5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 72.598497] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00
00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6
41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa
ff ff
[ 72.617767] RIP [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[ 72.623883] RSP <ffff883f3ef0bbe8>
[ 72.627327] CR2: 0000000000000010
[ 72.630638] ---[ end trace b0c54137cf861b91 ]---
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Christian Borntraeger @ 2016-07-06 16:56 UTC (permalink / raw)
To: Peter Zijlstra, Juergen Gross
Cc: linux-s390, xen-devel-request, dave, benh, Pan Xinhui, boqun.feng,
will.deacon, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, Martin Schwidefsky, pbonzini, paulmck, linuxppc-dev
In-Reply-To: <20160706081920.GG30921@twins.programming.kicks-ass.net>
On 07/06/2016 10:19 AM, Peter Zijlstra wrote:
> On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote:
>> On 06/07/16 08:52, Peter Zijlstra wrote:
>
>>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> Xen support of this interface should be rather easy. Could you please
>> Cc: xen-devel-request@lists.xenproject.org in the next version?
>
> So meta question; aren't all you virt people looking at the regular
> virtualization list? Or should we really dig out all the various
> hypervisor lists and Cc them?
>
Some of the kvm on s390 team reads this, but I would assume that the base s390 team
does not (Martin can you confirm?) as the main focus was z/VM and LPAR. So maybe adding
linux-s390@vger for generic things does make sense.
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Wanpeng Li @ 2016-07-06 13:03 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-s390, Davidlohr Bueso, benh, kvm, Peter Zijlstra,
Pan Xinhui, boqun.feng, will.deacon, linux-kernel@vger.kernel.org,
Waiman Long, virtualization, Ingo Molnar, Paul Mackerras, mpe,
schwidefsky, Paul McKenney, linuxppc-dev
In-Reply-To: <8e8edf1b-b64b-3c44-b580-b9271663844c@redhat.com>
2016-07-06 20:28 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 06/07/2016 14:08, Wanpeng Li wrote:
>> 2016-07-06 18:44 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>>
>>>
>>> On 06/07/2016 08:52, Peter Zijlstra wrote:
>>>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>>>>> change fomr v1:
>>>>> a simplier definition of default vcpu_is_preempted
>>>>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>>>>> add more comments
>>>>> thanks boqun and Peter's suggestion.
>>>>>
>>>>> This patch set aims to fix lock holder preemption issues.
>>>>>
>>>>> test-case:
>>>>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>>>>
>>>>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>>>>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>>>>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>>>>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>>>>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>>>>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>>>>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>>>>
>>>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>>>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>>>>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>>>
>>>> Paolo, could you help out with an (x86) KVM interface for this?
>>>
>>> If it's just for spin loops, you can check if the version field in the
>>> steal time structure has changed.
>>
>> Steal time will not be updated until ahead of next vmentry except
>> wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
>> currently, right?
>
> Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to
> indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
> VCPU has been scheduled out since the last time the guest reset the bit.
> The guest can use an xchg to test-and-clear it. The bit can be
> accessed at any time, independent of the version field.
I will try to implement it tomorrow, thanks for your proposal. :)
Regards,
Wanpeng Li
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Paolo Bonzini @ 2016-07-06 12:44 UTC (permalink / raw)
To: Peter Zijlstra, Juergen Gross
Cc: linux-s390, dave, Pan Xinhui, boqun.feng, will.deacon,
xen-devel-request, waiman.long, linux-kernel, mingo, paulus, mpe,
benh, schwidefsky, paulmck, virtualization, linuxppc-dev
In-Reply-To: <20160706081920.GG30921@twins.programming.kicks-ass.net>
On 06/07/2016 10:19, Peter Zijlstra wrote:
>>> Paolo, could you help out with an (x86) KVM interface for this?
>> >
>> > Xen support of this interface should be rather easy. Could you please
>> > Cc: xen-devel-request@lists.xenproject.org in the next version?
> So meta question; aren't all you virt people looking at the regular
> virtualization list? Or should we really dig out all the various
> hypervisor lists and Cc them?
I at least skim the subjects.
Paolo
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Paolo Bonzini @ 2016-07-06 12:28 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, benh, kvm, Peter Zijlstra,
Pan Xinhui, boqun.feng, will.deacon, linux-kernel@vger.kernel.org,
Waiman Long, virtualization, Ingo Molnar, Paul Mackerras, mpe,
schwidefsky, Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+CwXLsNrC59g6tZk=U+T0Scn_O=b60NPESJXU6uaP7-AsA@mail.gmail.com>
On 06/07/2016 14:08, Wanpeng Li wrote:
> 2016-07-06 18:44 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>
>>
>> On 06/07/2016 08:52, Peter Zijlstra wrote:
>>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>>>> change fomr v1:
>>>> a simplier definition of default vcpu_is_preempted
>>>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>>>> add more comments
>>>> thanks boqun and Peter's suggestion.
>>>>
>>>> This patch set aims to fix lock holder preemption issues.
>>>>
>>>> test-case:
>>>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>>>
>>>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>>>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>>>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>>>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>>>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>>>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>>>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>>>
>>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>>>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>>
>>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> If it's just for spin loops, you can check if the version field in the
>> steal time structure has changed.
>
> Steal time will not be updated until ahead of next vmentry except
> wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
> currently, right?
Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to
indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
VCPU has been scheduled out since the last time the guest reset the bit.
The guest can use an xchg to test-and-clear it. The bit can be
accessed at any time, independent of the version field.
Paolo
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Wanpeng Li @ 2016-07-06 12:08 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-s390, Davidlohr Bueso, benh, kvm, Peter Zijlstra,
Pan Xinhui, boqun.feng, will.deacon, linux-kernel@vger.kernel.org,
Waiman Long, virtualization, Ingo Molnar, Paul Mackerras, mpe,
schwidefsky, Paul McKenney, linuxppc-dev
In-Reply-To: <14a24854-9787-e4a1-c9a8-76eba4e97301@redhat.com>
2016-07-06 18:44 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 06/07/2016 08:52, Peter Zijlstra wrote:
>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>>> change fomr v1:
>>> a simplier definition of default vcpu_is_preempted
>>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>>> add more comments
>>> thanks boqun and Peter's suggestion.
>>>
>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> test-case:
>>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>>
>>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>>
>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>>
>>
>> Paolo, could you help out with an (x86) KVM interface for this?
>
> If it's just for spin loops, you can check if the version field in the
> steal time structure has changed.
Steal time will not be updated until ahead of next vmentry except
wrmsr MSR_KVM_STEAL_TIME. So it can't represent it is preempted
currently, right?
Regards,
Wanpeng Li
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-06 11:59 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-s390, dave, benh, Pan Xinhui, boqun.feng, will.deacon,
linux-kernel, waiman.long, virtualization, mingo, paulus, mpe,
schwidefsky, paulmck, linuxppc-dev
In-Reply-To: <14a24854-9787-e4a1-c9a8-76eba4e97301@redhat.com>
On Wed, Jul 06, 2016 at 12:44:58PM +0200, Paolo Bonzini wrote:
> > Paolo, could you help out with an (x86) KVM interface for this?
>
> If it's just for spin loops, you can check if the version field in the
> steal time structure has changed.
That would require remembering the old value, no?
That would work with a previous interface proposal, see:
http://lkml.kernel.org/r/1466937715-6683-2-git-send-email-xinhui.pan@linux.vnet.ibm.com
the vcpu_get_yield_count() thing would match that I think.
However the current proposal:
http://lkml.kernel.org/r/1467124991-13164-2-git-send-email-xinhui.pan@linux.vnet.ibm.com
dropped that in favour of only vcpu_is_preempted(), which requires being
able to tell if a (remote) vcpu is currently running or not, which iirc,
isn't possible with the steal time sequence count.
^ permalink raw reply
* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: Balbir Singh @ 2016-07-06 10:54 UTC (permalink / raw)
To: Pan Xinhui, linux-kernel, linuxppc-dev, virtualization,
linux-s390
Cc: dave, peterz, mpe, boqun.feng, will.deacon, waiman.long, mingo,
paulus, benh, schwidefsky, paulmck
In-Reply-To: <1467124991-13164-3-git-send-email-xinhui.pan@linux.vnet.ibm.com>
On Tue, 2016-06-28 at 10:43 -0400, Pan Xinhui wrote:
> This is to fix some lock holder preemption issues. Some other locks
> implementation do a spin loop before acquiring the lock itself. Currently
> kernel has an interface of bool vcpu_is_preempted(int cpu). It take the cpu
^^ takes
> as parameter and return true if the cpu is preempted. Then kernel can break
> the spin loops upon on the retval of vcpu_is_preempted.
>
> As kernel has used this interface, So lets support it.
>
> Only pSeries need supoort it. And the fact is powerNV are built into same
^^ support
> kernel image with pSeries. So we need return false if we are runnig as
> powerNV. The another fact is that lppaca->yiled_count keeps zero on
^^ yield
> powerNV. So we can just skip the machine type.
>
> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 523673d..3ac9fcb 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -52,6 +52,24 @@
> #define SYNC_IO
> #endif
>
> +/*
> + * This support kernel to check if one cpu is preempted or not.
> + * Then we can fix some lock holder preemption issue.
> + */
> +#ifdef CONFIG_PPC_PSERIES
> +#define vcpu_is_preempted vcpu_is_preempted
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> + /*
> + * pSeries and powerNV can be built into same kernel image. In
> + * principle we need return false directly if we are running as
> + * powerNV. However the yield_count is always zero on powerNV, So
> + * skip such machine type check
Or you could use the ppc_md interface callbacks if required, but your
solution works as well
> + */
> + return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
> +}
> +#endif
> +
> static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> {
> return lock.slock == 0;
Balbir Singh.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox