All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linu Cherian <linuc.decode@gmail.com>
To: pbonzini@redhat.com, rkrcmar@redhat.com
Cc: kvm@vger.kernel.org, linu.cherian@cavium.com,
	sunil.goutham@cavium.com, Geethasowjanya.Akula@cavium.com
Subject: Possible race condition during lock_count increment in srcu_read_lock
Date: Mon, 29 May 2017 20:00:57 +0530	[thread overview]
Message-ID: <20170529143057.GA12683@virtx40> (raw)

Hi,

Issue observed:
--------------
Iperf traffic running on guest 10G interface(VFIO assigned device)
followed by, kvm guest poweroff/shutdown results in the following
warning on Cavium arm64 platform.

Stacktrace.
[ 1151.424692] ------------[ cut here ]------------
[ 1151.429318] WARNING: CPU: 30 PID: 6744 at kernel/rcu/srcu.c:251
cleanup_srcu_struct+0xb4/0xd0
[ 1151.437832] Modules linked in: fuse cavium_rng_vf rng_core
cavium_rng ipv6 btrfs xor zlib_deflate raid6_pq

[ 1151.448978] CPU: 30 PID: 6744 Comm: qemu-system-aar Tainted: G
W       4.11.2 #1
[ 1151.456970] Hardware name: Gigabyte gbt-mt60/gbt-mt60, BIOS 0.3 Mar
31 2017
[ 1151.463921] task: fffffe1f994a3400 task.stack: fffffe1f99590000
[ 1151.469831] PC is at cleanup_srcu_struct+0xb4/0xd0
[ 1151.474612] LR is at cleanup_srcu_struct+0x80/0xd0
[ 1151.479392] pc : [<fffffc000811698c>] lr : [<fffffc0008116958>]
pstate: 60000145
[ 1151.486776] sp : fffffe1f99593b20
[ 1151.490081] x29: fffffe1f99593b20 x28: fffffe1f994a3400 
[ 1151.495385] x27: 0000000000000008 x26: fffffe1f99593de8 
[ 1151.500689] x25: dead000000000100 x24: dead000000000200 
[ 1151.505993] x23: fffffc0008e93f24 x22: fffffffffffffff9 
[ 1151.511296] x21: ffffff000d93f0f0 x20: fffffc0008e93bc8 
[ 1151.516601] x19: fffffc0008e93e18 x18: 000000000000003f 
[ 1151.521904] x17: 000000000000003f x16: 000000000000000d 
[ 1151.527208] x15: 0000000000000010 x14: 0000000000000000 
[ 1151.532512] x13: 0000000000000040 x12: 0000000000000020 
[ 1151.537816] x11: 0000000000000020 x10: 0101010101010101 
[ 1151.543120] x9 : 0000000040000000 x8 : 0000000000210d00 
[ 1151.548424] x7 : fefefeff71647274 x6 : 0000000000000000 
[ 1151.553728] x5 : 0000000000000000 x4 : 0000000000000000 
[ 1151.559032] x3 : 0000000000000000 x2 : 0000000000000040 
[ 1151.564335] x1 : 0000000000000040 x0 : 0000000000000040 

[ 1151.571119] ---[ end trace ff1987e1b5556fbe ]---
[ 1151.575726] Call trace:
[ 1151.578163] Exception stack(0xfffffe1f99593950 to
0xfffffe1f99593a80)
[ 1151.584594] 3940:
fffffc0008e93e18 0000040000000000
[ 1151.592414] 3960: fffffe1f99593b20 fffffc000811698c
fffffe1f99593980 fffffc000839cc28
[ 1151.600233] 3980: fffffe1f995939d0 fffffc00081344cc
fffffc000896b000 fffffe1f99593ad8
[ 1151.608053] 39a0: fffffe1f99593ab0 fffffc00081d2a5c
fffffdffc7d95ac0 ffffff1f656b1a00
[ 1151.615873] 39c0: fffffe1f99593ad0 fffffc00081d2a5c
fffffdff87e75e00 fffffe1f9d780000
[ 1151.623692] 39e0: fffffe1f994a3400 000000000000051e
0000000000000040 0000000000000040
[ 1151.631511] 3a00: 0000000000000040 0000000000000000
0000000000000000 0000000000000000
[ 1151.639330] 3a20: 0000000000000000 fefefeff71647274
0000000000210d00 0000000040000000
[ 1151.647150] 3a40: 0101010101010101 0000000000000020
0000000000000020 0000000000000040
[ 1151.654970] 3a60: 0000000000000000 0000000000000010
000000000000000d 000000000000003f
[ 1151.662789] [<fffffc000811698c>] cleanup_srcu_struct+0xb4/0xd0
[ 1151.668614] [<fffffc000809c930>] kvm_put_kvm+0x1e0/0x238
[ 1151.673916] [<fffffc000809c9f8>] kvm_vm_release+0x20/0x30
[ 1151.679308] [<fffffc00081f0c64>] __fput+0x8c/0x1d0
[ 1151.684089] [<fffffc00081f0e0c>] ____fput+0xc/0x18
[ 1151.688872] [<fffffc00080d5830>] task_work_run+0xc0/0xe0
[ 1151.694176] [<fffffc00080bf194>] do_exit+0x2c4/0x978
[ 1151.699131] [<fffffc00080bf8ac>] do_group_exit+0x34/0x98
[ 1151.704435] [<fffffc00080c9118>] get_signal+0x1e0/0x4c0
[ 1151.709653] [<fffffc00080863d0>] do_signal+0xb8/0x4d8
[ 1151.714694] [<fffffc0008086a08>] do_notify_resume+0x88/0xa8
[ 1151.720256] [<fffffc0008082ad0>] work_pending+0x8/0x10



Analysis
--------
Additional prints in srcu_readers_active, indicate that per cpu
variable, lock_count( of struct srcu_struct)lags behind the unlock
count and hence the warning. 

In KVM irq injection path, irqfd_wakeup in interrupt context
calls srcu_read_lock(&kvm->irq_srcu). A worker thread, irqfd_inject
in process context also calls srcu_read_lock inside kvm_set_irq. 

This can lead to race condition while incrementing the
lock_count(using __this_cpu_inc), since atomic instructions being not
used. 

Replacing __this_cpu_inc with this_cpu_inc(atomic version) workarounds the
problem which backs up this analysis.

Possible solution
-----------------
One way is to avoid the srcu_read_lock/unlock usage in irq context.
In arm/arm64 architecture, 	
if (kvm_arch_set_irq_inatomic(&irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
      false) == -EWOULDBLOCK)
in irqfd_wakefup is always true and hence schedule_work can be called
directly

@@ -195,6 +195,11 @@ int __attribute__((weak))
kvm_arch_set_irq_inatomic(
        int idx;
 
        if (flags & POLLIN) {
+               if (kvm_arch_set_irq_will_block_always()) {
+                       schedule_work(&irqfd->inject);
+                       goto skiplock;
+               }
+
                idx = srcu_read_lock(&kvm->irq_srcu);
                do {
                        seq =
			read_seqcount_begin(&irqfd->irq_entry_sc);
@@ -208,6 +213,7 @@ int __attribute__((weak))
kvm_arch_set_irq_inatomic(
                srcu_read_unlock(&kvm->irq_srcu, idx);
        }
 
+skiplock:
        if (flags & POLLHUP) {
                /* The eventfd is closing, detach from KVM */
                unsigned long flags;

This works without giving any warnings as well.

Is a patch welcome in that direction ?

Appreicate your feedback on this.



-- 
Linu cherian

             reply	other threads:[~2017-05-29 14:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-29 14:30 Linu Cherian [this message]
2017-05-29 15:05 ` Possible race condition during lock_count increment in srcu_read_lock Paolo Bonzini
2017-05-29 15:50   ` Paul E. McKenney
2017-05-29 16:38     ` Paolo Bonzini
2017-05-29 20:14       ` Paul E. McKenney
2017-05-30 13:49         ` Paolo Bonzini
2017-05-30 16:19           ` Paul E. McKenney
2017-05-31 13:03           ` Linu Cherian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170529143057.GA12683@virtx40 \
    --to=linuc.decode@gmail.com \
    --cc=Geethasowjanya.Akula@cavium.com \
    --cc=kvm@vger.kernel.org \
    --cc=linu.cherian@cavium.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=sunil.goutham@cavium.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.