All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: "rcu_preempt detected stalls" with xen_free_irq involved
Date: Thu, 24 Aug 2023 16:04:33 +0200	[thread overview]
Message-ID: <ZOdjcurhtb7eQNYe@mail-itl> (raw)
In-Reply-To: <ZOV8zMeie3oprrGg@mail-itl>

[-- Attachment #1: Type: text/plain, Size: 7639 bytes --]

On Wed, Aug 23, 2023 at 05:28:12AM +0200, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> Since updating from 5.15.124 to 6.1.43, I observe rather often an issue
> like in the subject. This happens on a domU with heavy vchan usage
> (several connections established and released per second).
> 
> The domain in question is a PVH with 16 vCPUs and generally is rather
> busy (CPU time, but also some noticeable network and disk I/O), but I've
> seen this happening also in less intensive times (but still several
> vchan connections being handled).
> 
> This is running on Xen 4.17.2, AMD EPYC (Zen3), with smt=off.
> 
> Any ideas?
> 
> Full message:
> 
>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>     rcu:       9-...0: (0 ticks this GP) idle=2364/1/0x4000000000000000 softirq=20505/20505 fqs=11999                                                                                                 
>        (detected by 12, t=60004 jiffies, g=79009, q=1863 ncpus=16)                                                                                                                                    
>     Sending NMI from CPU 12 to CPUs 9:                                                                                                                                                                
>     NMI backtrace for cpu 9                                                                                                                                                                           
>     CPU: 9 PID: 18266 Comm: qrexec-agent Not tainted 6.1.43-1.qubes.fc37.x86_64 #1                                                                                                                    
>     RIP: 0010:queued_write_lock_slowpath+0x64/0x124                                                                                                                                                   
>     Code: ff 90 0f 1f 44 00 00 5b 5d c3 cc cc cc cc f0 81 0b 00 01 00 00 ba ff 00 00 00 b9 00 01 00 00 8b 03 3d 00 01 00 00 74 0b f3 90 <8b> 03 3d 00 01 00 00 75 f5 89 c8 f0 0f b1 13 74 be eb e2 65 
>                                                                                                                                                                                                       
>     RSP: 0018:ffffc9000229fd30 EFLAGS: 00000006                                                                                                                                                       
>     RAX: 0000000000000500 RBX: ffffffff8468de60 RCX: 0000000000000100                                                                                                                                 
>     RDX: 00000000000000ff RSI: ffff8881004b86d8 RDI: ffffffff8468de60                                                                                                                                 
>     RBP: ffffffff8468de64 R08: ffff8881004b8860 R09: ffffffff82d47600                                                                                                                                 
>     R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810464e3e0                                                                                                                                 
>     R13: ffff8881012a76a0 R14: ffff88838d1f5a90 R15: 0000000000000000                                                                                                                                 
>     FS:  0000716dc7f6b780(0000) GS:ffff8883dc040000(0000) knlGS:0000000000000000                                                                                                                      
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                                 
>     CR2: 0000716dc7f7c060 CR3: 00000001e3d0c001 CR4: 0000000000770ee0                                                                                                                                 
>     PKRU: 55555554                                                                                                                                                                                    
>     Call Trace:                                                                                                                                                                                       
>      <NMI>                                                                                                                                                                                            
>      ? show_trace_log_lvl+0x1d3/0x2ef                                                                                                                                                                 
>      ? show_trace_log_lvl+0x1d3/0x2ef                                                                                                                                                                 
>      ? show_trace_log_lvl+0x1d3/0x2ef                                                                                                                                                                 
>      ? __raw_write_lock_irqsave+0x3d/0x50
>      ? nmi_cpu_backtrace.cold+0x1b/0x76
>      ? queued_write_lock_slowpath+0x64/0x124
>      ? nmi_cpu_backtrace_handler+0xd/0x20
>      ? nmi_handle+0x5d/0x120
>      ? queued_write_lock_slowpath+0x64/0x124
>      ? default_do_nmi+0x69/0x170
>      ? exc_nmi+0x13c/0x170
>      ? end_repeat_nmi+0x16/0x67
>      ? queued_write_lock_slowpath+0x64/0x124
>      ? queued_write_lock_slowpath+0x64/0x124
>      ? queued_write_lock_slowpath+0x64/0x124
>      </NMI>
>      <TASK>
>      __raw_write_lock_irqsave+0x3d/0x50
>      xen_free_irq+0x43/0x170
>      unbind_from_irqhandler+0x40/0x80
>      evtchn_release+0x27/0x8e [xen_evtchn]
>      __fput+0x91/0x250
>      task_work_run+0x59/0x90
>      exit_to_user_mode_loop+0x121/0x150
>      exit_to_user_mode_prepare+0xaf/0xc0
>      syscall_exit_to_user_mode+0x17/0x40
>      do_syscall_64+0x67/0x80
>      ? handle_mm_fault+0xdb/0x2d0
>      ? preempt_count_add+0x47/0xa0
>      ? up_read+0x37/0x70 
>      ? do_user_addr_fault+0x1bb/0x570
>      ? exc_page_fault+0x70/0x170
>      entry_SYSCALL_64_after_hwframe+0x63/0xcd
>     RIP: 0033:0x716dc81077ea
>     Code: 48 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8 d3 ce f8 ff 8b 7c 24 0c 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 33 cf f8 ff 8b 
>     
>     RSP: 002b:00007ffce6ce80e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
>     RAX: 0000000000000000 RBX: 000055aae7e8c150 RCX: 0000716dc81077ea
>     RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000014
>     RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>     R10: 0000716dc80086b8 R11: 0000000000000293 R12: 000055aae7e8ad70
>     R13: 0000000000000003 R14: 0000716dc8020bf8 R15: 00007ffce6ce81a0
>      </TASK>
> 
> 
> I've seen also few other flavors of the above, for example:
> https://gist.github.com/marmarek/a8b79ef2a877443c7aa57fdca366a701

Unfortunately, I've got it happened on 5.15.124 too (updated the gist
above with that trace too). So, it isn't that clear regression. That
said, it still happens way less often on 5.15 than 6.1.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2023-08-24 14:05 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-23  3:28 "rcu_preempt detected stalls" with xen_free_irq involved - regression Marek Marczykowski-Górecki
2023-08-24 14:04 ` Marek Marczykowski-Górecki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZOdjcurhtb7eQNYe@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.