public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vasant Hegde <vasant.hegde@amd.com>
To: Jerry Snitselaar <jsnitsel@redhat.com>,
	Peng Zhang <zhangpeng.00@bytedance.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>
Cc: will@kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, Li Bin <huawei.libin@huawei.com>,
	Xie XiuQi <xiexiuqi@huawei.com>,
	Yang Yingliang <yangyingliang@huawei.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Subject: Re: [PATCH] iommu: Avoid softlockup and rcu stall in fq_flush_timeout().
Date: Fri, 28 Apr 2023 11:14:54 +0530	[thread overview]
Message-ID: <21f69b43-a1e7-6c84-a360-dae410bedb3f@amd.com> (raw)
In-Reply-To: <7bede423-690c-4f6a-9c23-def4ad08344e@amd.com>

Hello Robin, Joerg,

Ping. Any suggestion on below proposal (schedule work on each CPU to free iova)?

-Vasant



On 4/18/2023 4:11 PM, Vasant Hegde wrote:
> Hi All,
> 
> 
> On 4/11/2023 3:52 AM, Jerry Snitselaar wrote:
>> On Thu, Feb 16, 2023 at 03:11:48PM +0800, Peng Zhang wrote:
>>> There is softlockup under fio pressure test with smmu enabled:
>>> watchdog: BUG: soft lockup - CPU#81 stuck for 22s!  [swapper/81:0]
>>> ...
>>> Call trace:
>>>  fq_flush_timeout+0xc0/0x110
>>>  call_timer_fn+0x34/0x178
>>>  expire_timers+0xec/0x158
>>>  run_timer_softirq+0xc0/0x1f8
>>>  __do_softirq+0x120/0x324
>>>  irq_exit+0x11c/0x140
>>>  __handle_domain_irq+0x6c/0xc0
>>>  gic_handle_irq+0x6c/0x170
>>>  el1_irq+0xb8/0x140
>>>  arch_cpu_idle+0x38/0x1c0
>>>  default_idle_call+0x24/0x44
>>>  do_idle+0x1f4/0x2d8
>>>  cpu_startup_entry+0x2c/0x30
>>>  secondary_start_kernel+0x17c/0x1c8
>>>
>>> Rcu stall may also be triggered:
>>>
>>> rcu: INFO: rcu_sched self-detected stall on CPU
>>> NMI backtrace for cpu 21
>>> CPU: 21 PID: 118 Comm: ksoftirqd/21
>>> ...
>>> Call trace:
>>>  fq_flush_timeout+0x6d/0x90
>>>  ? fq_ring_free+0xc0/0xc0
>>>  call_timer_fn+0x2b/0x120
>>>  run_timer_softirq+0x1a6/0x420
>>>  ? finish_task_switch+0x80/0x280
>>>  __do_softirq+0xda/0x2da
>>>  ? sort_range+0x20/0x20
>>>  run_ksoftirqd+0x26/0x40
>>>  smpboot_thread_fn+0xb8/0x150
>>>  kthread+0x110/0x130
>>>  ? __kthread_cancel_work+0x40/0x40
>>>  ret_from_fork+0x1f/0x30
>>>
>>> This is because the timer callback fq_flush_timeout may run more than
>>> 10ms, and timer may be processed continuously in the softirq so trigger
>>> softlockup and rcu stall. We can use work to deal with fq_ring_free for
>>> each cpu which may take long time, that to avoid triggering softlockup
>>> and rcu stall.
>>>
>>> This patch is modified from the patch[1] of openEuler.
>>>
>>
>> Hi Robin,
>>
>> I was looking at something similar to this recently were in this case
>> they were beating the heck out the system with the hazard io stress
>> test, and someone else with some medusa test tool. In one case they
>> had them force a dump on the soft lockup. On the 384 core genoa, 90
>> cores were spinning the iovad rb tree lock for one domain, 1 had it,
>> and the poor flush queue timer handler was having to fight everyone
>> for the lock. I'm not sure what would be considered a realistic workload
>> compared to these stressors, but could this be an issue over time as
>> systems continue to get more cores since the timer handler potentially
>> grabs and releases the iova domain rb tree lock for each cpu? The only
>> cases I know of are using io stressors, so I don't know how big a deal
>> it is.
>>
>> I think soft lockups could still be produced with this patch, since
>> there would still be the lock contention.
> 
> Right. We hit this issue when running stress-ng on 384 CPU system having NVMe
> disk. I have tried this patch. This improved the situation. But we still hit
> soft lockup after an hour or so.
> 
> Backtrace with this patch:
> 
>  4551.470797] watchdog: BUG: soft lockup - CPU#328 stuck for 30s!
> [kworker/328:1:2710]
> [ 4551.479441] Modules linked in: nvme_fabrics(E) intel_rapl_msr(E)
> intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) kvm(E)
> crct10dif_pclmul(E) ghash_clmulni_intel(E) sha512_ssse3(E) aesni_intel(E)
> crypto_simd(E) cryptd(E) rapl(E) wmi_bmof(E) binfmt_misc(E) ipmi_ssif(E)
> nls_iso8859_1(E) ast(E) drm_shmem_helper(E) drm_kms_helper(E) i2c_algo_bit(E)
> syscopyarea(E) sysfillrect(E) sysimgblt(E) acpi_ipmi(E) ccp(E) k10temp(E)
> ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) mac_hid(E) sch_fq_codel(E) msr(E)
> parport_pc(E) ppdev(E) lp(E) parport(E) ramoops(E) reed_solomon(E) pstore_blk(E)
> pstore_zone(E) efi_pstore(E) drm(E) ip_tables(E) x_tables(E) autofs4(E)
> crc32_pclmul(E) nvme(E) ahci(E) xhci_pci(E) tg3(E) nvme_core(E) libahci(E)
> i2c_piix4(E) xhci_pci_renesas(E) wmi(E)
> [ 4551.479475] CPU: 328 PID: 2710 Comm: kworker/328:1 Kdump: loaded Tainted: G
>           E      6.2.9schd_work+ #1
> [ 4551.479477] Hardware name: AMD Corporation Genoa, BIOS RTI 03/20/2023
> [ 4551.479479] Workqueue: events free_iova_work_func
> [ 4551.479484] RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x60
> [ 4551.479488] Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 49 89 f0 48 89
> e5 c6 07 00 0f 1f 00 41 f7 c0 00 02 00 00 74 06 fb 0f 1f 44 00 00 <65> ff 0d 50
> f8 fc 7d 74 13 5d 31 c0 89 c2 89 c1 89 c6 89 c7 41 89
> [ 4551.479489] RSP: 0018:ffa0000092e07e30 EFLAGS: 00000206
> [ 4551.479491] RAX: 0000000000000000 RBX: 00000000000000f2 RCX: 0000000000000000
> [ 4551.479492] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffd1ff3cf82be060
> [ 4551.479492] RBP: ffa0000092e07e30 R08: 0000000000000282 R09: 0000000000000000
> [ 4551.479493] R10: 0000000000000000 R11: 0000000000000000 R12: ff1100010d2af6d0
> [ 4551.479493] R13: ffd1ff3cf82be060 R14: ffd1ff3cf82bb858 R15: 0000000000000282
> [ 4551.479494] FS:  0000000000000000(0000) GS:ff11018056200000(0000)
> knlGS:0000000000000000
> [ 4551.479495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4551.479496] CR2: 00007ffd392f5f28 CR3: 0000000152f06004 CR4: 0000000000771ee0
> [ 4551.479497] PKRU: 55555554
> [ 4551.479497] Call Trace:
> [ 4551.479498]  <TASK>
> [ 4551.479502]  free_iova_work_func+0x6a/0xc0
> [ 4551.479505]  process_one_work+0x21f/0x440
> [ 4551.479507]  worker_thread+0x50/0x3f0
> [ 4551.479509]  ? __pfx_worker_thread+0x10/0x10
> [ 4551.479510]  kthread+0xee/0x120
> [ 4551.479513]  ? __pfx_kthread+0x10/0x10
> [ 4551.479514]  ret_from_fork+0x2c/0x50
> [ 4551.479518]  </TASK>
> 
> I did slight modification on top of this patch (schedule work on each CPU) and
> it worked for me.
> 
> @Robin, Joerg,
>   Does something like below (schedule work on each CPU to free iova
>  (fq_ring_free()) is acceptable?
> 
> -Vasant
> 
> 
> ------
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7aed15e53f10..eb52814e67e5 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -157,10 +157,12 @@ static void fq_flush_iotlb(struct iommu_dma_cookie *cookie)
>  static void fq_flush_timeout(struct timer_list *t)
>  {
>  	struct iommu_dma_cookie *cookie = from_timer(cookie, t, fq_timer);
> +	int cpu;
> 
>  	atomic_set(&cookie->fq_timer_on, 0);
>  	fq_flush_iotlb(cookie);
> -	schedule_work(&cookie->free_iova_work);
> +	for_each_possible_cpu(cpu)
> +		schedule_work_on(cpu, &cookie->free_iova_work);
>  }
> 
>  static void queue_iova(struct iommu_dma_cookie *cookie,
> @@ -235,17 +237,15 @@ static void free_iova_work_func(struct work_struct *work)
>  {
>  	struct iommu_dma_cookie *cookie;
>  	int cpu;
> +	unsigned long flags;
> +	struct iova_fq *fq;
> 
>  	cookie = container_of(work, struct iommu_dma_cookie, free_iova_work);
> -	for_each_possible_cpu(cpu) {
> -		unsigned long flags;
> -		struct iova_fq *fq;
> +	fq = this_cpu_ptr(cookie->fq);
> 
> -		fq = per_cpu_ptr(cookie->fq, cpu);
> -		spin_lock_irqsave(&fq->lock, flags);
> -		fq_ring_free(cookie, fq);
> -		spin_unlock_irqrestore(&fq->lock, flags);
> -	}
> +	spin_lock_irqsave(&fq->lock, flags);
> +	fq_ring_free(cookie, fq);
> +	spin_unlock_irqrestore(&fq->lock, flags);
>  }
> 
>  /* sysfs updates are serialised by the mutex of the group owning @domain */
> 
> ---------
> 
> 
> 
> 
> 

  reply	other threads:[~2023-04-28  5:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-16  7:11 [PATCH] iommu: Avoid softlockup and rcu stall in fq_flush_timeout() Peng Zhang
2023-04-10 22:22 ` Jerry Snitselaar
2023-04-18 10:41   ` Vasant Hegde
2023-04-28  5:44     ` Vasant Hegde [this message]
2023-05-22 14:58       ` Joerg Roedel
2023-05-22 15:18         ` Jerry Snitselaar
2023-05-22 15:35           ` John Garry
     [not found] <20230216084902.1486-1-hdanton@sina.com>
2023-02-16 11:02 ` Peng Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21f69b43-a1e7-6c84-a360-dae410bedb3f@amd.com \
    --to=vasant.hegde@amd.com \
    --cc=huawei.libin@huawei.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=jsnitsel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=will@kernel.org \
    --cc=xiexiuqi@huawei.com \
    --cc=yangyingliang@huawei.com \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox