All of lore.kernel.org
 help / color / mirror / Atom feed
From: ppvk@codeaurora.org
To: paolo.valente@linaro.org
Cc: linux-block@vger.kernel.org, axboe@kernel.dk
Subject: Re: APPS Crash - Page fault Execute error: PC = bfqg_stats_update_idle_time+0x14/0x50 LR = bfq_dispatch_request+0x398/0x948
Date: Mon, 09 Sep 2019 16:39:32 +0530	[thread overview]
Message-ID: <11509fcad34d67f5a87dd2bd21db6813@codeaurora.org> (raw)
In-Reply-To: <ade528d81fda9fe1d85b71ffb41c1aeb@codeaurora.org>

Hi Paolo,

Did you get a chance to take a look on this issue ?

Regards,
Pradeep

On 2019-09-06 21:09, ppvk@codeaurora.org wrote:
> Hi Paolo,
> 
> Good Evening!!
> 
> We started using BFQ IO Sched. from last 3 Months. it was awesome !!
> 
> Its been running great, but we just hit a "free-after-use" crash.
> We are running stability test (includes heavy IO, video playbacks etc)
> for 48~72hrs. We encountered below error. This crash is seen 2 to 3
> times during our stability testing.
> 
>    120.444910:   <6> binder: undelivered transaction 163882, process 
> died.
>    120.470254:   <6> binder: undelivered transaction 163953, process 
> died.
>    120.474841:   <6> Unable to handle kernel paging request at virtual
> address 006b6b6b6b6b7773
>    120.484651:   <6> Mem abort info:
>    120.487534:   <6>   ESR = 0x96000004
>    120.490680:   <6>   Exception class = DABT (current EL), IL = 32 
> bits
>    120.496760:   <6>   SET = 0, FnV = 0
>    120.499906:   <6>   EA = 0, S1PTW = 0
>    120.503145:   <6> Data abort info:
>    120.506111:   <6>   ISV = 0, ISS = 0x00000004
>    120.510056:   <6>   CM = 0, WnR = 0
>    120.513106:   <6> [006b6b6b6b6b7773] address between user and
> kernel address ranges
>    120.520437:   <6> Internal error: Oops: 96000004 [#1] PREEMPT SMP
>    120.522998:   <6> binder_alloc: 12672: binder_alloc_buf, no vma
>    120.526162:   <6> Modules linked in: wlan(O) machine_dlkm(O)
> wcd938x_slave_dlkm(O) wcd938x_dlkm(O) wcd9xxx_dlkm(O) mbhc_dlkm(O)
> tx_macro_dlkm(O) rx_macro_dlkm(O) va_macro_dlkm(O) wsa_macro_dlkm(O)
> swr_ctrl_dlkm(O) bolero_cdc_dlkm(O) wsa881x_dlkm(O) wcd_core_dlkm(O)
> stub_dlkm(O) hdmi_dlkm(O) swr_dlkm(O) pinctrl_lpi_dlkm(O) usf_dlkm(O)
> native_dlkm(O) platform_dlkm(O) q6_dlkm(O) adsp_loader_dlkm(O)
> apr_dlkm(O) snd_event_dlkm(O) q6_notifier_dlkm(O) q6_pdr_dlkm(O)
>    120.537723:   <6> binder: 1280:9939 transaction failed 29189/-3,
> size 80-0 line 3277
>    120.572951:   <6> Process kworker/4:1H (pid: 310, stack limit =
> 0xffffff8010670000)
>    120.572958:   <6> CPU: 4 PID: 310 Comm: kworker/4:1H Tainted: G S
>    W  O      4.19.66+ #1
>    120.572961:   <6> Hardware name: Qualcomm Technologies, Inc. Lito 
> MTP (DT)
>    120.572973:   <6> Workqueue: kblockd blk_mq_run_work_fn
>    120.572980:   <2> pstate: a0c00085 (NzCv daIf +PAN +UAO)
>    120.572988:   <2> pc : bfqg_stats_update_idle_time+0x14/0x50
>    120.572992:   <2> lr : bfq_dispatch_request+0x398/0x948
>    120.572995:   <2> sp : ffffff8010673bb0
>    120.572998:   <2> x29: ffffff8010673bc0 x28: 0000000000000001
>    120.573004:   <2> x27: 0000001c0c5e468f x26: fffffff6aa182a80
>    120.573011:   <2> x25: 0000000000000000 x24: fffffff6aa197a80
>    120.573018:   <2> x23: fffffff63feb4008 x22: 0000000000000001
>    120.573024:   <2> x21: fffffff6ab1eee08 x20: fffffff63feb4008
>    120.573030:   <2> x19: 6b6b6b6b6b6b6ad3 x18: 0000000000000004
>    120.573037:   <2> x17: 000000000000775f x16: 0000000000004e20
>    120.573043:   <2> x15: fffffff5dcd83d40 x14: 2faa9e9b959219ea
>    120.573049:   <2> x13: 0000000000000004 x12: 0000000089c9fa08
>    120.573056:   <2> x11: 2a768cc68edc0700 x10: 0000000000000000
>    120.573062:   <2> x9 : 0000000000000001 x8 : 6b6b6b6b6b6b6b6b
>    120.573069:   <2> x7 : ffffff943cfb1678 x6 : 0000000000000000
>    120.573075:   <2> x5 : 0000000000000080 x4 : 0000000000000001
>    120.573081:   <2> x3 : 0000000000000000 x2 : 0000000000000000
>    120.573087:   <2> x1 : 0000000000000000 x0 : 6b6b6b6b6b6b6ad3
>    120.573098:   <6>
>    :
>    :
>    121.185250:   <2> Call trace:
>    121.187772:   <2>  bfqg_stats_update_idle_time+0x14/0x50
>    121.192701:   <2>  bfq_dispatch_request+0x398/0x948
>    121.197188:   <2>  blk_mq_do_dispatch_sched+0x84/0x118
>    121.198271:   <6> CPU7: update max cpu_capacity 1024
>    121.206504:   <2>  blk_mq_sched_dispatch_requests+0x130/0x190
>    121.211874:   <2>  __blk_mq_run_hw_queue+0xcc/0x148
>    121.216359:   <2>  blk_mq_run_work_fn+0x24/0x30
>    121.220490:   <2>  process_one_work+0x328/0x6b0
>    121.224620:   <2>  worker_thread+0x330/0x4d0
>    121.228476:   <2>  kthread+0x128/0x138
>    121.231806:   <2>  ret_from_fork+0x10/0x1c
>    121.235484:   <6> Code: a9017bfd 910043fd aa0003f3 d503201f 
> (39728268)
>    121.241751:   <6> ---[ end trace 83177e232bbbf1f0 ]---
>    121.246514:   <6> Kernel panic - not syncing: Fatal exception
>    121.251896:   <6> SMP: stopping secondary CPUs
>    121.255938:   <6> CPU1: stopping
> 
> After doing code walk on bfq-iosched, i could see that in fn.
> bfq_dispatch_request()
> 
> in_serv_queue is assigned with bfqd->in_service_queue.
> 
> and after this, we are getting the request using 
> __bfq_dispatch_request() fn.
> 
> 
> In __bfq_dispatch_request() fn., we see that bfqd->in_service_queue is
> being updated via. __bfq_dispatch_request --> bfq_select_queue -->
> bfq_set_in_service_queue -->  __bfq_set_in_service_queue.
> I presumed that the in service queue time/budget got expired and it
> selected a new queue from a different service group.
> (If there are no more queues to be served, it could pick the new queue
> from it's next group of service tree.) during this updation, we are
> not updating the variable "in_serv_queue", we are passing same it to
> bfq_update_dispatch_stats() fn. and resulting in above crash while
> updating bfqg idle time stats.
> 
> 
> I also seen the comments written just above 
> bfgq_stats_update_idle_time() fn. as
> " Since the idle timer has been disabled, in_serv_queue contained some
> request when __bfq_dispatch_request
> was invoked above, which implies that rq was picked exactly from
> in_serv_queue. Thus in_serv_queue == bfqq,
> and is therefore guaranteed to exist because of the above arguments."
> 
> if it is so, can we suppose to use "bfqq" instead of "in_serv_queue"
> as bfqq is extracted from request "rq" and the "rq" is updated
> properly by __bfq_dispatch_request() fn. ?
> 
> Please correct me if my understanding was wrong.
> 
> Many thanks in advance !!
> 
> Best Regards,
> Pradeep

           reply	other threads:[~2019-09-09 11:09 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <ade528d81fda9fe1d85b71ffb41c1aeb@codeaurora.org>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11509fcad34d67f5a87dd2bd21db6813@codeaurora.org \
    --to=ppvk@codeaurora.org \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.