public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Maxime Ripard <maxime.ripard@bootlin.com>
To: "Clément Péron" <peron.clem@gmail.com>,
	"Mauro Carvalho Chehab" <mchehab@kernel.org>,
	"Rob Herring" <robh+dt@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Chen-Yu Tsai" <wens@csie.org>,
	devicetree <devicetree@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-sunxi <linux-sunxi@googlegroups.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	linux-media@vger.kernel.org
Subject: Re: [linux-sunxi] Re: [PATCH v2 00/10] Allwinner A64/H6 IR support
Date: Wed, 29 May 2019 09:19:45 +0200	[thread overview]
Message-ID: <20190529071945.mrbgurcvl2jvpm5r@flea> (raw)
In-Reply-To: <20190528180447.zlrdfmn73fntnf4n@core.my.home>


[-- Attachment #1.1: Type: text/plain, Size: 7770 bytes --]

On Tue, May 28, 2019 at 08:04:47PM +0200, Ondřej Jirman wrote:
> Hello Clément,
>
> On Tue, May 28, 2019 at 06:21:19PM +0200, Clément Péron wrote:
> > Hi Ondřej,
> >
> > On Mon, 27 May 2019 at 21:53, 'Ondřej Jirman' via linux-sunxi
> > <linux-sunxi@googlegroups.com> wrote:
> > >
> > > Hi Clément,
> > >
> > > On Mon, May 27, 2019 at 09:30:16PM +0200, verejna wrote:
> > > > Hi Clément,
> > > >
> > > > On Mon, May 27, 2019 at 08:49:59PM +0200, Clément Péron wrote:
> > > > > Hi Ondrej,
> > > > >
> > > > > >
> > > > > > I'm testing on Orange Pi 3.
> > > > > >
> > > > > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops
> > > > > > responding/serial console stops responding). I don't have RC controller to test
> > > > > > the CIR. But just enabling the CIR causes kernel to hang shortly after boot.
> > > > > >
> > > > > > I tried booting multiple times. Other results:
> > > > > >
> > > > > > boot 2:
> > > > > >
> > > > > > - ssh hangs even before connecting (ethernet crashes/is reset)
> > > > > >
> > > > > > INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > > rcu:    0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437
> > > > > > dwmac-sun8i 5020000.ethernet eth0: Reset adapter.
> > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/.
> > > > > > rcu: blocking rcu_node structures:
> > > > > >  rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > > rcu:    0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714
> > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/.
> > > > > > rcu: blocking rcu_node structures:
> > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > > rcu:    0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203
> > > > > >
> > > > > > above messages appear regularly.
> > > > > >
> > > > > > boot 3:
> > > > > >
> > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > > rcu:    0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600
> > > > > >
> > > > > >
> > > > > > Sometimes serial console keeps working. Sometimes it locks up too (but not
> > > > > > frequently). Storage locks up always (any program that was not run before
> > > > > > the crash can't be started and lock up the kernel hard, programs that
> > > > > > were executed prior, can be run again).
> > > > > >
> > > > > >
> > > > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to
> > > > > > trigger the crash). So this seems to be limited to H6 for now.
> > > > > >
> > > > > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table
> > > > > > lamp light).
> > > > > >
> > > > > > Without your patches, everything works fine on H6, and I never see
> > > > > > crashes/lockups.
> > > > > >
> > > > > > I tired physically covering the IR receiver, and that helps preventing the
> > > > > > crash. As soon as I uncover it, the crash happens again in 1s or so:
> > > > > >
> > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > > rcu:    0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444
> > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > > rcu:    0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777
> > > > > >
> > > > > > This time I got the hung task and reboot: (probably not directly related)
> > > > > >
> > > > > > INFO: task find:560 blocked for more than 120 seconds.
> > > > > >       Not tainted 5.2.0-rc2+ #7
> > > > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > find            D    0   560    551 0x00000000
> > > > > > Call trace:
> > > > > >  __switch_to+0x6c/0x90
> > > > > >  __schedule+0x1f4/0x578
> > > > > >  schedule+0x28/0xa8
> > > > > >  io_schedule+0x18/0x38
> > > > > >  __lock_page+0x12c/0x208
> > > > > >  pagecache_get_page+0x238/0x2e8
> > > > > >  __get_node_page+0x6c/0x310
> > > > > >  f2fs_get_node_page+0x14/0x20
> > > > > >  f2fs_iget+0x70/0xc60
> > > > > >  f2fs_lookup+0xcc/0x218
> > > > > >  __lookup_slow+0x78/0x160
> > > > > >  lookup_slow+0x3c/0x60
> > > > > >  walk_component+0x1e4/0x2e0
> > > > > >  path_lookupat.isra.13+0x5c/0x1e0
> > > > > >  filename_lookup.part.23+0x6c/0xe8
> > > > > >  user_path_at_empty+0x4c/0x60
> > > > > >  vfs_statx+0x78/0xd8
> > > > > >  __se_sys_newfstatat+0x24/0x48
> > > > > >  __arm64_sys_newfstatat+0x18/0x20
> > > > > >  el0_svc_handler+0x9c/0x170
> > > > > >  el0_svc+0x8/0xc
> > > > > > Kernel panic - not syncing: hung_task: blocked tasks
> > > > > > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7
> > > > > > Hardware name: OrangePi 3 (DT)
> > > > > > Call trace:
> > > > > >  dump_backtrace+0x0/0xf8
> > > > > >  show_stack+0x14/0x20
> > > > > >  dump_stack+0xa8/0xcc
> > > > > >  panic+0x124/0x2dc
> > > > > >  proc_dohung_task_timeout_secs+0x0/0x40
> > > > > >  kthread+0x120/0x128
> > > > > >  ret_from_fork+0x10/0x18
> > > > > > SMP: stopping secondary CPUs
> > > > > > Kernel Offset: disabled
> > > > > > CPU features: 0x0002,20002000
> > > > > > Memory Limit: none
> > > > > > Rebooting in 3 seconds..
> > > > > >
> > > > > >
> > > > > > Meanwhile H5 based board now runs for 15 minutes without issues.
> > > > > >
> > > > > > So to sum up:
> > > > > >
> > > > > > - these crashes are definitely H6 IR related
> > > > > >   - the same kernel, on H5 works
> > > > > >   - covering the sensor prevents the crashes on H6
> > > > > >
> > > > > > So we should probably hold on with the series, until this is figured out.
> > > > >
> > > > > Thanks for testing, but I think it's more hardware related.
> > > > > It seems that your IR is flooded or misconfigured for your board.
> > > > > Could you add a simple print in the "sunxi_ir_irq"
> > > >
> > > > Yes, I get flood of IRQs with status = 0x30. (after I turn on the lamp,
> > > > but it persists even after I turn it off and cover the IR sensor).
> > >
> > > Interestingly, status also contains RAC, and it's 0 in this case. So the
> > > interrupt if firing with "No available data in RX FIFO" repeatedly. Regardless
> > > of input.
> > >
> > > So there's something else up.
> >
> > Really weird indeed...
> >
> > I have pushed a new version, where I didn't enabled the support for
> > others H6 board and the cover letter include a link to this thread.
> >
> > It would be great if other sunxi users could test this series, to
> > check if this issue in present in other OPi3 / Pine H64.
>
> I don't know if this is enough. I'd rather prefer if the driver has a way
> of detecting this situation and shutting the module down, at the very least,
> instead of taking down the entire system with IRQ flood.
>
> It may be detectable by checking RAC == 0 when RX FIFO available interrupt
> flag is set.
>
> Otherwise, this will eventually be forgotten (cover letters are not even stored
> in git), and someone will fall into the trap again, after enabling r_ir on
> their board, and end up chasing their tail for a day. I've initially only found
> this is IR driver issue after a long unpleasant debugging session, chasing other
> more obvious ideas (as when this happens there's absolutely nothing in the log
> indicating this is IR issue).

Returning IRQ_NONE in the handler will disable the interrupt line
after 100,000 (I think?) occurences. That might be a good workaround,
but we definitely want to have a comment there :)

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-05-29  7:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-26 22:25 [PATCH v2 00/10] Allwinner A64/H6 IR support Clément Péron
2019-05-26 22:25 ` [PATCH v2 01/10] dt-bindings: media: sunxi-ir: add A31 compatible Clément Péron
2019-05-26 22:25 ` [PATCH v2 02/10] media: rc: sunxi: Add " Clément Péron
2019-05-27  7:47   ` Maxime Ripard
2019-05-27  8:20     ` Clément Péron
2019-05-27  9:59       ` Maxime Ripard
2019-05-26 22:25 ` [PATCH v2 03/10] ARM: dts: sunxi: prefer A31 instead of A13 for ir Clément Péron
2019-05-27  7:47   ` Maxime Ripard
2019-05-27  8:15     ` Clément Péron
2019-05-26 22:25 ` [PATCH v2 04/10] dt-bindings: media: sunxi-ir: Add A64 compatible Clément Péron
2019-05-26 22:25 ` [PATCH v2 05/10] arm64: dts: allwinner: a64: Add IR node Clément Péron
2019-05-26 22:25 ` [PATCH v2 06/10] arm64: dts: allwinner: a64: Enable IR on Orange Pi Win Clément Péron
2019-05-26 22:25 ` [PATCH v2 07/10] dt-bindings: media: sunxi-ir: Add H6 compatible Clément Péron
2019-05-26 22:25 ` [PATCH v2 08/10] arm64: dts: allwinner: h6: Add IR receiver node Clément Péron
2019-05-26 22:25 ` [PATCH v2 09/10] arm64: dts: allwinner: h6: Enable IR on H6 boards Clément Péron
2019-05-26 22:25 ` [PATCH v2 10/10] arm64: defconfig: enable IR SUNXI option Clément Péron
2019-05-27 13:48 ` [PATCH v2 00/10] Allwinner A64/H6 IR support Ondřej Jirman
2019-05-27 14:59   ` Clément Péron
2019-05-27 16:31     ` Ondřej Jirman
2019-05-27 17:23       ` Ondřej Jirman
2019-05-27 18:49         ` Clément Péron
2019-05-27 19:30           ` Ondřej Jirman
2019-05-27 19:53             ` Ondřej Jirman
2019-05-28 16:21               ` [linux-sunxi] " Clément Péron
2019-05-28 18:04                 ` Ondřej Jirman
2019-05-29  7:19                   ` Maxime Ripard [this message]
2019-05-29  7:55                     ` Clément Péron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190529071945.mrbgurcvl2jvpm5r@flea \
    --to=maxime.ripard@bootlin.com \
    --cc=devicetree@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-sunxi@googlegroups.com \
    --cc=mark.rutland@arm.com \
    --cc=mchehab@kernel.org \
    --cc=peron.clem@gmail.com \
    --cc=robh+dt@kernel.org \
    --cc=wens@csie.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox