linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
       [not found] <5b607cc4.1c69fb81.6c1d6.6534@mx.google.com>
@ 2018-07-31 16:06 ` Mark Brown
  2018-08-01  8:19   ` Ludovic BARRE
  2018-08-01 10:05   ` Ulf Hansson
  2018-07-31 16:11 ` Mark Brown
  1 sibling, 2 replies; 8+ messages in thread
From: Mark Brown @ 2018-07-31 16:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:

Today's -next fails to boot on a variety of Qualcomm 32 bit platforms:

>     multi_v7_defconfig:
>         qcom-apq8064-cm-qs600:
>             lab-baylibre-seattle: new failure (last pass: next-20180730)
>         qcom-apq8064-ifc6410:
>             lab-baylibre-seattle: new failure (last pass: next-20180730)
> 
>     qcom_defconfig:
>         qcom-apq8064-cm-qs600:
>             lab-baylibre-seattle: new failure (last pass: next-20180730)

The logs are all somewhat similar, for example:

   https://storage.kernelci.org/next/master/next-20180731/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-qcom-apq8064-cm-qs600.html

detects a DMA problem during MMCI initialization:

[    2.237566] mmci-pl18x 121c0000.sdcc: mmc2: PL180 manf 51 rev0 at 0x121c0000 irq 32,0 (pio)
[    2.244790] mmci-pl18x 121c0000.sdcc: DMA channels RX dma2chan1, TX dma2chan2
[    2.271722] mmci-pl18x 12400000.sdcc: error during DMA transfer!
[    2.271757] mmci-pl18x 12400000.sdcc: buggy DMA detected. Taking evasive action.
[    2.276798] ------------[ cut here ]------------
[    2.284185] WARNING: CPU: 0 PID: 0 at ../include/linux/dma-mapping.h:551 bam_free_chan+0x2d8/0x2e0
[    2.288772] Modules linked in:
[    2.297534] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc7-next-20180731 #1

then panics:

[    2.513796] ------------[ cut here ]------------
[    2.518367] kernel BUG at ../mm/vmalloc.c:1608!
[    2.522968] Internal error: Oops - BUG: 0 [#1] SMP ARM

trying to release the DMA channel.  I've not done any bisection or
anything but I do note 8bb2299d2d0b5cc (mmc: mmci: Add and implement a
->dma_setup() callback for qcom dml) and some related commits in the MMC
tree.

More details for each of the failed boots at:

  https://kernelci.org/boot/id/5b6054f559b5144b9396baa9/
  https://kernelci.org/boot/id/5b60551259b5144abb96bab6/
  https://kernelci.org/boot/id/5b6054e259b5144b1e96bab2/

including full logs, details of the build and so on.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180731/f514af29/attachment.sig>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
       [not found] <5b607cc4.1c69fb81.6c1d6.6534@mx.google.com>
  2018-07-31 16:06 ` next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731) Mark Brown
@ 2018-07-31 16:11 ` Mark Brown
  2018-07-31 19:50   ` Niklas Cassel
  1 sibling, 1 reply; 8+ messages in thread
From: Mark Brown @ 2018-07-31 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:

Today's -next fails to boot on db820c:

> arm64:

>     defconfig:
>         apq8096-db820c:
>             lab-bjorn: new failure (last pass: next-20180730)

There's nothing immediately obvious as the boot failure cause in the
logs, the last output is a failure to load the ath10k_pci firmware:

04:02:53.750283  [    4.503980] ath10k_pci 0000:01:00.0: Failed to find firmware-N.bin (N between 2 and 6) from ath10k/QCA6174/hw3.0: -2
04:02:53.756384  [    4.504010] ath10k_pci 0000:01:00.0: could not fetch firmware files (-2)
04:02:53.760522  [    4.513736] ath10k_pci 0000:01:00.0: could not probe fw (-2)

but I'm not sure that's the actual cause.  More details, including the
full boot log, here:

   https://kernelci.org/boot/id/5b6042b559b514136096babf/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180731/ef582ea2/attachment.sig>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
  2018-07-31 16:11 ` Mark Brown
@ 2018-07-31 19:50   ` Niklas Cassel
  2018-08-01  9:31     ` Mark Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Niklas Cassel @ 2018-07-31 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 31, 2018 at 05:11:14PM +0100, Mark Brown wrote:
> On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
> 
> Today's -next fails to boot on db820c:
> 
> > arm64:
> 
> >     defconfig:
> >         apq8096-db820c:
> >             lab-bjorn: new failure (last pass: next-20180730)
> 
> There's nothing immediately obvious as the boot failure cause in the
> logs, the last output is a failure to load the ath10k_pci firmware:
> 
> 04:02:53.750283  [    4.503980] ath10k_pci 0000:01:00.0: Failed to find firmware-N.bin (N between 2 and 6) from ath10k/QCA6174/hw3.0: -2
> 04:02:53.756384  [    4.504010] ath10k_pci 0000:01:00.0: could not fetch firmware files (-2)
> 04:02:53.760522  [    4.513736] ath10k_pci 0000:01:00.0: could not probe fw (-2)
> 
> but I'm not sure that's the actual cause.  More details, including the
> full boot log, here:
> 
>    https://kernelci.org/boot/id/5b6042b559b514136096babf/

I tried booting today's -next on db820c, using arm64 defconfig,
and it booted correctly:

I also tried removing the ath10k firmware from my initrd, but it still booted
correctly.

# cat /proc/version
Linux version 4.18.0-rc7-next-20180731-00001-g47055e3ba913 (nks at centauri) (gcc version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)) #9 SMP PREEMPT Tue Jul 31 21:34:43 CEST 2018

I guess it could be a bug that does not trigger on every boot,
or it could be a problem in the kernelci infrastructure.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
  2018-07-31 16:06 ` next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731) Mark Brown
@ 2018-08-01  8:19   ` Ludovic BARRE
  2018-08-01  9:58     ` Ulf Hansson
  2018-08-01 10:05   ` Ulf Hansson
  1 sibling, 1 reply; 8+ messages in thread
From: Ludovic BARRE @ 2018-08-01  8:19 UTC (permalink / raw)
  To: linux-arm-kernel

hi Mark, Ulf

When I see log, I think the patch in attachment could fix this issue
, but like I've not qcom board I can't test if it's fixed :-(.

Ulf: for patch delivery, you prefer a patch delivery on mailing list ?

BR
Ludo

On 07/31/2018 06:06 PM, Mark Brown wrote:
> On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
> 
> Today's -next fails to boot on a variety of Qualcomm 32 bit platforms:
> 
>>      multi_v7_defconfig:
>>          qcom-apq8064-cm-qs600:
>>              lab-baylibre-seattle: new failure (last pass: next-20180730)
>>          qcom-apq8064-ifc6410:
>>              lab-baylibre-seattle: new failure (last pass: next-20180730)
>>
>>      qcom_defconfig:
>>          qcom-apq8064-cm-qs600:
>>              lab-baylibre-seattle: new failure (last pass: next-20180730)
> 
> The logs are all somewhat similar, for example:
> 
>     https://storage.kernelci.org/next/master/next-20180731/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-qcom-apq8064-cm-qs600.html
> 
> detects a DMA problem during MMCI initialization:
> 
> [    2.237566] mmci-pl18x 121c0000.sdcc: mmc2: PL180 manf 51 rev0 at 0x121c0000 irq 32,0 (pio)
> [    2.244790] mmci-pl18x 121c0000.sdcc: DMA channels RX dma2chan1, TX dma2chan2
> [    2.271722] mmci-pl18x 12400000.sdcc: error during DMA transfer!
> [    2.271757] mmci-pl18x 12400000.sdcc: buggy DMA detected. Taking evasive action.
> [    2.276798] ------------[ cut here ]------------
> [    2.284185] WARNING: CPU: 0 PID: 0 at ../include/linux/dma-mapping.h:551 bam_free_chan+0x2d8/0x2e0
> [    2.288772] Modules linked in:
> [    2.297534] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc7-next-20180731 #1
> 
> then panics:
> 
> [    2.513796] ------------[ cut here ]------------
> [    2.518367] kernel BUG at ../mm/vmalloc.c:1608!
> [    2.522968] Internal error: Oops - BUG: 0 [#1] SMP ARM
> 
> trying to release the DMA channel.  I've not done any bisection or
> anything but I do note 8bb2299d2d0b5cc (mmc: mmci: Add and implement a
> ->dma_setup() callback for qcom dml) and some related commits in the MMC
> tree.
> 
> More details for each of the failed boots at:
> 
>    https://kernelci.org/boot/id/5b6054f559b5144b9396baa9/
>    https://kernelci.org/boot/id/5b60551259b5144abb96bab6/
>    https://kernelci.org/boot/id/5b6054e259b5144b1e96bab2/
> 
> including full logs, details of the build and so on.
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-mmc-mmci-fix-qcom-dma-issue-during-mmci-init-with-ne.patch
Type: text/x-patch
Size: 1267 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180801/c20093b4/attachment.bin>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
  2018-07-31 19:50   ` Niklas Cassel
@ 2018-08-01  9:31     ` Mark Brown
  2018-08-01 20:50       ` Bjorn Andersson
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Brown @ 2018-08-01  9:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 31, 2018 at 09:50:37PM +0200, Niklas Cassel wrote:

> I guess it could be a bug that does not trigger on every boot,
> or it could be a problem in the kernelci infrastructure.

Infrastructure bugs *tend* to manifest differently to this FWIW, though
it can never be excluded.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180801/18e9f5f6/attachment.sig>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
  2018-08-01  8:19   ` Ludovic BARRE
@ 2018-08-01  9:58     ` Ulf Hansson
  0 siblings, 0 replies; 8+ messages in thread
From: Ulf Hansson @ 2018-08-01  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 August 2018 at 10:19, Ludovic BARRE <ludovic.barre@st.com> wrote:
> hi Mark, Ulf
>
> When I see log, I think the patch in attachment could fix this issue
> , but like I've not qcom board I can't test if it's fixed :-(.
>
> Ulf: for patch delivery, you prefer a patch delivery on mailing list ?

Thanks for looking into this.

However, no need to post a fix this time (your patch fixed the issue,
but should declare the qcom_variant_init() in mmci.h.

I have already amended the patch, so no further actions is needed.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
  2018-07-31 16:06 ` next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731) Mark Brown
  2018-08-01  8:19   ` Ludovic BARRE
@ 2018-08-01 10:05   ` Ulf Hansson
  1 sibling, 0 replies; 8+ messages in thread
From: Ulf Hansson @ 2018-08-01 10:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 31 July 2018 at 18:06, Mark Brown <broonie@kernel.org> wrote:
> On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
>
> Today's -next fails to boot on a variety of Qualcomm 32 bit platforms:
>
>>     multi_v7_defconfig:
>>         qcom-apq8064-cm-qs600:
>>             lab-baylibre-seattle: new failure (last pass: next-20180730)
>>         qcom-apq8064-ifc6410:
>>             lab-baylibre-seattle: new failure (last pass: next-20180730)
>>
>>     qcom_defconfig:
>>         qcom-apq8064-cm-qs600:
>>             lab-baylibre-seattle: new failure (last pass: next-20180730)
>
> The logs are all somewhat similar, for example:
>
>    https://storage.kernelci.org/next/master/next-20180731/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-qcom-apq8064-cm-qs600.html
>
> detects a DMA problem during MMCI initialization:
>
> [    2.237566] mmci-pl18x 121c0000.sdcc: mmc2: PL180 manf 51 rev0 at 0x121c0000 irq 32,0 (pio)
> [    2.244790] mmci-pl18x 121c0000.sdcc: DMA channels RX dma2chan1, TX dma2chan2
> [    2.271722] mmci-pl18x 12400000.sdcc: error during DMA transfer!
> [    2.271757] mmci-pl18x 12400000.sdcc: buggy DMA detected. Taking evasive action.
> [    2.276798] ------------[ cut here ]------------
> [    2.284185] WARNING: CPU: 0 PID: 0 at ../include/linux/dma-mapping.h:551 bam_free_chan+0x2d8/0x2e0
> [    2.288772] Modules linked in:
> [    2.297534] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc7-next-20180731 #1
>
> then panics:
>
> [    2.513796] ------------[ cut here ]------------
> [    2.518367] kernel BUG at ../mm/vmalloc.c:1608!
> [    2.522968] Internal error: Oops - BUG: 0 [#1] SMP ARM
>
> trying to release the DMA channel.  I've not done any bisection or
> anything but I do note 8bb2299d2d0b5cc (mmc: mmci: Add and implement a
> ->dma_setup() callback for qcom dml) and some related commits in the MMC
> tree.
>
> More details for each of the failed boots at:
>
>   https://kernelci.org/boot/id/5b6054f559b5144b9396baa9/
>   https://kernelci.org/boot/id/5b60551259b5144abb96bab6/
>   https://kernelci.org/boot/id/5b6054e259b5144b1e96bab2/
>
> including full logs, details of the build and so on.

Mark, thanks for reporting.

Problem was a simple one liner that should have been added to included
in my patch "mmc: mmci: Add and implement a ->dma_setup() callback for
qcom dml". The missing oneliner caused mmci to wrongly use dma for the
qcom variant.

I have amended the patch and published it, it should reach the next
tree as of tomorrow. Apologize for the mess it created.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
  2018-08-01  9:31     ` Mark Brown
@ 2018-08-01 20:50       ` Bjorn Andersson
  0 siblings, 0 replies; 8+ messages in thread
From: Bjorn Andersson @ 2018-08-01 20:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed 01 Aug 02:31 PDT 2018, Mark Brown wrote:

> On Tue, Jul 31, 2018 at 09:50:37PM +0200, Niklas Cassel wrote:
> 
> > I guess it could be a bug that does not trigger on every boot,
> > or it could be a problem in the kernelci infrastructure.
> 
> Infrastructure bugs *tend* to manifest differently to this FWIW, though
> it can never be excluded.

No, that's not an infrastructure issue.

The board did warn about not finding the ath10k firmware, which is
always does, so that's not the issue - in itself. Then nothing happened
for 266 seconds, so my lab decided to terminate the agony.

So this is either an issue with the stability of next-20180731 or with
the specific board.


PS. Today's next did boot successfully on the board.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-01 20:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5b607cc4.1c69fb81.6c1d6.6534@mx.google.com>
2018-07-31 16:06 ` next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731) Mark Brown
2018-08-01  8:19   ` Ludovic BARRE
2018-08-01  9:58     ` Ulf Hansson
2018-08-01 10:05   ` Ulf Hansson
2018-07-31 16:11 ` Mark Brown
2018-07-31 19:50   ` Niklas Cassel
2018-08-01  9:31     ` Mark Brown
2018-08-01 20:50       ` Bjorn Andersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).