linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Thierry Reding <thierry.reding@gmail.com>
To: Embedded Engineer <embed786@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Russell King - ARM Linux admin <linux@armlinux.org.uk>,
	Jon Hunter <jonathanh@nvidia.com>,
	linux-tegra@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: Unstable Kernel behavior on an ARM based board
Date: Tue, 5 Mar 2019 12:20:05 +0100	[thread overview]
Message-ID: <20190305112005.GC26369@ulmo> (raw)
In-Reply-To: <CA+_ZnZTeZLyCcjZduQODzjWxTpU96AefzvTBDFbq2CSjVQxONg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 3924 bytes --]

On Tue, Mar 05, 2019 at 03:29:26PM +0500, Embedded Engineer wrote:
> On Tue, Mar 5, 2019 at 3:07 PM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > Please apply this patch so we can see the (ptrval) values.  Thanks.
> 
> Please find below logs after applying patch:
> 
> https://pastebin.com/6TaBxPX5

Hm... so looks like what you're getting here is the error spew from the
DMA pool debug code in mm/dmax_pool.c. The way I understand it is that
that will initialize the memory for each page allocated from the pool
with the POOL_POISON_FREED (0xa7) (see pool_alloc_page()) and then upon
adding the page to the pool list, it'll store the offset to page->offset
field and check the contents of the page.

The contents of the page then don't match the expected poison. The dump
of the corrupted memory is somewhat confusing because the values that
don't match the poison are actually expected, at least partially. From
my reading of the DMA pool code, the first four bytes store the offset
of the DMA block into the physical memory page. However, given the size
of the hexdump, it looks like the pool was allocated with a block size
of 64 bytes, which matches the code in drivers/usb/chipidea/udc.c that
allocates the "ci_hw_qh" pool.

What's strange here, though, is that the offset that's stored to the
first four bytes of a block seems to actually be stored twice per block.
The first offset seems to be correct, since it's apparently used to find
the offset of the next block to allocate. If you look at the first
corrupted hexdump:

  [    1.327553] tegra-udc 7d000000.usb: dma_pool_alloc ci_hw_qh, ec056080 (corrupted)
  [    1.335058] 00000000: c0 00 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................
  [    1.343077] 00000010: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................
  [    1.351095] 00000020: 80 00 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................
  [    1.359113] 00000030: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................

This is the entry for the block at offset 0x00000080 and the offset for
the next block is 0x000000c0, which is exactly 64 bytes after the
current block. However, if you then look at the second offset that's
stored at offset 0x00000020 in the block, it's 0x00000080, which does
match the offset of the current block, but I think that may just be
coincidence. The same coincidence happens for the second corrupted
block:

  [    1.367210] tegra-udc 7d000000.usb: dma_pool_alloc ci_hw_qh, ec056140 (corrupted)
  [    1.374709] 00000000: 80 01 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................
  [    1.382727] 00000010: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................
  [    1.390744] 00000020: 40 01 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  @...............
  [    1.398760] 00000030: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................

But not for the third:

  [    1.406965] tegra-udc 7d000000.usb: dma_pool_alloc ci_hw_qh, ec0561c0 (corrupted)
  [    1.414466] 00000000: 00 02 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................
  [    1.422483] 00000010: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................
  [    1.430502] 00000020: 40 03 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  @...............
  [    1.438519] 00000030: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7  ................

The fact that we see the offset stored at offset 0x20 in each block
makes me think there's perhaps some sort of aliasing happening here. But
I'm not sure how the system would even boot this far if aliasing was
really the problem. Things should be falling apart much sooner if that's
really what's going on here.

However, this sort of aliasing is not something that your typical memory
test will catch, so it could explain why they aren't reporting any
errors.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-03-05 11:20 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-02 10:44 Unstable Kernel behavior on an ARM based board Embedded Engineer
2019-03-02 11:00 ` Russell King - ARM Linux admin
2019-03-02 11:01 ` Willy Tarreau
2019-03-02 11:22   ` Embedded Engineer
2019-03-02 11:25     ` Willy Tarreau
2019-03-02 11:46       ` Russell King - ARM Linux admin
2019-03-04 13:57         ` Thierry Reding
2019-03-02 11:36     ` Russell King - ARM Linux admin
2019-03-02 11:52       ` Embedded Engineer
2019-03-02 11:57         ` Russell King - ARM Linux admin
2019-03-02 12:20           ` Embedded Engineer
2019-03-02 12:39             ` Russell King - ARM Linux admin
2019-03-02 13:10               ` Embedded Engineer
2019-03-02 15:07               ` Clemens Koller
2019-03-04  5:14                 ` Embedded Engineer
2019-03-04 10:26                   ` Vladimir Murzin
2019-03-04 12:25                     ` Embedded Engineer
2019-03-04 14:25                       ` Thierry Reding
2019-03-04 15:51                         ` Embedded Engineer
2019-03-05 10:01                         ` Embedded Engineer
2019-03-05 10:07                           ` Russell King - ARM Linux admin
2019-03-05 10:29                             ` Embedded Engineer
2019-03-05 11:20                               ` Thierry Reding [this message]
2019-03-05 11:22                               ` Russell King - ARM Linux admin
2019-03-05 11:57                                 ` Thierry Reding
2019-03-05 13:16                                   ` Embedded Engineer
2019-03-05 13:23                                     ` Russell King - ARM Linux admin
2019-03-05 13:32                                       ` Embedded Engineer
2019-03-05 14:23                                         ` Russell King - ARM Linux admin
2019-03-05 14:57                                           ` Embedded Engineer
2019-03-05 14:58                                             ` Russell King - ARM Linux admin
2019-03-05 15:11                                               ` Embedded Engineer
2019-03-05 15:31                                                 ` Russell King - ARM Linux admin
2019-03-05 15:44                                                   ` Embedded Engineer
2019-03-15  8:55                                                     ` Marcel Ziswiler
2019-03-05 16:00                                                   ` Clemens Koller
2019-03-05 16:21                                                     ` Embedded Engineer
2019-03-09  7:50                                                     ` Embedded Engineer
2019-03-05 10:32                           ` Thierry Reding
2019-03-05 11:05                             ` Embedded Engineer
2019-03-05 11:36                               ` Thierry Reding
2019-03-04 14:00                   ` Andrew Lunn
2019-03-04 14:27                     ` Thierry Reding
2019-03-04 15:27                     ` Embedded Engineer
2019-03-04 15:57                       ` Andrew Lunn
2019-03-04 16:03                         ` Embedded Engineer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190305112005.GC26369@ulmo \
    --to=thierry.reding@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=embed786@gmail.com \
    --cc=jonathanh@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=vladimir.murzin@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).