From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Roger Quadros <rogerq@ti.com>
Cc: Tony Lindgren <tony@atomide.com>,
linux-omap <linux-omap@vger.kernel.org>,
"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
"Cooper Jr., Franklin" <fcooper@ti.com>
Subject: Re: gpmc-nand broken since v4.12
Date: Mon, 16 Oct 2017 13:34:57 +0200 [thread overview]
Message-ID: <20171016133457.74b2773f@bbrezillon> (raw)
In-Reply-To: <fe579fc6-253d-a734-22b0-de24448dfde9@ti.com>
Hi Roger,
On Mon, 16 Oct 2017 13:22:04 +0300
Roger Quadros <rogerq@ti.com> wrote:
>
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>
> On 13/10/17 23:29, Boris Brezillon wrote:
> > On Fri, 13 Oct 2017 22:24:58 +0200
> > Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> >
> >> On Fri, 13 Oct 2017 14:50:33 +0200
> >> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> >>
> >>> On Fri, 13 Oct 2017 14:57:29 +0300
> >>> Roger Quadros <rogerq@ti.com> wrote:
> >>>
> >>>> Hi Boris,
> >>>>
> >>>> NAND on gpmc-omap breaks for me while doing a unmount of a ubi volume since v4.12
> >>>>
> >>>> Behaviour follows through in v4.13 and v4.14-rc as well.
> >>>>
> >>>> Do you have any idea about this?
> >>
> >> More info on what has changed in 4.12: we no longer allocate the
> >> nand_buffer struct + its internal buffer using a single big kmalloc
> >> call, which means the nand_buffer struct is now likely to be allocated
> >> in a small object slab (sizeof(nand_buffers) = 12). If you have a
> >> use-after-free bug somewhere in the kernel, it might corrupt the
>
> Spot on. Problem starts from commit 4d5dea5725f3aa95b9b64e252540e435dd216dbc
> "mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset"
>
> >
> > I meant buffer-overflow, not use-after-free.
>
> Your analysis seems correct.
>
> >
> >> nand_buffers object, which might explain the bug you see here.
> >>
> >>>
> >>> Can you try with this patch [1] applied and paste me the values printed
> >>> just before the crash?
> >>>
> >>> [1]http://code.bulix.org/lc8xk0-209746
>
> == unmounting volume
> [ 36.308792] nand: nand_write_subpage_hwecc:2575 ecc_calc = (null) oob_poi = ed096800
> [ 36.317319] mtd_ooblayout_set_bytes:1330 dst = ed096802 src = (null)
> [ 36.324162] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>
>
> Running the following patch
> https://hastebin.com/ulogurutuz.php
> shows
>
> [ 37.260689] nand: nand_write_subpage_hwecc:2547:A ecc_calc = ed116e40 oob_poi = ed117800
> [ 37.260772] nand: nand_write_subpage_hwecc:2556:A1:step 0, ecc_calc = ed116e40
> [ 37.260779] nand: nand_write_subpage_hwecc:2562:A2:step 0, ecc_calc = ed116e40
> [ 37.260834] nand: nand_write_subpage_hwecc:2556:A1:step 1, ecc_calc = ed116e40
> [ 37.260840] nand: nand_write_subpage_hwecc:2567:A3-pre:step 1, ecc_calc = ed116e40
> [ 37.260846] omap_calculate_ecc_bch
> [ 37.260856] nand: nand_write_subpage_hwecc:2570:A3-post:step 1, ecc_calc = (null)
> [ 37.260912] nand: nand_write_subpage_hwecc:2556:A1:step 2, ecc_calc = (null)
> [ 37.260918] nand: nand_write_subpage_hwecc:2562:A2:step 2, ecc_calc = (null)
> [ 37.260972] nand: nand_write_subpage_hwecc:2556:A1:step 3, ecc_calc = (null)
> [ 37.260978] nand: nand_write_subpage_hwecc:2562:A2:step 3, ecc_calc = (null)
> [ 37.260984] nand: nand_write_subpage_hwecc:2587:B ecc_calc = (null) oob_poi = ed117800
> [ 37.260991] mtd_ooblayout_set_bytes:1330 dst = ed117802 src = (null)
>
> which means omap_calculate_ecc_bch() it the culprit.
>
> Looks like the function calculates and stores ECC for all sectors even if we just want ECC
> for just one sector (sub-page).
Yes, looks like you find the root cause.
>
> Is my understanding correct
> - We should not be hooking the multi-sector ECC calculator omap_calculate_ecc_bch() to ecc.calculate
> - provide a new one sector ECC calculator function (for BCH4/8/16) for omap and hook it to ecc.calculate
> OR
> - override nand_read_subpage() and nand_write_subpage() using omap specific implementation (for BCH4/8/16).
Second solution sounds simpler because the ECC sector information is
not passed to ecc->calculate() hook, which means you'd have to extract
it from the ecc_calc pointer:
(uintptr_t)(ecc_calc - chip->buffers->ecccalc) / chip->ecc.size
and doing arithmetic on pointers is usually not a good idea.
Anyway, I'd be fine with both solutions, so pick the one you prefer.
Regards,
Boris
WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Roger Quadros <rogerq@ti.com>
Cc: Tony Lindgren <tony@atomide.com>,
linux-omap <linux-omap@vger.kernel.org>,
"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
"Cooper Jr., Franklin" <fcooper@ti.com>
Subject: Re: gpmc-nand broken since v4.12
Date: Mon, 16 Oct 2017 13:34:57 +0200 [thread overview]
Message-ID: <20171016133457.74b2773f@bbrezillon> (raw)
In-Reply-To: <fe579fc6-253d-a734-22b0-de24448dfde9@ti.com>
Hi Roger,
On Mon, 16 Oct 2017 13:22:04 +0300
Roger Quadros <rogerq@ti.com> wrote:
>
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>
> On 13/10/17 23:29, Boris Brezillon wrote:
> > On Fri, 13 Oct 2017 22:24:58 +0200
> > Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> >
> >> On Fri, 13 Oct 2017 14:50:33 +0200
> >> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> >>
> >>> On Fri, 13 Oct 2017 14:57:29 +0300
> >>> Roger Quadros <rogerq@ti.com> wrote:
> >>>
> >>>> Hi Boris,
> >>>>
> >>>> NAND on gpmc-omap breaks for me while doing a unmount of a ubi volume since v4.12
> >>>>
> >>>> Behaviour follows through in v4.13 and v4.14-rc as well.
> >>>>
> >>>> Do you have any idea about this?
> >>
> >> More info on what has changed in 4.12: we no longer allocate the
> >> nand_buffer struct + its internal buffer using a single big kmalloc
> >> call, which means the nand_buffer struct is now likely to be allocated
> >> in a small object slab (sizeof(nand_buffers) = 12). If you have a
> >> use-after-free bug somewhere in the kernel, it might corrupt the
>
> Spot on. Problem starts from commit 4d5dea5725f3aa95b9b64e252540e435dd216dbc
> "mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset"
>
> >
> > I meant buffer-overflow, not use-after-free.
>
> Your analysis seems correct.
>
> >
> >> nand_buffers object, which might explain the bug you see here.
> >>
> >>>
> >>> Can you try with this patch [1] applied and paste me the values printed
> >>> just before the crash?
> >>>
> >>> [1]http://code.bulix.org/lc8xk0-209746
>
> == unmounting volume
> [ 36.308792] nand: nand_write_subpage_hwecc:2575 ecc_calc = (null) oob_poi = ed096800
> [ 36.317319] mtd_ooblayout_set_bytes:1330 dst = ed096802 src = (null)
> [ 36.324162] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>
>
> Running the following patch
> https://hastebin.com/ulogurutuz.php
> shows
>
> [ 37.260689] nand: nand_write_subpage_hwecc:2547:A ecc_calc = ed116e40 oob_poi = ed117800
> [ 37.260772] nand: nand_write_subpage_hwecc:2556:A1:step 0, ecc_calc = ed116e40
> [ 37.260779] nand: nand_write_subpage_hwecc:2562:A2:step 0, ecc_calc = ed116e40
> [ 37.260834] nand: nand_write_subpage_hwecc:2556:A1:step 1, ecc_calc = ed116e40
> [ 37.260840] nand: nand_write_subpage_hwecc:2567:A3-pre:step 1, ecc_calc = ed116e40
> [ 37.260846] omap_calculate_ecc_bch
> [ 37.260856] nand: nand_write_subpage_hwecc:2570:A3-post:step 1, ecc_calc = (null)
> [ 37.260912] nand: nand_write_subpage_hwecc:2556:A1:step 2, ecc_calc = (null)
> [ 37.260918] nand: nand_write_subpage_hwecc:2562:A2:step 2, ecc_calc = (null)
> [ 37.260972] nand: nand_write_subpage_hwecc:2556:A1:step 3, ecc_calc = (null)
> [ 37.260978] nand: nand_write_subpage_hwecc:2562:A2:step 3, ecc_calc = (null)
> [ 37.260984] nand: nand_write_subpage_hwecc:2587:B ecc_calc = (null) oob_poi = ed117800
> [ 37.260991] mtd_ooblayout_set_bytes:1330 dst = ed117802 src = (null)
>
> which means omap_calculate_ecc_bch() it the culprit.
>
> Looks like the function calculates and stores ECC for all sectors even if we just want ECC
> for just one sector (sub-page).
Yes, looks like you find the root cause.
>
> Is my understanding correct
> - We should not be hooking the multi-sector ECC calculator omap_calculate_ecc_bch() to ecc.calculate
> - provide a new one sector ECC calculator function (for BCH4/8/16) for omap and hook it to ecc.calculate
> OR
> - override nand_read_subpage() and nand_write_subpage() using omap specific implementation (for BCH4/8/16).
Second solution sounds simpler because the ECC sector information is
not passed to ecc->calculate() hook, which means you'd have to extract
it from the ecc_calc pointer:
(uintptr_t)(ecc_calc - chip->buffers->ecccalc) / chip->ecc.size
and doing arithmetic on pointers is usually not a good idea.
Anyway, I'd be fine with both solutions, so pick the one you prefer.
Regards,
Boris
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
next prev parent reply other threads:[~2017-10-16 11:35 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-13 11:57 gpmc-nand broken since v4.12 Roger Quadros
2017-10-13 11:57 ` Roger Quadros
2017-10-13 12:50 ` Boris Brezillon
2017-10-13 12:50 ` Boris Brezillon
2017-10-13 20:24 ` Boris Brezillon
2017-10-13 20:24 ` Boris Brezillon
2017-10-13 20:29 ` Boris Brezillon
2017-10-13 20:29 ` Boris Brezillon
2017-10-16 10:22 ` Roger Quadros
2017-10-16 10:22 ` Roger Quadros
2017-10-16 11:34 ` Boris Brezillon [this message]
2017-10-16 11:34 ` Boris Brezillon
2017-10-16 12:12 ` Roger Quadros
2017-10-16 12:12 ` Roger Quadros
2017-10-16 12:39 ` Boris Brezillon
2017-10-16 12:39 ` Boris Brezillon
2017-10-16 13:50 ` Boris Brezillon
2017-10-16 13:50 ` Boris Brezillon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171016133457.74b2773f@bbrezillon \
--to=boris.brezillon@free-electrons.com \
--cc=fcooper@ti.com \
--cc=linux-mtd@lists.infradead.org \
--cc=linux-omap@vger.kernel.org \
--cc=rogerq@ti.com \
--cc=tony@atomide.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.