All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Roger Quadros <rogerq@ti.com>, Tony Lindgren <tony@atomide.com>
Cc: linux-omap <linux-omap@vger.kernel.org>,
	"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	"Cooper Jr., Franklin" <fcooper@ti.com>
Subject: Re: gpmc-nand broken since v4.12
Date: Mon, 16 Oct 2017 15:50:24 +0200	[thread overview]
Message-ID: <20171016155024.211c8235@bbrezillon> (raw)
In-Reply-To: <20171016143904.098eaa50@bbrezillon>

On Mon, 16 Oct 2017 14:39:04 +0200
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> On Mon, 16 Oct 2017 15:12:38 +0300
> Roger Quadros <rogerq@ti.com> wrote:
> 
> > Hi Boris,
> > 
> > 
> > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> > 
> > On 16/10/17 14:34, Boris Brezillon wrote:  
> > > Hi Roger,
> > > 
> > > On Mon, 16 Oct 2017 13:22:04 +0300
> > > Roger Quadros <rogerq@ti.com> wrote:
> > >     
> > >> 
> > >> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> > >>
> > >> On 13/10/17 23:29, Boris Brezillon wrote:    
> > >>> On Fri, 13 Oct 2017 22:24:58 +0200
> > >>> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> > >>>       
> > >>>> On Fri, 13 Oct 2017 14:50:33 +0200
> > >>>> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> > >>>>      
> > >>>>> On Fri, 13 Oct 2017 14:57:29 +0300
> > >>>>> Roger Quadros <rogerq@ti.com> wrote:
> > >>>>>         
> > >>>>>> Hi Boris,
> > >>>>>>
> > >>>>>> NAND on gpmc-omap breaks for me while doing a unmount of a ubi volume since v4.12
> > >>>>>>
> > >>>>>> Behaviour follows through in v4.13 and v4.14-rc as well.
> > >>>>>>
> > >>>>>> Do you have any idea about this?        
> > >>>>
> > >>>> More info on what has changed in 4.12: we no longer allocate the
> > >>>> nand_buffer struct + its internal buffer using a single big kmalloc
> > >>>> call, which means the nand_buffer struct is now likely to be allocated
> > >>>> in a small object slab (sizeof(nand_buffers) = 12). If you have a
> > >>>> use-after-free bug somewhere in the kernel, it might corrupt the      
> > >>
> > >> Spot on. Problem starts from commit 4d5dea5725f3aa95b9b64e252540e435dd216dbc
> > >> "mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset"
> > >>    
> > >>>
> > >>> I meant buffer-overflow, not use-after-free.      
> > >>
> > >> Your analysis seems correct.
> > >>    
> > >>>       
> > >>>> nand_buffers object, which might explain the bug you see here.
> > >>>>      
> > >>>>>
> > >>>>> Can you try with this patch [1] applied and paste me the values printed
> > >>>>> just before the crash?
> > >>>>>
> > >>>>> [1]http://code.bulix.org/lc8xk0-209746      
> > >>
> > >> == unmounting volume
> > >> [   36.308792] nand: nand_write_subpage_hwecc:2575 ecc_calc =   (null) oob_poi = ed096800
> > >> [   36.317319] mtd_ooblayout_set_bytes:1330 dst = ed096802 src =   (null)
> > >> [   36.324162] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > >>
> > >>
> > >> Running the following patch
> > >> https://hastebin.com/ulogurutuz.php
> > >> shows
> > >>
> > >> [   37.260689] nand: nand_write_subpage_hwecc:2547:A ecc_calc = ed116e40 oob_poi = ed117800
> > >> [   37.260772] nand: nand_write_subpage_hwecc:2556:A1:step 0, ecc_calc = ed116e40
> > >> [   37.260779] nand: nand_write_subpage_hwecc:2562:A2:step 0, ecc_calc = ed116e40
> > >> [   37.260834] nand: nand_write_subpage_hwecc:2556:A1:step 1, ecc_calc = ed116e40
> > >> [   37.260840] nand: nand_write_subpage_hwecc:2567:A3-pre:step 1, ecc_calc = ed116e40
> > >> [   37.260846] omap_calculate_ecc_bch
> > >> [   37.260856] nand: nand_write_subpage_hwecc:2570:A3-post:step 1, ecc_calc =   (null)
> > >> [   37.260912] nand: nand_write_subpage_hwecc:2556:A1:step 2, ecc_calc =   (null)
> > >> [   37.260918] nand: nand_write_subpage_hwecc:2562:A2:step 2, ecc_calc =   (null)
> > >> [   37.260972] nand: nand_write_subpage_hwecc:2556:A1:step 3, ecc_calc =   (null)
> > >> [   37.260978] nand: nand_write_subpage_hwecc:2562:A2:step 3, ecc_calc =   (null)
> > >> [   37.260984] nand: nand_write_subpage_hwecc:2587:B ecc_calc =   (null) oob_poi = ed117800
> > >> [   37.260991] mtd_ooblayout_set_bytes:1330 dst = ed117802 src =   (null)
> > >>
> > >> which means omap_calculate_ecc_bch() it the culprit.
> > >>
> > >> Looks like the function calculates and stores ECC for all sectors even if we just want ECC
> > >> for just one sector (sub-page).    
> > > 
> > > Yes, looks like you find the root cause.
> > >     
> > >>
> > >> Is my understanding correct
> > >> - We should not be hooking the multi-sector ECC calculator omap_calculate_ecc_bch() to ecc.calculate
> > >> - provide a new one sector ECC calculator function (for BCH4/8/16) for omap and hook it to ecc.calculate
> > >> OR
> > >> - override nand_read_subpage() and nand_write_subpage() using omap specific implementation (for BCH4/8/16).    
> > > 
> > > Second solution sounds simpler because the ECC sector information is
> > > not passed to ecc->calculate() hook, which means you'd have to extract
> > > it from the ecc_calc pointer:
> > > 
> > > 	(uintptr_t)(ecc_calc - chip->buffers->ecccalc) / chip->ecc.size    
> > 
> > I don't think we need ECC sector number at all if we're always going to do a transfer
> > of one sector of data after calling ecc.hwctl(NAND_ECC_READ/WRITE) and before calling ecc.calculate.
> > 
> > My understanding is that the ECC calculators sector 0 registers will always contain
> > the right content irrespective of which sector we transfer as long as we do,
> > 	ecc.hwctl(mtd, NAND_ECC_READ/WRITE);
> > 	transfer one sector;
> > 	ecc.calculate();  
> 
> Ok, then the first solution sounds good too.
> 
> > 
> > Why isn't there a nand_read_subpage_hwecc()?  
> 
> Probably because nobody ever needed it. Feel free to add it if you
> think this is necessary.

Actually, if you want to make your patch easily backport-able to
stable releases, you'd better go for the omap_nand_write/read_subpage()
approach. The generic solution can be done afterwards.

> 
> > In the current form a subpage read won't work if
> > if ecc->mode is NAND_ECC_HW and controllers expect a ecc.hwctl(NAND_ECC_READ) before
> > calling ecc.calculate().
> > 
> > For the OMAP case I can override both subpage functions.
> > Is there a good way to test if subpage read/writes are working as they should?  
> 
> There's a nandsubpage test [1] in mtd-utils.
> 
> [1]http://git.infradead.org/mtd-utils.git/blob/HEAD:/tests/mtd-tests/nandpagetest.c
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Roger Quadros <rogerq@ti.com>, Tony Lindgren <tony@atomide.com>
Cc: linux-omap <linux-omap@vger.kernel.org>,
	"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	"Cooper Jr., Franklin" <fcooper@ti.com>
Subject: Re: gpmc-nand broken since v4.12
Date: Mon, 16 Oct 2017 15:50:24 +0200	[thread overview]
Message-ID: <20171016155024.211c8235@bbrezillon> (raw)
In-Reply-To: <20171016143904.098eaa50@bbrezillon>

On Mon, 16 Oct 2017 14:39:04 +0200
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> On Mon, 16 Oct 2017 15:12:38 +0300
> Roger Quadros <rogerq@ti.com> wrote:
> 
> > Hi Boris,
> > 
> > 
> > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> > 
> > On 16/10/17 14:34, Boris Brezillon wrote:  
> > > Hi Roger,
> > > 
> > > On Mon, 16 Oct 2017 13:22:04 +0300
> > > Roger Quadros <rogerq@ti.com> wrote:
> > >     
> > >> 
> > >> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> > >>
> > >> On 13/10/17 23:29, Boris Brezillon wrote:    
> > >>> On Fri, 13 Oct 2017 22:24:58 +0200
> > >>> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> > >>>       
> > >>>> On Fri, 13 Oct 2017 14:50:33 +0200
> > >>>> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> > >>>>      
> > >>>>> On Fri, 13 Oct 2017 14:57:29 +0300
> > >>>>> Roger Quadros <rogerq@ti.com> wrote:
> > >>>>>         
> > >>>>>> Hi Boris,
> > >>>>>>
> > >>>>>> NAND on gpmc-omap breaks for me while doing a unmount of a ubi volume since v4.12
> > >>>>>>
> > >>>>>> Behaviour follows through in v4.13 and v4.14-rc as well.
> > >>>>>>
> > >>>>>> Do you have any idea about this?        
> > >>>>
> > >>>> More info on what has changed in 4.12: we no longer allocate the
> > >>>> nand_buffer struct + its internal buffer using a single big kmalloc
> > >>>> call, which means the nand_buffer struct is now likely to be allocated
> > >>>> in a small object slab (sizeof(nand_buffers) = 12). If you have a
> > >>>> use-after-free bug somewhere in the kernel, it might corrupt the      
> > >>
> > >> Spot on. Problem starts from commit 4d5dea5725f3aa95b9b64e252540e435dd216dbc
> > >> "mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset"
> > >>    
> > >>>
> > >>> I meant buffer-overflow, not use-after-free.      
> > >>
> > >> Your analysis seems correct.
> > >>    
> > >>>       
> > >>>> nand_buffers object, which might explain the bug you see here.
> > >>>>      
> > >>>>>
> > >>>>> Can you try with this patch [1] applied and paste me the values printed
> > >>>>> just before the crash?
> > >>>>>
> > >>>>> [1]http://code.bulix.org/lc8xk0-209746      
> > >>
> > >> == unmounting volume
> > >> [   36.308792] nand: nand_write_subpage_hwecc:2575 ecc_calc =   (null) oob_poi = ed096800
> > >> [   36.317319] mtd_ooblayout_set_bytes:1330 dst = ed096802 src =   (null)
> > >> [   36.324162] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > >>
> > >>
> > >> Running the following patch
> > >> https://hastebin.com/ulogurutuz.php
> > >> shows
> > >>
> > >> [   37.260689] nand: nand_write_subpage_hwecc:2547:A ecc_calc = ed116e40 oob_poi = ed117800
> > >> [   37.260772] nand: nand_write_subpage_hwecc:2556:A1:step 0, ecc_calc = ed116e40
> > >> [   37.260779] nand: nand_write_subpage_hwecc:2562:A2:step 0, ecc_calc = ed116e40
> > >> [   37.260834] nand: nand_write_subpage_hwecc:2556:A1:step 1, ecc_calc = ed116e40
> > >> [   37.260840] nand: nand_write_subpage_hwecc:2567:A3-pre:step 1, ecc_calc = ed116e40
> > >> [   37.260846] omap_calculate_ecc_bch
> > >> [   37.260856] nand: nand_write_subpage_hwecc:2570:A3-post:step 1, ecc_calc =   (null)
> > >> [   37.260912] nand: nand_write_subpage_hwecc:2556:A1:step 2, ecc_calc =   (null)
> > >> [   37.260918] nand: nand_write_subpage_hwecc:2562:A2:step 2, ecc_calc =   (null)
> > >> [   37.260972] nand: nand_write_subpage_hwecc:2556:A1:step 3, ecc_calc =   (null)
> > >> [   37.260978] nand: nand_write_subpage_hwecc:2562:A2:step 3, ecc_calc =   (null)
> > >> [   37.260984] nand: nand_write_subpage_hwecc:2587:B ecc_calc =   (null) oob_poi = ed117800
> > >> [   37.260991] mtd_ooblayout_set_bytes:1330 dst = ed117802 src =   (null)
> > >>
> > >> which means omap_calculate_ecc_bch() it the culprit.
> > >>
> > >> Looks like the function calculates and stores ECC for all sectors even if we just want ECC
> > >> for just one sector (sub-page).    
> > > 
> > > Yes, looks like you find the root cause.
> > >     
> > >>
> > >> Is my understanding correct
> > >> - We should not be hooking the multi-sector ECC calculator omap_calculate_ecc_bch() to ecc.calculate
> > >> - provide a new one sector ECC calculator function (for BCH4/8/16) for omap and hook it to ecc.calculate
> > >> OR
> > >> - override nand_read_subpage() and nand_write_subpage() using omap specific implementation (for BCH4/8/16).    
> > > 
> > > Second solution sounds simpler because the ECC sector information is
> > > not passed to ecc->calculate() hook, which means you'd have to extract
> > > it from the ecc_calc pointer:
> > > 
> > > 	(uintptr_t)(ecc_calc - chip->buffers->ecccalc) / chip->ecc.size    
> > 
> > I don't think we need ECC sector number at all if we're always going to do a transfer
> > of one sector of data after calling ecc.hwctl(NAND_ECC_READ/WRITE) and before calling ecc.calculate.
> > 
> > My understanding is that the ECC calculators sector 0 registers will always contain
> > the right content irrespective of which sector we transfer as long as we do,
> > 	ecc.hwctl(mtd, NAND_ECC_READ/WRITE);
> > 	transfer one sector;
> > 	ecc.calculate();  
> 
> Ok, then the first solution sounds good too.
> 
> > 
> > Why isn't there a nand_read_subpage_hwecc()?  
> 
> Probably because nobody ever needed it. Feel free to add it if you
> think this is necessary.

Actually, if you want to make your patch easily backport-able to
stable releases, you'd better go for the omap_nand_write/read_subpage()
approach. The generic solution can be done afterwards.

> 
> > In the current form a subpage read won't work if
> > if ecc->mode is NAND_ECC_HW and controllers expect a ecc.hwctl(NAND_ECC_READ) before
> > calling ecc.calculate().
> > 
> > For the OMAP case I can override both subpage functions.
> > Is there a good way to test if subpage read/writes are working as they should?  
> 
> There's a nandsubpage test [1] in mtd-utils.
> 
> [1]http://git.infradead.org/mtd-utils.git/blob/HEAD:/tests/mtd-tests/nandpagetest.c
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2017-10-16 13:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13 11:57 gpmc-nand broken since v4.12 Roger Quadros
2017-10-13 11:57 ` Roger Quadros
2017-10-13 12:50 ` Boris Brezillon
2017-10-13 12:50   ` Boris Brezillon
2017-10-13 20:24   ` Boris Brezillon
2017-10-13 20:24     ` Boris Brezillon
2017-10-13 20:29     ` Boris Brezillon
2017-10-13 20:29       ` Boris Brezillon
2017-10-16 10:22       ` Roger Quadros
2017-10-16 10:22         ` Roger Quadros
2017-10-16 11:34         ` Boris Brezillon
2017-10-16 11:34           ` Boris Brezillon
2017-10-16 12:12           ` Roger Quadros
2017-10-16 12:12             ` Roger Quadros
2017-10-16 12:39             ` Boris Brezillon
2017-10-16 12:39               ` Boris Brezillon
2017-10-16 13:50               ` Boris Brezillon [this message]
2017-10-16 13:50                 ` Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171016155024.211c8235@bbrezillon \
    --to=boris.brezillon@free-electrons.com \
    --cc=fcooper@ti.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=rogerq@ti.com \
    --cc=tony@atomide.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.