* Re: cfi_cmdset_0002: do_write_buffer timeouts [not found] <CAN8TOE8dVYxBbb8MtozFio8dS-ypq14U8RuKTo38QcAtXM5Qrw@mail.gmail.com> @ 2013-04-11 9:00 ` Brian Norris 2013-04-11 9:21 ` Huang Shijie ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Brian Norris @ 2013-04-11 9:00 UTC (permalink / raw) To: linux-mtd@lists.infradead.org Cc: Huang Shijie, Kevin Cernekee, David Woodhouse, Artem Bityutskiy [Sorry for the repeat email for some; Gmail switched me back to HTML-mode, so my previous email couldn't be delivered to the MTD list] Hi all, I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c: MTD do_write_buffer(): software timeout I'm using a 64Mbyte Spansion S29GL512 NOR flash: physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer ID 0x000001 Chip ID 0x002301 I can reproduce the timeout approximately 0.5% of the time on a simple reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the timeout comes out to just 1 jiffy. I have to increase this timeout to at least 3 ticks to avoid the timeouts. (I've been running reboot tests successfully for several days with the timeout as 3 jiffies.) So my question is: what is the "best" way to decide these timeouts? I'm inclined to just increase the timeout (and to use the proper msecs_to_jiffies() macro, as a cleanup). But according to the datasheets (which agree with the comments in the code), the max time should be less than a millisecond. So simply increasing the timeout may in fact just be masking some other bug. Huang, I noticed you recently sent a patch that adjusts the timeout print message in do_write_buffer(). Have you had problems with this code recently? Any thoughts from any interested (or uninterested) party would be useful. Thanks, Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-11 9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris @ 2013-04-11 9:21 ` Huang Shijie 2013-04-11 19:37 ` Brian Norris 2013-04-12 6:23 ` Norbert van Bolhuis 2013-04-12 6:34 ` Stefan Roese 2 siblings, 1 reply; 9+ messages in thread From: Huang Shijie @ 2013-04-11 9:21 UTC (permalink / raw) To: Brian Norris Cc: David Woodhouse, Kevin Cernekee, linux-mtd@lists.infradead.org, Artem Bityutskiy 于 2013年04月11日 17:00, Brian Norris 写道: > [Sorry for the repeat email for some; Gmail switched me back to > HTML-mode, so my previous email couldn't be delivered to the MTD list] > > Hi all, > > I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c: > > MTD do_write_buffer(): software timeout > > I'm using a 64Mbyte Spansion S29GL512 NOR flash: > > physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. > Manufacturer ID 0x000001 Chip ID 0x002301 > > I can reproduce the timeout approximately 0.5% of the time on a simple > reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the > timeout comes out to just 1 jiffy. I have to increase this timeout to > at least 3 ticks to avoid the timeouts. (I've been running reboot > tests successfully for several days with the timeout as 3 jiffies.) > > So my question is: what is the "best" way to decide these timeouts? > I'm inclined to just increase the timeout (and to use the proper > msecs_to_jiffies() macro, as a cleanup). But according to the > datasheets (which agree with the comments in the code), the max time > should be less than a millisecond. So simply increasing the timeout > may in fact just be masking some other bug. > > Huang, > > I noticed you recently sent a patch that adjusts the timeout print > message in do_write_buffer(). Have you had problems with this code > recently? > yes. I am fighting with the timeout out now. :( My chip is M29W256GL7AN6E. physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer ID 0x000020 Chip ID 0x00227e When I run the bonnie++/ubifs on the NOR. I will get a timeout occasionally. Sometime it can passes the bonie++/ubifs test, while sometimes it can not. The timeout occurs at some fixed address, such as 0x4e0000, 0x520000. I tried to extend the 1ms to 10ms for the buffer-write in do_write_buffer(). But the bug still occurs. (I also tested other Nor, such as Spansion S29GL256P10 and Micron JS28F256M29EWL. i do not meet the timeout issue with these two nor.) thanks Huang Shijie ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-11 9:21 ` Huang Shijie @ 2013-04-11 19:37 ` Brian Norris 0 siblings, 0 replies; 9+ messages in thread From: Brian Norris @ 2013-04-11 19:37 UTC (permalink / raw) To: Huang Shijie Cc: David Woodhouse, Kevin Cernekee, linux-mtd@lists.infradead.org, Artem Bityutskiy On Thu, Apr 11, 2013 at 2:21 AM, Huang Shijie <b32955@freescale.com> wrote: > 于 2013年04月11日 17:00, Brian Norris 写道: >> I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c: >> >> MTD do_write_buffer(): software timeout >> >> I'm using a 64Mbyte Spansion S29GL512 NOR flash: >> >> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. >> Manufacturer ID 0x000001 Chip ID 0x002301 >> >> I can reproduce the timeout approximately 0.5% of the time on a simple >> reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the >> timeout comes out to just 1 jiffy. I have to increase this timeout to >> at least 3 ticks to avoid the timeouts. (I've been running reboot >> tests successfully for several days with the timeout as 3 jiffies.) >> >> So my question is: what is the "best" way to decide these timeouts? >> I'm inclined to just increase the timeout (and to use the proper >> msecs_to_jiffies() macro, as a cleanup). But according to the >> datasheets (which agree with the comments in the code), the max time >> should be less than a millisecond. So simply increasing the timeout >> may in fact just be masking some other bug. >> >> Huang, >> >> I noticed you recently sent a patch that adjusts the timeout print >> message in do_write_buffer(). Have you had problems with this code >> recently? >> > yes. I am fighting with the timeout out now. :( > > My chip is M29W256GL7AN6E. > physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer ID > 0x000020 Chip ID 0x00227e > > > When I run the bonnie++/ubifs on the NOR. I will get a timeout occasionally. > Sometime it can passes the bonie++/ubifs test, > while sometimes it can not. > The timeout occurs at some fixed address, such as 0x4e0000, 0x520000. I > tried to extend the 1ms to 10ms for the buffer-write in do_write_buffer(). > But the bug still occurs. Well, our timeouts are a little different then. A larger timeout solves all my problems. And my timeouts aren't at consistent addresses. Here's a sampling of mine over the last few hours. MTD do_write_buffer(): software timeout @ address 0x240b87e MTD do_write_buffer(): software timeout @ address 0x248b6be MTD do_write_buffer(): software timeout @ address 0x132067e MTD do_write_buffer(): software timeout @ address 0x31712fe MTD do_write_buffer(): software timeout @ address 0x3c0e0fe MTD do_write_buffer(): software timeout @ address 0xd2037e MTD do_write_buffer(): software timeout @ address 0x318043e MTD do_write_buffer(): software timeout @ address 0x2a201fe MTD do_write_buffer(): software timeout @ address 0x2a4f47e MTD do_write_buffer(): software timeout @ address 0x2a3ef7e > (I also tested other Nor, such as Spansion S29GL256P10 and Micron > JS28F256M29EWL. > i do not meet the timeout issue with these two nor.) Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-11 9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris 2013-04-11 9:21 ` Huang Shijie @ 2013-04-12 6:23 ` Norbert van Bolhuis 2013-04-13 2:59 ` Brian Norris 2013-04-12 6:34 ` Stefan Roese 2 siblings, 1 reply; 9+ messages in thread From: Norbert van Bolhuis @ 2013-04-12 6:23 UTC (permalink / raw) To: Brian Norris Cc: Huang Shijie, Kevin Cernekee, linux-mtd@lists.infradead.org, David Woodhouse, Artem Bityutskiy On 04/11/13 11:00, Brian Norris wrote: > [Sorry for the repeat email for some; Gmail switched me back to > HTML-mode, so my previous email couldn't be delivered to the MTD list] > > Hi all, > > I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c: > > MTD do_write_buffer(): software timeout > > I'm using a 64Mbyte Spansion S29GL512 NOR flash: > > physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. > Manufacturer ID 0x000001 Chip ID 0x002301 > > I can reproduce the timeout approximately 0.5% of the time on a simple > reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the > timeout comes out to just 1 jiffy. I have to increase this timeout to > at least 3 ticks to avoid the timeouts. (I've been running reboot > tests successfully for several days with the timeout as 3 jiffies.) > > So my question is: what is the "best" way to decide these timeouts? > I'm inclined to just increase the timeout (and to use the proper > msecs_to_jiffies() macro, as a cleanup). But according to the > datasheets (which agree with the comments in the code), the max time > should be less than a millisecond. So simply increasing the timeout > may in fact just be masking some other bug. > > Huang, > > I noticed you recently sent a patch that adjusts the timeout print > message in do_write_buffer(). Have you had problems with this code > recently? > > Any thoughts from any interested (or uninterested) party would be useful. > > Thanks, > Brian > This: http://lkml.org/lkml/2009/9/3/84 maybe your problem. Try disabling CONFIG_NO_HZ and you know for sure. --- Norbert ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-12 6:23 ` Norbert van Bolhuis @ 2013-04-13 2:59 ` Brian Norris 2013-04-15 7:55 ` Huang Shijie 0 siblings, 1 reply; 9+ messages in thread From: Brian Norris @ 2013-04-13 2:59 UTC (permalink / raw) To: Norbert van Bolhuis Cc: Huang Shijie, Kevin Cernekee, linux-mtd@lists.infradead.org, David Woodhouse, Artem Bityutskiy On Thu, Apr 11, 2013 at 11:23 PM, Norbert van Bolhuis <nvbolhuis@aimvalley.nl> wrote: > On 04/11/13 11:00, Brian Norris wrote: >> >> [Sorry for the repeat email for some; Gmail switched me back to >> HTML-mode, so my previous email couldn't be delivered to the MTD list] >> >> Hi all, >> >> I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c: >> >> MTD do_write_buffer(): software timeout >> >> I'm using a 64Mbyte Spansion S29GL512 NOR flash: >> >> physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. >> Manufacturer ID 0x000001 Chip ID 0x002301 >> >> I can reproduce the timeout approximately 0.5% of the time on a simple >> reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the >> timeout comes out to just 1 jiffy. I have to increase this timeout to >> at least 3 ticks to avoid the timeouts. (I've been running reboot >> tests successfully for several days with the timeout as 3 jiffies.) >> >> So my question is: what is the "best" way to decide these timeouts? >> I'm inclined to just increase the timeout (and to use the proper >> msecs_to_jiffies() macro, as a cleanup). But according to the >> datasheets (which agree with the comments in the code), the max time >> should be less than a millisecond. So simply increasing the timeout >> may in fact just be masking some other bug. >> >> Huang, >> >> I noticed you recently sent a patch that adjusts the timeout print >> message in do_write_buffer(). Have you had problems with this code >> recently? >> >> Any thoughts from any interested (or uninterested) party would be useful. >> >> Thanks, >> Brian >> > > > This: > > http://lkml.org/lkml/2009/9/3/84 > > maybe your problem. > > Try disabling CONFIG_NO_HZ and you know for sure. Disabling CONFIG_NO_HZ doesn't fix my problem. Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-13 2:59 ` Brian Norris @ 2013-04-15 7:55 ` Huang Shijie 2013-04-17 21:45 ` Brian Norris 0 siblings, 1 reply; 9+ messages in thread From: Huang Shijie @ 2013-04-15 7:55 UTC (permalink / raw) To: Brian Norris Cc: David Woodhouse, Kevin Cernekee, linux-mtd@lists.infradead.org, Norbert van Bolhuis, Artem Bityutskiy 于 2013年04月13日 10:59, Brian Norris 写道: > Disabling CONFIG_NO_HZ doesn't fix my problem. I also disable the CONFIG_NO_HZ, and it does not fix my problem too. But after i remove the mutex_unlock/mutex_lock in UDELAY/INVALIDATE_CACHE_UDELAY, my problem disappears. I run for three days, no timeout occurs. (I do not enable the CONFIG_MTD_XIP). --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -1043,17 +1043,13 @@ static void __xipram xip_udelay(struct map_info *map, struct flchip *chip, #define UDELAY(map, chip, adr, usec) \ do { \ - mutex_unlock(&chip->mutex); \ cfi_udelay(usec); \ - mutex_lock(&chip->mutex); \ } while (0) #define INVALIDATE_CACHE_UDELAY(map, chip, adr, len, usec) \ do { \ - mutex_unlock(&chip->mutex); \ INVALIDATE_CACHED_RANGE(map, adr, len); \ cfi_udelay(usec); \ - mutex_lock(&chip->mutex); \ } while (0) #endif -- thanks Huang Shijie ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-15 7:55 ` Huang Shijie @ 2013-04-17 21:45 ` Brian Norris 2013-04-18 2:09 ` Huang Shijie 0 siblings, 1 reply; 9+ messages in thread From: Brian Norris @ 2013-04-17 21:45 UTC (permalink / raw) To: Huang Shijie Cc: David Woodhouse, Kevin Cernekee, linux-mtd@lists.infradead.org, Norbert van Bolhuis, Artem Bityutskiy On Mon, Apr 15, 2013 at 12:55 AM, Huang Shijie <b32955@freescale.com> wrote: > 于 2013年04月13日 10:59, Brian Norris 写道: > >> Disabling CONFIG_NO_HZ doesn't fix my problem. > > I also disable the CONFIG_NO_HZ, and it does not fix my problem too. > > But after i remove the mutex_unlock/mutex_lock in > UDELAY/INVALIDATE_CACHE_UDELAY, > my problem disappears. I run for three days, no timeout occurs. (I do not > enable the CONFIG_MTD_XIP). > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c > @@ -1043,17 +1043,13 @@ static void __xipram xip_udelay(struct map_info > *map, struct flchip *chip, > > #define UDELAY(map, chip, adr, usec) \ > do { \ > - mutex_unlock(&chip->mutex); \ > cfi_udelay(usec); \ > - mutex_lock(&chip->mutex); \ > } while (0) > > #define INVALIDATE_CACHE_UDELAY(map, chip, adr, len, usec) \ > do { \ > - mutex_unlock(&chip->mutex); \ > INVALIDATE_CACHED_RANGE(map, adr, len); \ > cfi_udelay(usec); \ > - mutex_lock(&chip->mutex); \ > } while (0) > > #endif This patch doesn't solve my problem, so it seems that Huang and I probably are seeing different root causes for these timeouts. I tried applying this patch and then timing the exact delay seen by the time we "time out" (by directly accessing the CPU count register), and the delay is always very close to 4ms (with my kernel, HZ=250, so 4ms is expected). So it seems like my system is waiting plenty long (according to the flash part specification) but if I wait even longer, the operation does complete successfully. I'll continue to look at this issue, but I thought I'd post my results so far. Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-17 21:45 ` Brian Norris @ 2013-04-18 2:09 ` Huang Shijie 0 siblings, 0 replies; 9+ messages in thread From: Huang Shijie @ 2013-04-18 2:09 UTC (permalink / raw) To: Brian Norris Cc: David Woodhouse, Kevin Cernekee, linux-mtd@lists.infradead.org, Norbert van Bolhuis, Artem Bityutskiy 于 2013年04月18日 05:45, Brian Norris 写道: > This patch doesn't solve my problem, so it seems that Huang and I > probably are seeing different root causes for these timeouts. > yes. My timeout maybe caused by the error in the erase-suspend/erase-resume. thanks Huang Shijie ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cfi_cmdset_0002: do_write_buffer timeouts 2013-04-11 9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris 2013-04-11 9:21 ` Huang Shijie 2013-04-12 6:23 ` Norbert van Bolhuis @ 2013-04-12 6:34 ` Stefan Roese 2 siblings, 0 replies; 9+ messages in thread From: Stefan Roese @ 2013-04-12 6:34 UTC (permalink / raw) To: Brian Norris Cc: Huang Shijie, Kevin Cernekee, linux-mtd@lists.infradead.org, David Woodhouse, Artem Bityutskiy On 11.04.2013 11:00, Brian Norris wrote: > [Sorry for the repeat email for some; Gmail switched me back to > HTML-mode, so my previous email couldn't be delivered to the MTD list] > > Hi all, > > I'm having some trouble where I am getting timeouts in cfi_cmdset_0002.c: > > MTD do_write_buffer(): software timeout > > I'm using a 64Mbyte Spansion S29GL512 NOR flash: > > physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. > Manufacturer ID 0x000001 Chip ID 0x002301 > > I can reproduce the timeout approximately 0.5% of the time on a simple > reboot, mount UBI rootfs test. My system has CONFIG_HZ=250, and so the > timeout comes out to just 1 jiffy. I have to increase this timeout to > at least 3 ticks to avoid the timeouts. (I've been running reboot > tests successfully for several days with the timeout as 3 jiffies.) > > So my question is: what is the "best" way to decide these timeouts? > I'm inclined to just increase the timeout (and to use the proper > msecs_to_jiffies() macro, as a cleanup). But according to the > datasheets (which agree with the comments in the code), the max time > should be less than a millisecond. So simply increasing the timeout > may in fact just be masking some other bug. > > Huang, > > I noticed you recently sent a patch that adjusts the timeout print > message in do_write_buffer(). Have you had problems with this code > recently? > > Any thoughts from any interested (or uninterested) party would be useful. Without looking into the cmdset_0002 code, I remember fixing a similar issue for cmdset_0001 a few months ago: git id: 7be1f6b9a1ae3476a424380b52aad7c14c3273ab Author: Stefan Roese <sr@denx.de> 2012-08-28 11:34:13 Committer: David Woodhouse <David.Woodhouse@intel.com> 2012-09-29 16:29:08 Follows: v3.6-rc2 Precedes: v3.7-rc1 mtd: cfi_cmdset_0001: Fix problem with unlocking timeout Unlocking may take up to 1.4 seconds on some Intel flashes. So lets use a max. of 1.5 seconds (1500ms) as timeout. See "Clear Block Lock-Bits Time" on page 40 in "3 Volt Intel StrataFlash Memory" 28F128J3,28F640J3,28F320J3 manual from February 2003 This patch also fixes some other problems with this timeout: - Don't use HZ in timeout "calculation"! While testing we noticed that an unlocking timeout occured with HZ=1000 and didn't occur with HZ=300. This was because the timeout parameter was calculated differently depending on the HZ value. Now a fixed value of 1500ms is used. - The last parameter of WAIT_TIMEOUT (defined to inval_cache_and_wait_for_operation) has to be passed in micro-seconds. So multiply the ms value with 1000 and not 100 to calculate this value. - Use variable name "mdelay" instead of misleading "udelay". One main issue here was that the resulting timeout was HZ related resulting in different behavior depending on the HZ configuration. This current issue here might be related, not sure though. Thanks, Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-04-18 2:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAN8TOE8dVYxBbb8MtozFio8dS-ypq14U8RuKTo38QcAtXM5Qrw@mail.gmail.com>
2013-04-11 9:00 ` cfi_cmdset_0002: do_write_buffer timeouts Brian Norris
2013-04-11 9:21 ` Huang Shijie
2013-04-11 19:37 ` Brian Norris
2013-04-12 6:23 ` Norbert van Bolhuis
2013-04-13 2:59 ` Brian Norris
2013-04-15 7:55 ` Huang Shijie
2013-04-17 21:45 ` Brian Norris
2013-04-18 2:09 ` Huang Shijie
2013-04-12 6:34 ` Stefan Roese
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).