* kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable
@ 2009-04-05 3:32 Mok Keith
[not found] ` <69f617130904042032o382f5084v4fe21884e2356c77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Mok Keith @ 2009-04-05 3:32 UTC (permalink / raw)
To: linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW,
spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Hi all,
I have encounter a kernel panic, when I saw "pxa2xx-spi pxa2xx-spi.1:
dma_transfer: fifo overrun".
After dig into the code from the kernel panic log, I found that
cur_chip equals to NULL in pump_transfers function.
It is very easy to duplicated on my system running pxa9xx cpu with dma
enable (the spi working fine with pure I/O).
However if some printk is added for debugging, the problem gone.
So I cannot find out why the tasklet_schedule for pump_transfers is
called after giveback function is called without the cur_chip is set
first.
Anyone has any idea ?
Keith
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <69f617130904042032o382f5084v4fe21884e2356c77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable [not found] ` <69f617130904042032o382f5084v4fe21884e2356c77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-04-05 17:07 ` Ned Forrester [not found] ` <49D8E537.1010307-/d+BM93fTQY@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Ned Forrester @ 2009-04-05 17:07 UTC (permalink / raw) To: Mok Keith Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW Mok Keith wrote: > Hi all, > > I have encounter a kernel panic, when I saw "pxa2xx-spi pxa2xx-spi.1: > dma_transfer: fifo overrun". > After dig into the code from the kernel panic log, I found that > cur_chip equals to NULL in pump_transfers function. > > It is very easy to duplicated on my system running pxa9xx cpu with dma > enable (the spi working fine with pure I/O). > However if some printk is added for debugging, the problem gone. > > So I cannot find out why the tasklet_schedule for pump_transfers is > called after giveback function is called without the cur_chip is set > first. > > Anyone has any idea ? Some. I have worked on this driver a lot, but it has been awhile, so I might overlook some things. First, the panic is probably caused by these declarations in pump_transfers(): u32 dma_thresh = drv_data->cur_chip->dma_threshold; u32 dma_burst = drv_data->cur_chip->dma_burst_size; and, of course, uses of "chip" after this assignement: chip = drv_data->cur_chip; These assignments are performed without checking the validity of cur_chip. That should be OK in the "standard use" of pxa2xx_spi, because pump_transfers() is only supposed to be called between calls to pump_messages(), where cur_chip is set, and calls to giveback() or start_queue(), where cur_chip is cleared. By "standard use", I mean use of the SPI bus with Linux as the master (the pxa processor is generating the SPI clock), and normal SPI transfers where every bit received matches a bit transmitted. In this mode, it is hard to imagine how there would be FIFO overrun errors in DMA mode, because the clock will stop when the TX buffer is empty, and there should be a matching RX buffer that is filled by the DMA hardware, thus keeping the SSP receiver FIFO from filling. The only way I can imagine DMA allowing the receiver FIFO to fill, would be if silly values of burst and threshold were used, but these are set by the driver, so they should be OK. Is your application using the SSP in some unusual way that allows the RX FIFO to overrun? I am not familiar with any PXA9xx chips. What clock speed are you using. What timeout setting are you using? Are you using power management with suspend/resume? I have seen FIFO overruns in my application, but I use a heavily modified version of pxa2xx_spi.c that implements descriptor-fetch DMA, enables external clocks, and uses read-without-transmit (RWOT) mode, to collect data from an 11Mbit/sec external master. Doing these things can easily overrun the FIFO, but it only happens when I fail to keep filled the chain of DMA descriptors pointing to empty buffers (and now I have fixed that, too, so that I can read data continuously, forever). The DMA hardware itself never fails to keep up, so I don't see why you would get overruns in DMA mode. Are you sure that your transfers are actually operating in DMA mode? The driver reverts to PIO mode for any transfer that exceeds 8191bytes in length. The driver is not yet coded to break long transfers into shorter segments that are within the length that the DMA hardware can handle, so it just uses PIO mode for long transfers; this is a known deficiency that someone might fix in the future. All that said, in my modified driver, I did change the above declarations to simple declarations and later checked the validity of cur_chip before making the assignments. I don't recall exactly which circumstance resulted in execution of pump_transfers() without a valid cur_chip, but it happened with my very non-standard application. I my case, I elected to silently return, if cur_chip was not defined, but one could issue a message, of course. I would bet that the fundamental cause of your problem is the FIFO overrun. With some more information about your setup and use of pxa2xx_spi, I might be able to provide more clues. I would hesitate to simply patch the above assignments without first understanding why pump_transfers() is being executed out of sequence. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/ http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <49D8E537.1010307-/d+BM93fTQY@public.gmane.org>]
* Re: kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable [not found] ` <49D8E537.1010307-/d+BM93fTQY@public.gmane.org> @ 2009-04-06 2:22 ` Mok Keith [not found] ` <69f617130904051922w72810b52v576546c10c069941-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Mok Keith @ 2009-04-06 2:22 UTC (permalink / raw) To: Ned Forrester Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW Hi Ned, > Is your application using the SSP in some unusual way that allows the RX > FIFO to overrun? I am not familiar with any PXA9xx chips. What clock > speed are you using. What timeout setting are you using? Are you using > power management with suspend/resume? No, it is not allow RX FIFO overrun. Here is the settings in arch: static struct pxa2xx_spi_chip libertas_spi= { .tx_threshold = 7, .rx_threshold = 8, .cs_control = libertas_spi_cs, .dma_burst_size = 8, .timeout = 230, }; Here is the settings in driver: spi->mode = SPI_MODE_0; spi->max_speed_hz = 1000000; /* REVISIT max=50MHz */ spi->bits_per_word = 16; ret = spi_setup(spi); I set the speed to : "spi->max_speed_hz = 1000000;" only, should be very low. No power management has been enable, it just happen at the very beginning during firmware download to chip. I got a hint that if I enlarge the timeout to 1000, panic will not happen and FIFO overrun will not happen. But the chip just cannot run after firmware downloaded. (It is okay in pure PIO mode). > > I have seen FIFO overruns in my application, but I use a heavily > modified version of pxa2xx_spi.c that implements descriptor-fetch DMA, > enables external clocks, and uses read-without-transmit (RWOT) mode, to > collect data from an 11Mbit/sec external master. No modification to pxa2xx_spi.c in my case. > Are you sure that your transfers are actually operating in DMA mode? > The driver reverts to PIO mode for any transfer that exceeds 8191bytes > in length. I am sure it is in DMA mode since it hangs up during download of firmware to the chip which has length around 512 bytes only and is dma aligned. I had already add printk and confirm it. > I would bet that the fundamental cause of your problem is the FIFO > overrun. With some more information about your setup and use of > pxa2xx_spi, I might be able to provide more clues. I would hesitate to > simply patch the above assignments without first understanding why > pump_transfers() is being executed out of sequence. I agree, it is meaningless to just add a null pointer checking without knowning the execution sequence that leads to the problem. Thanks, Keith ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <69f617130904051922w72810b52v576546c10c069941-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable [not found] ` <69f617130904051922w72810b52v576546c10c069941-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-04-08 15:19 ` Ned Forrester 2009-04-11 2:23 ` [spi-devel-general] " Mok Keith 0 siblings, 1 reply; 5+ messages in thread From: Ned Forrester @ 2009-04-08 15:19 UTC (permalink / raw) To: Mok Keith Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW Sorry for the delayed response. Mok Keith wrote: > Hi Ned, > >> Is your application using the SSP in some unusual way that allows the RX >> FIFO to overrun? I am not familiar with any PXA9xx chips. What clock >> speed are you using. What timeout setting are you using? Are you using >> power management with suspend/resume? > > No, it is not allow RX FIFO overrun. > > Here is the settings in arch: > static struct pxa2xx_spi_chip libertas_spi= { > .tx_threshold = 7, > .rx_threshold = 8, > .cs_control = libertas_spi_cs, > .dma_burst_size = 8, > .timeout = 230, > }; Keep in mind that threshold is measured in "registers", of which there are 16 in the FIFO (at least on PXA2xx devices), regardless of byte-width, while burst_size is measured in bytes. So if you are doing 16 bits_per_word (below), then the threshold should be 8 with a matching burst of 16; these values each equal 1/2 of the FIFO. Also, matching the tx and rx thresholds at 8 and 8 makes more sense to me, however... In DMA mode, the burst is significant, while the thresholds are ignored and computed from burst in set_dma_burst_and_threshold(). If the requested burst is more than 1/2 the FIFO for a given bits/word, then it is reduced accordingly before computing threshold. If you give it burst of 8 at 16bits/word, it will compute a matching rx_threshold of 4, and a tx_threshold of 12 (not to be confused with the actual register values, which are one less: 3 and 11). Try setting dma_burst_size to 16. That will compute tx and rx thresholds of 8, representing half the FIFO. That is the way I normally use the driver. If there is a bug in the computation of burst and threshold, then this might change the behavior. -- You did not show your values for struct pxa2xx_spi_master. I assume you have enable_dma = 1, in this structure. > Here is the settings in driver: > spi->mode = SPI_MODE_0; > spi->max_speed_hz = 1000000; /* REVISIT max=50MHz */ > spi->bits_per_word = 16; > ret = spi_setup(spi); > > I set the speed to : "spi->max_speed_hz = 1000000;" only, should be very low. > No power management has been enable, it just happen at the very > beginning during firmware download to chip. > I got a hint that if I enlarge the timeout to 1000, panic will not > happen and FIFO overrun > will not happen. But the chip just cannot run after firmware > downloaded. (It is okay in pure PIO mode). Timeout is an important setting. It is used to clean up any trailing bytes at the end of a transfer that were not handled by DMA (due to transfer length not being divisible by burst-size, or whatever other cause). If the timeout is too short, so that the timeout occurs between words, then spurious interrupts will be fielded and ignored; because you have 16bits/word at 1MHz, = 16us, this might be happening if you have a short timeout. If the timeout is too long, you waste time at the end of any transfer with trailing bytes. The difficult issue with the timeout is that it is not specified what clock is counted within the chip to generate the timeout. That may seem strange, but the developer's manuals for the PXA255 and PXA270 say that the clock used for timing is the "peripheral clock" but never say what that clock is. On the PXA255, running at 400MHz, I carefully measured the clock (using long timeouts) to be 99.5MHz, which is run-clock/4. I have no idea what clock is used on a PXA3xx or PXA9xx. At 99.5MHz, the default timeout setting of 1000 results in a 10usec timeout, which is shorter than the time between your arriving 16-bit words. If you really use a value of 230, and *if* the PXA9xx uses a similar clock to count from, then your timeout is only 2.3usec. You probably want to use a value of at least 10,000. I have no theory about why timeout interrupts might contribute to receiver FIFO overruns in DMA mode, however. >> I would bet that the fundamental cause of your problem is the FIFO >> overrun. With some more information about your setup and use of >> pxa2xx_spi, I might be able to provide more clues. I would hesitate to >> simply patch the above assignments without first understanding why >> pump_transfers() is being executed out of sequence. > > I agree, it is meaningless to just add a null pointer checking without > knowning the execution sequence that leads to the problem. > > Thanks, > Keith > > -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/ http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [spi-devel-general] kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable 2009-04-08 15:19 ` Ned Forrester @ 2009-04-11 2:23 ` Mok Keith 0 siblings, 0 replies; 5+ messages in thread From: Mok Keith @ 2009-04-11 2:23 UTC (permalink / raw) To: Ned Forrester; +Cc: linux-arm-kernel, spi-devel-general Hi Ned, Thanks for your detailed description. I encounter no more dma fifo overflow after enlarge timeout value to 1000, but still get no response from spi device after firmware is downloaded. (I can get response using pure I/O, i.e. enable_dma=0 and dma_burst_size=0): > Try setting dma_burst_size to 16. That will compute tx and rx > thresholds of 8, representing half the FIFO. That is the way I normally > use the driver. If there is a bug in the computation of burst and > threshold, then this might change the behavior. I tried it before with different combination: dma_burst_size=16, tx_threshold=7, rx_threshold=8 dma_burst_size=16, tx_threshold=1, rx_threshold=1 dma_burst_size=16, tx_threshold=0, rx_threshold=0 dma_burst_size=8, tx_threshold=0, rx_threshold=0 dma_burst_size=, tx_threshold=1, rx_threshold=1 dma_burst_size=16, tx_threshold=8, rx_threshold=8 All gives same result. (after firmware download to spi device, no response) > You did not show your values for struct pxa2xx_spi_master. I assume you > have enable_dma = 1, in this structure. Yes I did it. > On the PXA255, running at 400MHz, I carefully > measured the clock (using long timeouts) to be 99.5MHz, which is > run-clock/4. I have no idea what clock is used on a PXA3xx or PXA9xx. In PXA9xx manual, it states that timeout equals to value/26MHz. So I enlarge it to 1000, no more dma fifo overrun now, but still cannot get spi device work under dma mode. The PXA9XX manual said that TXFIFO overruns and RXFIFO underruns are silent errors. There is no indication of the overrun or underrun condition other than missing data at the receiving end of the link. I don't know whether I fall into this trap or not. Any clue ? Keith ------------------------------------------------------------------- List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-04-11 2:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-05 3:32 kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable Mok Keith
[not found] ` <69f617130904042032o382f5084v4fe21884e2356c77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-05 17:07 ` Ned Forrester
[not found] ` <49D8E537.1010307-/d+BM93fTQY@public.gmane.org>
2009-04-06 2:22 ` Mok Keith
[not found] ` <69f617130904051922w72810b52v576546c10c069941-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-08 15:19 ` Ned Forrester
2009-04-11 2:23 ` [spi-devel-general] " Mok Keith
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.