From: Quintin Pitts <geek4linux@gmail.com>
To: John Linville <linville@tuxdriver.com>
Cc: linux-wireless <linux-wireless@vger.kernel.org>,
Christian Lamparter <chunkeey@web.de>
Subject: [RFC] p54pci: skb_over_panic, soft lockup, stall under flood
Date: Sun, 11 Oct 2009 09:28:55 -0500 [thread overview]
Message-ID: <4AD1EBA7.904@gmail.com> (raw)
Hi,
Sorry for my lack of experience in all aspects - first time
submitting!!!
In trying to get p54pci driver to be stable on my platform and hardware
- here is a generic patch that seems to accomplish that. Since the
ViewSonic V210 uses the IT8152 pci bridge - some attention was needed to
get dma related allocation in the first physical 64M. I have verified
that the dma related allocation is in the first 64M and dmabounce is not
being used - just for those wondering if that was part of the problems.
Platform: ViewSonic V210 arm pxa255
Kernel 2.6.30.5 eabi
Wireless Drivers from compat-wireless-2009-09-30 and what I applied the below patch to.
Firmware used: FW rev 2.13.12.0 - Softmac protocol 5.9
Wireless card: GemTek WL-850FJB minipci card.
phy0: p54 detected a LM86 firmware
p54: rx_mtu reduced from 3240 to 2376
phy0: FW rev 2.13.12.0 - Softmac protocol 5.9
phy0: cryptographic accelerator WEP:YES, TKIP:YES, CCMP:YES
phy0: hwaddr 00:90:4b:c1:06:bc, MAC:isl3890 RF:Frisbee
phy0: Selected rate control algorithm 'minstrel'
device pci info (lspci -v):
00:06.0 Network controller: Intersil Corporation ISL3886 [Prism Javelin/Prism Xbow] (rev 01)
Subsystem: Intersil Corporation Device 0000
Flags: bus master, medium devsel, latency 56, IRQ 217
Memory at 11000000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [dc] Power Management version 1
Kernel driver in use: p54pci
Kernel modules: prism54, p54pci
Reasons for patch was to solve the below problems.
1. p54p_check_rx_ring - skb_over_panic: Under a ping flood or just left
running for a bit would panic with a skb_over_panic. Investigation
showed for some odd reason the device/firmware instead of writing a
length in the data rx_ring (desc->len) had instead written the whole dma
address (host->host_addr) into location of the len/flag (host->len and
host->flags) spot and the same dma address that was in the ring. Added
the following condition in p54p_check_rx_ring to trap that condition and
trim the skb reset the len and flags only. By the way - I used haret to
see if it I could prove it happening under wince - located the dma
memory that was being used for rings - and also happening under windows
ce with the len/flag being set to the same as the host dma. Scanning
the ring at 1000 times per second (I think) In a flood or iperf. Would
see an occasional len/flag location get set to the same host address in
that ring - may only happen a few times every minute. Under normal
operation maybe a few times a day.
if(unlikely(len == (desc->host_addr & 0xffff)
&& (desc->flags == ((desc->host_addr & 0xffff0000) >> 16))) )
2. p54p_refill_rx_ring - eventual stall: Has the potential in very busy
(flood) to over run the last rx data processed ring index corrupting the
next rings - causing some havoc of getting some 13 indexes difference
between priv->rx_idx_data and ring_control host_idx on a 8 index ring.
This appears to eventually fill up the TX queue - returning a -ENOSPC in
p54_assign_address (txrx.c) because of ring corruption missing some TX
releases. Changed p54p_refill_rx_ring to take a index parm and use that
as the last processed ring index - instead of the using the ring_control
device_idx.
3. p54p_check_rx_ring - eventual stall: On ping flood - Control
P54_CONTROL_TYPE_TXDONE rx packets that are skb reused - seem to cause a
problem on the next time around with the same index. Even though the
length was not the same was still being seen as a
P54_CONTROL_TYPE_TXDONE packet again. Side affects varied - one being
the main end result same as the #2 listed above TX not being released
and returning a -ENOSPC in p54_assign_address (txrx.c) - stall.
Problem went away if did not reuse the skb but unmap it and
dev_kfree_skb if return was zero from p54_rx. Still unclear why this
would be - but had no problems with patch afterwards.
4. p54p_check_rx_ring - soft lockup in p54p_refill_rx_ring. This only
occurred when 5 minute iperf on a fast wireless network - Or 1 to 2 days
of unit left up. Discovered that the device had lost it's mind and set
the ring_control->device_index[ring_index] exactly 0xFF or 255 less than
it should be (ram issue??) don't know. Happens on three of my devices
the same way. If left to continue - the p54p_refill_rx_ring while loop
goes negative and soft lockup. Trap and return if device_idx - (*index)
greater than ring_index. Error is only tripped the one time - meaning
the next time p54p_check_rx_ring is called the device index is back to
what it should have been.
5. p54p_open - 1 out of 10 boots will produce device does not
respond! or Cannot boot firmware!. Minor - but frustrating all the
same.
Always rmmod p54pci and then modprobe p54pci works. It seems if get a
error on p54p_open trying again works. And if p54_read_eeprom fails -
trying again works.
The below was applied to compat-wireless-2009-09-30:
Thanks,
Quintin.
Signed-off-by: Quintin Pitts <geek4linux@gmail.com>
---
--- a/drivers/net/wireless/p54/p54pci.c 2009-09-29 23:13:58.000000000 -0500
+++ b/drivers/net/wireless/p54/p54pci.c 2009-10-09 08:15:58.000000000 -0500
@@ -131,7 +131,7 @@ static int p54p_upload_firmware(struct i
static void p54p_refill_rx_ring(struct ieee80211_hw *dev,
int ring_index, struct p54p_desc *ring, u32 ring_limit,
- struct sk_buff **rx_buf)
+ struct sk_buff **rx_buf, u32 index)
{
struct p54p_priv *priv = dev->priv;
struct p54p_ring_control *ring_control = priv->ring_control;
@@ -139,7 +139,11 @@ static void p54p_refill_rx_ring(struct i
idx = le32_to_cpu(ring_control->host_idx[ring_index]);
limit = idx;
- limit -= le32_to_cpu(ring_control->device_idx[ring_index]);
+/*
+ * Use last processed index instead of device_idx
+ * so we don't corrupt our ring
+ */
+ limit -= le32_to_cpu(index);
limit = ring_limit - limit;
i = idx % ring_limit;
@@ -181,9 +185,26 @@ static void p54p_check_rx_ring(struct ie
struct p54p_ring_control *ring_control = priv->ring_control;
struct p54p_desc *desc;
u32 idx, i;
+ int ret;
+ idx = le32_to_cpu(ring_control->device_idx[ring_index]);
i = (*index) % ring_limit;
- (*index) = idx = le32_to_cpu(ring_control->device_idx[ring_index]);
+ if(unlikely((idx - (*index)) > ring_limit ||
+ (le32_to_cpu(ring_control->host_idx[ring_index]) - (*index)) > ring_limit)) {
+ printk(KERN_DEBUG "%s: devidx jumped *index=%d devidx=%d hostidx=%d ring_limit=%d\n",
+ __func__,(*index),idx,ring_control->host_idx[ring_index],ring_limit);
+/*
+ * Do nothing things are really wrong - device index has jumped got corrupted
+ * - wait for it to stabilize
+ * So far device idx exactly 0xFF (255) bytes less than what it should be.
+ * only seen to happen on very fast wireless and packet floods and/or iperf test
+ * In testing this error only encountered once - so next time around the
+ * device index is correct.
+ * if to continue would soft lockup/hang in while loop in p54p_refill_rx_ring
+ */
+ return;
+ }
+ (*index) = idx;
idx %= ring_limit;
while (i != idx) {
u16 len;
@@ -197,25 +218,40 @@ static void p54p_check_rx_ring(struct ie
i %= ring_limit;
continue;
}
+ if(unlikely(len == (desc->host_addr & 0xffff)
+ && (desc->flags == ((desc->host_addr & 0xffff0000) >> 16))) ) {
+/* device has put device dma in desc len/flag location - will crash in skb_put
+ * desc->len and desc->flags contain the host_addr -
+ * trap before skb_put and discard
+ * ViewSonic V210 and wireless card GENTEK WL-850 , IT8152 PCI bridge
+ * happens occasionally - no clear reason or frequency.
+ *
+ */
+ printk(KERN_DEBUG "%s: rx_ring len/flags has address - skipping!\n",__func__);
+ skb_trim(skb,0);
+ desc->len = cpu_to_le16(priv->common.rx_mtu + 32);
+ desc->flags=0;
+
+ } else {
+
skb_put(skb, len);
- if (p54_rx(dev, skb)) {
- pci_unmap_single(priv->pdev,
+ ret=p54_rx(dev,skb);
+ pci_unmap_single(priv->pdev,
le32_to_cpu(desc->host_addr),
priv->common.rx_mtu + 32,
PCI_DMA_FROMDEVICE);
- rx_buf[i] = NULL;
- desc->host_addr = 0;
- } else {
- skb_trim(skb, 0);
- desc->len = cpu_to_le16(priv->common.rx_mtu + 32);
- }
+ if(ret==0)
+ dev_kfree_skb(skb);
+ rx_buf[i] = NULL;
+ desc->host_addr = 0;
+ } /* end of desc->len skb corrupt crash test */
i++;
i %= ring_limit;
}
- p54p_refill_rx_ring(dev, ring_index, ring, ring_limit, rx_buf);
+ p54p_refill_rx_ring(dev, ring_index, ring, ring_limit, rx_buf, (*index));
}
/* caller must hold priv->lock */
@@ -428,10 +464,10 @@ static int p54p_open(struct ieee80211_hw
priv->rx_idx_mgmt = priv->tx_idx_mgmt = 0;
p54p_refill_rx_ring(dev, 0, priv->ring_control->rx_data,
- ARRAY_SIZE(priv->ring_control->rx_data), priv->rx_buf_data);
+ ARRAY_SIZE(priv->ring_control->rx_data), priv->rx_buf_data, 0);
p54p_refill_rx_ring(dev, 2, priv->ring_control->rx_mgmt,
- ARRAY_SIZE(priv->ring_control->rx_mgmt), priv->rx_buf_mgmt);
+ ARRAY_SIZE(priv->ring_control->rx_mgmt), priv->rx_buf_mgmt, 0);
P54P_WRITE(ring_control_base, cpu_to_le32(priv->ring_control_dma));
P54P_READ(ring_control_base);
@@ -550,9 +586,26 @@ static int __devinit p54p_probe(struct p
}
err = p54p_open(dev);
- if (err)
- goto err_free_common;
+ if (err) {
+
+ printk(KERN_DEBUG "%s: p54p_open failed - trying again\n",__func__);
+ msleep(10);
+ err = p54p_open(dev);
+ if (err)
+ goto err_free_common;
+ }
err = p54_read_eeprom(dev);
+ if (err)
+ {
+ printk(KERN_DEBUG "%s: p54_read_eeprom failed - trying again\n",__func__);
+ p54p_stop(dev);
+ err = p54p_open(dev);
+ if (err)
+ goto err_free_common;
+ msleep(10);
+ err = p54_read_eeprom(dev);
+
+ }
p54p_stop(dev);
if (err)
goto err_free_common;
next reply other threads:[~2009-10-11 14:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-11 14:28 Quintin Pitts [this message]
2009-10-11 15:31 ` [RFC] p54pci: skb_over_panic, soft lockup, stall under flood Larry Finger
2009-10-11 19:41 ` Christian Lamparter
2009-10-12 0:26 ` Quintin Pitts
2009-10-12 0:09 ` Quintin Pitts
2009-10-12 8:57 ` Christian Lamparter
2009-10-12 13:34 ` Quintin Pitts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AD1EBA7.904@gmail.com \
--to=geek4linux@gmail.com \
--cc=chunkeey@web.de \
--cc=linux-wireless@vger.kernel.org \
--cc=linville@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).