From: Andreas Fenkart <andreas.fenkart@streamunlimited.com>
To: Bing Zhao <bzhao@marvell.com>
Cc: Andreas Fenkart <andreas.fenkart@streamunlimited.com>,
"linville@tuxdriver.com" <linville@tuxdriver.com>,
"linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>,
"daniel@zonque.org" <daniel@zonque.org>,
Yogesh Powar <yogeshp@marvell.com>,
Avinash Patil <patila@marvell.com>
Subject: Re: [PATCH 1/6] mwifiex: bug: remove NO_PKT_PRIO_TID.
Date: Wed, 3 Apr 2013 13:35:54 +0200 [thread overview]
Message-ID: <20130403113554.GA14785@blumentopf> (raw)
In-Reply-To: <477F20668A386D41ADCC57781B1F70430D9DDAB197@SC-VEXCH1.marvell.com>
Hi Bing,
On Tue, Apr 02, 2013 at 07:40:53PM -0700, Bing Zhao wrote:
>
> > Using NO_PKT_PRIO_TID and tx_pkts_queued to check for an empty state, can
> > lead to a contradictory state, resulting in an infinite loop.
> > Currently queueing and dequeuing of packets is not synchronized, and can
> > happen concurrently. While tx_pkts_queued is incremented when adding a
> > packet, max prio is set to NO_PKT when the WMM list is empty. If a packet
> > is added right after the check for empty, but before setting max prio to
> > NO_PKT, that packet is trapped and creates an infinite loop.
> > Because of the new packet, tx_pkts_queued is at least 1, indicating wmm
> > lists are not empty. Opposing that max prio is NO_PKT, which means "skip
> > this wmm queue, it has no packets". The infinite loop results, because the
> > main loop checks the wmm lists for not empty via tx_pkts_queued, but when
> > dequeing uses max_prio to see if it can skip a list. This will never end,
> > unless a new packet is added which will restore max prio to the level of
> > the trapped packet.
> > The solution here is to rely on tx_pkts_queued solely for checking wmm
> > queue to be empty, and drop the NO_PKT define. It does not address the
> > locking issue.
> >
> > Signed-off-by: Andreas Fenkart <andreas.fenkart@streamunlimited.com>
>
> With this patch (1/6) applied, I'm getting soft-lockup watchdog:
>
> BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:37]
My bad here, should be like this when patch is applied first:
@@ -919,8 +919,12 @@ mwifiex_wmm_get_highest_priolist_ptr(struct
mwifiex_adapter *adapter,
do {
priv_tmp = bssprio_node->priv;
- hqp = &priv_tmp->wmm.highest_queued_prio;
+ if (atomic_read(&priv_tmp->wmm.tx_pkts_queued) == 0)
+ goto skip_bss;
+
+ /* iterate over the WMM queues of the BSS */
+ hqp = &priv_tmp->wmm.highest_queued_prio;
for (i = atomic_read(hqp); i >= LOW_PRIO_TID; --i) {
tid_ptr = &(priv_tmp)->wmm.
@@ -980,12 +984,7 @@ mwifiex_wmm_get_highest_priolist_ptr(struct mwifiex_adapter *adapter,
} while (ptr != head);
}
- /* No packet at any TID for this priv. Mark as
such
- * to skip checking TIDs for this priv (until
pkt is
- * added).
- */
- atomic_set(hqp, NO_PKT_PRIO_TID);
-
+skip_bss:
/* Get next bss priority node */
bssprio_node = list_first_entry(&bssprio_node->list,
struct mwifiex_bss_prio_node,
That said, yes I developed the pathset the other way round. First
cleaned up, until I knew how to fix the bug best. Then pulled the fix
in front of the cleanup patches and -- mea culpa -- didn't test the
patches individually. Sorry again.
Also found issue here, which could be a problem without patch 6/6:
--- a/drivers/net/wireless/mwifiex/wmm.c
+++ b/drivers/net/wireless/mwifiex/wmm.c
@@ -688,13 +688,13 @@ mwifiex_wmm_add_buf_txqueue(struct mwifiex_private *priv,
ra_list->total_pkts_size += skb->len;
ra_list->pkt_count++;
- atomic_inc(&priv->wmm.tx_pkts_queued);
-
if (atomic_read(&priv->wmm.highest_queued_prio) <
tos_to_tid_inv[tid_down])
atomic_set(&priv->wmm.highest_queued_prio,
tos_to_tid_inv[tid_down]);
+ atomic_inc(&priv->wmm.tx_pkts_queued);
+
How should I proceed? Can I reorder patches to match my development
cycle, which is? 2-5;1;6 or more verbosely cleanup first followed
by bug fix and proper locking last
Or should keep the order as is, but fix patch 1, and propagate changes
through patch 2 till 6?
rgds,
Andi
next prev parent reply other threads:[~2013-04-03 11:36 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-19 9:52 mwifiex: infinite loop in mwifiex_main_process Andreas Fenkart
2013-03-19 22:37 ` Bing Zhao
2013-04-02 0:05 ` Andreas Fenkart
2013-04-02 0:08 ` [PATCH 1/6] mwifiex: bug: remove NO_PKT_PRIO_TID Andreas Fenkart
2013-04-02 0:08 ` [PATCH 2/6] mwifiex: bug: wrong list in list_empty check Andreas Fenkart
2013-04-02 0:08 ` [PATCH 3/6] mwifiex: remove unused tid_tbl_lock from mwifiex_tid_tbl Andreas Fenkart
2013-04-02 0:08 ` [PATCH 4/6] mwifiex: replace ra_list_curr by list rotation Andreas Fenkart
2013-04-02 0:08 ` [PATCH 5/6] mwifiex: rework round robin scheduling of bss nodes Andreas Fenkart
2013-04-02 0:08 ` [PATCH 6/6] mwifiex: hold proper locks when accessing ra_list / bss_prio lists Andreas Fenkart
2013-04-03 2:40 ` [PATCH 1/6] mwifiex: bug: remove NO_PKT_PRIO_TID Bing Zhao
2013-04-03 11:35 ` Andreas Fenkart [this message]
2013-04-03 18:37 ` Bing Zhao
2013-04-04 20:57 ` Andreas Fenkart
2013-04-04 21:01 ` [PATCH 1/4] mwifiex: bug: wrong list in list_empty check Andreas Fenkart
2013-04-04 21:01 ` [PATCH 2/4] mwifiex: remove unused tid_tbl_lock from mwifiex_tid_tbl Andreas Fenkart
2013-04-04 22:33 ` Bing Zhao
2013-04-04 21:01 ` [PATCH 3/4] mwifiex: bug: remove NO_PKT_PRIO_TID Andreas Fenkart
2013-04-04 22:34 ` Bing Zhao
2013-04-04 21:01 ` [PATCH 4/4] mwifiex: bug: hold proper locks when accessing ra_list / bss_prio lists Andreas Fenkart
2013-04-04 22:38 ` Bing Zhao
2013-04-04 22:29 ` [PATCH 1/4] mwifiex: bug: wrong list in list_empty check Bing Zhao
2013-04-04 21:08 ` [PATCH 1/2] mwifiex: replace ra_list_curr by list rotation Andreas Fenkart
2013-04-04 21:08 ` [PATCH 2/2] mwifiex: rework round robin scheduling of bss nodes Andreas Fenkart
2013-04-04 22:56 ` [PATCH 1/6] mwifiex: bug: remove NO_PKT_PRIO_TID Bing Zhao
2013-04-05 8:27 ` Andreas Fenkart
2013-04-08 18:19 ` Bing Zhao
2013-04-11 11:51 ` [PATCH v3 0/2] wmm queues handling simplificatons Andreas Fenkart
2013-04-11 11:51 ` [PATCH 1/2] mwifiex: replace ra_list_curr by list rotation Andreas Fenkart
2013-04-11 18:42 ` Bing Zhao
2013-04-11 11:51 ` [PATCH 2/2] mwifiex: rework round robin scheduling of bss nodes Andreas Fenkart
2013-04-11 18:43 ` Bing Zhao
2013-04-23 18:33 ` [PATCH v3 0/2] wmm queues handling simplificatons Bing Zhao
2013-04-23 18:48 ` John W. Linville
2013-04-23 18:51 ` Bing Zhao
2013-04-02 18:16 ` mwifiex: infinite loop in mwifiex_main_process Bing Zhao
2013-04-02 19:35 ` Andreas Fenkart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130403113554.GA14785@blumentopf \
--to=andreas.fenkart@streamunlimited.com \
--cc=bzhao@marvell.com \
--cc=daniel@zonque.org \
--cc=linux-wireless@vger.kernel.org \
--cc=linville@tuxdriver.com \
--cc=patila@marvell.com \
--cc=yogeshp@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).