From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Yibo Zhao <yiboz@codeaurora.org>
Cc: ath10k@lists.infradead.org, linux-wireless@vger.kernel.org,
linux-wireless-owner@vger.kernel.org
Subject: Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree
Date: Fri, 20 Sep 2019 11:15:25 +0200 [thread overview]
Message-ID: <8736gre3bm.fsf@toke.dk> (raw)
In-Reply-To: <1b4ab006d9b5c88035845aaac193ef48@codeaurora.org>
Yibo Zhao <yiboz@codeaurora.org> writes:
> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>
>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>
>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>
>>>>>>> In a loop txqs dequeue scenario, if the first txq in the rbtree
>>>>>>> gets
>>>>>>> removed from rbtree immediately in the ieee80211_return_txq(), the
>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>> schedule_pos
>>>>>>> not leading to the second txq in the rbtree. Thus, defering the
>>>>>>> removal right before the end of this schedule round.
>>>>>>>
>>>>>>> Co-developed-by: Yibo Zhao <yiboz@codeaurora.org>
>>>>>>> Signed-off-by: Yibo Zhao <yiboz@codeaurora.org>
>>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>>>>
>>>>>> I didn't write this patch, so please don't use my sign-off. I'll
>>>>>> add
>>>>>> ack or review tags as appropriate in reply; but a few comments
>>>>>> first:
>>>>>>
>>>>>>> ---
>>>>>>> include/net/mac80211.h | 16 ++++++++++--
>>>>>>> net/mac80211/ieee80211_i.h | 3 +++
>>>>>>> net/mac80211/main.c | 6 +++++
>>>>>>> net/mac80211/tx.c | 63
>>>>>>> +++++++++++++++++++++++++++++++++++++++++++---
>>>>>>> 4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>> --- a/include/net/mac80211.h
>>>>>>> +++ b/include/net/mac80211.h
>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>
>>>>>>> #define IEEE80211_MAX_TX_RETRY 31
>>>>>>>
>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>> +
>>>>>>> static inline void ieee80211_rate_set_vht(struct
>>>>>>> ieee80211_tx_rate
>>>>>>> *rate,
>>>>>>> u8 mcs, u8 nss)
>>>>>>> {
>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct
>>>>>>> ieee80211_hw *hw,
>>>>>>> * @ac: AC number to return packets from.
>>>>>>> *
>>>>>>> * Should only be called between calls to
>>>>>>> ieee80211_txq_schedule_start()
>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, it will
>>>>>>> be
>>>>>>> added
>>>>>>> + * to a remove list and get removed later.
>>>>>>> * Returns the next txq if successful, %NULL if no queue is
>>>>>>> eligible.
>>>>>>> If a txq
>>>>>>> * is returned, it should be returned with ieee80211_return_txq()
>>>>>>> after the
>>>>>>> * driver has finished scheduling it.
>>>>>>> @@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
>>>>>>> ieee80211_hw *hw, u8 ac)
>>>>>>> * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>>> * @ac: AC number to acquire locks for
>>>>>>> *
>>>>>>> - * Release locks previously acquired by
>>>>>>> ieee80211_txq_schedule_end().
>>>>>>> + * Release locks previously acquired by
>>>>>>> ieee80211_txq_schedule_end().
>>>>>>> Check
>>>>>>> + * and remove the empty txq from rb-tree.
>>>>>>> */
>>>>>>> void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>>>>>>> __releases(txq_lock);
>>>>>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
>>>>>>> ieee80211_hw
>>>>>>> *hw, struct ieee80211_txq *txq)
>>>>>>> __acquires(txq_lock) __releases(txq_lock);
>>>>>>>
>>>>>>> /**
>>>>>>> + * ieee80211_txqs_check - Check txqs waiting for removal
>>>>>>> + *
>>>>>>> + * @tmr: pointer as obtained from local
>>>>>>> + *
>>>>>>> + */
>>>>>>> +void ieee80211_txqs_check(struct timer_list *tmr);
>>>>>>> +
>>>>>>> +/**
>>>>>>> * ieee80211_txq_may_transmit - check whether TXQ is allowed to
>>>>>>> transmit
>>>>>>> *
>>>>>>> * This function is used to check whether given txq is allowed to
>>>>>>> transmit by
>>>>>>> diff --git a/net/mac80211/ieee80211_i.h
>>>>>>> b/net/mac80211/ieee80211_i.h
>>>>>>> index a4556f9..49aa143e 100644
>>>>>>> --- a/net/mac80211/ieee80211_i.h
>>>>>>> +++ b/net/mac80211/ieee80211_i.h
>>>>>>> @@ -847,6 +847,7 @@ struct txq_info {
>>>>>>> struct codel_stats cstats;
>>>>>>> struct sk_buff_head frags;
>>>>>>> struct rb_node schedule_order;
>>>>>>> + struct list_head candidate;
>>>>>>> unsigned long flags;
>>>>>>>
>>>>>>> /* keep last! */
>>>>>>> @@ -1145,6 +1146,8 @@ struct ieee80211_local {
>>>>>>> u64 airtime_v_t[IEEE80211_NUM_ACS];
>>>>>>> u64 airtime_weight_sum[IEEE80211_NUM_ACS];
>>>>>>>
>>>>>>> + struct list_head remove_list[IEEE80211_NUM_ACS];
>>>>>>> + struct timer_list remove_timer;
>>>>>>> u16 airtime_flags;
>>>>>>>
>>>>>>> const struct ieee80211_ops *ops;
>>>>>>> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
>>>>>>> index e9ffa8e..78fe24a 100644
>>>>>>> --- a/net/mac80211/main.c
>>>>>>> +++ b/net/mac80211/main.c
>>>>>>> @@ -667,10 +667,15 @@ struct ieee80211_hw
>>>>>>> *ieee80211_alloc_hw_nm(size_t priv_data_len,
>>>>>>>
>>>>>>> for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>>>>>>> local->active_txqs[i] = RB_ROOT_CACHED;
>>>>>>> + INIT_LIST_HEAD(&local->remove_list[i]);
>>>>>>> spin_lock_init(&local->active_txq_lock[i]);
>>>>>>> }
>>>>>>> local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
>>>>>>>
>>>>>>> + timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
>>>>>>> + mod_timer(&local->remove_timer,
>>>>>>> + jiffies +
>>>>>>> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
>>>>>>> +
>>>>>>> INIT_LIST_HEAD(&local->chanctx_list);
>>>>>>> mutex_init(&local->chanctx_mtx);
>>>>>>>
>>>>>>> @@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
>>>>>>> ieee80211_hw
>>>>>>> *hw)
>>>>>>> tasklet_kill(&local->tx_pending_tasklet);
>>>>>>> tasklet_kill(&local->tasklet);
>>>>>>>
>>>>>>> + del_timer_sync(&local->remove_timer);
>>>>>>> #ifdef CONFIG_INET
>>>>>>> unregister_inetaddr_notifier(&local->ifa_notifier);
>>>>>>> #endif
>>>>>>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>>>>>>> index d00baaa..42ca010 100644
>>>>>>> --- a/net/mac80211/tx.c
>>>>>>> +++ b/net/mac80211/tx.c
>>>>>>> @@ -1450,6 +1450,7 @@ void ieee80211_txq_init(struct
>>>>>>> ieee80211_sub_if_data *sdata,
>>>>>>> codel_stats_init(&txqi->cstats);
>>>>>>> __skb_queue_head_init(&txqi->frags);
>>>>>>> RB_CLEAR_NODE(&txqi->schedule_order);
>>>>>>> + INIT_LIST_HEAD(&txqi->candidate);
>>>>>>>
>>>>>>> txqi->txq.vif = &sdata->vif;
>>>>>>>
>>>>>>> @@ -3724,6 +3725,9 @@ void ieee80211_schedule_txq(struct
>>>>>>> ieee80211_hw
>>>>>>> *hw,
>>>>>>>
>>>>>>> spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>
>>>>>>> + if (!list_empty(&txqi->candidate))
>>>>>>> + list_del_init(&txqi->candidate);
>>>>>>> +
>>>>>>> if (!RB_EMPTY_NODE(&txqi->schedule_order))
>>>>>>> goto out;
>>>>>>>
>>>>>>> @@ -3783,6 +3787,20 @@ static void
>>>>>>> __ieee80211_unschedule_txq(struct
>>>>>>> ieee80211_hw *hw,
>>>>>>> RB_CLEAR_NODE(&txqi->schedule_order);
>>>>>>> }
>>>>>>>
>>>>>>> +void ieee80211_remove_txq(struct ieee80211_hw *hw,
>>>>>>> + struct ieee80211_txq *txq)
>>>>>>> +{
>>>>>>> + struct ieee80211_local *local = hw_to_local(hw);
>>>>>>> + struct txq_info *txqi = to_txq_info(txq);
>>>>>>> +
>>>>>>> + lockdep_assert_held(&local->active_txq_lock[txq->ac]);
>>>>>>> +
>>>>>>> + if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
>>>>>>> + __ieee80211_unschedule_txq(hw, txq);
>>>>>>> + list_del_init(&txqi->candidate);
>>>>>>> + }
>>>>>>> +}
>>>>>>> +
>>>>>>> void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
>>>>>>> struct ieee80211_txq *txq)
>>>>>>> __acquires(txq_lock) __releases(txq_lock)
>>>>>>> @@ -3790,7 +3808,7 @@ void ieee80211_unschedule_txq(struct
>>>>>>> ieee80211_hw *hw,
>>>>>>> struct ieee80211_local *local = hw_to_local(hw);
>>>>>>>
>>>>>>> spin_lock_bh(&local->active_txq_lock[txq->ac]);
>>>>>>> - __ieee80211_unschedule_txq(hw, txq);
>>>>>>> + ieee80211_remove_txq(hw, txq);
>>>>>>> spin_unlock_bh(&local->active_txq_lock[txq->ac]);
>>>>>>> }
>>>>>>>
>>>>>>> @@ -3803,11 +3821,48 @@ void ieee80211_return_txq(struct
>>>>>>> ieee80211_hw
>>>>>>> *hw,
>>>>>>> lockdep_assert_held(&local->active_txq_lock[txq->ac]);
>>>>>>>
>>>>>>> if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
>>>>>>> - (skb_queue_empty(&txqi->frags) &&
>>>>>>> !txqi->tin.backlog_packets))
>>>>>>> - __ieee80211_unschedule_txq(hw, txq);
>>>>>>> + !txq_has_queue(&txqi->txq) &&
>>>>>>> + list_empty(&txqi->candidate))
>>>>>>> + list_add_tail(&txqi->candidate, &local->remove_list[txq->ac]);
>>>>>>> +
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(ieee80211_return_txq);
>>>>>>>
>>>>>>> +void __ieee80211_check_txqs(struct ieee80211_local *local, int
>>>>>>> ac)
>>>>>>> +{
>>>>>>> + struct txq_info *iter, *tmp;
>>>>>>> + struct sta_info *sta;
>>>>>>> +
>>>>>>> + lockdep_assert_held(&local->active_txq_lock[ac]);
>>>>>>> +
>>>>>>> + list_for_each_entry_safe(iter, tmp, &local->remove_list[ac],
>>>>>>> + candidate) {
>>>>>>> + sta = container_of(iter->txq.sta, struct sta_info, sta);
>>>>>>> +
>>>>>>> + if (txq_has_queue(&iter->txq))
>>>>>>> + list_del_init(&iter->candidate);
>>>>>>> + else
>>>>>>> + ieee80211_remove_txq(&local->hw, &iter->txq);
>>>>>>> + }
>>>>>>> +}
>>>>>>> +
>>>>>>> +void ieee80211_txqs_check(struct timer_list *t)
>>>>>>> +{
>>>>>>> + struct ieee80211_local *local = from_timer(local, t,
>>>>>>> remove_timer);
>>>>>>> + struct txq_info *iter, *tmp;
>>>>>>> + struct sta_info *sta;
>>>>>>> + int ac;
>>>>>>> +
>>>>>>> + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
>>>>>>> + spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>> + __ieee80211_check_txqs(local, ac);
>>>>>>> + spin_unlock_bh(&local->active_txq_lock[ac]);
>>>>>>> + }
>>>>>>> +
>>>>>>> + mod_timer(&local->remove_timer,
>>>>>>> + jiffies +
>>>>>>> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
>>>>>>> +}
>>>>>>
>>>>>> I'll ask the same as I did last time (where you told me to hold off
>>>>>> until this round):
>>>>>>
>>>>>> Why do you need the timer and the periodic check? If TXQs are added
>>>>>> to
>>>>>> the remove list during the scheduling run, and
>>>>>> __ieee80211_check_txqs()
>>>>>> is run from schedule_end(), isn't that sufficient to clear the
>>>>>> list?
>>>>> Is it possible that a txq is not added to the remove list but then
>>>>> packets in it are dropped by fq_codel algo? Like the station
>>>>> disconnects
>>>>> without any notification.
>>>>
>>>> Well as long as all the other cleanup paths call directly into
>>>> __unschedule_txq(), that should remove stations from the scheduler
>>>> when
>>>> they disconnect etc.
>>> Yes, the disconnect scenario is a bad example. My concern is, say, we
>>> have 10 stations and only one of them is assigned a very small weight
>>> compared with that of others. Suppose, after its chance of Tx, it is
>>> most likely to be placed in the rightmost(still has some packets in
>>> the
>>> txq) and no more incoming data for it. The remaining packets in txq
>>> will
>>> be dropped due to timeout algo in codel(correct me if I am wrong) but
>>> this empty txq will stay on the rbtree until other txqs get drained or
>>> global vt catch up with its vt. The staying time could be long if
>>> weight
>>> is extremely small. Then do we need timer to check or any other better
>>> solution?
>>
>> Ah, I see what you mean. No, I don't think this will be a problem; the
>> scenario you're describing would play out like this:
>>
>> 1. Station ends transmitting, still has a single packet queued, gets
>> moved to the end of the rbtree (and stays there for a while).
>>
>> 2. When we finally get to the point where this station gets another
>> chance to transmit, the CoDel drop timer triggers and the last
>> packet
>> is dropped[0]. This means that the queue will just be empty
>> (and ieee80211_tx_dequeue() will return NULL).
>>
>> 3. Because the queue is empty, ieee80211_return_txq() will not put it
>> back on the rbtree.
>>
>> Crucially, in 2. the CoDel algorithm doesn't kick in until the point of
>> packet dequeue. But even if an empty queue stays on the rbtree for a
>> while, there is no harm in that: eventually it will get its turn, it
>> will turn out to be empty, and just be skipped over.
> Then that will be fine. Thanks for the explanation of the dropping part
> in CoDel algorithm.
Yup, think so. And you're welcome :)
>> The issue we need to be concerned about is the opposite: If we have a
>> queue that *does* have packets queued, but which is *not* scheduled for
>> transmission, that will stall TX.
> Is it by design since its vt is more than global vt, right? The lattency
> may somehow get impacted though.
Well, it should still stay on the rbtree as long as it has packets
queued. We don't have a check anywhere that reschedules TXQs whose v_t
drops below global v_t...
>> [0] CoDel in most cases only drops a single packet at a time, so it
>> will
>> not clear out an entire queue with multiple packets in one go. But you
>> are right that it could conceivably drop the last packet in a queue.
>>
>>>> We only need to defer removal inside a single "scheduling round"
>>>> (i.e.,
>>>> between a pair of ieee80211_txq_schedule_start/end. So if we just
>>>> walk
>>>> the remove list in schedule_end() we should be enough, no?
>>>>
>>>> Hmm, or maybe a simpler way to fix the original issue is just to have
>>>> unschedule_txq() update the schedule_pos() pointer?
>>>>
>>>> I.e., unschedule_txq checks if the txq being removed is currently
>>>> being
>>>> pointed to by schedule_pos[ac], and if it is, it updates schedule_pos
>>>> to
>>>> be the rb_next of the current value?
>>> Actually, if schedule_pos is updated to rb_next of the current value,
>>> then in the next_txq() where we are going to use rb_next again and
>>> finally pick the next node of the node we really want. Is it fine to
>>> update schedule_pos to NULL?
>>
>> Hmm, yeah, good point.
>>
>> If we do end up setting schedule_pos to NULL in the middle of a
>> scheduling round, that will make next_txq() "start over", and do
>> another
>> loop through the whole thing. I guess we may be able hit a case where
>> things can oscillate back and forth between addition and removal
>> resulting in an infinite loop? Not sure, but at least I can't seem to
>> convince myself that this can't happen.
>
> As the loop of next_txq under lock protection as below,
>
> txq_schedule_start();
> while(txq=next_txq()){
> ...
> return_txq(txq);
> }
> txq_schedule_end();
>
> I do not see any chance of addition, no?
As you noted in your other email, Felix reduced the locking. And yeah,
we need to rebase this series to also incorporate that. I figure I can
send an updated version of the first patch in the series once we've
worked out the remaining issues with your follow-up patches.
> In ath10k, we will usually push packets of first txq as many as we can
> until it is drained and then move to the next one. So if a txq gets
> removed in the return_txq, it should always be the leftmost. And
> during this period, neither vt of any station or global vt can be
> updated due to lock protection.
>
>>
>> But in that case, we could fix it by just conditionally assigning
>> either
>> rb_next or rb_prev to the schedule_pos in unschedule_txq()? I.e.,
>> something like:
>>
>> local->schedule_pos[ac] = rb_next(node) ?: rb_prev(node);
> I am not sure I am getting your point. Still in next_txq,
> schedule_pos[ac] will lead us to the next node of the one we want.
The logic in next_txq is different when schedule_pos[ac] is NULL, vs
when rb_next(schedule_pos[ac]) is NULL. The former restarts a new
scheduling round, while the latter ends the current round.
-Toke
next prev parent reply other threads:[~2019-09-20 9:15 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-16 13:09 [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler Yibo Zhao
2019-09-16 13:09 ` [PATCH 2/4] mac80211: defer txqs removal from rbtree Yibo Zhao
2019-09-17 21:10 ` Toke Høiland-Jørgensen
2019-09-18 10:27 ` Yibo Zhao
2019-09-18 11:23 ` Toke Høiland-Jørgensen
2019-09-19 9:56 ` Yibo Zhao
2019-09-19 10:37 ` Toke Høiland-Jørgensen
2019-09-20 8:29 ` Yibo Zhao
2019-09-20 9:15 ` Toke Høiland-Jørgensen [this message]
2019-09-21 10:49 ` Yibo Zhao
2019-09-21 11:27 ` Toke Høiland-Jørgensen
2019-09-21 11:53 ` Yibo Zhao
2019-09-21 12:22 ` Yibo Zhao
2019-09-21 13:02 ` Toke Høiland-Jørgensen
2019-09-21 13:24 ` Yibo Zhao
2019-09-21 14:00 ` Toke Høiland-Jørgensen
2019-09-22 5:19 ` Yibo Zhao
2019-09-23 10:47 ` Toke Høiland-Jørgensen
2019-09-23 11:42 ` Kalle Valo
2019-09-23 16:39 ` Toke Høiland-Jørgensen
2019-09-24 5:27 ` Kalle Valo
2019-09-24 7:23 ` Toke Høiland-Jørgensen
2019-09-24 2:45 ` Yibo Zhao
2019-09-24 7:26 ` Toke Høiland-Jørgensen
2019-09-24 8:31 ` Yibo Zhao
2019-09-24 8:44 ` Toke Høiland-Jørgensen
2019-09-16 13:09 ` [PATCH 3/4] mac80211: fix low throughput in push pull mode Yibo Zhao
2019-09-16 15:27 ` Johannes Berg
2019-09-17 6:36 ` Yibo Zhao
2019-09-17 6:55 ` Johannes Berg
2019-09-17 21:12 ` Toke Høiland-Jørgensen
2019-09-18 10:02 ` Yibo Zhao
2019-09-18 10:16 ` Toke Høiland-Jørgensen
2019-09-18 10:18 ` Yibo Zhao
2019-09-16 13:09 ` [PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together Yibo Zhao
2019-09-17 21:24 ` Toke Høiland-Jørgensen
2019-09-18 10:16 ` Yibo Zhao
2019-09-16 14:51 ` [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler Toke Høiland-Jørgensen
2019-09-17 21:31 ` Toke Høiland-Jørgensen
2019-09-20 8:37 ` Yibo Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8736gre3bm.fsf@toke.dk \
--to=toke@redhat.com \
--cc=ath10k@lists.infradead.org \
--cc=linux-wireless-owner@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=yiboz@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).