From: Yibo Zhao <yiboz@codeaurora.org>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: ath10k@lists.infradead.org, linux-wireless@vger.kernel.org,
linux-wireless-owner@vger.kernel.org
Subject: Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree
Date: Sun, 22 Sep 2019 13:19:43 +0800 [thread overview]
Message-ID: <2935b00bf3e29ad8b2738fe98dc24a76@codeaurora.org> (raw)
In-Reply-To: <874l157nrt.fsf@toke.dk>
On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
>
>> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>
>>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:
>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>
>>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:
>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>
>>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:
>>>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>>>
>>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the
>>>>>>>>>>>>>> rbtree
>>>>>>>>>>>>>> gets
>>>>>>>>>>>>>> removed from rbtree immediately in the
>>>>>>>>>>>>>> ieee80211_return_txq(),
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to
>>>>>>>>>>>>>> schedule_pos
>>>>>>>>>>>>>> not leading to the second txq in the rbtree. Thus,
>>>>>>>>>>>>>> defering
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> removal right before the end of this schedule round.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Co-developed-by: Yibo Zhao <yiboz@codeaurora.org>
>>>>>>>>>>>>>> Signed-off-by: Yibo Zhao <yiboz@codeaurora.org>
>>>>>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I didn't write this patch, so please don't use my sign-off.
>>>>>>>>>>>>> I'll
>>>>>>>>>>>>> add
>>>>>>>>>>>>> ack or review tags as appropriate in reply; but a few
>>>>>>>>>>>>> comments
>>>>>>>>>>>>> first:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> include/net/mac80211.h | 16 ++++++++++--
>>>>>>>>>>>>>> net/mac80211/ieee80211_i.h | 3 +++
>>>>>>>>>>>>>> net/mac80211/main.c | 6 +++++
>>>>>>>>>>>>>> net/mac80211/tx.c | 63
>>>>>>>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++---
>>>>>>>>>>>>>> 4 files changed, 83 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/include/net/mac80211.h
>>>>>>>>>>>>>> b/include/net/mac80211.h
>>>>>>>>>>>>>> index ac2ed8e..ba5a345 100644
>>>>>>>>>>>>>> --- a/include/net/mac80211.h
>>>>>>>>>>>>>> +++ b/include/net/mac80211.h
>>>>>>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #define IEEE80211_MAX_TX_RETRY 31
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> static inline void ieee80211_rate_set_vht(struct
>>>>>>>>>>>>>> ieee80211_tx_rate
>>>>>>>>>>>>>> *rate,
>>>>>>>>>>>>>> u8 mcs, u8 nss)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff
>>>>>>>>>>>>>> *ieee80211_tx_dequeue(struct
>>>>>>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>>>>>> * @ac: AC number to return packets from.
>>>>>>>>>>>>>> *
>>>>>>>>>>>>>> * Should only be called between calls to
>>>>>>>>>>>>>> ieee80211_txq_schedule_start()
>>>>>>>>>>>>>> - * and ieee80211_txq_schedule_end().
>>>>>>>>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty,
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>> + * to a remove list and get removed later.
>>>>>>>>>>>>>> * Returns the next txq if successful, %NULL if no queue
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> eligible.
>>>>>>>>>>>>>> If a txq
>>>>>>>>>>>>>> * is returned, it should be returned with
>>>>>>>>>>>>>> ieee80211_return_txq()
>>>>>>>>>>>>>> after the
>>>>>>>>>>>>>> * driver has finished scheduling it.
>>>>>>>>>>>>>> @@ -6268,7 +6271,8 @@ void
>>>>>>>>>>>>>> ieee80211_txq_schedule_start(struct
>>>>>>>>>>>>>> ieee80211_hw *hw, u8 ac)
>>>>>>>>>>>>>> * @hw: pointer as obtained from ieee80211_alloc_hw()
>>>>>>>>>>>>>> * @ac: AC number to acquire locks for
>>>>>>>>>>>>>> *
>>>>>>>>>>>>>> - * Release locks previously acquired by
>>>>>>>>>>>>>> ieee80211_txq_schedule_end().
>>>>>>>>>>>>>> + * Release locks previously acquired by
>>>>>>>>>>>>>> ieee80211_txq_schedule_end().
>>>>>>>>>>>>>> Check
>>>>>>>>>>>>>> + * and remove the empty txq from rb-tree.
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>> void ieee80211_txq_schedule_end(struct ieee80211_hw *hw,
>>>>>>>>>>>>>> u8
>>>>>>>>>>>>>> ac)
>>>>>>>>>>>>>> __releases(txq_lock);
>>>>>>>>>>>>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
>>>>>>>>>>>>>> ieee80211_hw
>>>>>>>>>>>>>> *hw, struct ieee80211_txq *txq)
>>>>>>>>>>>>>> __acquires(txq_lock) __releases(txq_lock);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /**
>>>>>>>>>>>>>> + * ieee80211_txqs_check - Check txqs waiting for removal
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * @tmr: pointer as obtained from local
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +void ieee80211_txqs_check(struct timer_list *tmr);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> * ieee80211_txq_may_transmit - check whether TXQ is
>>>>>>>>>>>>>> allowed
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> transmit
>>>>>>>>>>>>>> *
>>>>>>>>>>>>>> * This function is used to check whether given txq is
>>>>>>>>>>>>>> allowed
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> transmit by
>>>>>>>>>>>>>> diff --git a/net/mac80211/ieee80211_i.h
>>>>>>>>>>>>>> b/net/mac80211/ieee80211_i.h
>>>>>>>>>>>>>> index a4556f9..49aa143e 100644
>>>>>>>>>>>>>> --- a/net/mac80211/ieee80211_i.h
>>>>>>>>>>>>>> +++ b/net/mac80211/ieee80211_i.h
>>>>>>>>>>>>>> @@ -847,6 +847,7 @@ struct txq_info {
>>>>>>>>>>>>>> struct codel_stats cstats;
>>>>>>>>>>>>>> struct sk_buff_head frags;
>>>>>>>>>>>>>> struct rb_node schedule_order;
>>>>>>>>>>>>>> + struct list_head candidate;
>>>>>>>>>>>>>> unsigned long flags;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /* keep last! */
>>>>>>>>>>>>>> @@ -1145,6 +1146,8 @@ struct ieee80211_local {
>>>>>>>>>>>>>> u64 airtime_v_t[IEEE80211_NUM_ACS];
>>>>>>>>>>>>>> u64 airtime_weight_sum[IEEE80211_NUM_ACS];
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + struct list_head remove_list[IEEE80211_NUM_ACS];
>>>>>>>>>>>>>> + struct timer_list remove_timer;
>>>>>>>>>>>>>> u16 airtime_flags;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> const struct ieee80211_ops *ops;
>>>>>>>>>>>>>> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
>>>>>>>>>>>>>> index e9ffa8e..78fe24a 100644
>>>>>>>>>>>>>> --- a/net/mac80211/main.c
>>>>>>>>>>>>>> +++ b/net/mac80211/main.c
>>>>>>>>>>>>>> @@ -667,10 +667,15 @@ struct ieee80211_hw
>>>>>>>>>>>>>> *ieee80211_alloc_hw_nm(size_t priv_data_len,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>>>>>>>>>>>>>> local->active_txqs[i] = RB_ROOT_CACHED;
>>>>>>>>>>>>>> + INIT_LIST_HEAD(&local->remove_list[i]);
>>>>>>>>>>>>>> spin_lock_init(&local->active_txq_lock[i]);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + timer_setup(&local->remove_timer, ieee80211_txqs_check,
>>>>>>>>>>>>>> 0);
>>>>>>>>>>>>>> + mod_timer(&local->remove_timer,
>>>>>>>>>>>>>> + jiffies +
>>>>>>>>>>>>>> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> INIT_LIST_HEAD(&local->chanctx_list);
>>>>>>>>>>>>>> mutex_init(&local->chanctx_mtx);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
>>>>>>>>>>>>>> ieee80211_hw
>>>>>>>>>>>>>> *hw)
>>>>>>>>>>>>>> tasklet_kill(&local->tx_pending_tasklet);
>>>>>>>>>>>>>> tasklet_kill(&local->tasklet);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + del_timer_sync(&local->remove_timer);
>>>>>>>>>>>>>> #ifdef CONFIG_INET
>>>>>>>>>>>>>> unregister_inetaddr_notifier(&local->ifa_notifier);
>>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>>>>>>>>>>>>>> index d00baaa..42ca010 100644
>>>>>>>>>>>>>> --- a/net/mac80211/tx.c
>>>>>>>>>>>>>> +++ b/net/mac80211/tx.c
>>>>>>>>>>>>>> @@ -1450,6 +1450,7 @@ void ieee80211_txq_init(struct
>>>>>>>>>>>>>> ieee80211_sub_if_data *sdata,
>>>>>>>>>>>>>> codel_stats_init(&txqi->cstats);
>>>>>>>>>>>>>> __skb_queue_head_init(&txqi->frags);
>>>>>>>>>>>>>> RB_CLEAR_NODE(&txqi->schedule_order);
>>>>>>>>>>>>>> + INIT_LIST_HEAD(&txqi->candidate);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> txqi->txq.vif = &sdata->vif;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -3724,6 +3725,9 @@ void ieee80211_schedule_txq(struct
>>>>>>>>>>>>>> ieee80211_hw
>>>>>>>>>>>>>> *hw,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> + if (!list_empty(&txqi->candidate))
>>>>>>>>>>>>>> + list_del_init(&txqi->candidate);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> if (!RB_EMPTY_NODE(&txqi->schedule_order))
>>>>>>>>>>>>>> goto out;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -3783,6 +3787,20 @@ static void
>>>>>>>>>>>>>> __ieee80211_unschedule_txq(struct
>>>>>>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>>>>>> RB_CLEAR_NODE(&txqi->schedule_order);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +void ieee80211_remove_txq(struct ieee80211_hw *hw,
>>>>>>>>>>>>>> + struct ieee80211_txq *txq)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + struct ieee80211_local *local = hw_to_local(hw);
>>>>>>>>>>>>>> + struct txq_info *txqi = to_txq_info(txq);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + lockdep_assert_held(&local->active_txq_lock[txq->ac]);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
>>>>>>>>>>>>>> + __ieee80211_unschedule_txq(hw, txq);
>>>>>>>>>>>>>> + list_del_init(&txqi->candidate);
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
>>>>>>>>>>>>>> struct ieee80211_txq *txq)
>>>>>>>>>>>>>> __acquires(txq_lock) __releases(txq_lock)
>>>>>>>>>>>>>> @@ -3790,7 +3808,7 @@ void ieee80211_unschedule_txq(struct
>>>>>>>>>>>>>> ieee80211_hw *hw,
>>>>>>>>>>>>>> struct ieee80211_local *local = hw_to_local(hw);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> spin_lock_bh(&local->active_txq_lock[txq->ac]);
>>>>>>>>>>>>>> - __ieee80211_unschedule_txq(hw, txq);
>>>>>>>>>>>>>> + ieee80211_remove_txq(hw, txq);
>>>>>>>>>>>>>> spin_unlock_bh(&local->active_txq_lock[txq->ac]);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -3803,11 +3821,48 @@ void ieee80211_return_txq(struct
>>>>>>>>>>>>>> ieee80211_hw
>>>>>>>>>>>>>> *hw,
>>>>>>>>>>>>>> lockdep_assert_held(&local->active_txq_lock[txq->ac]);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
>>>>>>>>>>>>>> - (skb_queue_empty(&txqi->frags) &&
>>>>>>>>>>>>>> !txqi->tin.backlog_packets))
>>>>>>>>>>>>>> - __ieee80211_unschedule_txq(hw, txq);
>>>>>>>>>>>>>> + !txq_has_queue(&txqi->txq) &&
>>>>>>>>>>>>>> + list_empty(&txqi->candidate))
>>>>>>>>>>>>>> + list_add_tail(&txqi->candidate,
>>>>>>>>>>>>>> &local->remove_list[txq->ac]);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> EXPORT_SYMBOL(ieee80211_return_txq);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +void __ieee80211_check_txqs(struct ieee80211_local
>>>>>>>>>>>>>> *local,
>>>>>>>>>>>>>> int
>>>>>>>>>>>>>> ac)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + struct txq_info *iter, *tmp;
>>>>>>>>>>>>>> + struct sta_info *sta;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + lockdep_assert_held(&local->active_txq_lock[ac]);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + list_for_each_entry_safe(iter, tmp,
>>>>>>>>>>>>>> &local->remove_list[ac],
>>>>>>>>>>>>>> + candidate) {
>>>>>>>>>>>>>> + sta = container_of(iter->txq.sta, struct sta_info,
>>>>>>>>>>>>>> sta);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (txq_has_queue(&iter->txq))
>>>>>>>>>>>>>> + list_del_init(&iter->candidate);
>>>>>>>>>>>>>> + else
>>>>>>>>>>>>>> + ieee80211_remove_txq(&local->hw, &iter->txq);
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +void ieee80211_txqs_check(struct timer_list *t)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + struct ieee80211_local *local = from_timer(local, t,
>>>>>>>>>>>>>> remove_timer);
>>>>>>>>>>>>>> + struct txq_info *iter, *tmp;
>>>>>>>>>>>>>> + struct sta_info *sta;
>>>>>>>>>>>>>> + int ac;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
>>>>>>>>>>>>>> + spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>>>>>>> + __ieee80211_check_txqs(local, ac);
>>>>>>>>>>>>>> + spin_unlock_bh(&local->active_txq_lock[ac]);
>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + mod_timer(&local->remove_timer,
>>>>>>>>>>>>>> + jiffies +
>>>>>>>>>>>>>> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll ask the same as I did last time (where you told me to
>>>>>>>>>>>>> hold
>>>>>>>>>>>>> off
>>>>>>>>>>>>> until this round):
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why do you need the timer and the periodic check? If TXQs
>>>>>>>>>>>>> are
>>>>>>>>>>>>> added
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the remove list during the scheduling run, and
>>>>>>>>>>>>> __ieee80211_check_txqs()
>>>>>>>>>>>>> is run from schedule_end(), isn't that sufficient to clear
>>>>>>>>>>>>> the
>>>>>>>>>>>>> list?
>>>>>>>>>>>> Is it possible that a txq is not added to the remove list
>>>>>>>>>>>> but
>>>>>>>>>>>> then
>>>>>>>>>>>> packets in it are dropped by fq_codel algo? Like the station
>>>>>>>>>>>> disconnects
>>>>>>>>>>>> without any notification.
>>>>>>>>>>>
>>>>>>>>>>> Well as long as all the other cleanup paths call directly
>>>>>>>>>>> into
>>>>>>>>>>> __unschedule_txq(), that should remove stations from the
>>>>>>>>>>> scheduler
>>>>>>>>>>> when
>>>>>>>>>>> they disconnect etc.
>>>>>>>>>> Yes, the disconnect scenario is a bad example. My concern is,
>>>>>>>>>> say,
>>>>>>>>>> we
>>>>>>>>>> have 10 stations and only one of them is assigned a very small
>>>>>>>>>> weight
>>>>>>>>>> compared with that of others. Suppose, after its chance of Tx,
>>>>>>>>>> it
>>>>>>>>>> is
>>>>>>>>>> most likely to be placed in the rightmost(still has some
>>>>>>>>>> packets
>>>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>> txq) and no more incoming data for it. The remaining packets
>>>>>>>>>> in
>>>>>>>>>> txq
>>>>>>>>>> will
>>>>>>>>>> be dropped due to timeout algo in codel(correct me if I am
>>>>>>>>>> wrong)
>>>>>>>>>> but
>>>>>>>>>> this empty txq will stay on the rbtree until other txqs get
>>>>>>>>>> drained
>>>>>>>>>> or
>>>>>>>>>> global vt catch up with its vt. The staying time could be long
>>>>>>>>>> if
>>>>>>>>>> weight
>>>>>>>>>> is extremely small. Then do we need timer to check or any
>>>>>>>>>> other
>>>>>>>>>> better
>>>>>>>>>> solution?
>>>>>>>>>
>>>>>>>>> Ah, I see what you mean. No, I don't think this will be a
>>>>>>>>> problem;
>>>>>>>>> the
>>>>>>>>> scenario you're describing would play out like this:
>>>>>>>>>
>>>>>>>>> 1. Station ends transmitting, still has a single packet queued,
>>>>>>>>> gets
>>>>>>>>> moved to the end of the rbtree (and stays there for a
>>>>>>>>> while).
>>>>>>>>>
>>>>>>>>> 2. When we finally get to the point where this station gets
>>>>>>>>> another
>>>>>>>>> chance to transmit, the CoDel drop timer triggers and the
>>>>>>>>> last
>>>>>>>>> packet
>>>>>>>>> is dropped[0]. This means that the queue will just be empty
>>>>>>>>> (and ieee80211_tx_dequeue() will return NULL).
>>>>>>>>>
>>>>>>>>> 3. Because the queue is empty, ieee80211_return_txq() will not
>>>>>>>>> put
>>>>>>>>> it
>>>>>>>>> back on the rbtree.
>>>>>>>>>
>>>>>>>>> Crucially, in 2. the CoDel algorithm doesn't kick in until the
>>>>>>>>> point
>>>>>>>>> of
>>>>>>>>> packet dequeue. But even if an empty queue stays on the rbtree
>>>>>>>>> for
>>>>>>>>> a
>>>>>>>>> while, there is no harm in that: eventually it will get its
>>>>>>>>> turn,
>>>>>>>>> it
>>>>>>>>> will turn out to be empty, and just be skipped over.
>>>>>>>> Then that will be fine. Thanks for the explanation of the
>>>>>>>> dropping
>>>>>>>> part
>>>>>>>> in CoDel algorithm.
>>>>>>>
>>>>>>> Yup, think so. And you're welcome :)
>>>>>>>
>>>>>>>>> The issue we need to be concerned about is the opposite: If we
>>>>>>>>> have
>>>>>>>>> a
>>>>>>>>> queue that *does* have packets queued, but which is *not*
>>>>>>>>> scheduled
>>>>>>>>> for
>>>>>>>>> transmission, that will stall TX.
>>>>>>>> Is it by design since its vt is more than global vt, right? The
>>>>>>>> lattency
>>>>>>>> may somehow get impacted though.
>>>>>>>
>>>>>>> Well, it should still stay on the rbtree as long as it has
>>>>>>> packets
>>>>>>> queued. We don't have a check anywhere that reschedules TXQs
>>>>>>> whose
>>>>>>> v_t
>>>>>>> drops below global v_t...
>>>>>>>
>>>>>>>>> [0] CoDel in most cases only drops a single packet at a time,
>>>>>>>>> so
>>>>>>>>> it
>>>>>>>>> will
>>>>>>>>> not clear out an entire queue with multiple packets in one go.
>>>>>>>>> But
>>>>>>>>> you
>>>>>>>>> are right that it could conceivably drop the last packet in a
>>>>>>>>> queue.
>>>>>>>>>
>>>>>>>>>>> We only need to defer removal inside a single "scheduling
>>>>>>>>>>> round"
>>>>>>>>>>> (i.e.,
>>>>>>>>>>> between a pair of ieee80211_txq_schedule_start/end. So if we
>>>>>>>>>>> just
>>>>>>>>>>> walk
>>>>>>>>>>> the remove list in schedule_end() we should be enough, no?
>>>>>>>>>>>
>>>>>>>>>>> Hmm, or maybe a simpler way to fix the original issue is just
>>>>>>>>>>> to
>>>>>>>>>>> have
>>>>>>>>>>> unschedule_txq() update the schedule_pos() pointer?
>>>>>>>>>>>
>>>>>>>>>>> I.e., unschedule_txq checks if the txq being removed is
>>>>>>>>>>> currently
>>>>>>>>>>> being
>>>>>>>>>>> pointed to by schedule_pos[ac], and if it is, it updates
>>>>>>>>>>> schedule_pos
>>>>>>>>>>> to
>>>>>>>>>>> be the rb_next of the current value?
>>>>>>>>>> Actually, if schedule_pos is updated to rb_next of the current
>>>>>>>>>> value,
>>>>>>>>>> then in the next_txq() where we are going to use rb_next again
>>>>>>>>>> and
>>>>>>>>>> finally pick the next node of the node we really want. Is it
>>>>>>>>>> fine
>>>>>>>>>> to
>>>>>>>>>> update schedule_pos to NULL?
>>>>>>>>>
>>>>>>>>> Hmm, yeah, good point.
>>>>>>>>>
>>>>>>>>> If we do end up setting schedule_pos to NULL in the middle of a
>>>>>>>>> scheduling round, that will make next_txq() "start over", and
>>>>>>>>> do
>>>>>>>>> another
>>>>>>>>> loop through the whole thing. I guess we may be able hit a case
>>>>>>>>> where
>>>>>>>>> things can oscillate back and forth between addition and
>>>>>>>>> removal
>>>>>>>>> resulting in an infinite loop? Not sure, but at least I can't
>>>>>>>>> seem
>>>>>>>>> to
>>>>>>>>> convince myself that this can't happen.
>>>>>>>>
>>>>>>>> As the loop of next_txq under lock protection as below,
>>>>>>>>
>>>>>>>> txq_schedule_start();
>>>>>>>> while(txq=next_txq()){
>>>>>>>> ...
>>>>>>>> return_txq(txq);
>>>>>>>> }
>>>>>>>> txq_schedule_end();
>>>>>>>>
>>>>>>>> I do not see any chance of addition, no?
>>>>>>>
>>>>>>> As you noted in your other email, Felix reduced the locking. And
>>>>>>> yeah,
>>>>>>> we need to rebase this series to also incorporate that. I figure
>>>>>>> I
>>>>>>> can
>>>>>>> send an updated version of the first patch in the series once
>>>>>>> we've
>>>>>>> worked out the remaining issues with your follow-up patches.
>>>>>>>
>>>>>> Oh, I was thinking we were discussing without locking reduced.
>>>>>> Yes,
>>>>>> I
>>>>>> also agree there might be a case causing infinite loop. With
>>>>>> locking
>>>>>> reduced, the tree can be adjusted between next_txq() and
>>>>>> return_txq()
>>>>>> in
>>>>>> the loop situation. For further discussion, let 's consider,
>>>>>> 1) the tree starts like:
>>>>>> A->B->C->D->E
>>>>>> 2) then next_txq() returns A for dequeuing
>>>>>> 3) driver dequeues A and draines A without any active txq locked
>>>>>> meaning
>>>>>> the tree could be changed upon Tx compeletion.
>>>>>> 4) then in return_txq(), the tree could be,
>>>>>> i A->B->C->D->E (A is empty, and maybe soon be added
>>>>>> back
>>>>>> before the loop end)
>>>>>> ii B->C->A->D->E (A is empty, and maybe soon be added
>>>>>> back
>>>>>> before the loop end)
>>>>>> iii B->C->D->E->A (A is empty, and maybe soon be added
>>>>>> back
>>>>>> before the loop end)
>>>>>>
>>>>>> with this change:
>>>>>> local->schedule_pos[ac] = rb_next(node) ?: rb_prev(node);
>>>>>>
>>>>>> for case i, local->schedule_pos[ac] is rb_next(A) which is B, and
>>>>>> in
>>>>>> next_txq(), rb_next(B) is what we returns which actually is C and
>>>>>> B
>>>>>> is
>>>>>> skipped, no?
>>>>>>
>>>>>> Similiar for case ii, we skip B, C, D.
>>>>>
>>>>> Yup, I think you're right. But if we can fix this by making
>>>>> ieee80211_resort_txq() aware of the schedule_pos as well, no? I.e.,
>>>>> if
>>>>> resort_txq() acts on the txq that's currently in schedule_pos, it
>>>>> will
>>>>> update schedule pos with the same rb_next(node) ?: rb_prev(node);
>>>>> (optionally after checking that the position of the node is
>>>>> actually
>>>>> going to change).
>>>> Sorry, please igore last email sent by mistake.
>>>>
>>>> I don't think it makes any difference with that in unschedule_txq().
>>>> For
>>>> case i, it finally picks C as well in next_txq(). For next_txq(),
>>>> schedule_pos means previous candidate node whereas with your change,
>>>> it
>>>> looks like schedule_pos is current candidate node instead.
>>>
>>> Hmm, that was not actually what I was thinking, but yeah I think
>>> you're
>>> right that it would be easier to just change it so schedule_pos is
>>> pointing to the next and not the current txq we want to schedule.
>> So do you mean we can change next_txq like this,
>>
>> struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8
>> ac)
>> {
>> struct ieee80211_local *local = hw_to_local(hw);
>> struct rb_node *node = local->schedule_pos[ac];
>> struct txq_info *txqi = NULL;
>> bool first = false;
>>
>> lockdep_assert_held(&local->active_txq_lock[ac]);
>>
>> if (!node) {
>> node = rb_first_cached(&local->active_txqs[ac]);
>> first = true;
>> - } else
>> - node = rb_next(node);
>> + }
>> +
>> if (!node)
>> return NULL;
>
> Ah, no, now I remember why this didn't work and I went with the other
> approach: If you make this change, you also have to have this at the
> end:
>
> local->schedule_pos[ac] = rb_next(node);
>
>
> But this means we can no longer distinguish between having gone through
> the whole thing (so rb_next() returns NULL), or starting out with
> nothing.
>
> So, instead we need to keep next_txq() the way it is, and just add
Right, should keep next_txq() the way it is.
>
> local->schedule_pos[ac] = rb_prev(node);
>
> whenever we remove a node (both in return_txq() and resort_txq()).
Agree, and also we may need to consider case like A is removed and soon
be added back just the same as ii),
B->C->A->D->E
then B is schedule, removed and soon added back,
C->A->B->D->E
A and B will have a second chance to be scheduled and this may happen to
others as well leading to the infinite loop as you have mentioned
previously, so do we need to maintain a schedule_round like we do in
DRR? Like,
- If the node is in the same round, by pass schedule, go to
rb_next(), either continue loop this round or end this round.
- Increase the schedule_round at the schedule_start() only when the
schedule_pos is NULL.
>
>>>
>>> We'd still need a check in resort_txq() then, but it would make it
>>> safe
>>> to unschedule in return_txq()...
>> Yes, agree with that.
>>
>>
>>>
>>>>>> Also I am wondering if there will be some SMP issues relating with
>>>>>> local->schedule_pos[ac].
>>>>>
>>>>> Not sure what you mean by this?
>>>> My bad. Please ignore this.
>>>>
>>>>
>>>>>
>>>>>>>> In ath10k, we will usually push packets of first txq as many as
>>>>>>>> we
>>>>>>>> can
>>>>>>>> until it is drained and then move to the next one. So if a txq
>>>>>>>> gets
>>>>>>>> removed in the return_txq, it should always be the leftmost. And
>>>>>>>> during this period, neither vt of any station or global vt can
>>>>>>>> be
>>>>>>>> updated due to lock protection.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> But in that case, we could fix it by just conditionally
>>>>>>>>> assigning
>>>>>>>>> either
>>>>>>>>> rb_next or rb_prev to the schedule_pos in unschedule_txq()?
>>>>>>>>> I.e.,
>>>>>>>>> something like:
>>>>>>>>>
>>>>>>>>> local->schedule_pos[ac] = rb_next(node) ?: rb_prev(node);
>>>>>>>> I am not sure I am getting your point. Still in next_txq,
>>>>>>>> schedule_pos[ac] will lead us to the next node of the one we
>>>>>>>> want.
>>>>>>>
>>>>>>> The logic in next_txq is different when schedule_pos[ac] is NULL,
>>>>>>> vs
>>>>>>> when rb_next(schedule_pos[ac]) is NULL. The former restarts a new
>>>>>>> scheduling round, while the latter ends the current round.
>>>>>>>
>>>>>>> -Toke
>>>>>>
>>>>>> --
>>>>>> Yibo
>>>>
>>>> --
>>>> Yibo
>>
>> --
>> Yibo
--
Yibo
next prev parent reply other threads:[~2019-09-22 5:22 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-16 13:09 [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler Yibo Zhao
2019-09-16 13:09 ` [PATCH 2/4] mac80211: defer txqs removal from rbtree Yibo Zhao
2019-09-17 21:10 ` Toke Høiland-Jørgensen
2019-09-18 10:27 ` Yibo Zhao
2019-09-18 11:23 ` Toke Høiland-Jørgensen
2019-09-19 9:56 ` Yibo Zhao
2019-09-19 10:37 ` Toke Høiland-Jørgensen
2019-09-20 8:29 ` Yibo Zhao
2019-09-20 9:15 ` Toke Høiland-Jørgensen
2019-09-21 10:49 ` Yibo Zhao
2019-09-21 11:27 ` Toke Høiland-Jørgensen
2019-09-21 11:53 ` Yibo Zhao
2019-09-21 12:22 ` Yibo Zhao
2019-09-21 13:02 ` Toke Høiland-Jørgensen
2019-09-21 13:24 ` Yibo Zhao
2019-09-21 14:00 ` Toke Høiland-Jørgensen
2019-09-22 5:19 ` Yibo Zhao [this message]
2019-09-23 10:47 ` Toke Høiland-Jørgensen
2019-09-23 11:42 ` Kalle Valo
2019-09-23 16:39 ` Toke Høiland-Jørgensen
2019-09-24 5:27 ` Kalle Valo
2019-09-24 7:23 ` Toke Høiland-Jørgensen
2019-09-24 2:45 ` Yibo Zhao
2019-09-24 7:26 ` Toke Høiland-Jørgensen
2019-09-24 8:31 ` Yibo Zhao
2019-09-24 8:44 ` Toke Høiland-Jørgensen
2019-09-16 13:09 ` [PATCH 3/4] mac80211: fix low throughput in push pull mode Yibo Zhao
2019-09-16 15:27 ` Johannes Berg
2019-09-17 6:36 ` Yibo Zhao
2019-09-17 6:55 ` Johannes Berg
2019-09-17 21:12 ` Toke Høiland-Jørgensen
2019-09-18 10:02 ` Yibo Zhao
2019-09-18 10:16 ` Toke Høiland-Jørgensen
2019-09-18 10:18 ` Yibo Zhao
2019-09-16 13:09 ` [PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together Yibo Zhao
2019-09-17 21:24 ` Toke Høiland-Jørgensen
2019-09-18 10:16 ` Yibo Zhao
2019-09-16 14:51 ` [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler Toke Høiland-Jørgensen
2019-09-17 21:31 ` Toke Høiland-Jørgensen
2019-09-20 8:37 ` Yibo Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2935b00bf3e29ad8b2738fe98dc24a76@codeaurora.org \
--to=yiboz@codeaurora.org \
--cc=ath10k@lists.infradead.org \
--cc=linux-wireless-owner@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).