* Regarding .wake_tx_queue() model
@ 2020-05-04 19:39 Maxime Bizon
2020-05-05 12:06 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 10+ messages in thread
From: Maxime Bizon @ 2020-05-04 19:39 UTC (permalink / raw)
To: linux-wireless
Hello,
Currently switching a driver to .wake_tx_queue() model, and I would
appreciate some guidance over a couple of issues.
My hardware exposes 1 FIFO per ac, so the current driver basically
queue skb in the correct fifo from drv_tx(), and once a FIFO is big
"enough" (packet count or total duration), I use
ieee80211_stop_queue(), and the corresponding ieee80211_wait_queue()
in tx completion.
Current driver TX flow is:
- drv_tx() => push into FIFO
- drv_tx() => push into FIFO
- drv_tx() => push into FIFO => FIFO full => ieee80211_stop_queue()
- [drv_tx won't be called]
- tx completion event => ieee80211_wake_queue()
- drv_tx()
[...]
1) drv_tx() & drv_wake_tx_queue() concurrency
With the .wake_tx_queue model, there are now two entry points in the
driver, how does the stack ensure that drv_tx() is not blocked forever
if there is concurrent traffic on the same AC ?
for example:
- .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue()
- tx completion event => ieee80211_wake_queue()
- .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue()
- tx completion event => ieee80211_wake_queue()
- [...]
ieee80211_wake_queue() will schedule both tx_pending_tasklet and
wake_txqs_tasklet, but I don't think there is any guarantee in the
call ordering.
Is it the driver's duty to leave a bit of room during
drv_wake_tx_queue() scheduling and not stop the queues from there ?
2) ieee80211_stop_queue() vs drv_wake_tx_queue()
Commit 21a5d4c3a45ca608477a083096cfbce76e449a0c made it so that
ieee80211_tx_dequeue() will return nothing if hardware queue is
stopped, but drv_wake_tx_queue() is still called for ac whose queue is
stopped.
so should I do this ?
- .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue()
- .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => NULL => return
- tx completion event => ieee80211_wake_queue()
- .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue()
- [...]
or this ?
- .wake_tx_queue() => ieee80211_queue_stopped() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue()
- .wake_tx_queue() => ieee80211_queue_stopped() => return
associated commit log only mentions edge cases (channel switch, DFS),
so I'm not sure if ieee80211_stop_queue() for txqs was intended for
"fast path", also see 3)
3) _ieee80211_wake_txqs() looks buggy:
If the cab_queue is not stopped, this loop will unconditionally wake
up all txqs, which I guess is not what was intended:
for (i = 0; i < local->hw.queues; i++) {
if (local->queue_stop_reasons[i])
continue;
for (ac = 0; ac < n_acs; ac++) {
int ac_queue = sdata->vif.hw_queue[ac];
if (ac_queue == i ||
sdata->vif.cab_queue == i)
__ieee80211_wake_txqs(sdata, ac);
}
4) building aggregation in the driver:
I'm doing aggregation on the host side rather than in the firmware,
which will ends up with more or less the same code as ath9k, is there
any on-going effort to move that code from the driver into the stack ?
Thanks,
--
Maxime
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Regarding .wake_tx_queue() model 2020-05-04 19:39 Regarding .wake_tx_queue() model Maxime Bizon @ 2020-05-05 12:06 ` Toke Høiland-Jørgensen 2020-05-05 13:15 ` Maxime Bizon 0 siblings, 1 reply; 10+ messages in thread From: Toke Høiland-Jørgensen @ 2020-05-05 12:06 UTC (permalink / raw) To: Maxime Bizon, linux-wireless Maxime Bizon <mbizon@freebox.fr> writes: > Hello, > > Currently switching a driver to .wake_tx_queue() model Yay :) > and I would appreciate some guidance over a couple of issues. > > My hardware exposes 1 FIFO per ac, so the current driver basically > queue skb in the correct fifo from drv_tx(), and once a FIFO is big > "enough" (packet count or total duration), I use > ieee80211_stop_queue(), and the corresponding ieee80211_wait_queue() > in tx completion. > > Current driver TX flow is: > - drv_tx() => push into FIFO > - drv_tx() => push into FIFO > - drv_tx() => push into FIFO => FIFO full => ieee80211_stop_queue() > - [drv_tx won't be called] > - tx completion event => ieee80211_wake_queue() > - drv_tx() > [...] > > > 1) drv_tx() & drv_wake_tx_queue() concurrency > > With the .wake_tx_queue model, there are now two entry points in the > driver, how does the stack ensure that drv_tx() is not blocked forever > if there is concurrent traffic on the same AC ? > > > for example: > > - .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue() > - tx completion event => ieee80211_wake_queue() > - .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue() > - tx completion event => ieee80211_wake_queue() > - [...] > > ieee80211_wake_queue() will schedule both tx_pending_tasklet and > wake_txqs_tasklet, but I don't think there is any guarantee in the > call ordering. > > Is it the driver's duty to leave a bit of room during > drv_wake_tx_queue() scheduling and not stop the queues from there ? Yeah, this is basically up to the driver. I'm mostly familiar with ath9k, and I think basically what that does is that it doesn't fill the HW FIFO in normal operation: For data packets being pulled off ieee80211_tx_dequeue() it'll only queue two aggregates in the hardware at a time. This is a good thing! We want the packets to be queued on the mac80211 TXQs not in a dumb HW FIFO causing bufferbloat! Given that you're building aggregates in the driver, you could just do the same thing as ath9k and likely get pretty good results, I think :) > 2) ieee80211_stop_queue() vs drv_wake_tx_queue() > > Commit 21a5d4c3a45ca608477a083096cfbce76e449a0c made it so that > ieee80211_tx_dequeue() will return nothing if hardware queue is > stopped, but drv_wake_tx_queue() is still called for ac whose queue is > stopped. > > > so should I do this ? > > - .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue() > - .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => NULL => return > - tx completion event => ieee80211_wake_queue() > - .wake_tx_queue() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue() > - [...] > > or this ? > > - .wake_tx_queue() => ieee80211_queue_stopped() => ieee80211_next_txq() => ieee80211_tx_dequeue() => FIFO now full => ieee80211_stop_queue() > - .wake_tx_queue() => ieee80211_queue_stopped() => return > > associated commit log only mentions edge cases (channel switch, DFS), > so I'm not sure if ieee80211_stop_queue() for txqs was intended for > "fast path", also see 3) I don't think ieee80211_stop_queue() is meant to be used this way at all in the wake_tx_queue case. Rather, when you get a wake_tx_queue() callback, you just queue as many frames as you feel like (see '2 aggregate' limit above), and then return. Then, on a TX completion you just call your internal driver function that tries to pull more frames from the mac80211 TXQs. You'll keep getting wake_tx_queue callbacks from mac80211, but there's nothing saying you have to pull any frames on each one. See ath_txq_schedule() for how ath9k does this :) > 3) _ieee80211_wake_txqs() looks buggy: > > If the cab_queue is not stopped, this loop will unconditionally wake > up all txqs, which I guess is not what was intended: > > for (i = 0; i < local->hw.queues; i++) { > if (local->queue_stop_reasons[i]) > continue; > > for (ac = 0; ac < n_acs; ac++) { > int ac_queue = sdata->vif.hw_queue[ac]; > > if (ac_queue == i || > sdata->vif.cab_queue == i) > __ieee80211_wake_txqs(sdata, ac); > } (not sure about this none) > 4) building aggregation in the driver: > > I'm doing aggregation on the host side rather than in the firmware, > which will ends up with more or less the same code as ath9k, is there > any on-going effort to move that code from the driver into the stack ? Not aware of any on-going efforts, no. Something like this usually happens because someone feels it would make their life easier. Say, if they're writing a new driver and wants to re-use code :) Looking at the ath9k code, ath_tx_form_aggr() is tied into the internal driver buffer representations, so I'm not sure how much work it would be to generalise and split out parts of it. It need not be a complete "build me an aggregate" function that you move into mac80211, though, even some utility functions to calculate padding etc might be shareable? I guess that if you're copying code from there I guess you'll find out :) -Toke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-05 12:06 ` Toke Høiland-Jørgensen @ 2020-05-05 13:15 ` Maxime Bizon 2020-05-05 13:53 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 10+ messages in thread From: Maxime Bizon @ 2020-05-05 13:15 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: linux-wireless On Tuesday 05 May 2020 à 14:06:18 (+0200), Toke Høiland-Jørgensen wrote: Hello Toke, thanks for your comments > I don't think ieee80211_stop_queue() is meant to be used this way at all > in the wake_tx_queue case. Rather, when you get a wake_tx_queue() > callback, you just queue as many frames as you feel like (see '2 > aggregate' limit above), and then return. Then, on a TX completion you > just call your internal driver function that tries to pull more frames > from the mac80211 TXQs. alright, indeed .wake_tx_queue() does not have to result in any actual frame sent. but what about .release_buffered_frames then ? .release_buffered_frames() and .drv_tx() are both "push" oriented interface. When they are called, you have to push a frame to the hardware, so if they are called when hardware FIFO is full, you need to drop the frame (or add yet another intermediate level of queuing) from what I can see, .release_buffered_frames() will happily be called even if queue is stopped, and you have to at least send one frame. I'm just trying to avoid any additionnal intermediate queing in the driver between what mac80211 gives me and the HW fifo which has a fixed size (well actually you can choose the size, but only at init time). my current workaround is to pre-size the hw fifo queue to handle what I think is the worst case Also .release_buffered_frames() codepath is difficult to test, how do you trigger that reliably ? assuming VIF is an AP, then you need the remote STA to go to sleep even though you have traffic waiting for it in the txqi. For now I patch the stack to introduce artificial delay. Since my hardware has a sticky per-STA PS filter with good status reporting, I'm considering using ieee80211_sta_block_awake() and only unblock STA when all its txqi are empty to get rid of .release_buffered_frames() complexity. -- Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-05 13:15 ` Maxime Bizon @ 2020-05-05 13:53 ` Toke Høiland-Jørgensen 2020-05-05 15:20 ` Maxime Bizon 0 siblings, 1 reply; 10+ messages in thread From: Toke Høiland-Jørgensen @ 2020-05-05 13:53 UTC (permalink / raw) To: Maxime Bizon; +Cc: linux-wireless Maxime Bizon <mbizon@freebox.fr> writes: > On Tuesday 05 May 2020 à 14:06:18 (+0200), Toke Høiland-Jørgensen wrote: > > Hello Toke, > > thanks for your comments > >> I don't think ieee80211_stop_queue() is meant to be used this way at all >> in the wake_tx_queue case. Rather, when you get a wake_tx_queue() >> callback, you just queue as many frames as you feel like (see '2 >> aggregate' limit above), and then return. Then, on a TX completion you >> just call your internal driver function that tries to pull more frames >> from the mac80211 TXQs. > > alright, indeed .wake_tx_queue() does not have to result in any actual > frame sent. > > but what about .release_buffered_frames then ? > > .release_buffered_frames() and .drv_tx() are both "push" oriented > interface. When they are called, you have to push a frame to the > hardware, so if they are called when hardware FIFO is full, you need > to drop the frame (or add yet another intermediate level of queuing) > > from what I can see, .release_buffered_frames() will happily be called > even if queue is stopped, and you have to at least send one frame. > > I'm just trying to avoid any additionnal intermediate queing in the > driver between what mac80211 gives me and the HW fifo which has a > fixed size (well actually you can choose the size, but only at init > time). > > my current workaround is to pre-size the hw fifo queue to handle what > I think is the worst case Well, I think that should be fine? Having a longer HW queue is fine, as long as you have some other logic to not fill it all the time (like the "max two aggregates" logic I mentioned before). I think this is what ath9k does too. At least it looks like both drv_tx() and release_buffered_frames() will just abort (and drop in the former case) if there is no HW buffer space left. > Also .release_buffered_frames() codepath is difficult to test, how do > you trigger that reliably ? assuming VIF is an AP, then you need the > remote STA to go to sleep even though you have traffic waiting for it > in the txqi. For now I patch the stack to introduce artificial delay. > > Since my hardware has a sticky per-STA PS filter with good status > reporting, I'm considering using ieee80211_sta_block_awake() and only > unblock STA when all its txqi are empty to get rid of > .release_buffered_frames() complexity. I'm probably not the right person to answer that; never did have a good grip on the details of PS support. What hardware is it you're writing a driver for, BTW, and are you planning to upstream it? :) -Toke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-05 13:53 ` Toke Høiland-Jørgensen @ 2020-05-05 15:20 ` Maxime Bizon 2020-05-05 16:50 ` Toke Høiland-Jørgensen 2020-05-25 9:47 ` Johannes Berg 0 siblings, 2 replies; 10+ messages in thread From: Maxime Bizon @ 2020-05-05 15:20 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: linux-wireless On Tuesday 05 May 2020 à 15:53:08 (+0200), Toke Høiland-Jørgensen wrote: > Well, I think that should be fine? Having a longer HW queue is fine, as > long as you have some other logic to not fill it all the time (like the > "max two aggregates" logic I mentioned before). I think this is what > ath9k does too. At least it looks like both drv_tx() and > release_buffered_frames() will just abort (and drop in the former case) > if there is no HW buffer space left. Ok BTW, the "max two aggregates" rule, why is it based on number of frames and not duration ? if you are queing 1500 bytes @1Mbit/s, even one frame is enough, but not so for faster rates. It would be even better if minstrel could limit the total duration when computing number of hardware retries, and then mac80211 could handle software retries for those really slow packets, no hardware FIFO "pollution" > > Also .release_buffered_frames() codepath is difficult to test, how do > > you trigger that reliably ? assuming VIF is an AP, then you need the > > remote STA to go to sleep even though you have traffic waiting for it > > in the txqi. For now I patch the stack to introduce artificial delay. > > > > Since my hardware has a sticky per-STA PS filter with good status > > reporting, I'm considering using ieee80211_sta_block_awake() and only > > unblock STA when all its txqi are empty to get rid of > > .release_buffered_frames() complexity. > > I'm probably not the right person to answer that; never did have a good > grip on the details of PS support. Hopefully Felix or Johannes will see this. Just wondering if there is anything I'm missing, this alternative way of doing looks easier to me because it removes "knowledge" of frame delivery under service period from the driver: 1) first get rid of buffered txq traffic when entering PS: --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -1593,6 +1593,15 @@ static void sta_ps_start(struct sta_info *sta) list_del_init(&txqi->schedule_order); spin_unlock(&local->active_txq_lock[txq->ac]); - if (txq_has_queue(txq)) - set_bit(tid, &sta->txq_buffered_tids); - else - clear_bit(tid, &sta->txq_buffered_tids); + /* transfer txq into tx_filtered frames */ + spin_lock(&fq->lock); + while ((skb = skb_dequeue(&txq->frags))) + skb_queue_tail(&sta->tx_filtered[txq->ac], skb); + /* use something more efficient like fq_tin_reset */ + while ((skb = fq_tin_dequeue(fq, tin, fq_tin_dequeue_func))) + skb_queue_tail(&sta->tx_filtered[txq->ac], skb); + spin_unlock_bh(&fq->lock); + 2) driver register for STA_NOTIFY_SLEEP 3) driver count tx frames pending in the hardware per STA and sets ieee80211_sta_block_awake(sta, 1) when > 0 4) on tx completion, if STA is sleeping and number of pending tx frames in hardware for a given STA reaches 0: - if driver buffers other frames for this STA, release them with TX_FILTERED in reverse order - calls ieee80211_sta_block_awake(false) what do you think ? > What hardware is it you're writing a driver for, BTW, and are you > planning to upstream it? :) that's a rewrite of the mwl8k driver targeting the same hardware, but with a different firmware interface. if I can bring it on par with the existing set of supported hardware and features, I could try to upstream it yes. -- Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-05 15:20 ` Maxime Bizon @ 2020-05-05 16:50 ` Toke Høiland-Jørgensen 2020-05-05 17:49 ` Maxime Bizon 2020-05-25 9:47 ` Johannes Berg 1 sibling, 1 reply; 10+ messages in thread From: Toke Høiland-Jørgensen @ 2020-05-05 16:50 UTC (permalink / raw) To: Maxime Bizon; +Cc: linux-wireless Maxime Bizon <mbizon@freebox.fr> writes: > On Tuesday 05 May 2020 à 15:53:08 (+0200), Toke Høiland-Jørgensen wrote: > >> Well, I think that should be fine? Having a longer HW queue is fine, as >> long as you have some other logic to not fill it all the time (like the >> "max two aggregates" logic I mentioned before). I think this is what >> ath9k does too. At least it looks like both drv_tx() and >> release_buffered_frames() will just abort (and drop in the former case) >> if there is no HW buffer space left. > > Ok > > BTW, the "max two aggregates" rule, why is it based on number of > frames and not duration ? if you are queing 1500 bytes @1Mbit/s, even > one frame is enough, but not so for faster rates. It's the minimum amount that works - assuming you get a TX completion when one is done, the CPU has time to build the next one before the second one is done, and you avoid starvation. Note this only works well for aggregates, since their size tend to vary with rate; if you're queueing individual packets to the HWQ you need something that takes rate into account, which is what AQL (Airtime Queue Limits) does for ath10k. > It would be even better if minstrel could limit the total duration > when computing number of hardware retries, and then mac80211 could > handle software retries for those really slow packets, no hardware > FIFO "pollution" Minstrel will compute the max aggregate size based on the rate, which is why the "two aggregates" scheme works. It likely could be smarter about limiting the number of retries, as you say, but we never did get around to doing anything about that :) >> > Also .release_buffered_frames() codepath is difficult to test, how do >> > you trigger that reliably ? assuming VIF is an AP, then you need the >> > remote STA to go to sleep even though you have traffic waiting for it >> > in the txqi. For now I patch the stack to introduce artificial delay. >> > >> > Since my hardware has a sticky per-STA PS filter with good status >> > reporting, I'm considering using ieee80211_sta_block_awake() and only >> > unblock STA when all its txqi are empty to get rid of >> > .release_buffered_frames() complexity. >> >> I'm probably not the right person to answer that; never did have a good >> grip on the details of PS support. > > Hopefully Felix or Johannes will see this. > > Just wondering if there is anything I'm missing, this alternative way > of doing looks easier to me because it removes "knowledge" of frame > delivery under service period from the driver: > > > 1) first get rid of buffered txq traffic when entering PS: > > --- a/net/mac80211/rx.c > +++ b/net/mac80211/rx.c > @@ -1593,6 +1593,15 @@ static void sta_ps_start(struct sta_info *sta) > list_del_init(&txqi->schedule_order); > spin_unlock(&local->active_txq_lock[txq->ac]); > > - if (txq_has_queue(txq)) > - set_bit(tid, &sta->txq_buffered_tids); > - else > - clear_bit(tid, &sta->txq_buffered_tids); > + /* transfer txq into tx_filtered frames */ > + spin_lock(&fq->lock); > + while ((skb = skb_dequeue(&txq->frags))) > + skb_queue_tail(&sta->tx_filtered[txq->ac], skb); > + /* use something more efficient like fq_tin_reset */ > + while ((skb = fq_tin_dequeue(fq, tin, fq_tin_dequeue_func))) > + skb_queue_tail(&sta->tx_filtered[txq->ac], skb); > + spin_unlock_bh(&fq->lock); This seems like a bad idea; we want the TXQ mechanism to decide which frame to send on wakeup. > 2) driver register for STA_NOTIFY_SLEEP > > 3) driver count tx frames pending in the hardware per STA and sets > ieee80211_sta_block_awake(sta, 1) when > 0 > > 4) on tx completion, if STA is sleeping and number of pending tx frames in hardware for a > given STA reaches 0: > - if driver buffers other frames for this STA, release them with TX_FILTERED in reverse order > - calls ieee80211_sta_block_awake(false) > > what do you think ? As I said, I'm not an expert on the PS code, so I may be missing something. But it seems to me that in a model where the driver pulls the frames from mac80211 (i.e., for drivers using wake_tx_queue), there really is no way around having a way to instruct the driver "please use these flags for the next N frames you send" - which is what release_buffered_frames() does. What you're suggesting is basically turning off this 'pull mode' for the frames buffered during PS and have mac80211 revert to push mode for those, right? But then you lose the benefits of pull mode (the TXQs) for those frames. I remember Johannes talking about a 'shim layer' between the mac80211 TXQs and the 'drv_tx()' hook as a way to bring the benefits of the TXQs to the 'long tail' of simple drivers that don't do any internal buffering anyway, without having to change the drivers to use 'pull mode'. Am I wrong in thinking that mwl8k may be a good candidate for such a layer? From glancing through the existing driver it looks like it's mostly just taking each frame, wrapping it in a HW descriptor, and sticking it on a TX ring? >> What hardware is it you're writing a driver for, BTW, and are you >> planning to upstream it? :) > > that's a rewrite of the mwl8k driver targeting the same hardware, but > with a different firmware interface. > > if I can bring it on par with the existing set of supported hardware > and features, I could try to upstream it yes. That would be awesome! :) -Toke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-05 16:50 ` Toke Høiland-Jørgensen @ 2020-05-05 17:49 ` Maxime Bizon 2020-05-05 20:49 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 10+ messages in thread From: Maxime Bizon @ 2020-05-05 17:49 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: linux-wireless On Tuesday 05 May 2020 à 18:50:45 (+0200), Toke Høiland-Jørgensen wrote: > This seems like a bad idea; we want the TXQ mechanism to decide which > frame to send on wakeup. .release_buffered_frames() is only needed/used if STA went into powersave while packets were already sitting inside txqi, that's an edge case. In the other much more common case (STA went into sleep without any traffic pending in txqi), then the "classic" ps delivery code is used: frames gets pulled from ps_tx_buf queue (1 by 1 for ps poll, more for uapsd), and those frames ends up being sent through drv_tx(), since they have the flag IEEE80211_TX_CTRL_PS_RESPONSE so they bypass txqi. so I was just looking at removing that edge case, sending those frames back to ps_tx_buf() from the driver. > really is no way around having a way to instruct the driver "please use > these flags for the next N frames you send" - which is what > release_buffered_frames() does. What you're suggesting is basically > turning off this 'pull mode' for the frames buffered during PS and have > mac80211 revert to push mode for those, right? But then you lose the > benefits of pull mode (the TXQs) for those frames. I just want to give those back to mac80211, those frames were already in push mode anyway. > I remember Johannes talking about a 'shim layer' between the mac80211 > TXQs and the 'drv_tx()' hook as a way to bring the benefits of the TXQs > to the 'long tail' of simple drivers that don't do any internal > buffering anyway, without having to change the drivers to use 'pull > mode'. Am I wrong in thinking that mwl8k may be a good candidate for > such a layer? From glancing through the existing driver it looks like > it's mostly just taking each frame, wrapping it in a HW descriptor, and > sticking it on a TX ring? maybe with the current firmware interface, but with the new one aggregation is done on host side, so tx path is no more that simple. -- Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-05 17:49 ` Maxime Bizon @ 2020-05-05 20:49 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 10+ messages in thread From: Toke Høiland-Jørgensen @ 2020-05-05 20:49 UTC (permalink / raw) To: Maxime Bizon; +Cc: linux-wireless Maxime Bizon <mbizon@freebox.fr> writes: > On Tuesday 05 May 2020 à 18:50:45 (+0200), Toke Høiland-Jørgensen wrote: > >> This seems like a bad idea; we want the TXQ mechanism to decide which >> frame to send on wakeup. > > .release_buffered_frames() is only needed/used if STA went into > powersave while packets were already sitting inside txqi, that's an > edge case. > > In the other much more common case (STA went into sleep without any > traffic pending in txqi), then the "classic" ps delivery code is used: > frames gets pulled from ps_tx_buf queue (1 by 1 for ps poll, more for > uapsd), and those frames ends up being sent through drv_tx(), since > they have the flag IEEE80211_TX_CTRL_PS_RESPONSE so they bypass txqi. Ah, I see, and if there are a lot of outstanding frames the client is supposed to wake up and resume regular operation? As I said, I really don't know much about how PS works; but I'm enjoying learning about it, so thanks! :) > so I was just looking at removing that edge case, sending those frames > back to ps_tx_buf() from the driver. Right, that makes sense. But I guess this is only something you can do if you never buffer frames in the driver, no? E.g., ath9k has its own internal retry queue, so it needs the callback to train that; and once the callback is there, extending it to pull from the TXQs is quite straight forward... So it's not necessarily a generally-applicable optimisation, is what I mean. >> really is no way around having a way to instruct the driver "please use >> these flags for the next N frames you send" - which is what >> release_buffered_frames() does. What you're suggesting is basically >> turning off this 'pull mode' for the frames buffered during PS and have >> mac80211 revert to push mode for those, right? But then you lose the >> benefits of pull mode (the TXQs) for those frames. > > I just want to give those back to mac80211, those frames were already > in push mode anyway. Gotcha. >> I remember Johannes talking about a 'shim layer' between the mac80211 >> TXQs and the 'drv_tx()' hook as a way to bring the benefits of the TXQs >> to the 'long tail' of simple drivers that don't do any internal >> buffering anyway, without having to change the drivers to use 'pull >> mode'. Am I wrong in thinking that mwl8k may be a good candidate for >> such a layer? From glancing through the existing driver it looks like >> it's mostly just taking each frame, wrapping it in a HW descriptor, and >> sticking it on a TX ring? > > maybe with the current firmware interface, but with the new one > aggregation is done on host side, so tx path is no more that simple. Right, OK. Is this just a different firmware that's generally available, or is it a new thing? I am generally a fan of moving logic out of the firmware like this... -Toke ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-05 15:20 ` Maxime Bizon 2020-05-05 16:50 ` Toke Høiland-Jørgensen @ 2020-05-25 9:47 ` Johannes Berg 2020-05-26 10:33 ` Maxime Bizon 1 sibling, 1 reply; 10+ messages in thread From: Johannes Berg @ 2020-05-25 9:47 UTC (permalink / raw) To: Maxime Bizon, Toke Høiland-Jørgensen; +Cc: linux-wireless On Tue, 2020-05-05 at 17:20 +0200, Maxime Bizon wrote: > > > Also .release_buffered_frames() codepath is difficult to test, how do > > > you trigger that reliably ? assuming VIF is an AP, then you need the > > > remote STA to go to sleep even though you have traffic waiting for it > > > in the txqi. For now I patch the stack to introduce artificial delay. > > > > > > Since my hardware has a sticky per-STA PS filter with good status > > > reporting, I'm considering using ieee80211_sta_block_awake() and only > > > unblock STA when all its txqi are empty to get rid of > > > .release_buffered_frames() complexity. > > > > I'm probably not the right person to answer that; never did have a good > > grip on the details of PS support. > > Hopefully Felix or Johannes will see this. > > Just wondering if there is anything I'm missing, this alternative way > of doing looks easier to me because it removes "knowledge" of frame > delivery under service period from the driver: This stuff is a mess. I had a plan once to just rip it all out and combine it all with the TXQs, but not only is it hard to test, we've also offloaded this stuff to the firmware for our devices, so motivation is pretty low. > 1) first get rid of buffered txq traffic when entering PS: > > --- a/net/mac80211/rx.c > +++ b/net/mac80211/rx.c > @@ -1593,6 +1593,15 @@ static void sta_ps_start(struct sta_info *sta) > list_del_init(&txqi->schedule_order); > spin_unlock(&local->active_txq_lock[txq->ac]); > > - if (txq_has_queue(txq)) > - set_bit(tid, &sta->txq_buffered_tids); > - else > - clear_bit(tid, &sta->txq_buffered_tids); > + /* transfer txq into tx_filtered frames */ > + spin_lock(&fq->lock); > + while ((skb = skb_dequeue(&txq->frags))) > + skb_queue_tail(&sta->tx_filtered[txq->ac], skb); > + /* use something more efficient like fq_tin_reset */ > + while ((skb = fq_tin_dequeue(fq, tin, fq_tin_dequeue_func))) > + skb_queue_tail(&sta->tx_filtered[txq->ac], skb); > + spin_unlock_bh(&fq->lock); > + > > 2) driver register for STA_NOTIFY_SLEEP > > 3) driver count tx frames pending in the hardware per STA and sets > ieee80211_sta_block_awake(sta, 1) when > 0 > > 4) on tx completion, if STA is sleeping and number of pending tx frames in hardware for a > given STA reaches 0: > - if driver buffers other frames for this STA, release them with TX_FILTERED in reverse order > - calls ieee80211_sta_block_awake(false) > > what do you think ? I really don't know, off the top of my head, sorry. johannes ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Regarding .wake_tx_queue() model 2020-05-25 9:47 ` Johannes Berg @ 2020-05-26 10:33 ` Maxime Bizon 0 siblings, 0 replies; 10+ messages in thread From: Maxime Bizon @ 2020-05-26 10:33 UTC (permalink / raw) To: Johannes Berg; +Cc: Toke Høiland-Jørgensen, linux-wireless On Monday 25 May 2020 à 11:47:01 (+0200), Johannes Berg wrote: > This stuff is a mess. I had a plan once to just rip it all out and > combine it all with the TXQs, but not only is it hard to test, we've > also offloaded this stuff to the firmware for our devices, so motivation > is pretty low. I understand. I have softmac type of devices (ath9k type, host side aggregation) that I'd like to keep updating for awhile, so I'd rather have a helping stack. If anyone here has the experience, time and testing capabilities to do this big rework/cleanup, I may be able to get this funded, drop me an email. -- Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-05-26 10:39 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-05-04 19:39 Regarding .wake_tx_queue() model Maxime Bizon 2020-05-05 12:06 ` Toke Høiland-Jørgensen 2020-05-05 13:15 ` Maxime Bizon 2020-05-05 13:53 ` Toke Høiland-Jørgensen 2020-05-05 15:20 ` Maxime Bizon 2020-05-05 16:50 ` Toke Høiland-Jørgensen 2020-05-05 17:49 ` Maxime Bizon 2020-05-05 20:49 ` Toke Høiland-Jørgensen 2020-05-25 9:47 ` Johannes Berg 2020-05-26 10:33 ` Maxime Bizon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).