From: Felix Fietkau <nbd@openwrt.org>
To: Emmanuel Grumbach <egrumbach@gmail.com>
Cc: linux-wireless <linux-wireless@vger.kernel.org>,
Johannes Berg <johannes@sipsolutions.net>
Subject: Re: [RFC v2] mac80211: add A-MSDU tx support
Date: Sun, 7 Feb 2016 11:48:53 +0100 [thread overview]
Message-ID: <56B72115.10709@openwrt.org> (raw)
In-Reply-To: <CANUX_P0QkRhSpdxrd9jAobCX9E=TED55=YaTOJ6QwwuJQHfTqA@mail.gmail.com>
On 2016-02-07 11:22, Emmanuel Grumbach wrote:
> On Sun, Feb 7, 2016 at 12:06 PM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>> On Sun, Feb 7, 2016 at 11:08 AM, Felix Fietkau <nbd@openwrt.org> wrote:
>>> On 2016-02-07 08:25, Emmanuel Grumbach wrote:
>>>> On Fri, Feb 5, 2016 at 12:41 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>>>>> Requires software tx queueing support. frag_list support (for zero-copy)
>>>>> is optional.
>>>>>
>>>>> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
>>>>
>>>> Looks nice!
>>>> This would allow us to create aggregates of TCP Acks, the problem is
>>>> that when you are mostly receiving data, the hardware queues are
>>>> pretty much empty (nothing besides the TCP Acks which should go out
>>>> quickly) so that packets don't pile up in the software queues and
>>>> hence you don't have enough material to build an A-MSDU.
>>>> I guess that for AP oriented devices, this is ideal solution since you
>>>> can't rely on TSO (packets are not locally generated) and this allows
>>>> to build an A-MSDU without adding more latency since you build an
>>>> A-MSDU with packets that are already in the queue waiting instead of
>>>> delaying transmission of the first packet.
>>>> IIRC, the latter was the approach chose by the new Marvell driver
>>>> posted a few weeks ago. This approach is better in my eyes.
>>>> For iwlwifi which is much more station oriented (of GO which is
>>>> basically an AP with locally generated traffic), I took the TSO
>>>> approach. I guess we could try to change iwlwifi to use your tx
>>>> queues, and check how that works. This would allow us to have A-MSDU
>>>> on bridged traffic as well, although this use case is much less common
>>>> for Intel devices.
>>
>>
>>> Can the iwlwifi firmware maintain per-sta per-tid queues? Because that
>>> way you would get the most benefits from using that tx queueing
>>> infrastructure.
>>
>> Well... iwlwifi and athXk are very different. iwlwifi really has the
>> concept of queues and not flat descriptors.
>> Any Tx descriptor lives in the context of a Tx queue which is
>> per-sta-per tid when A-MPDUs is enabled.
>> We are now moving to a scheme in which we will have per-sta-per-tid
>> queue regardless of the A-MPDU state which will make much sense to tie
>> to the tx queueing infrastructure.
>> Thing is that in that case I am afraid we will not have enough packets
>> in the software tx queue to get A-MSDUs from your code. with TSO, it
>> is easier :) Still worth trying to work with this instead of TSO and
>> see how it goes. That won't happen anytime soon though.
>>
>>>
>>>>> +
>>>>> +static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
>>>>> + struct sta_info *sta,
>>>>> + struct ieee80211_fast_tx *fast_tx,
>>>>> + struct sk_buff *skb)
>>>>> +{
>>>>> + struct ieee80211_local *local = sdata->local;
>>>>> + u8 tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK;
>>>>> + struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>> + struct txq_info *txqi;
>>>>> + struct sk_buff **frag_tail, *head;
>>>>> + int subframe_len = skb->len - ETH_ALEN;
>>>>> + int max_amsdu_len;
>>>>> + __be16 len;
>>>>> + void *data;
>>>>> + bool ret = false;
>>>>> + int n = 1;
>>>>> +
>>>>> + if (!ieee80211_hw_check(&local->hw, TX_AMSDU))
>>>>> + return false;
>>>>> +
>>>>> + if (!txq)
>>>>> + return false;
>>>>> +
>>>>> + txqi = to_txq_info(txq);
>>>>> + if (test_bit(IEEE80211_TXQ_NO_AMSDU, &txqi->flags))
>>>>> + return false;
>>>>> +
>>>>> + /*
>>>>> + * A-MPDU limits maximum MPDU size to 4095 bytes. Since aggregation
>>>>> + * sessions are started/stopped without txq flush, use the limit here
>>>>> + * to avoid having to de-aggregate later.
>>>>> + */
>>>>> + max_amsdu_len = min_t(int, sta->sta.max_amsdu_len, 4095);
>>>>
>>>> So you can't get 10K A-MSDUs? I don't see where you check that you
>>>> have an A-MPDU session here. You seem to be applying the 4095 limit
>>>> also for streams that are not an A-MPDU?
>>>> I guess you could check if the sta is a VHT peer, in that case, no
>>>> limit applies.
>>> The explanation for the missing A-MPDU change is in that comment -
>>> checking for an active A-MPDU session would make it unnecessarily complex.
>>> Good point about checking for VHT capabilities to remove this limit, I
>>> will add that.
>
> Yes - I read the comment, but it seemed very sub-optimal to limit all
> the A-MSDUs to 4K. With TSO I can get up to 10K and it really helps
> TPT.
This was built with the assumption that most scenarios use A-MPDU anyway
and thus don't need really large A-MSDUs.
> One more point. In VHT, there may be a limit on the numbers of
> subframes in the A-MSDU. I don't see you handle that. Maybe I missed
> it?
I haven't looked at that much yet. Right now the driver can only specify
a limit for the number of subframes.
> And... in case the driver doesn't handle frag_list, you linearize the
> skb which is pretty much the only thing you can do at this stage. But,
> when you'll lift the 4095 bytes limit, you'll get 11K A-MSDU,
> linarizing such a long packet is really putting the memory manager
> under pressure.
I added no-frag_list support primarily for debugging purposes, it's not
supposed to perform well.
> This is an order 4 allocation, for each A-MSDU. Note
> that iwlwifi (and probably other drivers) can handle gather DMA in Tx,
> but they have a limited number of frags they can handle. iwlwifi e.g.
> can handle up to 20 frags, but 3 are taken for "paperwork". You'll
> have 2 frags per subframe at least (assuming that each subframe's
> payload is nicely contiguous and not on a page boundary). I think that
> it may be worthwhile to ask the driver how many frags it is supposed
> to handle. I can't promise iwlwifi will use it, but I guess it will be
> useful for someone.
You mean an extra frag limit in addition to the driver subframe limit,
in case individual subframes are fragmented as well?
- Felix
next prev parent reply other threads:[~2016-02-07 10:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-05 10:41 [RFC v2] mac80211: add A-MSDU tx support Felix Fietkau
2016-02-07 7:25 ` Emmanuel Grumbach
2016-02-07 9:08 ` Felix Fietkau
2016-02-07 10:06 ` Emmanuel Grumbach
2016-02-07 10:22 ` Emmanuel Grumbach
2016-02-07 10:48 ` Felix Fietkau [this message]
2016-02-07 11:32 ` Emmanuel Grumbach
2016-02-07 11:49 ` Felix Fietkau
2016-02-07 11:56 ` Emmanuel Grumbach
2016-02-07 13:21 ` Felix Fietkau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56B72115.10709@openwrt.org \
--to=nbd@openwrt.org \
--cc=egrumbach@gmail.com \
--cc=johannes@sipsolutions.net \
--cc=linux-wireless@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.