* Re: [PATCH net-next 0/2] tools: add bpftool
From: Jakub Kicinski @ 2017-09-27 0:44 UTC (permalink / raw)
To: David Ahern
Cc: netdev, daniel, alexei.starovoitov, davem, hannes, oss-drivers
In-Reply-To: <5522855a-937f-b2cb-4c74-3448d1680b10@gmail.com>
On Tue, 26 Sep 2017 17:32:31 -0600, David Ahern wrote:
> On 9/26/17 9:35 AM, Jakub Kicinski wrote:
> > I'm looking for a home for bpftool, Daniel suggested that
> > tools/net could be a good place, since there are only BPF
> > utilities there already.
> >
> > The tool should be complete for simple use cases and we
> > will continue extending it as we go along. E.g. providing
> > disassembly of loaded programs directly using LLVM library
> > and JSON output are high on the priority list.
>
> I have found this to be a very useful tool. Thanks for working on it.
> Moving it into the kernel will make it easier to build since it relies
> on libbpf and other files from the kernel tree.
>
> One change I have made locally is to link against libbpf.a. That way I
> only need to copy one file to a system to use it.
Thanks! I made the same change here, this patchset will have bpftool
linked against libbpf statically.
^ permalink raw reply
* Re: [PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()
From: Jason Wang @ 2017-09-27 0:38 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization, netdev, linux-kernel, kvm
In-Reply-To: <20170926170047-mutt-send-email-mst@kernel.org>
On 2017年09月27日 03:13, Michael S. Tsirkin wrote:
> On Fri, Sep 22, 2017 at 04:02:33PM +0800, Jason Wang wrote:
>> This patch introduces a helper which just increase the used idx. This
>> will be used in pair with vhost_prefetch_desc_indices() by batching
>> code.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/vhost.c | 33 +++++++++++++++++++++++++++++++++
>> drivers/vhost/vhost.h | 1 +
>> 2 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 8424166d..6532cda 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -2178,6 +2178,39 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
>> }
>> EXPORT_SYMBOL_GPL(vhost_add_used);
>>
>> +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n)
>> +{
>> + u16 old, new;
>> +
>> + old = vq->last_used_idx;
>> + new = (vq->last_used_idx += n);
>> + /* If the driver never bothers to signal in a very long while,
>> + * used index might wrap around. If that happens, invalidate
>> + * signalled_used index we stored. TODO: make sure driver
>> + * signals at least once in 2^16 and remove this.
>> + */
>> + if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old)))
>> + vq->signalled_used_valid = false;
>> +
>> + /* Make sure buffer is written before we update index. */
>> + smp_wmb();
>> + if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
>> + &vq->used->idx)) {
>> + vq_err(vq, "Failed to increment used idx");
>> + return -EFAULT;
>> + }
>> + if (unlikely(vq->log_used)) {
>> + /* Log used index update. */
>> + log_write(vq->log_base,
>> + vq->log_addr + offsetof(struct vring_used, idx),
>> + sizeof(vq->used->idx));
>> + if (vq->log_ctx)
>> + eventfd_signal(vq->log_ctx, 1);
>> + }
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(vhost_add_used_idx);
>> +
>> static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>> struct vring_used_elem *heads,
>> unsigned count)
>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> index 16c2cb6..5dd6c05 100644
>> --- a/drivers/vhost/vhost.h
>> +++ b/drivers/vhost/vhost.h
>> @@ -199,6 +199,7 @@ int __vhost_get_vq_desc(struct vhost_virtqueue *vq,
>> void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
>>
>> int vhost_vq_init_access(struct vhost_virtqueue *);
>> +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n);
>> int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
>> int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
>> unsigned count);
> Please change the API to hide the fact that there's an index that needs
> to be updated.
In fact, an interesting optimization on top is just call
vhost_add_used_idx(vq, n) instead of n vhost_add_used_idx(vq, 1). That's
the reason I leave n in the API.
Thanks
>
>> --
>> 2.7.4
^ permalink raw reply
* Re: [PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index
From: Jason Wang @ 2017-09-27 0:35 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization, netdev, linux-kernel, kvm
In-Reply-To: <20170926221435-mutt-send-email-mst@kernel.org>
On 2017年09月27日 03:19, Michael S. Tsirkin wrote:
> On Fri, Sep 22, 2017 at 04:02:32PM +0800, Jason Wang wrote:
>> This patch introduces vhost_prefetch_desc_indices() which could batch
>> descriptor indices fetching and used ring updating. This intends to
>> reduce the cache misses of indices fetching and updating and reduce
>> cache line bounce when virtqueue is almost full. copy_to_user() was
>> used in order to benefit from modern cpus that support fast string
>> copy. Batched virtqueue processing will be the first user.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/vhost.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> drivers/vhost/vhost.h | 3 +++
>> 2 files changed, 58 insertions(+)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index f87ec75..8424166d 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev,
>> }
>> EXPORT_SYMBOL_GPL(vhost_dequeue_msg);
>>
>> +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,
>> + struct vring_used_elem *heads,
>> + u16 num, bool used_update)
> why do you need to combine used update with prefetch?
For better performance and I believe we don't care about the overhead
when we meet errors in tx.
>
>> +{
>> + int ret, ret2;
>> + u16 last_avail_idx, last_used_idx, total, copied;
>> + __virtio16 avail_idx;
>> + struct vring_used_elem __user *used;
>> + int i;
>> +
>> + if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) {
>> + vq_err(vq, "Failed to access avail idx at %p\n",
>> + &vq->avail->idx);
>> + return -EFAULT;
>> + }
>> + last_avail_idx = vq->last_avail_idx & (vq->num - 1);
>> + vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>> + total = vq->avail_idx - vq->last_avail_idx;
>> + ret = total = min(total, num);
>> +
>> + for (i = 0; i < ret; i++) {
>> + ret2 = vhost_get_avail(vq, heads[i].id,
>> + &vq->avail->ring[last_avail_idx]);
>> + if (unlikely(ret2)) {
>> + vq_err(vq, "Failed to get descriptors\n");
>> + return -EFAULT;
>> + }
>> + last_avail_idx = (last_avail_idx + 1) & (vq->num - 1);
>> + }
>> +
>> + if (!used_update)
>> + return ret;
>> +
>> + last_used_idx = vq->last_used_idx & (vq->num - 1);
>> + while (total) {
>> + copied = min((u16)(vq->num - last_used_idx), total);
>> + ret2 = vhost_copy_to_user(vq,
>> + &vq->used->ring[last_used_idx],
>> + &heads[ret - total],
>> + copied * sizeof(*used));
>> +
>> + if (unlikely(ret2)) {
>> + vq_err(vq, "Failed to update used ring!\n");
>> + return -EFAULT;
>> + }
>> +
>> + last_used_idx = 0;
>> + total -= copied;
>> + }
>> +
>> + /* Only get avail ring entries after they have been exposed by guest. */
>> + smp_rmb();
> Barrier before return is a very confusing API. I guess it's designed to
> be used in a specific way to make it necessary - but what is it?
Looks like a and we need do this after reading avail_idx.
Thanks
>
>
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(vhost_prefetch_desc_indices);
>>
>> static int __init vhost_init(void)
>> {
>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> index 39ff897..16c2cb6 100644
>> --- a/drivers/vhost/vhost.h
>> +++ b/drivers/vhost/vhost.h
>> @@ -228,6 +228,9 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to,
>> ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
>> struct iov_iter *from);
>> int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled);
>> +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,
>> + struct vring_used_elem *heads,
>> + u16 num, bool used_update);
>>
>> #define vq_err(vq, fmt, ...) do { \
>> pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \
>> --
>> 2.7.4
^ permalink raw reply
* Re: [PATCH net-next RFC 0/5] batched tx processing in vhost_net
From: Jason Wang @ 2017-09-27 0:27 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization, netdev, linux-kernel, kvm
In-Reply-To: <20170926164055-mutt-send-email-mst@kernel.org>
On 2017年09月26日 21:45, Michael S. Tsirkin wrote:
> On Fri, Sep 22, 2017 at 04:02:30PM +0800, Jason Wang wrote:
>> Hi:
>>
>> This series tries to implement basic tx batched processing. This is
>> done by prefetching descriptor indices and update used ring in a
>> batch. This intends to speed up used ring updating and improve the
>> cache utilization.
> Interesting, thanks for the patches. So IIUC most of the gain is really
> overcoming some of the shortcomings of virtio 1.0 wrt cache utilization?
Yes.
Actually, looks like batching in 1.1 is not as easy as in 1.0.
In 1.0, we could do something like:
batch update used ring by user copy_to_user()
smp_wmb()
update used_idx
In 1.1, we need more memory barriers, can't benefit from fast copy helpers?
for () {
update desc.addr
smp_wmb()
update desc.flag
}
>
> Which is fair enough (1.0 is already deployed) but I would like to avoid
> making 1.1 support harder, and this patchset does this unfortunately,
I think the new APIs do not expose more internal data structure of
virtio than before? (vq->heads has already been used by vhost_net for
years). Consider the layout is re-designed completely, I don't see an
easy method to reuse current 1.0 API for 1.1.
> see comments on individual patches. I'm sure it can be addressed though.
>
>> Test shows about ~22% improvement in tx pss.
> Is this with or without tx napi in guest?
MoonGen is used in guest for better numbers.
Thanks
>
>> Please review.
>>
>> Jason Wang (5):
>> vhost: split out ring head fetching logic
>> vhost: introduce helper to prefetch desc index
>> vhost: introduce vhost_add_used_idx()
>> vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH
>> vhost_net: basic tx virtqueue batched processing
>>
>> drivers/vhost/net.c | 221 ++++++++++++++++++++++++++++----------------------
>> drivers/vhost/vhost.c | 165 +++++++++++++++++++++++++++++++------
>> drivers/vhost/vhost.h | 9 ++
>> 3 files changed, 270 insertions(+), 125 deletions(-)
>>
>> --
>> 2.7.4
^ permalink raw reply
* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
From: Yuchung Cheng @ 2017-09-27 0:18 UTC (permalink / raw)
To: Roman Gushchin
Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
linux-kernel@vger.kernel.org
In-Reply-To: <CAK6E8=eBZ6XhRg7ihoQ_2=4bTk1RSdxT2zJ_Z7-4X-HzNeaiQQ@mail.gmail.com>
On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng <ycheng@google.com> wrote:
> On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>>> >
>>> > > Hello.
>>> > >
>>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
>>> > > warning shown below. Most of the time it is harmless, but rarely it just
>>> > > causes either freeze or (I believe, this is related too) panic in
>>> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
>>> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
>>> > > to capture it if possible.
>>> > >
>>> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
>>> > > used to shape traffic with tc.
>>> > >
>>> > > Please note this regression was already reported as BZ [1] and as a letter to
>>> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
>>> > > only) me on my home router since v4.11 till v4.13.1 incl.
>>> > >
>>> > > Please advise on how to deal with it. I'll provide any additional info if
>>> > > necessary, also ready to test patches if any.
>>> > >
>>> > > Thanks.
>>> > >
>>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>>> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=
>>> >
>>> > We're experiencing the same problems on some machines in our fleet.
>>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>>> > sometimes panics in tcp_sacktag_walk().
>>> >
>>> > Here is an example of a backtrace with the panic log:
>>
>> Hi Yuchung!
>>
>>> do you still see the panics if you disable RACK?
>>> sysctl net.ipv4.tcp_recovery=0?
>>
>> No, we haven't seen any crash since that.
> I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
> take an empty skb :-( Do you have stack trace or any hint on which call
> to tcp-sacktag_walk triggered the panic? internally at Google we never
> see that.
hmm something just struck me: could you try
sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
and see if kernel still panics on sack processing?
>
>
>>
>>>
>>> also have you experience any sack reneg? could you post the output of
>>> ' nstat |grep -i TCP' thanks
>>
>> hostname TcpActiveOpens 2289680 0.0
>> hostname TcpPassiveOpens 3592758 0.0
>> hostname TcpAttemptFails 746910 0.0
>> hostname TcpEstabResets 154988 0.0
>> hostname TcpInSegs 16258678255 0.0
>> hostname TcpOutSegs 46967011611 0.0
>> hostname TcpRetransSegs 13724310 0.0
>> hostname TcpInErrs 2 0.0
>> hostname TcpOutRsts 9418798 0.0
>> hostname TcpExtEmbryonicRsts 2303 0.0
>> hostname TcpExtPruneCalled 90192 0.0
>> hostname TcpExtOfoPruned 57274 0.0
>> hostname TcpExtOutOfWindowIcmps 3 0.0
>> hostname TcpExtTW 1164705 0.0
>> hostname TcpExtTWRecycled 2 0.0
>> hostname TcpExtPAWSEstab 159 0.0
>> hostname TcpExtDelayedACKs 209207209 0.0
>> hostname TcpExtDelayedACKLocked 508571 0.0
>> hostname TcpExtDelayedACKLost 1713248 0.0
>> hostname TcpExtListenOverflows 625 0.0
>> hostname TcpExtListenDrops 625 0.0
>> hostname TcpExtTCPHPHits 9341188489 0.0
>> hostname TcpExtTCPPureAcks 1434646465 0.0
>> hostname TcpExtTCPHPAcks 5733614672 0.0
>> hostname TcpExtTCPSackRecovery 3261698 0.0
>> hostname TcpExtTCPSACKReneging 12203 0.0
>> hostname TcpExtTCPSACKReorder 433189 0.0
>> hostname TcpExtTCPTSReorder 22694 0.0
>> hostname TcpExtTCPFullUndo 45092 0.0
>> hostname TcpExtTCPPartialUndo 22016 0.0
>> hostname TcpExtTCPLossUndo 2150040 0.0
>> hostname TcpExtTCPLostRetransmit 60119 0.0
>> hostname TcpExtTCPSackFailures 2626782 0.0
>> hostname TcpExtTCPLossFailures 182999 0.0
>> hostname TcpExtTCPFastRetrans 4334275 0.0
>> hostname TcpExtTCPSlowStartRetrans 3453348 0.0
>> hostname TcpExtTCPTimeouts 1070997 0.0
>> hostname TcpExtTCPLossProbes 2633545 0.0
>> hostname TcpExtTCPLossProbeRecovery 941647 0.0
>> hostname TcpExtTCPSackRecoveryFail 336302 0.0
>> hostname TcpExtTCPRcvCollapsed 461354 0.0
>> hostname TcpExtTCPAbortOnData 349196 0.0
>> hostname TcpExtTCPAbortOnClose 3395 0.0
>> hostname TcpExtTCPAbortOnTimeout 51201 0.0
>> hostname TcpExtTCPMemoryPressures 2 0.0
>> hostname TcpExtTCPSpuriousRTOs 2120503 0.0
>> hostname TcpExtTCPSackShifted 2613736 0.0
>> hostname TcpExtTCPSackMerged 21358743 0.0
>> hostname TcpExtTCPSackShiftFallback 8769387 0.0
>> hostname TcpExtTCPBacklogDrop 5 0.0
>> hostname TcpExtTCPRetransFail 843 0.0
>> hostname TcpExtTCPRcvCoalesce 949068035 0.0
>> hostname TcpExtTCPOFOQueue 470118 0.0
>> hostname TcpExtTCPOFODrop 9915 0.0
>> hostname TcpExtTCPOFOMerge 9 0.0
>> hostname TcpExtTCPChallengeACK 90 0.0
>> hostname TcpExtTCPSYNChallenge 3 0.0
>> hostname TcpExtTCPFastOpenActive 2089 0.0
>> hostname TcpExtTCPSpuriousRtxHostQueues 896596 0.0
>> hostname TcpExtTCPAutoCorking 547386735 0.0
>> hostname TcpExtTCPFromZeroWindowAdv 28757 0.0
>> hostname TcpExtTCPToZeroWindowAdv 28761 0.0
>> hostname TcpExtTCPWantZeroWindowAdv 322431 0.0
>> hostname TcpExtTCPSynRetrans 3026 0.0
>> hostname TcpExtTCPOrigDataSent 40976870977 0.0
>> hostname TcpExtTCPHystartTrainDetect 453920 0.0
>> hostname TcpExtTCPHystartTrainCwnd 11586273 0.0
>> hostname TcpExtTCPHystartDelayDetect 10943 0.0
>> hostname TcpExtTCPHystartDelayCwnd 763554 0.0
>> hostname TcpExtTCPACKSkippedPAWS 30 0.0
>> hostname TcpExtTCPACKSkippedSeq 218 0.0
>> hostname TcpExtTCPWinProbe 2408 0.0
>> hostname TcpExtTCPKeepAlive 213768 0.0
>> hostname TcpExtTCPMTUPFail 69 0.0
>> hostname TcpExtTCPMTUPSuccess 8811 0.0
>>
>> Thanks!
^ permalink raw reply
* Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c
From: Yuchung Cheng @ 2017-09-27 0:12 UTC (permalink / raw)
To: Roman Gushchin
Cc: Oleksandr Natalenko, Hideaki YOSHIFUJI, Alexey Kuznetsov, netdev,
linux-kernel@vger.kernel.org
In-Reply-To: <20170926131011.GB26395@castle.DHCP.thefacebook.com>
On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin <guro@fb.com> wrote:
>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin <guro@fb.com> wrote:
>> >
>> > > Hello.
>> > >
>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting in the
>> > > warning shown below. Most of the time it is harmless, but rarely it just
>> > > causes either freeze or (I believe, this is related too) panic in
>> > > tcp_sacktag_walk() (because sk_buff passed to this function is NULL).
>> > > Unfortunately, I still do not have proper stacktrace from panic, but will try
>> > > to capture it if possible.
>> > >
>> > > Also, I have custom settings regarding TCP stack, shown below as well. ifb is
>> > > used to shape traffic with tc.
>> > >
>> > > Please note this regression was already reported as BZ [1] and as a letter to
>> > > ML [2], but got neither attention nor resolution. It is reproducible for (not
>> > > only) me on my home router since v4.11 till v4.13.1 incl.
>> > >
>> > > Please advise on how to deal with it. I'll provide any additional info if
>> > > necessary, also ready to test patches if any.
>> > >
>> > > Thanks.
>> > >
>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>> > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s=-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=
>> >
>> > We're experiencing the same problems on some machines in our fleet.
>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>> > sometimes panics in tcp_sacktag_walk().
>> >
>> > Here is an example of a backtrace with the panic log:
>
> Hi Yuchung!
>
>> do you still see the panics if you disable RACK?
>> sysctl net.ipv4.tcp_recovery=0?
>
> No, we haven't seen any crash since that.
I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
take an empty skb :-( Do you have stack trace or any hint on which call
to tcp-sacktag_walk triggered the panic? internally at Google we never
see that.
>
>>
>> also have you experience any sack reneg? could you post the output of
>> ' nstat |grep -i TCP' thanks
>
> hostname TcpActiveOpens 2289680 0.0
> hostname TcpPassiveOpens 3592758 0.0
> hostname TcpAttemptFails 746910 0.0
> hostname TcpEstabResets 154988 0.0
> hostname TcpInSegs 16258678255 0.0
> hostname TcpOutSegs 46967011611 0.0
> hostname TcpRetransSegs 13724310 0.0
> hostname TcpInErrs 2 0.0
> hostname TcpOutRsts 9418798 0.0
> hostname TcpExtEmbryonicRsts 2303 0.0
> hostname TcpExtPruneCalled 90192 0.0
> hostname TcpExtOfoPruned 57274 0.0
> hostname TcpExtOutOfWindowIcmps 3 0.0
> hostname TcpExtTW 1164705 0.0
> hostname TcpExtTWRecycled 2 0.0
> hostname TcpExtPAWSEstab 159 0.0
> hostname TcpExtDelayedACKs 209207209 0.0
> hostname TcpExtDelayedACKLocked 508571 0.0
> hostname TcpExtDelayedACKLost 1713248 0.0
> hostname TcpExtListenOverflows 625 0.0
> hostname TcpExtListenDrops 625 0.0
> hostname TcpExtTCPHPHits 9341188489 0.0
> hostname TcpExtTCPPureAcks 1434646465 0.0
> hostname TcpExtTCPHPAcks 5733614672 0.0
> hostname TcpExtTCPSackRecovery 3261698 0.0
> hostname TcpExtTCPSACKReneging 12203 0.0
> hostname TcpExtTCPSACKReorder 433189 0.0
> hostname TcpExtTCPTSReorder 22694 0.0
> hostname TcpExtTCPFullUndo 45092 0.0
> hostname TcpExtTCPPartialUndo 22016 0.0
> hostname TcpExtTCPLossUndo 2150040 0.0
> hostname TcpExtTCPLostRetransmit 60119 0.0
> hostname TcpExtTCPSackFailures 2626782 0.0
> hostname TcpExtTCPLossFailures 182999 0.0
> hostname TcpExtTCPFastRetrans 4334275 0.0
> hostname TcpExtTCPSlowStartRetrans 3453348 0.0
> hostname TcpExtTCPTimeouts 1070997 0.0
> hostname TcpExtTCPLossProbes 2633545 0.0
> hostname TcpExtTCPLossProbeRecovery 941647 0.0
> hostname TcpExtTCPSackRecoveryFail 336302 0.0
> hostname TcpExtTCPRcvCollapsed 461354 0.0
> hostname TcpExtTCPAbortOnData 349196 0.0
> hostname TcpExtTCPAbortOnClose 3395 0.0
> hostname TcpExtTCPAbortOnTimeout 51201 0.0
> hostname TcpExtTCPMemoryPressures 2 0.0
> hostname TcpExtTCPSpuriousRTOs 2120503 0.0
> hostname TcpExtTCPSackShifted 2613736 0.0
> hostname TcpExtTCPSackMerged 21358743 0.0
> hostname TcpExtTCPSackShiftFallback 8769387 0.0
> hostname TcpExtTCPBacklogDrop 5 0.0
> hostname TcpExtTCPRetransFail 843 0.0
> hostname TcpExtTCPRcvCoalesce 949068035 0.0
> hostname TcpExtTCPOFOQueue 470118 0.0
> hostname TcpExtTCPOFODrop 9915 0.0
> hostname TcpExtTCPOFOMerge 9 0.0
> hostname TcpExtTCPChallengeACK 90 0.0
> hostname TcpExtTCPSYNChallenge 3 0.0
> hostname TcpExtTCPFastOpenActive 2089 0.0
> hostname TcpExtTCPSpuriousRtxHostQueues 896596 0.0
> hostname TcpExtTCPAutoCorking 547386735 0.0
> hostname TcpExtTCPFromZeroWindowAdv 28757 0.0
> hostname TcpExtTCPToZeroWindowAdv 28761 0.0
> hostname TcpExtTCPWantZeroWindowAdv 322431 0.0
> hostname TcpExtTCPSynRetrans 3026 0.0
> hostname TcpExtTCPOrigDataSent 40976870977 0.0
> hostname TcpExtTCPHystartTrainDetect 453920 0.0
> hostname TcpExtTCPHystartTrainCwnd 11586273 0.0
> hostname TcpExtTCPHystartDelayDetect 10943 0.0
> hostname TcpExtTCPHystartDelayCwnd 763554 0.0
> hostname TcpExtTCPACKSkippedPAWS 30 0.0
> hostname TcpExtTCPACKSkippedSeq 218 0.0
> hostname TcpExtTCPWinProbe 2408 0.0
> hostname TcpExtTCPKeepAlive 213768 0.0
> hostname TcpExtTCPMTUPFail 69 0.0
> hostname TcpExtTCPMTUPSuccess 8811 0.0
>
> Thanks!
^ permalink raw reply
* [PATCH net 8/9] net/8390: Fix redundant code
From: Finn Thain @ 2017-09-27 0:07 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, linux-kernel, linux-arm-kernel, Russell King
In-Reply-To: <cover.1506470623.git.fthain@telegraphics.com.au>
The patch which introduced the 8390 core module parameter 'msg_enable'
failed to do anything useful with it: it merely causes an ancient
version string to be logged.
Remove the other code that logs the same string. Use the msg_enable
module parameter as the default value for ei_local->msg_enable.
Otherwise, some 8390 modules have no way to set ei_local->msg_enable.
Also fix two more issues arising from the same patch: indentation
mistakes and pointless static variables.
Fixes: c45f812f0280 ("8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature")
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
drivers/net/ethernet/8390/ax88796.c | 3 ---
drivers/net/ethernet/8390/axnet_cs.c | 2 --
drivers/net/ethernet/8390/etherh.c | 17 -----------------
drivers/net/ethernet/8390/hydra.c | 4 ----
drivers/net/ethernet/8390/lib8390.c | 2 ++
drivers/net/ethernet/8390/mac8390.c | 7 -------
drivers/net/ethernet/8390/mcf8390.c | 4 ----
drivers/net/ethernet/8390/pcnet_cs.c | 4 ----
drivers/net/ethernet/8390/zorro8390.c | 5 -----
9 files changed, 2 insertions(+), 46 deletions(-)
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index 05d9d3e2e92e..28aa79d2f16c 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -77,8 +77,6 @@ static unsigned char version[] = "ax88796.c: Copyright 2005,2007 Simtec Electron
#define AX_GPOC_PPDSET BIT(6)
-static u32 ax_msg_enable;
-
/* device private data */
struct ax_device {
@@ -747,7 +745,6 @@ static int ax_init_dev(struct net_device *dev)
ei_local->block_output = &ax_block_output;
ei_local->get_8390_hdr = &ax_get_8390_hdr;
ei_local->priv = 0;
- ei_local->msg_enable = ax_msg_enable;
dev->netdev_ops = &ax_netdev_ops;
dev->ethtool_ops = &ax_ethtool_ops;
diff --git a/drivers/net/ethernet/8390/axnet_cs.c b/drivers/net/ethernet/8390/axnet_cs.c
index 3da1fc539ef9..91e76dc1e6e1 100644
--- a/drivers/net/ethernet/8390/axnet_cs.c
+++ b/drivers/net/ethernet/8390/axnet_cs.c
@@ -104,7 +104,6 @@ static void AX88190_init(struct net_device *dev, int startp);
static int ax_open(struct net_device *dev);
static int ax_close(struct net_device *dev);
static irqreturn_t ax_interrupt(int irq, void *dev_id);
-static u32 axnet_msg_enable;
/*====================================================================*/
@@ -151,7 +150,6 @@ static int axnet_probe(struct pcmcia_device *link)
return -ENOMEM;
ei_local = netdev_priv(dev);
- ei_local->msg_enable = axnet_msg_enable;
spin_lock_init(&ei_local->page_lock);
info = PRIV(dev);
diff --git a/drivers/net/ethernet/8390/etherh.c b/drivers/net/ethernet/8390/etherh.c
index 11cbf22ad201..32e9627e3880 100644
--- a/drivers/net/ethernet/8390/etherh.c
+++ b/drivers/net/ethernet/8390/etherh.c
@@ -64,8 +64,6 @@ static char version[] =
#include "lib8390.c"
-static u32 etherh_msg_enable;
-
struct etherh_priv {
void __iomem *ioc_fast;
void __iomem *memc;
@@ -502,18 +500,6 @@ etherh_close(struct net_device *dev)
}
/*
- * Initialisation
- */
-
-static void __init etherh_banner(void)
-{
- static int version_printed;
-
- if ((etherh_msg_enable & NETIF_MSG_DRV) && (version_printed++ == 0))
- pr_info("%s", version);
-}
-
-/*
* Read the ethernet address string from the on board rom.
* This is an ascii string...
*/
@@ -671,8 +657,6 @@ etherh_probe(struct expansion_card *ec, const struct ecard_id *id)
struct etherh_priv *eh;
int ret;
- etherh_banner();
-
ret = ecard_request_resources(ec);
if (ret)
goto out;
@@ -757,7 +741,6 @@ etherh_probe(struct expansion_card *ec, const struct ecard_id *id)
ei_local->block_output = etherh_block_output;
ei_local->get_8390_hdr = etherh_get_header;
ei_local->interface_num = 0;
- ei_local->msg_enable = etherh_msg_enable;
etherh_reset(dev);
__NS8390_init(dev, 0);
diff --git a/drivers/net/ethernet/8390/hydra.c b/drivers/net/ethernet/8390/hydra.c
index 8ae249195301..941754ea78ec 100644
--- a/drivers/net/ethernet/8390/hydra.c
+++ b/drivers/net/ethernet/8390/hydra.c
@@ -66,7 +66,6 @@ static void hydra_block_input(struct net_device *dev, int count,
static void hydra_block_output(struct net_device *dev, int count,
const unsigned char *buf, int start_page);
static void hydra_remove_one(struct zorro_dev *z);
-static u32 hydra_msg_enable;
static struct zorro_device_id hydra_zorro_tbl[] = {
{ ZORRO_PROD_HYDRA_SYSTEMS_AMIGANET },
@@ -119,7 +118,6 @@ static int hydra_init(struct zorro_dev *z)
int start_page, stop_page;
int j;
int err;
- struct ei_device *ei_local;
static u32 hydra_offsets[16] = {
0x00, 0x02, 0x04, 0x06, 0x08, 0x0a, 0x0c, 0x0e,
@@ -138,8 +136,6 @@ static int hydra_init(struct zorro_dev *z)
start_page = NESM_START_PG;
stop_page = NESM_STOP_PG;
- ei_local = netdev_priv(dev);
- ei_local->msg_enable = hydra_msg_enable;
dev->base_addr = ioaddr;
dev->irq = IRQ_AMIGA_PORTS;
diff --git a/drivers/net/ethernet/8390/lib8390.c b/drivers/net/ethernet/8390/lib8390.c
index 60f8e2c8e726..5d9bbde9fe68 100644
--- a/drivers/net/ethernet/8390/lib8390.c
+++ b/drivers/net/ethernet/8390/lib8390.c
@@ -975,6 +975,8 @@ static void ethdev_setup(struct net_device *dev)
ether_setup(dev);
spin_lock_init(&ei_local->page_lock);
+
+ ei_local->msg_enable = msg_enable;
}
/**
diff --git a/drivers/net/ethernet/8390/mac8390.c b/drivers/net/ethernet/8390/mac8390.c
index 9497f18eaba0..1bfc66f37971 100644
--- a/drivers/net/ethernet/8390/mac8390.c
+++ b/drivers/net/ethernet/8390/mac8390.c
@@ -167,7 +167,6 @@ static void slow_sane_block_output(struct net_device *dev, int count,
const unsigned char *buf, int start_page);
static void word_memcpy_tocard(unsigned long tp, const void *fp, int count);
static void word_memcpy_fromcard(void *tp, unsigned long fp, int count);
-static u32 mac8390_msg_enable;
static enum mac8390_type __init mac8390_ident(struct nubus_dev *dev)
{
@@ -297,8 +296,6 @@ static bool __init mac8390_init(struct net_device *dev, struct nubus_dev *ndev,
int offset;
volatile unsigned short *i;
- printk_once(KERN_INFO pr_fmt("%s"), version);
-
dev->irq = SLOT2IRQ(ndev->board->slot);
/* This is getting to be a habit */
dev->base_addr = (ndev->board->slot_addr |
@@ -396,7 +393,6 @@ struct net_device * __init mac8390_probe(int unit)
struct net_device *dev;
struct nubus_dev *ndev = NULL;
int err = -ENODEV;
- struct ei_device *ei_local;
static unsigned int slots;
@@ -436,9 +432,6 @@ struct net_device * __init mac8390_probe(int unit)
if (!ndev)
goto out;
- ei_local = netdev_priv(dev);
- ei_local->msg_enable = mac8390_msg_enable;
-
err = register_netdev(dev);
if (err)
goto out;
diff --git a/drivers/net/ethernet/8390/mcf8390.c b/drivers/net/ethernet/8390/mcf8390.c
index 4bb967bc879e..4ad8031ab669 100644
--- a/drivers/net/ethernet/8390/mcf8390.c
+++ b/drivers/net/ethernet/8390/mcf8390.c
@@ -38,7 +38,6 @@ static const char version[] =
#define NESM_START_PG 0x40 /* First page of TX buffer */
#define NESM_STOP_PG 0x80 /* Last page +1 of RX ring */
-static u32 mcf8390_msg_enable;
#ifdef NE2000_ODDOFFSET
/*
@@ -407,7 +406,6 @@ static int mcf8390_init(struct net_device *dev)
static int mcf8390_probe(struct platform_device *pdev)
{
struct net_device *dev;
- struct ei_device *ei_local;
struct resource *mem, *irq;
resource_size_t msize;
int ret;
@@ -435,8 +433,6 @@ static int mcf8390_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, &pdev->dev);
platform_set_drvdata(pdev, dev);
- ei_local = netdev_priv(dev);
- ei_local->msg_enable = mcf8390_msg_enable;
dev->irq = irq->start;
dev->base_addr = mem->start;
diff --git a/drivers/net/ethernet/8390/pcnet_cs.c b/drivers/net/ethernet/8390/pcnet_cs.c
index bd0a2a14b649..a81ffe4874e1 100644
--- a/drivers/net/ethernet/8390/pcnet_cs.c
+++ b/drivers/net/ethernet/8390/pcnet_cs.c
@@ -66,7 +66,6 @@
#define PCNET_RDC_TIMEOUT (2*HZ/100) /* Max wait in jiffies for Tx RDC */
static const char *if_names[] = { "auto", "10baseT", "10base2"};
-static u32 pcnet_msg_enable;
/*====================================================================*/
@@ -556,7 +555,6 @@ static int pcnet_config(struct pcmcia_device *link)
int start_pg, stop_pg, cm_offset;
int has_shmem = 0;
struct hw_info *local_hw_info;
- struct ei_device *ei_local;
dev_dbg(&link->dev, "pcnet_config\n");
@@ -606,8 +604,6 @@ static int pcnet_config(struct pcmcia_device *link)
mii_phy_probe(dev);
SET_NETDEV_DEV(dev, &link->dev);
- ei_local = netdev_priv(dev);
- ei_local->msg_enable = pcnet_msg_enable;
if (register_netdev(dev) != 0) {
pr_notice("register_netdev() failed\n");
diff --git a/drivers/net/ethernet/8390/zorro8390.c b/drivers/net/ethernet/8390/zorro8390.c
index 6d93956b293b..35a500a21521 100644
--- a/drivers/net/ethernet/8390/zorro8390.c
+++ b/drivers/net/ethernet/8390/zorro8390.c
@@ -44,8 +44,6 @@
static const char version[] =
"8390.c:v1.10cvs 9/23/94 Donald Becker (becker@cesdis.gsfc.nasa.gov)\n";
-static u32 zorro8390_msg_enable;
-
#include "lib8390.c"
#define DRV_NAME "zorro8390"
@@ -296,7 +294,6 @@ static int zorro8390_init(struct net_device *dev, unsigned long board,
int err;
unsigned char SA_prom[32];
int start_page, stop_page;
- struct ei_device *ei_local = netdev_priv(dev);
static u32 zorro8390_offsets[16] = {
0x00, 0x02, 0x04, 0x06, 0x08, 0x0a, 0x0c, 0x0e,
0x10, 0x12, 0x14, 0x16, 0x18, 0x1a, 0x1c, 0x1e,
@@ -388,8 +385,6 @@ static int zorro8390_init(struct net_device *dev, unsigned long board,
dev->netdev_ops = &zorro8390_netdev_ops;
__NS8390_init(dev, 0);
- ei_local->msg_enable = zorro8390_msg_enable;
-
err = register_netdev(dev);
if (err) {
free_irq(IRQ_AMIGA_PORTS, dev);
--
2.13.5
^ permalink raw reply related
* Re: [PATCH net-next 0/5] net: dsa: use generic slave phydev
From: Florian Fainelli @ 2017-09-26 23:55 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170926211535.21273-1-vivien.didelot@savoirfairelinux.com>
On 09/26/2017 02:15 PM, Vivien Didelot wrote:
> DSA currently stores a phy_device pointer in each slave private
> structure. This requires to implement our own ethtool ksettings
> accessors and such.
>
> This patchset removes the private phy_device in favor of the one
> provided in the net_device structure, and thus allows us to use the
> generic phy_ethtool_* functions.
For this series:
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
On bcm_sf2 (7445 and 7278) along with the externally attached BCM53125
switch that needs the special MDIO read/write divert. We properly attach
to the right PHY devices in all cases.
Also tested unbind/bind, working correctly.
Thanks!
>
> Vivien Didelot (5):
> net: dsa: return -ENODEV is there is no slave PHY
> net: dsa: use slave device phydev
> net: dsa: use phy_ethtool_get_link_ksettings
> net: dsa: use phy_ethtool_set_link_ksettings
> net: dsa: use phy_ethtool_nway_reset
>
> net/dsa/dsa_priv.h | 1 -
> net/dsa/slave.c | 143 +++++++++++++++++++----------------------------------
> 2 files changed, 52 insertions(+), 92 deletions(-)
>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next 2/5] net: dsa: use slave device phydev
From: Florian Fainelli @ 2017-09-26 23:54 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170926211535.21273-3-vivien.didelot@savoirfairelinux.com>
On 09/26/2017 02:15 PM, Vivien Didelot wrote:
> There is no need to store a phy_device in dsa_slave_priv since
> net_device already provides one. Simply s/p->phy/dev->phydev/.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH iproute2] Add information about COLORFGBG to ip.8 man page
From: Roland Hopferwieser @ 2017-09-26 23:46 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170920175633.55ee67c9@xeon-e3>
[-- Attachment #1: Type: text/plain, Size: 103 bytes --]
> Your patch was damaged by the mailer you used.
> Please fix and resubmit.
Sorry, now as attachment.
[-- Attachment #2: iproute2-Add-information-about-COLORFGBG-to-ip.8-man-page.patch --]
[-- Type: text/x-patch, Size: 487 bytes --]
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index ae018fdf..2a27a56e 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -187,7 +187,8 @@ executes specified command over all objects, it depends if command supports this
.TP
.BR "\-c" , " -color"
-Use color output.
+Use color output. The color palette is affected by the COLORFGBG environment variable, which typically has the form "fg;bg".
+If "bg" is set to 0-6 or 8, the dark color palette is used.
.TP
.BR "\-t" , " \-timestamp"
^ permalink raw reply related
* [PATCH iproute2 v3] lib: json_print: rework 'new_json_obj' drop FILE* argument
From: Julien Fortin @ 2017-09-26 23:45 UTC (permalink / raw)
To: netdev; +Cc: roopa, nikolay, dsa, Julien Fortin
From: Julien Fortin <julien@cumulusnetworks.com>
As Stephen Hemminger mentioned on the last submission the new_json_obj
function is always called with fp == stdout, so right now, there's no
need of this extra argument.
The background for the rework is the following:
The ip monitor didn't call `new_json_obj` (even for in non json context),
so the static FILE* _fp variable wasn't initialized, thus raising a
SIGSEGV in ipaddress.c. This patch should fix this issue for good, new
paths won't have to call `new_json_obj`.
How to reproduce:
$ ip -t mon label link
(gdb) bt
.#0 _IO_vfprintf_internal (s=s@entry=0x0, format=format@entry=0x45460d “%d: “, ap=ap@entry=0x7fffffff7f18) at vfprintf.c:1278
.#1 0x0000000000451310 in color_fprintf (fp=0x0, attr=<optimized out>, fmt=0x45460d “%d: “) at color.c:108
.#2 0x000000000044a856 in print_color_int (t=t@entry=PRINT_ANY, color=color@entry=4294967295, key=key@entry=0x4545fc “ifindex”,
fmt=fmt@entry=0x45460d “%d: “, value=<optimized out>) at ip_print.c:132
.#3 0x000000000040ccd2 in print_int (value=<optimized out>, fmt=0x45460d “%d: “, key=0x4545fc “ifindex”, t=PRINT_ANY) at ip_common.h:189
.#4 print_linkinfo (who=<optimized out>, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipaddress.c:1107
.#5 0x0000000000422e13 in accept_msg (who=0x7fffffff8320, ctrl=0x7fffffff8310, n=0x7fffffffa380, arg=0x7ffff77a82a0 <_IO_2_1_stdout_>) at ipmonitor.c:89
.#6 0x000000000044c58f in rtnl_listen (rtnl=0x672160 <rth>, handler=handler@entry=0x422c70 <accept_msg>, jarg=0x7ffff77a82a0 <_IO_2_1_stdout_>)
at libnetlink.c:761
.#7 0x00000000004233db in do_ipmonitor (argc=<optimized out>, argv=0x7fffffffe5a0) at ipmonitor.c:310
.#8 0x0000000000408f74 in do_cmd (argv0=0x7fffffffe7f5 “mon”, argc=3, argv=0x7fffffffe588) at ip.c:116
.#9 0x0000000000408a94 in main (argc=4, argv=0x7fffffffe580) at ip.c:311
Fixes: 6377572f ("ip: ip_print: add new API to print JSON or regular format output")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
include/json_print.h | 4 +---
ip/ipaddress.c | 4 ++--
lib/json_print.c | 31 ++++++++++---------------------
3 files changed, 13 insertions(+), 26 deletions(-)
diff --git a/include/json_print.h b/include/json_print.h
index 44cf5ac5..b6ce1f9f 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -29,13 +29,11 @@ enum output_type {
PRINT_ANY = 4,
};
-void new_json_obj(int json, FILE *fp);
+void new_json_obj(int json);
void delete_json_obj(void);
bool is_json_context(void);
-void set_current_fp(FILE *fp);
-
void fflush_fp(void);
void open_json_object(const char *str);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index b8bc387a..9e9a7e0a 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1815,7 +1815,7 @@ static int ipaddr_showdump(void)
if (ipadd_dump_check_magic())
exit(-1);
- new_json_obj(json, stdout);
+ new_json_obj(json);
open_json_object(NULL);
open_json_array(PRINT_JSON, "addr_info");
@@ -2176,7 +2176,7 @@ static int ipaddr_list_flush_or_save(int argc, char **argv, int action)
* Initialize a json_writer and open an array object
* if -json was specified.
*/
- new_json_obj(json, stdout);
+ new_json_obj(json);
/*
* If only filter_dev present and none of the other
diff --git a/lib/json_print.c b/lib/json_print.c
index 93b4119d..aa527af6 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -16,15 +16,14 @@
#include "json_print.h"
static json_writer_t *_jw;
-static FILE *_fp;
#define _IS_JSON_CONTEXT(type) ((type & PRINT_JSON || type & PRINT_ANY) && _jw)
#define _IS_FP_CONTEXT(type) (!_jw && (type & PRINT_FP || type & PRINT_ANY))
-void new_json_obj(int json, FILE *fp)
+void new_json_obj(int json)
{
if (json) {
- _jw = jsonw_new(fp);
+ _jw = jsonw_new(stdout);
if (!_jw) {
perror("json object");
exit(1);
@@ -32,7 +31,6 @@ void new_json_obj(int json, FILE *fp)
jsonw_pretty(_jw, true);
jsonw_start_array(_jw);
}
- set_current_fp(fp);
}
void delete_json_obj(void)
@@ -48,15 +46,6 @@ bool is_json_context(void)
return _jw != NULL;
}
-void set_current_fp(FILE *fp)
-{
- if (!fp) {
- fprintf(stderr, "Error: invalid file pointer.\n");
- exit(1);
- }
- _fp = fp;
-}
-
json_writer_t *get_json_writer(void)
{
return _jw;
@@ -89,7 +78,7 @@ void open_json_array(enum output_type type, const char *str)
jsonw_name(_jw, str);
jsonw_start_array(_jw);
} else if (_IS_FP_CONTEXT(type)) {
- fprintf(_fp, "%s", str);
+ printf("%s", str);
}
}
@@ -103,7 +92,7 @@ void close_json_array(enum output_type type, const char *str)
jsonw_end_array(_jw);
jsonw_pretty(_jw, true);
} else if (_IS_FP_CONTEXT(type)) {
- fprintf(_fp, "%s", str);
+ printf("%s", str);
}
}
@@ -124,7 +113,7 @@ void close_json_array(enum output_type type, const char *str)
else \
jsonw_##type_name##_field(_jw, key, value); \
} else if (_IS_FP_CONTEXT(t)) { \
- color_fprintf(_fp, color, fmt, value); \
+ color_fprintf(stdout, color, fmt, value); \
} \
}
_PRINT_FUNC(int, int);
@@ -147,7 +136,7 @@ void print_color_string(enum output_type type,
else
jsonw_string_field(_jw, key, value);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, value);
+ color_fprintf(stdout, color, fmt, value);
}
}
@@ -168,7 +157,7 @@ void print_color_bool(enum output_type type,
else
jsonw_bool(_jw, value);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, value ? "true" : "false");
+ color_fprintf(stdout, color, fmt, value ? "true" : "false");
}
}
@@ -187,7 +176,7 @@ void print_color_0xhex(enum output_type type,
snprintf(b1, sizeof(b1), "%#x", hex);
print_string(PRINT_JSON, key, NULL, b1);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, hex);
+ color_fprintf(stdout, color, fmt, hex);
}
}
@@ -206,7 +195,7 @@ void print_color_hex(enum output_type type,
else
jsonw_string(_jw, b1);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, hex);
+ color_fprintf(stdout, color, fmt, hex);
}
}
@@ -226,6 +215,6 @@ void print_color_null(enum output_type type,
else
jsonw_null(_jw);
} else if (_IS_FP_CONTEXT(type)) {
- color_fprintf(_fp, color, fmt, value);
+ color_fprintf(stdout, color, fmt, value);
}
}
--
2.14.1
^ permalink raw reply related
* [iproute2 net-next 3/3] man: Add initial manpage for tc-cbs(8)
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
In-Reply-To: <20170926233958.12027-1-vinicius.gomes@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
man/man8/tc-cbs.8 | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 100 insertions(+)
create mode 100644 man/man8/tc-cbs.8
diff --git a/man/man8/tc-cbs.8 b/man/man8/tc-cbs.8
new file mode 100644
index 00000000..e84c5495
--- /dev/null
+++ b/man/man8/tc-cbs.8
@@ -0,0 +1,100 @@
+.TH CBS 8 "18 Sept 2017" "iproute2" "Linux"
+.SH NAME
+CBS \- Credit Based Shaper (CBS) Qdisc
+.SH SYNOPSIS
+.B tc qdisc ... dev
+dev
+.B parent
+classid
+.B [ handle
+major:
+.B ] cbs idleslope
+idleslope
+.B sendslope
+sendslope
+.B hicredit
+hicredit
+.B locredit
+locredit
+
+.SH DESCRIPTION
+The CBS (Credit Based Shaper) qdisc implements the shaping algorithm
+defined by the IEEE 802.1Q-2014 Section 8.6.8.2, which applies a well
+defined rate limiting method to the traffic.
+
+This queueing discipline is intended to be used by TSN (Time Sensitive
+Networking) applications, the CBS parameters are derived directly by
+what is described by the Annex L of the IEEE 802.1Q-2014
+Sepcification. The algorithm and how it affects the latency are
+detailed there.
+
+CBS is meant to be installed under another qdisc that maps packet
+flows to traffic classes, one example is
+.BR mqprio(8).
+
+.SH PARAMETERS
+.TP
+idleslope
+Idleslope is the rate of credits that is accumulated (in kilobits per
+second) when there is at least one packet waiting for transmission.
+Packets are transmitted when the current value of credits is equal or
+greater than zero. When there is no packet to be transmitted the
+amount of credits is set to zero. This is the main tunable of the CBS
+algorithm.
+.TP
+sendslope
+Sendslope is the rate of credits that is depleted (it should be a
+negative number of kilobits per second) when a transmission is
+ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section
+8.6.8.2 item g):
+
+sendslope = idleslope - port_transmit_rate
+
+.TP
+hicredit
+Hicredit defines the maximum amount of credits (in bytes) that can be
+accumulated. Hicredit depends on the characteristics of interfering
+traffic, 'max_interference_size' is the maximum size of any burst of
+traffic that can delay the transmission of a frame that is available
+for transmission for this traffic class, (IEEE 802.1Q-2014 Annex L,
+Equation L-3):
+
+hicredit = max_interference_size * (idleslope / port_transmit_rate)
+
+.TP
+locredit
+Locredit is the minimum amount of credits that can be reached. It is a
+function of the traffic flowing through this qdisc (IEEE 802.1Q-2014
+Annex L, Equation L-2):
+
+locredit = max_frame_size * (sendslope / port_transmit_rate)
+
+.SH EXAMPLES
+
+CBS is used to enforce a Quality of Service by limiting the data rate
+of a traffic class, to separate packets into traffic classes the user
+may choose
+.BR mqprio(8),
+and configure it like this:
+
+.EX
+# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 3 \\
+ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
+ queues 1@0 1@1 2@2 \\
+ hw 0
+.EE
+.P
+To replace the current queuing disciple by CBS in the current queueing
+discipline connected to traffic class number 0, issue:
+.P
+.EX
+# tc qdisc replace dev eth0 parent 100:4 cbs \\
+ locredit -1470 hicredit 30 sendslope -980000 idleslope 20000
+.EE
+
+These values are obtained from the following parameters, idleslope is
+20mbit/s, the transmission rate is 1Gbit/s and the maximum interfering
+frame size is 1500 bytes.
+
+.SH AUTHORS
+Vinicius Costa Gomes <vinicius.gomes@intel.com>
--
2.14.2
^ permalink raw reply related
* [iproute2 net-next 2/3] tc: Add support for the CBS qdisc
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
In-Reply-To: <20170926233958.12027-1-vinicius.gomes@intel.com>
The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).
The syntax is:
tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
hicredit <HICREDIT> sendslope <SENDSLOPE>
idleslope <IDLESLOPE>
(The order is not important)
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
tc/Makefile | 1 +
tc/q_cbs.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 135 insertions(+)
create mode 100644 tc/q_cbs.c
diff --git a/tc/Makefile b/tc/Makefile
index 777de5e6..24bd3e2e 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -69,6 +69,7 @@ TCMODULES += q_hhf.o
TCMODULES += q_clsact.o
TCMODULES += e_bpf.o
TCMODULES += f_matchall.o
+TCMODULES += q_cbs.o
TCSO :=
ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_cbs.c b/tc/q_cbs.c
new file mode 100644
index 00000000..80dd599a
--- /dev/null
+++ b/tc/q_cbs.c
@@ -0,0 +1,134 @@
+/*
+ * q_cbs.c CBS.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: Vinicius Costa Gomes <vinicius.gomes@intel.com>
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+ fprintf(stderr, "Usage: ... cbs hicredit BYTES locredit BYTES sendslope BPS idleslope BPS\n");
+}
+
+static void explain1(const char *arg, const char *val)
+{
+ fprintf(stderr, "cbs: illegal value for \"%s\": \"%s\"\n", arg, val);
+}
+
+static int cbs_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nlmsghdr *n)
+{
+ int ok = 0;
+ struct tc_cbs_qopt opt = {};
+ struct rtattr *tail;
+
+ while (argc > 0) {
+ if (matches(*argv, "hicredit") == 0) {
+ NEXT_ARG();
+ if (opt.hicredit) {
+ fprintf(stderr, "cbs: duplicate \"hicredit\" specification\n");
+ return -1;
+ }
+ if (get_s32(&opt.hicredit, *argv, 0)) {
+ explain1("hicredit", *argv);
+ return -1;
+ }
+ ok++;
+ } else if (matches(*argv, "locredit") == 0) {
+ NEXT_ARG();
+ if (opt.locredit) {
+ fprintf(stderr, "cbs: duplicate \"locredit\" specification\n");
+ return -1;
+ }
+ if (get_s32(&opt.locredit, *argv, 0)) {
+ explain1("locredit", *argv);
+ return -1;
+ }
+ ok++;
+ } else if (matches(*argv, "sendslope") == 0) {
+ NEXT_ARG();
+ if (opt.sendslope) {
+ fprintf(stderr, "cbs: duplicate \"sendslope\" specification\n");
+ return -1;
+ }
+ if (get_s32(&opt.sendslope, *argv, 0)) {
+ explain1("sendslope", *argv);
+ return -1;
+ }
+ ok++;
+ } else if (matches(*argv, "idleslope") == 0) {
+ NEXT_ARG();
+ if (opt.idleslope) {
+ fprintf(stderr, "cbs: duplicate \"idleslope\" specification\n");
+ return -1;
+ }
+ if (get_s32(&opt.idleslope, *argv, 0)) {
+ explain1("idleslope", *argv);
+ return -1;
+ }
+ ok++;
+ } else if (strcmp(*argv, "help") == 0) {
+ explain();
+ return -1;
+ } else {
+ fprintf(stderr, "cbs: unknown parameter \"%s\"\n", *argv);
+ explain();
+ return -1;
+ }
+ argc--; argv++;
+ }
+
+ tail = NLMSG_TAIL(n);
+ addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+ addattr_l(n, 2024, TCA_CBS_PARMS, &opt, sizeof(opt));
+ tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+ return 0;
+}
+
+static int cbs_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+ struct rtattr *tb[TCA_CBS_MAX+1];
+ struct tc_cbs_qopt *qopt;
+
+ if (opt == NULL)
+ return 0;
+
+ parse_rtattr_nested(tb, TCA_CBS_MAX, opt);
+
+ if (tb[TCA_CBS_PARMS] == NULL)
+ return -1;
+
+ qopt = RTA_DATA(tb[TCA_CBS_PARMS]);
+ if (RTA_PAYLOAD(tb[TCA_CBS_PARMS]) < sizeof(*qopt))
+ return -1;
+
+ fprintf(f, "hicredit %d ", qopt->hicredit);
+ fprintf(f, "locredit %d ", qopt->locredit);
+ fprintf(f, "sendslope %d ", qopt->sendslope);
+ fprintf(f, "idleslope %d ", qopt->idleslope);
+
+ return 0;
+}
+
+struct qdisc_util cbs_qdisc_util = {
+ .id = "cbs",
+ .parse_qopt = cbs_parse_opt,
+ .print_qopt = cbs_print_opt,
+};
--
2.14.2
^ permalink raw reply related
* [iproute2 net-next 1/3] update headers with CBS API
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
include/linux/pkt_sched.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 099bf552..27c849c0 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -871,4 +871,21 @@ struct tc_pie_xstats {
__u32 maxq; /* maximum queue size */
__u32 ecn_mark; /* packets marked with ecn*/
};
+
+/* CBS */
+struct tc_cbs_qopt {
+ __s32 hicredit;
+ __s32 locredit;
+ __s32 idleslope;
+ __s32 sendslope;
+};
+
+enum {
+ TCA_CBS_UNSPEC,
+ TCA_CBS_PARMS,
+ __TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
#endif
--
2.14.2
^ permalink raw reply related
* [next-queue PATCH 3/3] igb: Add support for CBS offload
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Andre Guedes, jhs, xiyou.wangcong, jiri, ivan.briano,
jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik
In-Reply-To: <20170926233916.11774-1-vinicius.gomes@intel.com>
From: Andre Guedes <andre.guedes@intel.com>
This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.
The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.
FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.
FQTSS feature is supported by i210 controller only.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
---
drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++
drivers/net/ethernet/intel/igb/e1000_regs.h | 8 +
drivers/net/ethernet/intel/igb/igb.h | 6 +
drivers/net/ethernet/intel/igb/igb_main.c | 347 +++++++++++++++++++++++++
4 files changed, 384 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 1de82f247312..83cabff1e0ab 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -353,7 +353,18 @@
#define E1000_RXPBS_CFG_TS_EN 0x80000000
#define I210_RXPBSIZE_DEFAULT 0x000000A2 /* RXPBSIZE default */
+#define I210_RXPBSIZE_MASK 0x0000003F
+#define I210_RXPBSIZE_PB_32KB 0x00000020
#define I210_TXPBSIZE_DEFAULT 0x04000014 /* TXPBSIZE default */
+#define I210_TXPBSIZE_MASK 0xC0FFFFFF
+#define I210_TXPBSIZE_PB0_8KB (8 << 0)
+#define I210_TXPBSIZE_PB1_8KB (8 << 6)
+#define I210_TXPBSIZE_PB2_4KB (4 << 12)
+#define I210_TXPBSIZE_PB3_4KB (4 << 18)
+
+#define I210_DTXMXPKTSZ_DEFAULT 0x00000098
+
+#define I210_SR_QUEUES_NUM 2
/* SerDes Control */
#define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400
@@ -1051,4 +1062,16 @@
#define E1000_VLAPQF_P_VALID(_n) (0x1 << (3 + (_n) * 4))
#define E1000_VLAPQF_QUEUE_MASK 0x03
+/* TX Qav Control fields */
+#define E1000_TQAVCTRL_XMIT_MODE BIT(0)
+#define E1000_TQAVCTRL_DATAFETCHARB BIT(4)
+#define E1000_TQAVCTRL_DATATRANARB BIT(8)
+
+/* TX Qav Credit Control fields */
+#define E1000_TQAVCC_IDLESLOPE_MASK 0xFFFF
+#define E1000_TQAVCC_QUEUEMODE BIT(31)
+
+/* Transmit Descriptor Control fields */
+#define E1000_TXDCTL_PRIORITY BIT(27)
+
#endif
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index 58adbf234e07..8eee081d395f 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -421,6 +421,14 @@ do { \
#define E1000_I210_FLA 0x1201C
+#define E1000_I210_DTXMXPKTSZ 0x355C
+
+#define E1000_I210_TXDCTL(_n) (0x0E028 + ((_n) * 0x40))
+
+#define E1000_I210_TQAVCTRL 0x3570
+#define E1000_I210_TQAVCC(_n) (0x3004 + ((_n) * 0x40))
+#define E1000_I210_TQAVHC(_n) (0x300C + ((_n) * 0x40))
+
#define E1000_INVM_DATA_REG(_n) (0x12120 + 4*(_n))
#define E1000_INVM_SIZE 64 /* Number of INVM Data Registers */
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 06ffb2bc713e..92845692087a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -281,6 +281,11 @@ struct igb_ring {
u16 count; /* number of desc. in the ring */
u8 queue_index; /* logical index of the ring*/
u8 reg_idx; /* physical index of the ring */
+ bool cbs_enable; /* indicates if CBS is enabled */
+ s32 idleslope; /* idleSlope in kbps */
+ s32 sendslope; /* sendSlope in kbps */
+ s32 hicredit; /* hiCredit in bytes */
+ s32 locredit; /* loCredit in bytes */
/* everything past this point are written often */
u16 next_to_clean;
@@ -621,6 +626,7 @@ struct igb_adapter {
#define IGB_FLAG_EEE BIT(14)
#define IGB_FLAG_VLAN_PROMISC BIT(15)
#define IGB_FLAG_RX_LEGACY BIT(16)
+#define IGB_FLAG_FQTSS BIT(17)
/* Media Auto Sense */
#define IGB_MAS_ENABLE_0 0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index fd4a46b03cc8..03b8d0f4acfd 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -34,6 +34,7 @@
#include <linux/slab.h>
#include <net/checksum.h>
#include <net/ip6_checksum.h>
+#include <net/pkt_sched.h>
#include <linux/net_tstamp.h>
#include <linux/mii.h>
#include <linux/ethtool.h>
@@ -62,6 +63,17 @@
#define BUILD 0
#define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) "." \
__stringify(BUILD) "-k"
+
+enum queue_mode {
+ QUEUE_MODE_STRICT_PRIORITY,
+ QUEUE_MODE_STREAM_RESERVATION,
+};
+
+enum tx_queue_prio {
+ TX_QUEUE_PRIO_HIGH,
+ TX_QUEUE_PRIO_LOW,
+};
+
char igb_driver_name[] = "igb";
char igb_driver_version[] = DRV_VERSION;
static const char igb_driver_string[] =
@@ -1271,6 +1283,12 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
ring->count = adapter->tx_ring_count;
ring->queue_index = txr_idx;
+ ring->cbs_enable = false;
+ ring->idleslope = 0;
+ ring->sendslope = 0;
+ ring->hicredit = 0;
+ ring->locredit = 0;
+
u64_stats_init(&ring->tx_syncp);
u64_stats_init(&ring->tx_syncp2);
@@ -1598,6 +1616,284 @@ static void igb_get_hw_control(struct igb_adapter *adapter)
ctrl_ext | E1000_CTRL_EXT_DRV_LOAD);
}
+static void enable_fqtss(struct igb_adapter *adapter, bool enable)
+{
+ struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
+
+ WARN_ON(hw->mac.type != e1000_i210);
+
+ if (enable)
+ adapter->flags |= IGB_FLAG_FQTSS;
+ else
+ adapter->flags &= ~IGB_FLAG_FQTSS;
+
+ if (netif_running(netdev))
+ schedule_work(&adapter->reset_task);
+}
+
+static bool is_fqtss_enabled(struct igb_adapter *adapter)
+{
+ return (adapter->flags & IGB_FLAG_FQTSS) ? true : false;
+}
+
+static void set_tx_desc_fetch_prio(struct e1000_hw *hw, int queue,
+ enum tx_queue_prio prio)
+{
+ u32 val;
+
+ WARN_ON(hw->mac.type != e1000_i210);
+ WARN_ON(queue < 0 || queue > 4);
+
+ val = rd32(E1000_I210_TXDCTL(queue));
+
+ if (prio == TX_QUEUE_PRIO_HIGH)
+ val |= E1000_TXDCTL_PRIORITY;
+ else
+ val &= ~E1000_TXDCTL_PRIORITY;
+
+ wr32(E1000_I210_TXDCTL(queue), val);
+}
+
+static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode)
+{
+ u32 val;
+
+ WARN_ON(hw->mac.type != e1000_i210);
+ WARN_ON(queue < 0 || queue > 1);
+
+ val = rd32(E1000_I210_TQAVCC(queue));
+
+ if (mode == QUEUE_MODE_STREAM_RESERVATION)
+ val |= E1000_TQAVCC_QUEUEMODE;
+ else
+ val &= ~E1000_TQAVCC_QUEUEMODE;
+
+ wr32(E1000_I210_TQAVCC(queue), val);
+}
+
+/**
+ * igb_configure_cbs - Configure Credit-Based Shaper (CBS)
+ * @adapter: pointer to adapter struct
+ * @queue: queue number
+ * @enable: true = enable CBS, false = disable CBS
+ * @idleslope: idleSlope in kbps
+ * @sendslope: sendSlope in kbps
+ * @hicredit: hiCredit in bytes
+ * @locredit: loCredit in bytes
+ *
+ * Configure CBS for a given hardware queue. When disabling, idleslope,
+ * sendslope, hicredit, locredit arguments are ignored. Returns 0 if
+ * success. Negative otherwise.
+ **/
+static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
+ bool enable, int idleslope, int sendslope,
+ int hicredit, int locredit)
+{
+ struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
+ u32 tqavcc;
+ u16 value;
+
+ WARN_ON(hw->mac.type != e1000_i210);
+ WARN_ON(queue < 0 || queue > 1);
+
+ if (enable) {
+ set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
+ set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
+
+ /* According to i210 datasheet section 7.2.7.7, we should set
+ * the 'idleSlope' field from TQAVCC register following the
+ * equation:
+ *
+ * For 100 Mbps link speed:
+ *
+ * value = BW * 0x7735 * 0.2 (E1)
+ *
+ * For 1000Mbps link speed:
+ *
+ * value = BW * 0x7735 * 2 (E2)
+ *
+ * E1 and E2 can be merged into one equation as shown below.
+ * Note that 'link-speed' is in Mbps.
+ *
+ * value = BW * 0x7735 * 2 * link-speed
+ * -------------- (E3)
+ * 1000
+ *
+ * 'BW' is the percentage bandwidth out of full link speed
+ * which can be found with the following equation. Note that
+ * idleSlope here is the parameter from this function which
+ * is in kbps.
+ *
+ * BW = idleSlope
+ * ----------------- (E4)
+ * link-speed * 1000
+ *
+ * That said, we can come up with a generic equation to
+ * calculate the value we should set it TQAVCC register by
+ * replacing 'BW' in E3 by E4. The resulting equation is:
+ *
+ * value = idleSlope * 0x7735 * 2 * link-speed
+ * ----------------- -------------- (E5)
+ * link-speed * 1000 1000
+ *
+ * 'link-speed' is present in both sides of the fraction so
+ * it is canceled out. The final equation is the following:
+ *
+ * value = idleSlope * 61034
+ * ----------------- (E6)
+ * 1000000
+ */
+ value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000);
+
+ tqavcc = rd32(E1000_I210_TQAVCC(queue));
+ tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+ tqavcc |= value;
+ wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+ wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735);
+ } else {
+ set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW);
+ set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY);
+
+ /* Set idleSlope to zero. */
+ tqavcc = rd32(E1000_I210_TQAVCC(queue));
+ tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+ wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+ /* Set hiCredit to zero. */
+ wr32(E1000_I210_TQAVHC(queue), 0);
+ }
+
+ /* XXX: In i210 controller the sendSlope and loCredit parameters from
+ * CBS are not configurable by software so we don't do any 'controller
+ * configuration' in respect to these parameters.
+ */
+
+ netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n",
+ (enable) ? "enabled" : "disabled", queue,
+ idleslope, sendslope, hicredit, locredit);
+}
+
+static int igb_save_cbs_params(struct igb_adapter *adapter, int queue,
+ bool enable, int idleslope, int sendslope,
+ int hicredit, int locredit)
+{
+ struct igb_ring *ring;
+
+ if (queue < 0 || queue > adapter->num_tx_queues)
+ return -EINVAL;
+
+ ring = adapter->tx_ring[queue];
+
+ ring->cbs_enable = enable;
+ ring->idleslope = idleslope;
+ ring->sendslope = sendslope;
+ ring->hicredit = hicredit;
+ ring->locredit = locredit;
+
+ return 0;
+}
+
+static bool is_any_cbs_enabled(struct igb_adapter *adapter)
+{
+ struct igb_ring *ring;
+ int i;
+
+ for (i = 0; i < adapter->num_tx_queues; i++) {
+ ring = adapter->tx_ring[i];
+
+ if (ring->cbs_enable)
+ return true;
+ }
+
+ return false;
+}
+
+static void igb_setup_tx_mode(struct igb_adapter *adapter)
+{
+ struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
+ u32 val;
+
+ /* Only i210 controller supports changing the transmission mode. */
+ if (hw->mac.type != e1000_i210)
+ return;
+
+ if (is_fqtss_enabled(adapter)) {
+ int i, max_queue;
+
+ /* Configure TQAVCTRL register: set transmit mode to 'Qav',
+ * set data fetch arbitration to 'round robin' and set data
+ * transfer arbitration to 'credit shaper algorithm.
+ */
+ val = rd32(E1000_I210_TQAVCTRL);
+ val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB;
+ val &= ~E1000_TQAVCTRL_DATAFETCHARB;
+ wr32(E1000_I210_TQAVCTRL, val);
+
+ /* Configure Tx and Rx packet buffers sizes as described in
+ * i210 datasheet section 7.2.7.7.
+ */
+ val = rd32(E1000_TXPBS);
+ val &= ~I210_TXPBSIZE_MASK;
+ val |= I210_TXPBSIZE_PB0_8KB | I210_TXPBSIZE_PB1_8KB |
+ I210_TXPBSIZE_PB2_4KB | I210_TXPBSIZE_PB3_4KB;
+ wr32(E1000_TXPBS, val);
+
+ val = rd32(E1000_RXPBS);
+ val &= ~I210_RXPBSIZE_MASK;
+ val |= I210_RXPBSIZE_PB_32KB;
+ wr32(E1000_RXPBS, val);
+
+ /* Section 8.12.9 states that MAX_TPKT_SIZE from DTXMXPKTSZ
+ * register should not exceed the buffer size programmed in
+ * TXPBS. The smallest buffer size programmed in TXPBS is 4kB
+ * so according to the datasheet we should set MAX_TPKT_SIZE to
+ * 4kB / 64.
+ *
+ * However, when we do so, no frame from queue 2 and 3 are
+ * transmitted. It seems the MAX_TPKT_SIZE should not be great
+ * or _equal_ to the buffer size programmed in TXPBS. For this
+ * reason, we set set MAX_ TPKT_SIZE to (4kB - 1) / 64.
+ */
+ val = (4096 - 1) / 64;
+ wr32(E1000_I210_DTXMXPKTSZ, val);
+
+ /* Since FQTSS mode is enabled, apply any CBS configuration
+ * previously set. If no previous CBS configuration has been
+ * done, then the initial configuration is applied, which means
+ * CBS is disabled.
+ */
+ max_queue = (adapter->num_tx_queues < I210_SR_QUEUES_NUM) ?
+ adapter->num_tx_queues : I210_SR_QUEUES_NUM;
+
+ for (i = 0; i < max_queue; i++) {
+ struct igb_ring *ring = adapter->tx_ring[i];
+
+ igb_configure_cbs(adapter, i, ring->cbs_enable,
+ ring->idleslope, ring->sendslope,
+ ring->hicredit, ring->locredit);
+ }
+ } else {
+ wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT);
+ wr32(E1000_TXPBS, I210_TXPBSIZE_DEFAULT);
+ wr32(E1000_I210_DTXMXPKTSZ, I210_DTXMXPKTSZ_DEFAULT);
+
+ val = rd32(E1000_I210_TQAVCTRL);
+ /* According to Section 8.12.21, the other flags we've set when
+ * enabling FQTSS are not relevant when disabling FQTSS so we
+ * don't set they here.
+ */
+ val &= ~E1000_TQAVCTRL_XMIT_MODE;
+ wr32(E1000_I210_TQAVCTRL, val);
+ }
+
+ netdev_dbg(netdev, "FQTSS %s\n", (is_fqtss_enabled(adapter)) ?
+ "enabled" : "disabled");
+}
+
/**
* igb_configure - configure the hardware for RX and TX
* @adapter: private board structure
@@ -1609,6 +1905,7 @@ static void igb_configure(struct igb_adapter *adapter)
igb_get_hw_control(adapter);
igb_set_rx_mode(netdev);
+ igb_setup_tx_mode(adapter);
igb_restore_vlan(adapter);
@@ -2150,6 +2447,55 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev,
return features;
}
+static int igb_offload_cbs(struct igb_adapter *adapter,
+ struct tc_cbs_qopt_offload *qopt)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ int err;
+
+ /* CBS offloading is only supported by i210 controller. */
+ if (hw->mac.type != e1000_i210)
+ return -EOPNOTSUPP;
+
+ /* CBS offloading is only supported by queue 0 and queue 1. */
+ if (qopt->queue < 0 || qopt->queue > 1)
+ return -EINVAL;
+
+ err = igb_save_cbs_params(adapter, qopt->queue, qopt->enable,
+ qopt->idleslope, qopt->sendslope,
+ qopt->hicredit, qopt->locredit);
+ if (err)
+ return err;
+
+ if (is_fqtss_enabled(adapter)) {
+ igb_configure_cbs(adapter, qopt->queue, qopt->enable,
+ qopt->idleslope, qopt->sendslope,
+ qopt->hicredit, qopt->locredit);
+
+ if (!is_any_cbs_enabled(adapter))
+ enable_fqtss(adapter, false);
+
+ } else {
+ enable_fqtss(adapter, true);
+ }
+
+ return 0;
+}
+
+static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
+ void *type_data)
+{
+ struct igb_adapter *adapter = netdev_priv(dev);
+
+ switch (type) {
+ case TC_SETUP_CBS:
+ return igb_offload_cbs(adapter, type_data);
+
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static const struct net_device_ops igb_netdev_ops = {
.ndo_open = igb_open,
.ndo_stop = igb_close,
@@ -2175,6 +2521,7 @@ static const struct net_device_ops igb_netdev_ops = {
.ndo_set_features = igb_set_features,
.ndo_fdb_add = igb_ndo_fdb_add,
.ndo_features_check = igb_features_check,
+ .ndo_setup_tc = igb_setup_tc,
};
/**
--
2.14.2
^ permalink raw reply related
* [next-queue PATCH 2/3] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
In-Reply-To: <20170926233916.11774-1-vinicius.gomes@intel.com>
This queueing discipline implements the shaper algorithm defined by
the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.
It's primary usage is to apply some bandwidth reservation to user
defined traffic classes, which are mapped to different queues via the
mqprio qdisc.
Initially, it only supports offloading the traffic shaping work to
supporting controllers.
Later, when a software implementation is added, the current dependency
on being installed "under" mqprio can be lifted.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
include/linux/netdevice.h | 1 +
include/net/pkt_sched.h | 9 ++
net/sched/Kconfig | 12 +++
net/sched/Makefile | 1 +
net/sched/sch_cbs.c | 229 ++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 252 insertions(+)
create mode 100644 net/sched/sch_cbs.c
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f535779d9dc1..5d6fb06fd80f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -775,6 +775,7 @@ enum tc_setup_type {
TC_SETUP_CLSFLOWER,
TC_SETUP_CLSMATCHALL,
TC_SETUP_CLSBPF,
+ TC_SETUP_CBS,
};
/* These structures hold the attributes of xdp state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 259bc191ba59..7c597b050b36 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -146,4 +146,13 @@ static inline bool is_classid_clsact_egress(u32 classid)
TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_EGRESS);
}
+struct tc_cbs_qopt_offload {
+ u8 enable;
+ s32 queue;
+ s32 hicredit;
+ s32 locredit;
+ s32 idleslope;
+ s32 sendslope;
+};
+
#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index e70ed26485a2..2dd24d231243 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -172,6 +172,18 @@ config NET_SCH_TBF
To compile this code as a module, choose M here: the
module will be called sch_tbf.
+config NET_SCH_CBS
+ tristate "Credit Based Shaper (CBS)"
+ depends on NET_SCH_MQPRIO
+ ---help---
+ Say Y here if you want to use the Credit Based Shaper (CBS) packet
+ scheduling algorithm.
+
+ See the top of <file:net/sched/sch_cbs.c> for more details.
+
+ To compile this code as a module, choose M here: the
+ module will be called sch_cbs.
+
config NET_SCH_GRED
tristate "Generic Random Early Detection (GRED)"
---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 7b915d226de7..80c8f92d162d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL) += sch_fq_codel.o
obj-$(CONFIG_NET_SCH_FQ) += sch_fq.o
obj-$(CONFIG_NET_SCH_HHF) += sch_hhf.o
obj-$(CONFIG_NET_SCH_PIE) += sch_pie.o
+obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o
obj-$(CONFIG_NET_CLS_U32) += cls_u32.o
obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
new file mode 100644
index 000000000000..6e1b7272d685
--- /dev/null
+++ b/net/sched/sch_cbs.c
@@ -0,0 +1,229 @@
+/*
+ * net/sched/sch_cbs.c Credit Based Shaper
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: Vinicius Costa Gomes <vinicius.gomes@intel.com>
+ *
+ */
+
+/* Credit Based Shaper (CBS)
+ =========================
+
+ This is a simple rate-limiting shaper aimed at TSN applications on
+ systems with known traffic workloads.
+
+ Its algorithm is defined by the IEEE 802.1Q-2014 Specification,
+ Section 8.6.8.2, and explained in more detail in the Annex L of the
+ same specification.
+
+ There are four tunables to be considered:
+
+ 'idleslope': Idleslope is the rate of credits that is
+ accumulated (in kilobits per second) when there is at least
+ one packet waiting for transmission. Packets are transmitted
+ when the current value of credits is equal or greater than
+ zero. When there is no packet to be transmitted the amount of
+ credits is set to zero. This is the main tunable of the CBS
+ algorithm.
+
+ 'sendslope':
+ Sendslope is the rate of credits that is depleted (it should be a
+ negative number of kilobits per second) when a transmission is
+ ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section
+ 8.6.8.2 item g):
+
+ sendslope = idleslope - port_transmit_rate
+
+ 'hicredit': Hicredit defines the maximum amount of credits (in
+ bytes) that can be accumulated. Hicredit depends on the
+ characteristics of interfering traffic,
+ 'max_interference_size' is the maximum size of any burst of
+ traffic that can delay the transmission of a frame that is
+ available for transmission for this traffic class, (IEEE
+ 802.1Q-2014 Annex L, Equation L-3):
+
+ hicredit = max_interference_size * (idleslope / port_transmit_rate)
+
+ 'locredit': Locredit is the minimum amount of credits that can
+ be reached. It is a function of the traffic flowing through
+ this qdisc (IEEE 802.1Q-2014 Annex L, Equation L-2):
+
+ locredit = max_frame_size * (sendslope / port_transmit_rate)
+*/
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <net/sch_generic.h>
+#include <net/pkt_sched.h>
+
+struct cbs_sched_data {
+ s32 queue;
+ s32 locredit;
+ s32 hicredit;
+ s32 sendslope;
+ s32 idleslope;
+};
+
+static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,
+ struct sk_buff **to_free)
+{
+ return qdisc_enqueue_tail(skb, sch);
+}
+
+static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {
+ [TCA_CBS_PARMS] = { .len = sizeof(struct tc_cbs_qopt) },
+};
+
+static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
+{
+ struct cbs_sched_data *q = qdisc_priv(sch);
+ struct tc_cbs_qopt_offload cbs = { };
+ struct nlattr *tb[TCA_CBS_MAX + 1];
+ const struct net_device_ops *ops;
+ struct tc_cbs_qopt *qopt;
+ struct net_device *dev;
+ int err;
+
+ err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
+ if (err < 0)
+ return err;
+
+ err = -EINVAL;
+ if (!tb[TCA_CBS_PARMS])
+ goto done;
+
+ qopt = nla_data(tb[TCA_CBS_PARMS]);
+
+ dev = qdisc_dev(sch);
+ ops = dev->netdev_ops;
+
+ cbs.queue = q->queue;
+ cbs.enable = 1;
+ cbs.hicredit = qopt->hicredit;
+ cbs.locredit = qopt->locredit;
+ cbs.idleslope = qopt->idleslope;
+ cbs.sendslope = qopt->sendslope;
+
+ err = -EOPNOTSUPP;
+ if (!ops->ndo_setup_tc)
+ goto done;
+
+ err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+ if (err < 0)
+ goto done;
+
+ q->hicredit = cbs.hicredit;
+ q->locredit = cbs.locredit;
+ q->idleslope = cbs.idleslope;
+ q->sendslope = cbs.sendslope;
+
+done:
+ return err;
+}
+
+static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
+{
+ struct cbs_sched_data *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+
+ if (!opt)
+ return -EINVAL;
+
+ /* FIXME: this means that we can only install this qdisc
+ * "under" mqprio. Do we need a more generic way to retrieve
+ * the queue, or do we pass the netdev_queue to the driver?
+ */
+ q->queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
+
+ return cbs_change(sch, opt);
+}
+
+static void cbs_destroy(struct Qdisc *sch)
+{
+ struct cbs_sched_data *q = qdisc_priv(sch);
+ struct tc_cbs_qopt_offload cbs = { };
+ const struct net_device_ops *ops;
+ struct net_device *dev;
+ int err;
+
+ q->hicredit = 0;
+ q->locredit = 0;
+ q->idleslope = 0;
+ q->sendslope = 0;
+
+ dev = qdisc_dev(sch);
+ ops = dev->netdev_ops;
+
+ if (!ops->ndo_setup_tc)
+ return;
+
+ cbs.queue = q->queue;
+ cbs.enable = 0;
+
+ err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+ if (err < 0)
+ pr_warn("Couldn't reset queue %d to default values\n",
+ cbs.queue);
+}
+
+static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+ struct cbs_sched_data *q = qdisc_priv(sch);
+ struct nlattr *nest;
+ struct tc_cbs_qopt opt;
+
+ nest = nla_nest_start(skb, TCA_OPTIONS);
+ if (!nest)
+ goto nla_put_failure;
+
+ opt.hicredit = q->hicredit;
+ opt.locredit = q->locredit;
+ opt.sendslope = q->sendslope;
+ opt.idleslope = q->idleslope;
+
+ if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt))
+ goto nla_put_failure;
+
+ return nla_nest_end(skb, nest);
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return -1;
+}
+
+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {
+ .next = NULL,
+ .id = "cbs",
+ .priv_size = sizeof(struct cbs_sched_data),
+ .enqueue = cbs_enqueue,
+ .dequeue = qdisc_dequeue_head,
+ .peek = qdisc_peek_dequeued,
+ .init = cbs_init,
+ .reset = qdisc_reset_queue,
+ .destroy = cbs_destroy,
+ .change = cbs_change,
+ .dump = cbs_dump,
+ .owner = THIS_MODULE,
+};
+
+static int __init cbs_module_init(void)
+{
+ return register_qdisc(&cbs_qdisc_ops);
+}
+
+static void __exit cbs_module_exit(void)
+{
+ unregister_qdisc(&cbs_qdisc_ops);
+}
+module_init(cbs_module_init)
+module_exit(cbs_module_exit)
+MODULE_LICENSE("GPL");
--
2.14.2
^ permalink raw reply related
* [next-queue PATCH 1/3] net/sched: Introduce the user API for the CBS shaper
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
In-Reply-To: <20170926233916.11774-1-vinicius.gomes@intel.com>
Export the API necessary for configuring the CBS shaper (implemented
in the next patch) via the tc tool.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
include/uapi/linux/pkt_sched.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf5528fed..27c849c053cf 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -871,4 +871,21 @@ struct tc_pie_xstats {
__u32 maxq; /* maximum queue size */
__u32 ecn_mark; /* packets marked with ecn*/
};
+
+/* CBS */
+struct tc_cbs_qopt {
+ __s32 hicredit;
+ __s32 locredit;
+ __s32 idleslope;
+ __s32 sendslope;
+};
+
+enum {
+ TCA_CBS_UNSPEC,
+ TCA_CBS_PARMS,
+ __TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
#endif
--
2.14.2
^ permalink raw reply related
* [next-queue PATCH 0/3] TSN: Add qdisc based config interface for CBS
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
Hi,
Changes from the RFC:
- Fixed comments from Henrik Austad;
- Simplified the Qdisc, using the generic implementation of callbacks
where possible;
- Small refactor on the driver (igb) code;
This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.
As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
Overview
========
Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.
The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.
Proposal
========
Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.
For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.
As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:
$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
idleslope I
Note that the parameters for this qdisc are the ones defined by the
802.1Q-2014 spec, so no hardware specific functionality is exposed here.
Testing this RFC
================
Attached to this cover letter are:
- calculate_cbs_params.py: A Python script to calculate the
parameters to the CBS queueing discipline;
- tsn-talker.c: A sample C implementation of the talker side of a stream;
- tsn-listener.c: A sample C implementation of the listener side of a
stream;
For testing the patches of this series, you may want to use the
attached samples to this cover letter and use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs'
qdisc to configure the HW shaper of the i210 controller:
1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev ens4 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
For a more detailed explanation, see mqprio(8), in short, this command
will map traffic with priority 3 to the hardware queue 0, traffic with
priority 2 to hardware queue 1, and the rest will be mapped to
hardware queues 2 and 3.
2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g class show dev ens4
Ex.:
+---(100:3) mqprio
| +---(100:6) mqprio
| +---(100:7) mqprio
|
+---(100:2) mqprio
| +---(100:5) mqprio
|
+---(100:1) mqprio
+---(100:4) mqprio
* Here '100:4' is Tx Queue #0 and '100:5' is Tx Queue #1.
3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
for B is 10Mbps:
$ calc_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500
4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev ens4 parent 100:4 cbs locredit -1470 \
hicredit 30 sendslope -980000 idleslope 20000
5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev ens4 parent 100:5 cbs \
locredit -1485 hicredit 31 sendslope -990000 idleslope 10000
6) Run Listener:
$ ./tsn-listener -d 01:AA:AA:AA:AA:AA -i ens4 -s 1500
7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
$ ./tsn-talker -d 01:AA:AA:AA:AA:AA -i ens4 -p 3 -s 1500
* The bandwidth displayed on the listener output at this stage should be very
close to the one configured for class A.
8) You can also run a Talker for class B (prio 2 here and using a
different address):
$ ./tsn-talker -d 01:BB:BB:BB:BB:BB -i ens4 -s 1500
Known Issues
============
- There is an implicit dependency on how mqprio assigns handles to
hardware queues;
- There is a problem on how mqprio assigns hardware queues to its
children qdiscs. A separated patchset is being worked on to solve
this.
Authors
=======
- Andre Guedes <andre.guedes@intel.com>
- Ivan Briano <ivan.briano@intel.com>
- Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
- Vinicius Gomes <vinicius.gomes@intel.com>
Andre Guedes (1):
igb: Add support for CBS offload
Vinicius Costa Gomes (2):
net/sched: Introduce the user API for the CBS shaper
net/sched: Introduce Credit Based Shaper (CBS) qdisc
drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++
drivers/net/ethernet/intel/igb/e1000_regs.h | 8 +
drivers/net/ethernet/intel/igb/igb.h | 6 +
drivers/net/ethernet/intel/igb/igb_main.c | 347 +++++++++++++++++++++++++
include/linux/netdevice.h | 1 +
include/net/pkt_sched.h | 9 +
include/uapi/linux/pkt_sched.h | 17 ++
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/sch_cbs.c | 229 ++++++++++++++++
10 files changed, 653 insertions(+)
create mode 100644 net/sched/sch_cbs.c
Annex: Sample files
===================
calc_cbs_params.py
--8<---------------cut here---------------start------------->8---
#!/usr/bin/env python
#
# Copyright (c) 2017, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import argparse
import math
def print_cbs_params_for_class_a(args):
idleslope = args.idleslope_a
sendslope = idleslope - args.link_speed
# According to 802.1Q-2014 spec, Annex L, hiCredit and
# loCredit for SR class A are calculated following the
# equations L-10 and L-12, respectively.
hicredit = math.ceil(idleslope * args.frame_non_sr / args.link_speed)
locredit = math.ceil(sendslope * args.frame_a / args.link_speed)
print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
(idleslope, sendslope, hicredit, locredit))
def print_cbs_params_for_class_b(args):
idleslope = args.idleslope_b
sendslope = idleslope - args.link_speed
# Annex L doesn't present a straightforward equation to
# calculate hiCredit for Class B so we have to derive it
# based on generic equations presented in that Annex.
#
# L-3 is the primary equation to calculate hiCredit. Section
# L.2 states that the 'maxInterferenceSize' for SR class B
# is the maximum burst size for SR class A plus the
# maxInterferenceSize from SR class A (which is equal to the
# maximum frame from non-SR traffic).
#
# The maximum burst size for SR class A equation is shown in
# L-16. Merging L-16 into L-3 we get the resulting equation
# which calculates hiCredit B (refer to section L.3 in case
# you're not familiar with the legend):
#
# hiCredit B = Rb * ( Mo Ma )
# ---------- + ------
# Ro - Ra Ro
#
hicredit = math.ceil(idleslope * \
((args.frame_non_sr / (args.link_speed - args.idleslope_a)) + \
(args.frame_a / args.link_speed)))
# loCredit B is calculated following equation L-2.
locredit = math.ceil(sendslope * args.frame_b / args.link_speed)
print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
(idleslope, sendslope, hicredit, locredit))
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-S', dest='link_speed', default=1000000.0, type=float,
help='Link speed in kbps')
parser.add_argument('-s', dest='frame_non_sr', default=1500.0, type=float,
help='Maximum frame size from non-SR traffic (MTU size'
'usually')
parser.add_argument('-A', dest='idleslope_a', default=0, type=float,
help='Idleslope for SR class A in kbps')
parser.add_argument('-a', dest='frame_a', default=0, type=float,
help='Maximum frame size for SR class A traffic')
parser.add_argument('-B', dest='idleslope_b', default=0, type=float,
help='Idleslope for SR class B in kbps')
parser.add_argument('-b', dest='frame_b', default=0, type=float,
help='Maximum frame size for SR class B traffic')
args = parser.parse_args()
if args.idleslope_a > 0:
print_cbs_params_for_class_a(args)
if args.idleslope_b > 0:
print_cbs_params_for_class_b(args)
if __name__ == "__main__":
main()
--8<---------------cut here---------------end--------------->8---
tsn-talker.c
--8<---------------cut here---------------start------------->8---
/*
* Copyright (c) 2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#define MAGIC 0xCC
static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static int priority = -1;
static size_t size = 1500;
static uint64_t seq;
static int delay = -1;
static struct argp_option options[] = {
{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
{"delay", 'D', "NUM", 0, "Delay (in us) between packet transmission" },
{"ifname", 'i', "IFNAME", 0, "Network Interface" },
{"prio", 'p', "NUM", 0, "SO_PRIORITY to be set in socket" },
{"packet-size", 's', "NUM", 0, "Size of packets to be transmitted" },
{ 0 }
};
static error_t parser(int key, char *arg, struct argp_state *state)
{
int res;
switch (key) {
case 'd':
res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&macaddr[0], &macaddr[1], &macaddr[2],
&macaddr[3], &macaddr[4], &macaddr[5]);
if (res != 6) {
printf("Invalid address\n");
exit(EXIT_FAILURE);
}
break;
case 'D':
delay = atoi(arg);
break;
case 'i':
strncpy(ifname, arg, sizeof(ifname) - 1);
break;
case 'p':
priority = atoi(arg);
break;
case 's':
size = atoi(arg);
break;
}
return 0;
}
static struct argp argp = { options, parser };
int main(int argc, char *argv[])
{
int fd, res;
struct ifreq req;
uint8_t *data;
struct sockaddr_ll sk_addr = {
.sll_family = AF_PACKET,
.sll_protocol = htons(ETH_P_TSN),
.sll_halen = ETH_ALEN,
};
argp_parse(&argp, argc, argv, 0, NULL, NULL);
fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
if (fd < 0) {
perror("Couldn't open socket");
return 1;
}
strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
res = ioctl(fd, SIOCGIFINDEX, &req);
if (res < 0) {
perror("Couldn't get interface index");
goto err;
}
sk_addr.sll_ifindex = req.ifr_ifindex;
memcpy(&sk_addr.sll_addr, macaddr, ETH_ALEN);
if (priority != -1) {
res = setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &priority,
sizeof(priority));
if (res < 0) {
perror("Couldn't set priority");
goto err;
}
}
data = alloca(size);
memset(data, MAGIC, size);
printf("Sending packets...\n");
while (1) {
uint64_t *seq_ptr = (uint64_t *) &data[0];
ssize_t n;
*seq_ptr = seq++;
n = sendto(fd, data, size, 0, (struct sockaddr *) &sk_addr,
sizeof(sk_addr));
if (n < 0)
perror("Failed to send data");
if (delay > 0)
usleep(delay);
}
close(fd);
return 0;
err:
close(fd);
return 1;
}
--8<---------------cut here---------------end--------------->8---
tsn-listener.c
--8<---------------cut here---------------start------------->8---
/*
* Copyright (c) 2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <poll.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/timerfd.h>
#include <unistd.h>
static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static uint64_t data_count;
static int size = 1500;
static time_t interval = 1;
static bool check_seq = false;
static uint64_t expected_seq;
static struct argp_option options[] = {
{"check-seq", 'c', NULL, 0, "Check sequence number within packet" },
{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
{"ifname", 'i', "IFNAME", 0, "Network Interface" },
{"interval", 'I', "SEC", 0, "Interval between bandwidth reports" },
{"packet-size", 's', "NUM", 0, "Expected packet size" },
{ 0 }
};
static error_t parser(int key, char *arg, struct argp_state *state)
{
int res;
switch (key) {
case 'c':
check_seq = true;
break;
case 'd':
res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&macaddr[0], &macaddr[1], &macaddr[2],
&macaddr[3], &macaddr[4], &macaddr[5]);
if (res != 6) {
printf("Invalid address\n");
exit(EXIT_FAILURE);
}
break;
case 'i':
strncpy(ifname, arg, sizeof(ifname) - 1);
break;
case 'I':
interval = atoi(arg);
break;
case 's':
size = atoi(arg);
break;
}
return 0;
}
static struct argp argp = { options, parser };
static int setup_timer(void)
{
int fd, res;
struct itimerspec tspec = { 0 };
fd = timerfd_create(CLOCK_MONOTONIC, 0);
if (fd < 0) {
perror("Couldn't create timer");
return -1;
}
tspec.it_value.tv_sec = interval;
tspec.it_interval.tv_sec = interval;
res = timerfd_settime(fd, 0, &tspec, NULL);
if (res < 0) {
perror("Couldn't set timer");
close(fd);
return -1;
}
return fd;
}
static int setup_socket(void)
{
int fd, res;
struct sockaddr_ll sk_addr = {
.sll_family = AF_PACKET,
.sll_protocol = htons(ETH_P_TSN),
};
fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
if (fd < 0) {
perror("Couldn't open socket");
return -1;
}
/* If user provided a network interface, bind() to it. */
if (ifname[0] != '\0') {
struct ifreq req;
strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
res = ioctl(fd, SIOCGIFINDEX, &req);
if (res < 0) {
perror("Couldn't get interface index");
goto err;
}
sk_addr.sll_ifindex = req.ifr_ifindex;
res = bind(fd, (struct sockaddr *) &sk_addr, sizeof(sk_addr));
if (res < 0) {
perror("Couldn't bind() to interface");
goto err;
}
}
/* If user provided the stream destination address, set it as multicast
* address.
*/
if (macaddr[0] != '\0') {
struct packet_mreq mreq;
mreq.mr_ifindex = sk_addr.sll_ifindex;
mreq.mr_type = PACKET_MR_MULTICAST;
mreq.mr_alen = ETH_ALEN;
memcpy(&mreq.mr_address, macaddr, ETH_ALEN);
res = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
&mreq, sizeof(struct packet_mreq));
if (res < 0) {
perror("Couldn't set PACKET_ADD_MEMBERSHIP");
goto err;
}
}
return fd;
err:
close(fd);
return -1;
}
static void recv_packet(int fd)
{
uint8_t *data = alloca(size);
ssize_t n = recv(fd, data, size, 0);
if (n < 0) {
perror("Failed to receive data");
return;
}
if (n != size)
printf("Size mismatch: expected %d, got %d\n", size, n);
if (check_seq) {
uint64_t *seq = (uint64_t *) &data[0];
/* If 'expected_seq' is equal to zero, it means this is the
* first packet we received so we don't know what sequence
* number to expect.
*/
if (expected_seq == 0)
expected_seq = *seq;
if (*seq != expected_seq) {
printf("Sequence mismatch: expected %llu, got %llu\n",
expected_seq, *seq);
expected_seq = *seq;
}
expected_seq++;
}
data_count += n;
}
static void report_bw(int fd)
{
uint64_t expirations;
ssize_t n = read(fd, &expirations, sizeof(uint64_t));
if (n < 0) {
perror("Couldn't read timerfd");
return;
}
if (expirations != 1)
printf("Some went wrong with timerfd\n");
printf("Receiving data rate: %llu kbps\n", (data_count * 8) / (1000 * interval));
data_count = 0;
}
int main(int argc, char *argv[])
{
int sk_fd, timer_fd, res;
struct pollfd fds[2];
argp_parse(&argp, argc, argv, 0, NULL, NULL);
sk_fd = setup_socket();
if (sk_fd < 0)
return 1;
timer_fd = setup_timer();
if (timer_fd < 0) {
close(sk_fd);
return 1;
}
fds[0].fd = sk_fd;
fds[0].events = POLLIN;
fds[1].fd = timer_fd;
fds[1].events = POLLIN;
printf("Waiting for packets...\n");
while (1) {
res = poll(fds, 2, -1);
if (res < 0) {
perror("Error on poll()");
goto err;
}
if (fds[0].revents & POLLIN)
recv_packet(fds[0].fd);
if (fds[1].revents & POLLIN) {
report_bw(fds[1].fd);
}
}
close(timer_fd);
close(sk_fd);
return 0;
err:
close(timer_fd);
close(sk_fd);
return 1;
}
--8<---------------cut here---------------end--------------->8---
^ permalink raw reply
* Re: [PATCH net-next 0/2] tools: add bpftool
From: David Ahern @ 2017-09-26 23:32 UTC (permalink / raw)
To: Jakub Kicinski, netdev
Cc: daniel, alexei.starovoitov, davem, hannes, oss-drivers
In-Reply-To: <20170926153522.31500-1-jakub.kicinski@netronome.com>
On 9/26/17 9:35 AM, Jakub Kicinski wrote:
> I'm looking for a home for bpftool, Daniel suggested that
> tools/net could be a good place, since there are only BPF
> utilities there already.
>
> The tool should be complete for simple use cases and we
> will continue extending it as we go along. E.g. providing
> disassembly of loaded programs directly using LLVM library
> and JSON output are high on the priority list.
I have found this to be a very useful tool. Thanks for working on it.
Moving it into the kernel will make it easier to build since it relies
on libbpf and other files from the kernel tree.
One change I have made locally is to link against libbpf.a. That way I
only need to copy one file to a system to use it.
^ permalink raw reply
* Re: [PATCH net-next 2/2] tools: bpf: add bpftool
From: Jakub Kicinski @ 2017-09-26 23:02 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: netdev, daniel, davem, hannes, dsahern, oss-drivers
In-Reply-To: <20170926222405.nq23enzudbjklczb@ast-mbp>
On Tue, 26 Sep 2017 15:24:06 -0700, Alexei Starovoitov wrote:
> On Tue, Sep 26, 2017 at 08:35:22AM -0700, Jakub Kicinski wrote:
> > Add a simple tool for querying and updating BPF objects on the system.
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Simon Horman <simon.horman@netronome.com>
> > ---
> > tools/bpf/Makefile | 18 +-
> > tools/bpf/bpftool/Makefile | 80 +++++
> > tools/bpf/bpftool/common.c | 214 ++++++++++++
> > tools/bpf/bpftool/jit_disasm.c | 83 +++++
> > tools/bpf/bpftool/main.c | 212 ++++++++++++
> > tools/bpf/bpftool/main.h | 99 ++++++
> > tools/bpf/bpftool/map.c | 742 +++++++++++++++++++++++++++++++++++++++++
> > tools/bpf/bpftool/prog.c | 392 ++++++++++++++++++++++
> > 8 files changed, 1837 insertions(+), 3 deletions(-)
> ...
> > +static int do_help(int argc, char **argv)
> > +{
> > + fprintf(stderr,
> > + "Usage: %s %s show [MAP]\n"
> > + " %s %s dump MAP\n"
> > + " %s %s update MAP key BYTES value VALUE [UPDATE_FLAGS]\n"
> > + " %s %s lookup MAP key BYTES\n"
> > + " %s %s getnext MAP [key BYTES]\n"
> > + " %s %s delete MAP key BYTES\n"
> > + " %s %s pin MAP FILE\n"
> > + " %s %s help\n"
> > + "\n"
> > + " MAP := { id MAP_ID | pinned FILE }\n"
> > + " " HELP_SPEC_PROGRAM "\n"
> > + " VALUE := { BYTES | MAP | PROG }\n"
> > + " UPDATE_FLAGS := { any | exist | noexist }\n"
> > + "",
>
> overall looks good to me, but still difficult to grasp how to use it.
> Can you add README with example usage and expected output?
I have a README on GitHub, but I was thinking about perhaps writing a
proper man page? Do you prefer one over the other?
> Acked-by: Alexei Starovoitov <ast@kernel.org>
Thanks!
> You also realize that you're signing up maintaining this tool, right? ;)
Yes :)
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] bpf/verifier: improve disassembly of BPF_NEG instructions
From: Daniel Borkmann @ 2017-09-26 22:53 UTC (permalink / raw)
To: Edward Cree, davem; +Cc: netdev, alexei.starovoitov, ys114321
In-Reply-To: <f27b1dfe-4ea2-5737-d3d5-21cb581d1927@solarflare.com>
On 09/26/2017 05:35 PM, Edward Cree wrote:
> BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are
> compound-assignments. So give it its own format in print_bpf_insn().
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [PATCH v2 net-next 1/2] bpf/verifier: improve disassembly of BPF_END instructions
From: Daniel Borkmann @ 2017-09-26 22:53 UTC (permalink / raw)
To: Edward Cree, davem; +Cc: netdev, alexei.starovoitov, ys114321
In-Reply-To: <b0a84ccf-8842-876c-ec82-b4b1da3d6efa@solarflare.com>
On 09/26/2017 05:35 PM, Edward Cree wrote:
> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
> different structure: it has a size in insn->imm (even if it's BPF_X) and
> uses the BPF_SRC (X or K) to indicate which endianness to use. So it
> needs different code to print it.
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [PATCH 6/6] net: dsa: mv88e6xxx: Forward broadcast frames to cpu and dsa ports
From: Andrew Lunn @ 2017-09-26 22:43 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev
In-Reply-To: <1506464764-12699-8-git-send-email-andrew@lunn.ch>
Ah, twice patch 6. Not good.
I will wait for a few days for comments, and then repost without the
duplication.
Andrew
^ permalink raw reply
* Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50
From: Josh Poimboeuf @ 2017-09-26 22:42 UTC (permalink / raw)
To: Richard Weinberger
Cc: Alexei Starovoitov, ast, daniel, netdev, linux-kernel, mingo
In-Reply-To: <1598510.AHGpDp18sh@blindfold>
On Tue, Sep 26, 2017 at 11:51:31PM +0200, Richard Weinberger wrote:
> Alexei,
>
> CC'ing Josh and Ingo.
>
> Am Dienstag, 26. September 2017, 06:09:02 CEST schrieb Alexei Starovoitov:
> > On Mon, Sep 25, 2017 at 11:23:31PM +0200, Richard Weinberger wrote:
> > > Hi!
> > >
> > > While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to
> > > trigger this splat:
> > >
> > > [ 297.629773] WARNING: kernel stack frame pointer at ffff880156a5fea0 in
> > > bash:2103 has bad value 00007ffec7d87e50
> > > [ 297.629777] unwind stack type:0 next_sp: (null) mask:0x6
> > > graph_idx:0
> > > [ 297.629783] ffff88015b207ae0: ffff88015b207b68 (0xffff88015b207b68)
> > > [ 297.629790] ffff88015b207ae8: ffffffffb163c00e
> > > (__save_stack_trace+0x6e/
> > > 0xd0)
> > > [ 297.629792] ffff88015b207af0: 0000000000000000 ...
> > > [ 297.629795] ffff88015b207af8: ffff880156a58000 (0xffff880156a58000)
> > > [ 297.629799] ffff88015b207b00: ffff880156a60000 (0xffff880156a60000)
> > > [ 297.629800] ffff88015b207b08: 0000000000000000 ...
> > > [ 297.629803] ffff88015b207b10: 0000000000000006 (0x6)
> > > [ 297.629806] ffff88015b207b18: ffff880151b02700 (0xffff880151b02700)
> > > [ 297.629809] ffff88015b207b20: 0000010100000000 (0x10100000000)
> > > [ 297.629812] ffff88015b207b28: ffff880156a5fea0 (0xffff880156a5fea0)
> > > [ 297.629815] ffff88015b207b30: ffff88015b207ae0 (0xffff88015b207ae0)
> > > [ 297.629818] ffff88015b207b38: ffffffffc0050282 (0xffffffffc0050282)
> > > [ 297.629819] ffff88015b207b40: 0000000000000000 ...
> > > [ 297.629822] ffff88015b207b48: 0000000001000000 (0x1000000)
> > > [ 297.629825] ffff88015b207b50: ffff880157b98280 (0xffff880157b98280)
> > > [ 297.629828] ffff88015b207b58: ffff880157b98380 (0xffff880157b98380)
> > > [ 297.629831] ffff88015b207b60: ffff88015ad2b500 (0xffff88015ad2b500)
> > > [ 297.629834] ffff88015b207b68: ffff88015b207b78 (0xffff88015b207b78)
> > > [ 297.629838] ffff88015b207b70: ffffffffb163c086
> > > (save_stack_trace+0x16/0x20) [ 297.629841] ffff88015b207b78:
> > > ffff88015b207da8 (0xffff88015b207da8) [ 297.629847] ffff88015b207b80:
> > > ffffffffb18a8ed6 (save_stack+0x46/0xd0) [ 297.629850] ffff88015b207b88:
> > > 000000400000000c (0x400000000c)
> > > [ 297.629852] ffff88015b207b90: ffff88015b207ba0 (0xffff88015b207ba0)
> > > [ 297.629855] ffff88015b207b98: ffff880100000000 (0xffff880100000000)
> > > [ 297.629859] ffff88015b207ba0: ffffffffb163c086
> > > (save_stack_trace+0x16/0x20) [ 297.629864] ffff88015b207ba8:
> > > ffffffffb18a8ed6 (save_stack+0x46/0xd0) [ 297.629868] ffff88015b207bb0:
> > > ffffffffb18a9752 (kasan_slab_free+0x72/0xc0)
> > Thanks for the report!
> > I'm not sure I understand what's going on here.
> > It seems you have kasan enabled and it's trying to do save_stack()
> > and something crashing?
> > I don't see any bpf related helpers in the stack trace.
> > Which architecture is this? and .config ?
> > Is bpf jit enabled? If so, make sure that net.core.bpf_jit_kallsyms=1
>
> I found some time to dig a little further.
> It seems to happen only when CONFIG_DEBUG_SPINLOCK is enabled, please see the
> attached .config. The JIT is off.
> KAsan is also not involved at all, the regular stack saving machinery from the
> trace framework initiates the stack unwinder.
>
> The issue arises as soon as in pre_handler_kretprobe() raw_spin_lock_irqsave()
> is being called.
> It happens on all releases that have commit c32c47c68a0a ("x86/unwind: Warn on
> bad frame pointer").
> Interestingly it does not happen when I run
> samples/kprobes/kretprobe_example.ko. So, BPF must be involved somehow.
>
> Here is another variant of the warning, it matches the attached .config:
I can take a look at it. Unfortunately, for these types of issues I
often need the vmlinux file to be able to make sense of the unwinder
dump. So if you happen to have somewhere to copy the vmlinux to, that
would be helpful. Or if you give me your GCC version I can try to
rebuild it locally.
--
Josh
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] bpf/verifier: improve disassembly of BPF_NEG instructions
From: Alexei Starovoitov @ 2017-09-26 22:34 UTC (permalink / raw)
To: Edward Cree; +Cc: davem, netdev, daniel, ys114321
In-Reply-To: <f27b1dfe-4ea2-5737-d3d5-21cb581d1927@solarflare.com>
On Tue, Sep 26, 2017 at 04:35:29PM +0100, Edward Cree wrote:
> BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are
> compound-assignments. So give it its own format in print_bpf_insn().
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
thank you for the cleanup.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox