Netdev List
 help / color / mirror / Atom feed
* Feature Request : iface may be allowed as datatype in all ipset
From: Akshat Kakkar @ 2018-05-30 10:03 UTC (permalink / raw)
  To: netdev

Is there a reason why iface is allowed to be paired only with net to
create an ipset?

I think with feature of skbinfo in every ipset, it should be allowed
to add iface in all ipset. As skbinfo can store tc classes, it might
make more sense if I can pin point on which outgoing interface this
class should be applied.

One direct way of doing could be adding iface in skbinfo itself, but I
dont think its a good suggestion.

So, other thing left is to have ipset storing interface too. Besides,
when I create a tc class, I create it on a known interface, so I know
beforehand on which interface this class is created. So I can easily
specify while adding entry in ipset.

^ permalink raw reply

* Re: [PATCH bpf 2/2] bpf: enforce usage of __aligned_u64 in the UAPI header
From: Eugene Syromiatnikov @ 2018-05-30 10:03 UTC (permalink / raw)
  To: Song Liu
  Cc: netdev, open list, Martin KaFai Lau, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, Jiri Olsa, Ingo Molnar,
	Lawrence Brakmo, Andrey Ignatov, Jakub Kicinski, John Fastabend,
	Dmitry V. Levin
In-Reply-To: <CAPhsuW5fVamngrqEWcsPKyr3Njjz4K5vO3o51BuWXAMw_nf9KA@mail.gmail.com>

On Tue, May 29, 2018 at 10:35:09AM -0700, Song Liu wrote:
> I think these changes are not necessary. Is it a general guidance to
> only use 64-bit aligned
> variables in UAPI headers?

Not really, but it allows avoiding most alignment issues like the one
mentioned in the previous patch and in the referenced RDMA patch.

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
	Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
	Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <20180530112022.2b793051@redhat.com>

On Wed, May 30, 2018 at 5:20 AM Jesper Dangaard Brouer <brouer@redhat.com>
wrote:

> On Mon, 28 May 2018 09:09:17 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Tariq, here are my test results : No drops for me.
> >
> > # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> > MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
> > Socket  Message  Elapsed      Messages
> > Size    Size     Time         Okay Errors   Throughput
> > bytes   bytes    secs            #      #   10^6bits/sec
> >
> > 212992   65507   10.00      202117      0    10592.00
> > 212992           10.00           0              0.00

> Hmm... Eric the above result show that ALL your UDP packets were dropped!
> You have 0 okay messages and 0.00 Mbit/s throughput.

> It needs to look like below (test on i40e NIC):

> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      186385      0    9767.08
> 212992           10.00      186385           9767.08


> If I manually instruct ip6tables to drop all UDP packets, then I get
> what you see... so, something on your test system are likely dropping
> your UDP packets, but letting regular netperf (TCP) control
> communication through.

> # ip6tables -I INPUT -p udp -j DROP

> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      182095      0    9542.41
> 212992           10.00           0              0.00



Right you are, for some reason I copied/pasted wrong results,
after _specifically_ filling up the frags to the memory limits,
when trying to reproduce 'bad numbers '

Here are the good ones, using latest David Miller net tree. ( plus
https://patchwork.ozlabs.org/patch/922528/  but that should not matter here)

llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      216236      0    11331.89
212992           10.00      215068           11270.68


There are few drops because of the too small
/proc/sys/net/core/rmem_default  ( 212992 as seen in netperf output) for
these kind of stress.
( each 64KB datagram actually consumes half the budget ...)

^ permalink raw reply

* Re: [PATCH bpf-next] bpftool: Support sendmsg{4,6} attach types
From: Daniel Borkmann @ 2018-05-30 10:56 UTC (permalink / raw)
  To: Song Liu, Jakub Kicinski
  Cc: Andrey Ignatov, Networking, Alexei Starovoitov, Quentin Monnet,
	kernel-team
In-Reply-To: <CAPhsuW6oyRbgnXoyNtA0XM03063qQJGok6bPpO_Z4QBVgmi7=w@mail.gmail.com>

On 05/30/2018 02:12 AM, Song Liu wrote:
> On Tue, May 29, 2018 at 2:20 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Tue, 29 May 2018 13:29:31 -0700, Andrey Ignatov wrote:
>>> Add support for recently added BPF_CGROUP_UDP4_SENDMSG and
>>> BPF_CGROUP_UDP6_SENDMSG attach types to bpftool, update documentation
>>> and bash completion.
>>>
>>> Signed-off-by: Andrey Ignatov <rdna@fb.com>
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>
>>> I'm not sure about "since 4.18" in Documentation part. I can follow-up when
>>> the next kernel version is known.
>>
>> IMHO it's fine, we can follow up if Linus decides to call it something
>> else :)
>>
>> Thanks!
> 
> Acked-by: Song Liu <songliubraving@fb.com>

Applied to bpf-next, thanks guys!

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
	Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
	Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <CANn89iL73WTtF7P477tJOZcbDsg3U7Py7ykA9xdipcahtJKNNA@mail.gmail.com>

On Wed, May 30, 2018 at 6:36 AM Eric Dumazet <edumazet@google.com> wrote:


> Here are the good ones, using latest David Miller net tree. ( plus
> https://patchwork.ozlabs.org/patch/922528/  but that should not matter
here)

> llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
> UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
> 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      216236      0    11331.89
> 212992           10.00      215068           11270.68


> There are few drops because of the too small
> /proc/sys/net/core/rmem_default  ( 212992 as seen in netperf output) for
> these kind of stress.
> ( each 64KB datagram actually consumes half the budget ...)


Once rmem_default is set to 1,000,000 and  mtu set back to 1500 (instead of
5102 on my testbed)
results are indeed better.

lpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM -l 10
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      231457      0    12129.56
1000000           10.00      231457           12129.56

^ permalink raw reply

* Re: [PATCH v5 0/3] IR decoding using BPF
From: Daniel Borkmann @ 2018-05-30 10:57 UTC (permalink / raw)
  To: Sean Young, linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, netdev, Matthias Reichl, Devin Heitmueller,
	Y Song, Quentin Monnet
In-Reply-To: <cover.1527419762.git.sean@mess.org>

On 05/27/2018 01:24 PM, Sean Young wrote:
> The kernel IR decoders (drivers/media/rc/ir-*-decoder.c) support the most
> widely used IR protocols, but there are many protocols which are not
> supported[1]. For example, the lirc-remotes[2] repo has over 2700 remotes,
> many of which are not supported by rc-core. There is a "long tail" of
> unsupported IR protocols, for which lircd is need to decode the IR .
> 
> IR encoding is done in such a way that some simple circuit can decode it;
> therefore, bpf is ideal.
> 
> In order to support all these protocols, here we have bpf based IR decoding.
> The idea is that user-space can define a decoder in bpf, attach it to
> the rc device through the lirc chardev.
> 
> Separate work is underway to extend ir-keytable to have an extensive library
> of bpf-based decoders, and a much expanded library of rc keymaps.
> 
> Another future application would be to compile IRP[3] to a IR BPF program, and
> so support virtually every remote without having to write a decoder for each.
> It might also be possible to support non-button devices such as analog
> directional pads or air conditioning remote controls and decode the target
> temperature in bpf, and pass that to an input device.
> 
> Thanks,
> 
> Sean Young
> 
> [1] http://www.hifi-remote.com/wiki/index.php?title=DecodeIR
> [2] https://sourceforge.net/p/lirc-remotes/code/ci/master/tree/remotes/
> [3] http://www.hifi-remote.com/wiki/index.php?title=IRP_Notation
> 
> Changes since v4:
>  - Renamed rc_dev_bpf_{attach,detach,query} to lirc_bpf_{attach,detach,query}
>  - Fixed error path in lirc_bpf_query
>  - Rebased on bpf-next
> 
> Changes since v3:
>  - Implemented review comments from Quentin Monnet and Y Song (thanks!)
>  - More helpful and better formatted bpf helper documentation
>  - Changed back to bpf_prog_array rather than open-coded implementation
>  - scancodes can be 64 bit
>  - bpf gets passed values in microseconds, not nanoseconds.
>    microseconds is more than than enough (IR receivers support carriers upto
>    70kHz, at which point a single period is already 14 microseconds). Also,
>    this makes it much more consistent with lirc mode2.
>  - Since it looks much more like lirc mode2, rename the program type to
>    BPF_PROG_TYPE_LIRC_MODE2.
>  - Rebased on bpf-next
> 
> Changes since v2:
>  - Fixed locking issues
>  - Improved self-test to cover more cases
>  - Rebased on bpf-next again
> 
> Changes since v1:
>  - Code review comments from Y Song <ys114321@gmail.com> and
>    Randy Dunlap <rdunlap@infradead.org>
>  - Re-wrote sample bpf to be selftest
>  - Renamed RAWIR_DECODER -> RAWIR_EVENT (Kconfig, context, bpf prog type)
>  - Rebase on bpf-next
>  - Introduced bpf_rawir_event context structure with simpler access checking

Applied to bpf-next, thanks Sean!

^ permalink raw reply

* Re: [PATCH bpf-next v7 3/6] bpf: Add IPv6 Segment Routing helpers
From: Daniel Borkmann @ 2018-05-30 11:00 UTC (permalink / raw)
  To: Mathieu Xhonneux, netdev; +Cc: dlebrun, alexei.starovoitov
In-Reply-To: <d6833d31-4481-9595-ce26-d93ff35f411a@iogearbox.net>

On 05/24/2018 12:18 PM, Daniel Borkmann wrote:
> On 05/20/2018 03:58 PM, Mathieu Xhonneux wrote:
>> The BPF seg6local hook should be powerful enough to enable users to
>> implement most of the use-cases one could think of. After some thinking,
>> we figured out that the following actions should be possible on a SRv6
>> packet, requiring 3 specific helpers :
>>     - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
>>     - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
>>                                (to add/delete TLVs)
>>     - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
>>                            (specifically End.X, End.T, End.B6 and
>>                             End.B6.Encap)
>>
>> The specifications of these helpers are provided in the patch (see
>> include/uapi/linux/bpf.h).
>>
>> The non-sensitive fields of the SRH are the following : flags, tag and
>> TLVs. The other fields can not be modified, to maintain the SRH
>> integrity. Flags, tag and TLVs can easily be modified as their validity
>> can be checked afterwards via seg6_validate_srh. It is not allowed to
>> modify the segments directly. If one wants to add segments on the path,
>> he should stack a new SRH using the End.B6 action via
>> bpf_lwt_seg6_action.
>>
>> Growing, shrinking or editing TLVs via the helpers will flag the SRH as
>> invalid, and it will have to be re-validated before re-entering the IPv6
>> layer. This flag is stored in a per-CPU buffer, along with the current
>> header length in bytes.
>>
>> Storing the SRH len in bytes in the control block is mandatory when using
>> bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
>> len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
>> boundary). When adding/deleting TLVs within the BPF program, the SRH may
>> temporary be in an invalid state where its length cannot be rounded to 8
>> bytes without remainder, hence the need to store the length in bytes
>> separately. The caller of the BPF program can then ensure that the SRH's
>> final length is valid using this value. Again, a final SRH modified by a
>> BPF program which doesn’t respect the 8-bytes boundary will be discarded
>> as it will be considered as invalid.
>>
>> Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
>> available from the LWT BPF IN hook, but not from the seg6local BPF one.
>> This helper allows to encapsulate a Segment Routing Header (either with
>> a new outer IPv6 header, or by inlining it directly in the existing IPv6
>> header) into a non-SRv6 packet. This helper is required if we want to
>> offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
>> as the BPF seg6local hook only works on traffic already containing a SRH.
>> This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
>> the same purpose but with a static SRH per route.
>>
>> These helpers require CONFIG_IPV6=y (and not =m).
>>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> Acked-by: David Lebrun <dlebrun@google.com>
> 
> One minor comments for follow-ups in here below.
> 
>> +BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
>> +	   const void *, from, u32, len)
>> +{
>> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
>> +	struct seg6_bpf_srh_state *srh_state =
>> +		this_cpu_ptr(&seg6_bpf_srh_states);
>> +	void *srh_tlvs, *srh_end, *ptr;
>> +	struct ipv6_sr_hdr *srh;
>> +	int srhoff = 0;
>> +
>> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
>> +		return -EINVAL;
>> +
>> +	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
>> +	srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
>> +	srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
>> +
>> +	ptr = skb->data + offset;
>> +	if (ptr >= srh_tlvs && ptr + len <= srh_end)
>> +		srh_state->valid = 0;
>> +	else if (ptr < (void *)&srh->flags ||
>> +		 ptr + len > (void *)&srh->segments)
>> +		return -EFAULT;
>> +
>> +	if (unlikely(bpf_try_make_writable(skb, offset + len)))
>> +		return -EFAULT;
>> +
>> +	memcpy(skb->data + offset, from, len);
>> +	return 0;
>> +#else /* CONFIG_IPV6_SEG6_BPF */
>> +	return -EOPNOTSUPP;
>> +#endif
>> +}
> 
> Instead of doing this inside the helper you can reject the program already
> in the lwt_*_func_proto() by returning NULL when !CONFIG_IPV6_SEG6_BPF. That
> way programs get rejected at verification time instead of runtime, so the
> user can probe availability more easily.

Mathieu, before this gets lost in archives, plan to follow-up on this one?

^ permalink raw reply

* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Wei Xu @ 2018-05-30 11:42 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <1527559830-8133-9-git-send-email-jasowang@redhat.com>

On Tue, May 29, 2018 at 10:10:30AM +0800, Jason Wang wrote:
> This patch introduces basic support for event suppression aka driver
> and device area.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c            | 191 ++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.h            |  10 +-
>  include/uapi/linux/virtio_ring.h |  19 ++++
>  3 files changed, 204 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a36e5ad2..112f680 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1112,10 +1112,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
>  			       struct vring_used __user *used)
>  {
>  	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
> +	struct vring_packed_desc_event *driver_event =
> +		(struct vring_packed_desc_event *)avail;
> +	struct vring_packed_desc_event *device_event =
> +		(struct vring_packed_desc_event *)used;
>  
> -	/* FIXME: check device area and driver area */
>  	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
> -	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
> +	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
> +	       access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) &&
> +	       access_ok(VERIFY_WRITE, device_event, sizeof(*device_event));
>  }
>  
>  static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
> @@ -1190,14 +1195,27 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
>  	return true;
>  }
>  
> -int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
> +{
> +	int num = vq->num;
> +
> +	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
> +			       (u64)(uintptr_t)vq->driver_event,
> +			       sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
> +			       (u64)(uintptr_t)vq->device_event,
> +			       sizeof(*vq->device_event), VHOST_ADDR_USED);
> +}
> +
> +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
>  {
>  	size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
>  	unsigned int num = vq->num;
>  
> -	if (!vq->iotlb)
> -		return 1;
> -
>  	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
>  			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
>  	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
> @@ -1209,6 +1227,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
>  			       num * sizeof(*vq->used->ring) + s,
>  			       VHOST_ADDR_USED);
>  }
> +
> +int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +{
> +	if (!vq->iotlb)
> +		return 1;
> +
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vq_iotlb_prefetch_packed(vq);
> +	else
> +		return vq_iotlb_prefetch_split(vq);
> +}
>  EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
>  
>  /* Can we log writes? */
> @@ -1730,6 +1759,50 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>  	return 0;
>  }
>  
> +static int vhost_update_device_flags(struct vhost_virtqueue *vq,
> +				     __virtio16 device_flags)
> +{
> +	void __user *flags;
> +
> +	if (vhost_put_user(vq, device_flags, &vq->device_event->flags,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		flags = &vq->device_event->flags;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (flags - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->flags));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
> +static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq,
> +					__virtio16 device_off_wrap)
> +{
> +	void __user *off_wrap;
> +
> +	if (vhost_put_user(vq, device_off_wrap, &vq->device_event->off_wrap,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		off_wrap = &vq->device_event->off_wrap;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (off_wrap - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->off_wrap));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
>  static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>  {
>  	if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
> @@ -2683,16 +2756,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(vhost_add_used_n);
>  
> -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +static bool vhost_notify_split(struct vhost_dev *dev,
> +			       struct vhost_virtqueue *vq)
>  {
>  	__u16 old, new;
>  	__virtio16 event;
>  	bool v;
>  
> -	/* FIXME: check driver area */
> -	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> -		return true;
> -
>  	/* Flush out used index updates. This is paired
>  	 * with the barrier that the Guest executes when enabling
>  	 * interrupts. */
> @@ -2725,6 +2795,64 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
>  }
>  
> +static bool vhost_notify_packed(struct vhost_dev *dev,
> +				struct vhost_virtqueue *vq)
> +{
> +	__virtio16 event_off_wrap, event_flags;
> +	__u16 old, new, off_wrap;
> +	bool v;
> +
> +	/* Flush out used descriptors updates. This is paired
> +	 * with the barrier that the Guest executes when enabling
> +	 * interrupts.
> +	 */
> +	smp_mb();
> +
> +	if (vhost_get_avail(vq, event_flags,
> +			   &vq->driver_event->flags) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_flags");
> +		return true;
> +	}
> +
> +	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	old = vq->signalled_used;
> +	v = vq->signalled_used_valid;
> +	new = vq->signalled_used = vq->last_used_idx;
> +	vq->signalled_used_valid = true;
> +
> +	if (event_flags != cpu_to_vhost16(vq, RING_EVENT_FLAGS_DESC))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	/* Read desc event flags before event_off and event_wrap */
> +	smp_rmb();
> +
> +	if (vhost_get_avail(vq, event_off_wrap,
> +			    &vq->driver_event->off_wrap) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_off/wrap");
> +		return true;
> +	}
> +
> +	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
> +
> +	if (unlikely(!v))
> +		return true;
> +
> +	return vhost_vring_packed_need_event(vq, vq->used_wrap_counter,
> +					     off_wrap, new, old);
> +}
> +
> +static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +{
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vhost_notify_packed(dev, vq);
> +	else
> +		return vhost_notify_split(dev, vq);
> +}
> +
>  /* This actually signals the guest, using eventfd. */
>  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  {
> @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
>  				       struct vhost_virtqueue *vq)
>  {
>  	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
> -	__virtio16 flags;
> +	__virtio16 flags = RING_EVENT_FLAGS_ENABLE;
>  	int ret;
>  
> -	/* FIXME: disable notification through device area */
> +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> +		return false;
> +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;

'used_flags' was originally designed for 1.0, why should we pay attetion to it here?

Wei
> +
> +	if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
> +		__virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx |
> +				      vq->avail_wrap_counter << 15);
> +
> +		ret = vhost_update_device_off_wrap(vq, off_wrap);
> +		if (ret) {
> +			vq_err(vq, "Failed to write to off warp at %p: %d\n",
> +			       &vq->device_event->off_wrap, ret);
> +			return false;
> +		}
> +		/* Make sure off_wrap is wrote before flags */
> +		smp_wmb();
> +		flags = RING_EVENT_FLAGS_DESC;
> +	}
> +
> +	ret = vhost_update_device_flags(vq, flags);
> +	if (ret) {
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +			&vq->device_event->flags, ret);
> +		return false;
> +	}
>  
>  	/* They could have slipped one in as we were doing that: make
>  	 * sure it's written, then check again. */
> @@ -2871,7 +3023,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify);
>  static void vhost_disable_notify_packed(struct vhost_dev *dev,
>  					struct vhost_virtqueue *vq)
>  {
> -	/* FIXME: disable notification through device area */
> +	__virtio16 flags;
> +	int r;
> +
> +	if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
> +		return;
> +	vq->used_flags |= VRING_USED_F_NO_NOTIFY;
> +
> +	flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +	r = vhost_update_device_flags(vq, flags);
> +	if (r)
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +		       &vq->device_event->flags, r);
>  }
>  
>  static void vhost_disable_notify_split(struct vhost_dev *dev,
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 7543a46..b920582 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -96,8 +96,14 @@ struct vhost_virtqueue {
>  		struct vring_desc __user *desc;
>  		struct vring_desc_packed __user *desc_packed;
>  	};
> -	struct vring_avail __user *avail;
> -	struct vring_used __user *used;
> +	union {
> +		struct vring_avail __user *avail;
> +		struct vring_packed_desc_event __user *driver_event;
> +	};
> +	union {
> +		struct vring_used __user *used;
> +		struct vring_packed_desc_event __user *device_event;
> +	};
>  	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
>  	struct file *kick;
>  	struct eventfd_ctx *call_ctx;
> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index e297580..71c7a46 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -75,6 +75,25 @@ struct vring_desc_packed {
>  	__virtio16 flags;
>  };
>  
> +/* Enable events */
> +#define RING_EVENT_FLAGS_ENABLE 0x0
> +/* Disable events */
> +#define RING_EVENT_FLAGS_DISABLE 0x1
> +/*
> + * Enable events for a specific descriptor
> + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
> + */
> +#define RING_EVENT_FLAGS_DESC 0x2
> +/* The value 0x3 is reserved */
> +
> +struct vring_packed_desc_event {
> +	/* Descriptor Ring Change Event Offset and Wrap Counter */
> +	__virtio16 off_wrap;
> +	/* Descriptor Ring Change Event Flags */
> +	__virtio16 flags;
> +};
> +
>  /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
>  struct vring_desc {
>  	/* Address (guest-physical). */
> -- 
> 2.7.4
> 

^ permalink raw reply

* [PATCH net-next] cxgb4: Add FORCE_PAUSE bit to 32 bit port caps
From: Ganesh Goudar @ 2018-05-30 11:45 UTC (permalink / raw)
  To: netdev, davem
  Cc: nirranjan, indranil, Ganesh Goudar, Santosh Rastapur,
	Casey Leedom

Add FORCE_PAUSE bit to force local pause settings instead
of using auto negotiated values.

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c    | 10 +++++++++-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h |  5 +++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 39da7e3..974a868 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3941,6 +3941,7 @@ static fw_port_cap32_t fwcaps16_to_caps32(fw_port_cap16_t caps16)
 	CAP16_TO_CAP32(FC_RX);
 	CAP16_TO_CAP32(FC_TX);
 	CAP16_TO_CAP32(ANEG);
+	CAP16_TO_CAP32(FORCE_PAUSE);
 	CAP16_TO_CAP32(MDIAUTO);
 	CAP16_TO_CAP32(MDISTRAIGHT);
 	CAP16_TO_CAP32(FEC_RS);
@@ -3982,6 +3983,7 @@ static fw_port_cap16_t fwcaps32_to_caps16(fw_port_cap32_t caps32)
 	CAP32_TO_CAP16(802_3_PAUSE);
 	CAP32_TO_CAP16(802_3_ASM_DIR);
 	CAP32_TO_CAP16(ANEG);
+	CAP32_TO_CAP16(FORCE_PAUSE);
 	CAP32_TO_CAP16(MDIAUTO);
 	CAP32_TO_CAP16(MDISTRAIGHT);
 	CAP32_TO_CAP16(FEC_RS);
@@ -4014,6 +4016,8 @@ static inline fw_port_cap32_t cc_to_fwcap_pause(enum cc_pause cc_pause)
 		fw_pause |= FW_PORT_CAP32_FC_RX;
 	if (cc_pause & PAUSE_TX)
 		fw_pause |= FW_PORT_CAP32_FC_TX;
+	if (!(cc_pause & PAUSE_AUTONEG))
+		fw_pause |= FW_PORT_CAP32_FORCE_PAUSE;
 
 	return fw_pause;
 }
@@ -4101,7 +4105,11 @@ int t4_link_l1cfg_core(struct adapter *adapter, unsigned int mbox,
 		rcap = lc->acaps | fw_fc | fw_fec | fw_mdi;
 	}
 
-	if (rcap & ~lc->pcaps) {
+	/* Note that older Firmware doesn't have FW_PORT_CAP32_FORCE_PAUSE, so
+	 * we need to exclude this from this check in order to maintain
+	 * compatibility ...
+	 */
+	if ((rcap & ~lc->pcaps) & ~FW_PORT_CAP32_FORCE_PAUSE) {
 		dev_err(adapter->pdev_dev,
 			"Requested Port Capabilities %#x exceed Physical Port Capabilities %#x\n",
 			rcap, lc->pcaps);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 2d91480..f1967cf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -2475,7 +2475,7 @@ enum fw_port_cap {
 	FW_PORT_CAP_MDISTRAIGHT		= 0x0400,
 	FW_PORT_CAP_FEC_RS		= 0x0800,
 	FW_PORT_CAP_FEC_BASER_RS	= 0x1000,
-	FW_PORT_CAP_FEC_RESERVED	= 0x2000,
+	FW_PORT_CAP_FORCE_PAUSE		= 0x2000,
 	FW_PORT_CAP_802_3_PAUSE		= 0x4000,
 	FW_PORT_CAP_802_3_ASM_DIR	= 0x8000,
 };
@@ -2522,7 +2522,8 @@ enum fw_port_mdi {
 #define	FW_PORT_CAP32_FEC_RESERVED1	0x02000000UL
 #define	FW_PORT_CAP32_FEC_RESERVED2	0x04000000UL
 #define	FW_PORT_CAP32_FEC_RESERVED3	0x08000000UL
-#define	FW_PORT_CAP32_RESERVED2		0xf0000000UL
+#define FW_PORT_CAP32_FORCE_PAUSE	0x10000000UL
+#define FW_PORT_CAP32_RESERVED2		0xe0000000UL
 
 #define FW_PORT_CAP32_SPEED_S	0
 #define FW_PORT_CAP32_SPEED_M	0xfff
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next] net: qcom/emac: fix unused variable
From: Timur Tabi @ 2018-05-30 12:10 UTC (permalink / raw)
  To: YueHaibing, davem; +Cc: netdev, linux-kernel
In-Reply-To: <20180529104343.19448-1-yuehaibing@huawei.com>

On 5/29/18 5:43 AM, YueHaibing wrote:
> When CONFIG_ACPI isn't set, variable qdf2400_ops/qdf2432_ops isn't used.
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:284:25: warning: ‘qdf2400_ops’ defined but not used [-Wunused-variable]
>   static struct sgmii_ops qdf2400_ops = {
>                           ^~~~~~~~~~~
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:276:25: warning: ‘qdf2432_ops’ defined but not used [-Wunused-variable]
>   static struct sgmii_ops qdf2432_ops = {
>                           ^~~~~~~~~~~
> 
> Move the declaration and functions inside the CONFIG_ACPI ifdef
> to fix the warning.
> Signed-off-by: YueHaibing<yuehaibing@huawei.com>

I already fixed this with:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d377df784178bf5b0a39e75dc8b1ee86e1abb3f6

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:27 UTC (permalink / raw)
  To: netdev, dsahern

FRRouting installs routes into the kernel associated with
the originating protocol.  Add these values to the well
known values in rtnetlink.h.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
---
v2: Fixed whitespace issues
 include/uapi/linux/rtnetlink.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cabb210c93af..7d8502313c99 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -254,6 +254,11 @@ enum {
 #define RTPROT_DHCP	16      /* DHCP client */
 #define RTPROT_MROUTED	17      /* Multicast daemon */
 #define RTPROT_BABEL	42      /* Babel daemon */
+#define RTPROT_BGP	186     /* BGP Routes */
+#define RTPROT_ISIS	187     /* ISIS Routes */
+#define RTPROT_OSPF	188     /* OSPF Routes */
+#define RTPROT_RIP	189     /* RIP Routes */
+#define RTPROT_EIGRP	192     /* EIGRP Routes */
 
 /* rtm_scope
 
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support
From: Yishai Hadas @ 2018-05-30 12:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Boris Pismenny, Matan Barak, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev, Alex Rosenbaum
In-Reply-To: <20180529195627.GA31423@ziepe.ca>

On 5/29/2018 10:56 PM, Jason Gunthorpe wrote:
> On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
>> diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
>> index 508ea8c82da7..ef3f430a7050 100644
>> +++ b/include/uapi/rdma/mlx5-abi.h
>> @@ -443,4 +443,18 @@ enum {
>>   enum {
>>   	MLX5_IB_CLOCK_INFO_V1              = 0,
>>   };
>> +
>> +struct mlx5_ib_flow_counters_data {
>> +	__aligned_u64   counters_data;
>> +	__u32   ncounters;
>> +	__u32   reserved;
>> +};
>> +
>> +struct mlx5_ib_create_flow {
>> +	__u32   ncounters_data;
>> +	__u32   reserved;
>> +	/* Following are counters data based on ncounters_data */
>> +	struct mlx5_ib_flow_counters_data data[];
>> +};
>> +
>>   #endif /* MLX5_ABI_USER_H */
> 
> This uapi thing still needs to be fixed as I pointed out before.

In V3 we can go with below, no change in memory layout but it can 
clarify the code/usage.

struct mlx5_ib_flow_counters_desc {
         __u32   description;
         __u32   index;
};

struct mlx5_ib_flow_counters_data {
         RDMA_UAPI_PTR(struct mlx5_ib_flow_counters_desc *, counters_data);
         __u32   ncounters;
         __u32   reserved;
};

struct mlx5_ib_create_flow {
         __u32   ncounters_data;
         __u32   reserved;
         /* Following are counters data based on ncounters_data */
         struct mlx5_ib_flow_counters_data data[];


> I still can't figure out why this should be a 2d array.

This comes to support the future case of multiple counters objects/specs 
passed with the same flow. There is a need to differentiate mapping data 
for each counters object and that is done via the 'ncounters_data' field 
and the 2d array.

  I think it
> should be written simply as:
> 
> struct mlx5_ib_flow_counter_desc {
>          __u32 description;
>          __u32 index;
> };
> 
> struct mlx5_ib_create_flow {
> 	RDMA_UAPI_PTR(struct mlx5_ib_flow_counter_desc, counters_data);
> 	__u32   ncounters;
> 	__u32   reserved;
> };
> 
> With the corresponding changes elsewhere.
> 

This doesn't support the above use case.

> A flex array at the end of a struct means that the struct can never be
> extended again which seems like a terrible idea,

The header [1] has a fixed size and will always exist even if there will 
be no counters. Future extensions [2] will be added in the memory post 
the flex array which its size depends on 'ncounters_data'. This pattern 
is used also in other extended APIs. [3]

struct mlx5_ib_create_flow {
         __u32   ncounters_data;
         __u32   reserved;
[1] /* Header is above ********

         /* Following are counters data based on ncounters_data */
         struct mlx5_ib_flow_counters_data data[];

[2] Future fields.

[3] 
https://elixir.bootlin.com/linux/latest/source/include/uapi/rdma/ib_user_verbs.h#L1145

^ permalink raw reply

* Re: [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:32 UTC (permalink / raw)
  To: netdev, David Ahern
In-Reply-To: <20180530122732.3688-1-sharpd@cumulusnetworks.com>

This patch is intended for net-next.

thanks!

donald

On Wed, May 30, 2018 at 8:27 AM, Donald Sharp
<sharpd@cumulusnetworks.com> wrote:
> FRRouting installs routes into the kernel associated with
> the originating protocol.  Add these values to the well
> known values in rtnetlink.h.
>
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
> v2: Fixed whitespace issues
>  include/uapi/linux/rtnetlink.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index cabb210c93af..7d8502313c99 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -254,6 +254,11 @@ enum {
>  #define RTPROT_DHCP    16      /* DHCP client */
>  #define RTPROT_MROUTED 17      /* Multicast daemon */
>  #define RTPROT_BABEL   42      /* Babel daemon */
> +#define RTPROT_BGP     186     /* BGP Routes */
> +#define RTPROT_ISIS    187     /* ISIS Routes */
> +#define RTPROT_OSPF    188     /* OSPF Routes */
> +#define RTPROT_RIP     189     /* RIP Routes */
> +#define RTPROT_EIGRP   192     /* EIGRP Routes */
>
>  /* rtm_scope
>
> --
> 2.14.3
>

^ permalink raw reply

* [PATCH net-next] qed: Add srq core support for RoCE and iWARP
From: Yuval Bason @ 2018-05-30 13:11 UTC (permalink / raw)
  To: yuval.bason, davem
  Cc: netdev, jgg, dledford, linux-rdma, Michal Kalderon, Ariel Elior

This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_cxt.c   |   5 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h   |   1 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   2 +
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c |  23 ++++
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_rdma.c  | 179 +++++++++++++++++++++++++++-
 drivers/net/ethernet/qlogic/qed/qed_rdma.h  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_roce.c  |  17 ++-
 include/linux/qed/qed_rdma_if.h             |  12 +-
 9 files changed, 235 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 820b226..7ed6aa0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -47,6 +47,7 @@
 #include "qed_hsi.h"
 #include "qed_hw.h"
 #include "qed_init_ops.h"
+#include "qed_rdma.h"
 #include "qed_reg_addr.h"
 #include "qed_sriov.h"
 
@@ -426,7 +427,7 @@ static void qed_cxt_set_srq_count(struct qed_hwfn *p_hwfn, u32 num_srqs)
 	p_mgr->srq_count = num_srqs;
 }
 
-static u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
 {
 	struct qed_cxt_mngr *p_mgr = p_hwfn->p_cxt_mngr;
 
@@ -2071,7 +2072,7 @@ static void qed_rdma_set_pf_params(struct qed_hwfn *p_hwfn,
 	u32 num_cons, num_qps, num_srqs;
 	enum protocol_type proto;
 
-	num_srqs = min_t(u32, 32 * 1024, p_params->num_srqs);
+	num_srqs = min_t(u32, QED_RDMA_MAX_SRQS, p_params->num_srqs);
 
 	if (p_hwfn->mcp_info->func_info.protocol == QED_PCI_ETH_RDMA) {
 		DP_NOTICE(p_hwfn,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
index a4e9586..758a8b4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
@@ -235,6 +235,7 @@ u32 qed_cxt_get_proto_tid_count(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
 u32 qed_cxt_get_proto_cid_start(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn);
 int qed_cxt_free_proto_ilt(struct qed_hwfn *p_hwfn, enum protocol_type proto);
 
 #define QED_CTX_WORKING_MEM 0
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 8e1e6e1..82ce401 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -9725,6 +9725,8 @@ enum iwarp_eqe_async_opcode {
 	IWARP_EVENT_TYPE_ASYNC_EXCEPTION_DETECTED,
 	IWARP_EVENT_TYPE_ASYNC_QP_IN_ERROR_STATE,
 	IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT,
 	MAX_IWARP_EQE_ASYNC_OPCODE
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 2a2b101..474e6cf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -271,6 +271,8 @@ int qed_iwarp_create_qp(struct qed_hwfn *p_hwfn,
 	p_ramrod->sq_num_pages = qp->sq_num_pages;
 	p_ramrod->rq_num_pages = qp->rq_num_pages;
 
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(qp->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(p_hwfn->hw_info.opaque_fid);
 	p_ramrod->qp_handle_for_cqe.hi = cpu_to_le32(qp->qp_handle.hi);
 	p_ramrod->qp_handle_for_cqe.lo = cpu_to_le32(qp->qp_handle.lo);
 
@@ -3004,8 +3006,11 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 				 union event_ring_data *data,
 				 u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
 	struct regpair *fw_handle = &data->rdma_data.async_handle;
 	struct qed_iwarp_ep *ep = NULL;
+	u16 srq_offset;
+	u16 srq_id;
 	u16 cid;
 
 	ep = (struct qed_iwarp_ep *)(uintptr_t)HILO_64(fw_handle->hi,
@@ -3067,6 +3072,24 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 		qed_iwarp_cid_cleaned(p_hwfn, cid);
 
 		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_EMPTY,
+					&srq_id);
+		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_LIMIT,
+					&srq_id);
+		break;
 	case IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW:
 		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW\n");
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 68c4399..b04d57c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -64,6 +64,7 @@
 
 #define QED_ROCE_QPS			(8192)
 #define QED_ROCE_DPIS			(8)
+#define QED_RDMA_SRQS                   QED_ROCE_QPS
 
 static char version[] =
 	"QLogic FastLinQ 4xxxx Core Module qed " DRV_MODULE_VERSION "\n";
@@ -922,6 +923,7 @@ static void qed_update_pf_params(struct qed_dev *cdev,
 	if (IS_ENABLED(CONFIG_QED_RDMA)) {
 		params->rdma_pf_params.num_qps = QED_ROCE_QPS;
 		params->rdma_pf_params.min_dpis = QED_ROCE_DPIS;
+		params->rdma_pf_params.num_srqs = QED_RDMA_SRQS;
 		/* divide by 3 the MRs to avoid MF ILT overflow */
 		params->rdma_pf_params.gl_pi = QED_ROCE_PROTOCOL_INDEX;
 	}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index a411f9c..bd23659 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -259,15 +259,29 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
 		goto free_cid_map;
 	}
 
+	/* Allocate bitmap for srqs */
+	p_rdma_info->num_srqs = qed_cxt_get_srq_count(p_hwfn);
+	rc = qed_rdma_bmap_alloc(p_hwfn, &p_rdma_info->srq_map,
+				 p_rdma_info->num_srqs, "SRQ");
+	if (rc) {
+		DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+			   "Failed to allocate srq bitmap, rc = %d\n", rc);
+		goto free_real_cid_map;
+	}
+
 	if (QED_IS_IWARP_PERSONALITY(p_hwfn))
 		rc = qed_iwarp_alloc(p_hwfn);
 
 	if (rc)
-		goto free_cid_map;
+		goto free_srq_map;
 
 	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "Allocation successful\n");
 	return 0;
 
+free_srq_map:
+	kfree(p_rdma_info->srq_map.bitmap);
+free_real_cid_map:
+	kfree(p_rdma_info->real_cid_map.bitmap);
 free_cid_map:
 	kfree(p_rdma_info->cid_map.bitmap);
 free_tid_map:
@@ -351,6 +365,8 @@ static void qed_rdma_resc_free(struct qed_hwfn *p_hwfn)
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->cq_map, 1);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->toggle_bits, 0);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->tid_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->srq_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->real_cid_map, 1);
 
 	kfree(p_rdma_info->port);
 	kfree(p_rdma_info->dev);
@@ -431,6 +447,12 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	if (cdev->rdma_max_sge)
 		dev->max_sge = min_t(u32, cdev->rdma_max_sge, dev->max_sge);
 
+	dev->max_srq_sge = QED_RDMA_MAX_SGE_PER_SRQ_WQE;
+	if (p_hwfn->cdev->rdma_max_srq_sge) {
+		dev->max_srq_sge = min_t(u32,
+					 p_hwfn->cdev->rdma_max_srq_sge,
+					 dev->max_srq_sge);
+	}
 	dev->max_inline = ROCE_REQ_MAX_INLINE_DATA_SIZE;
 
 	dev->max_inline = (cdev->rdma_max_inline) ?
@@ -474,6 +496,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	dev->max_mr_mw_fmr_size = dev->max_mr_mw_fmr_pbl * PAGE_SIZE;
 	dev->max_pkey = QED_RDMA_MAX_P_KEY;
 
+	dev->max_srq = p_hwfn->p_rdma_info->num_srqs;
+	dev->max_srq_wr = QED_RDMA_MAX_SRQ_WQE_ELEM;
 	dev->max_qp_resp_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
 					  (RDMA_RESP_RD_ATOMIC_ELM_SIZE * 2);
 	dev->max_qp_req_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
@@ -1628,6 +1652,156 @@ static void *qed_rdma_get_rdma_ctx(struct qed_dev *cdev)
 	return QED_LEADING_HWFN(cdev);
 }
 
+int qed_rdma_modify_srq(void *rdma_cxt,
+			struct qed_rdma_modify_srq_in_params *in_params)
+{
+	struct rdma_srq_modify_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid;
+	int rc;
+
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_MODIFY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_modify_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->wqe_limit = cpu_to_le16(in_params->wqe_limit);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "modified SRQ id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+int qed_rdma_destroy_srq(void *rdma_cxt,
+			 struct qed_rdma_destroy_srq_in_params *in_params)
+{
+	struct rdma_srq_destroy_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	struct qed_spq_entry *p_ent;
+	struct qed_bmap *bmap;
+	u16 opaque_fid;
+	int rc;
+
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_DESTROY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_destroy_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, in_params->srq_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "SRQ destroyed Id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+int qed_rdma_create_srq(void *rdma_cxt,
+			struct qed_rdma_create_srq_in_params *in_params,
+			struct qed_rdma_create_srq_out_params *out_params)
+{
+	struct rdma_srq_create_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	enum qed_cxt_elem_type elem_type;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid, srq_id;
+	struct qed_bmap *bmap;
+	u32 returned_id;
+	int rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	rc = qed_rdma_bmap_alloc_id(p_hwfn, bmap, &returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	if (rc) {
+		DP_NOTICE(p_hwfn, "failed to allocate srq id\n");
+		return rc;
+	}
+
+	elem_type = QED_ELEM_SRQ;
+	rc = qed_cxt_dynamic_ilt_alloc(p_hwfn, elem_type, returned_id);
+	if (rc)
+		goto err;
+	/* returned id is no greater than u16 */
+	srq_id = (u16)returned_id;
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	memset(&init_data, 0, sizeof(init_data));
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_CREATE_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		goto err;
+
+	p_ramrod = &p_ent->ramrod.rdma_create_srq;
+	DMA_REGPAIR_LE(p_ramrod->pbl_base_addr, in_params->pbl_base_addr);
+	p_ramrod->pages_in_srq_pbl = cpu_to_le16(in_params->num_pages);
+	p_ramrod->pd_id = cpu_to_le16(in_params->pd_id);
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->page_size = cpu_to_le16(in_params->page_size);
+	DMA_REGPAIR_LE(p_ramrod->producers_addr, in_params->prod_pair_addr);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		goto err;
+
+	out_params->srq_id = srq_id;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+		   "SRQ created Id = %x\n", out_params->srq_id);
+
+	return rc;
+
+err:
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	return rc;
+}
+
 bool qed_rdma_allocated_qps(struct qed_hwfn *p_hwfn)
 {
 	bool result;
@@ -1773,6 +1947,9 @@ static int qed_roce_ll2_set_mac_filter(struct qed_dev *cdev,
 	.rdma_free_tid = &qed_rdma_free_tid,
 	.rdma_register_tid = &qed_rdma_register_tid,
 	.rdma_deregister_tid = &qed_rdma_deregister_tid,
+	.rdma_create_srq = &qed_rdma_create_srq,
+	.rdma_modify_srq = &qed_rdma_modify_srq,
+	.rdma_destroy_srq = &qed_rdma_destroy_srq,
 	.ll2_acquire_connection = &qed_ll2_acquire_connection,
 	.ll2_establish_connection = &qed_ll2_establish_connection,
 	.ll2_terminate_connection = &qed_ll2_terminate_connection,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.h b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
index 18ec9cb..6f722ee 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
@@ -96,6 +96,8 @@ struct qed_rdma_info {
 	u8 num_cnqs;
 	u32 num_qps;
 	u32 num_mrs;
+	u32 num_srqs;
+	u16 srq_id_offset;
 	u16 queue_zone_base;
 	u16 max_queue_zones;
 	enum protocol_type proto;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 6acfd43..ee57fcd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -65,6 +65,8 @@
 		     u8 fw_event_code,
 		     u16 echo, union event_ring_data *data, u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
+
 	if (fw_event_code == ROCE_ASYNC_EVENT_DESTROY_QP_DONE) {
 		u16 icid =
 		    (u16)le32_to_cpu(data->rdma_data.rdma_destroy_qp_data.cid);
@@ -75,11 +77,18 @@
 		 */
 		qed_roce_free_real_icid(p_hwfn, icid);
 	} else {
-		struct qed_rdma_events *events = &p_hwfn->p_rdma_info->events;
+		if (fw_event_code == ROCE_ASYNC_EVENT_SRQ_EMPTY ||
+		    fw_event_code == ROCE_ASYNC_EVENT_SRQ_LIMIT) {
+			u16 srq_id = (u16)data->rdma_data.async_handle.lo;
+
+			events.affiliated_event(events.context, fw_event_code,
+						&srq_id);
+		} else {
+			union rdma_eqe_data rdata = data->rdma_data;
 
-		events->affiliated_event(p_hwfn->p_rdma_info->events.context,
-					 fw_event_code,
-				     (void *)&data->rdma_data.async_handle);
+			events.affiliated_event(events.context, fw_event_code,
+						(void *)&rdata.async_handle);
+		}
 	}
 
 	return 0;
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index 4dd72ba..e05e320 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -485,7 +485,9 @@ enum qed_iwarp_event_type {
 	QED_IWARP_EVENT_ACTIVE_MPA_REPLY,
 	QED_IWARP_EVENT_LOCAL_ACCESS_ERROR,
 	QED_IWARP_EVENT_REMOTE_OPERATION_ERROR,
-	QED_IWARP_EVENT_TERMINATE_RECEIVED
+	QED_IWARP_EVENT_TERMINATE_RECEIVED,
+	QED_IWARP_EVENT_SRQ_LIMIT,
+	QED_IWARP_EVENT_SRQ_EMPTY,
 };
 
 enum qed_tcp_ip_version {
@@ -646,6 +648,14 @@ struct qed_rdma_ops {
 	int (*rdma_alloc_tid)(void *rdma_cxt, u32 *itid);
 	void (*rdma_free_tid)(void *rdma_cxt, u32 itid);
 
+	int (*rdma_create_srq)(void *rdma_cxt,
+			       struct qed_rdma_create_srq_in_params *iparams,
+			       struct qed_rdma_create_srq_out_params *oparams);
+	int (*rdma_destroy_srq)(void *rdma_cxt,
+				struct qed_rdma_destroy_srq_in_params *iparams);
+	int (*rdma_modify_srq)(void *rdma_cxt,
+			       struct qed_rdma_modify_srq_in_params *iparams);
+
 	int (*ll2_acquire_connection)(void *rdma_cxt,
 				      struct qed_ll2_acquire_data *data);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [RFC PATCH ghak32 V2 00/13] audit: implement container id
From: Steve Grubb @ 2018-05-30 13:20 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: simo-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-FjpueFixGhCM4zKIHC2jIg, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	luto-DgEjT+Ai2ygdnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1521179281.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Friday, March 16, 2018 5:00:27 AM EDT Richard Guy Briggs wrote:
> Implement audit kernel container ID.
> 
> This patchset is a second RFC based on the proposal document (V3)
> posted:
> 	https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html

So, if you work on a container orchestrator, how exactly is this set of 
interfaces to be used and in what order?

Thanks,
-Steve

> The first patch implements the proc fs write to set the audit container
> ID of a process, emitting an AUDIT_CONTAINER record to announce the
> registration of that container ID on that process.  This patch requires
> userspace support for record acceptance and proper type display.
> 
> The second checks for children or co-threads and refuses to set the
> container ID if either are present.  (This policy could be changed to
> set both with the same container ID provided they meet the rest of the
> requirements.)
> 
> The third implements the auxiliary record AUDIT_CONTAINER_INFO if a
> container ID is identifiable with an event.  This patch requires
> userspace support for proper type display.
> 
> The fourth adds container ID filtering to the exit, exclude and user
> lists.  This patch requires auditctil userspace support for the
> --containerid option.
> 
> The 5th adds signal and ptrace support.
> 
> The 6th creates a local audit context to be able to bind a standalone
> record with a locally created auxiliary record.
> 
> The 7th, 8th, 9th, 10th patches add container ID records to standalone
> records.  Some of these may end up being syscall auxiliary records and
> won't need this specific support since they'll be supported via
> syscalls.
> 
> The 11th adds network namespace container ID labelling based on member
> tasks' container ID labels.
> 
> The 12th adds container ID support to standalone netfilter records that
> don't have a task context and lists each container to which that net
> namespace belongs.
> 
> The 13th implements reading the container ID from the proc filesystem
> for debugging.  This patch isn't planned for upstream inclusion.
> 
> Feedback please!
> 
> Example: Set a container ID of 123456 to the "sleep" task:
> 	sleep 2&
> 	child=$!
> 	echo 123456 > /proc/$child/containerid; echo $?
> 	ausearch -ts recent -m container
> 	echo child:$child contid:$( cat /proc/$child/containerid)
> This should produce a record such as:
> 	type=CONTAINER msg=audit(1521122590.315:222): op=set pid=689 uid=0
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0
> ses=3 opid=707 old-contid=18446744073709551615 contid=123456 res=1
> 
> Example: Set a filter on a container ID 123459 on /tmp/tmpcontainerid:
> 	containerid=123459
> 	key=tmpcontainerid
> 	auditctl -a exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\");
> close(\$tmpfile);" & child=$!
> 	echo $containerid > /proc/$child/containerid
> 	sleep 2
> 	ausearch -i -ts recent -k $key
> 	auditctl -d exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key rm -f /tmp/$key
> This should produce an event such as:
> 	type=CONTAINER_INFO msg=audit(1521122591.614:227): op=task contid=123459
> 	type=PROCTITLE msg=audit(1521122591.614:227):
> proctitle=7065726C002D6500736C65657020313B206F70656E286D792024746D7066696C
> 652C20273E272C20222F746D702F746D70636F6E7461696E6572696422293B20636C6F73652
> 824746D7066696C65293B type=PATH msg=audit(1521122591.614:227): item=1
> name="/tmp/tmpcontainerid" inode=18427 dev=00:26 mode=0100644 ouid=0
> ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
> cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
> type=PATH msg=audit(1521122591.614:227): item=0 name="/tmp/" inode=13513
> dev=00:26 mode=041777 ouid=0 ogid=0 rdev=00:00
> obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=0000000000000000
> cap_fi=0000000000000000 cap_fe=0 cap_fver=0 type=CWD
> msg=audit(1521122591.614:227): cwd="/root"
> 	type=SYSCALL msg=audit(1521122591.614:227): arch=c000003e syscall=257
> success=yes exit=3 a0=ffffffffffffff9c a1=55db90a28900 a2=241 a3=1b6
> items=2 ppid=689 pid=724 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0
> sgid=0 fsgid=0 tty=pts0 ses=3 comm="perl" exe="/usr/bin/perl"
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> key="tmpcontainerid"
> 
> See:
> 	https://github.com/linux-audit/audit-kernel/issues/32
> 	https://github.com/linux-audit/audit-userspace/issues/40
> 	https://github.com/linux-audit/audit-testsuite/issues/64
> 
> Richard Guy Briggs (13):
>   audit: add container id
>   audit: check children and threading before allowing containerid
>   audit: log container info of syscalls
>   audit: add containerid filtering
>   audit: add containerid support for ptrace and signals
>   audit: add support for non-syscall auxiliary records
>   audit: add container aux record to watch/tree/mark
>   audit: add containerid support for tty_audit
>   audit: add containerid support for config/feature/user records
>   audit: add containerid support for seccomp and anom_abend records
>   audit: add support for containerid to network namespaces
>   audit: NETFILTER_PKT: record each container ID associated with a netNS
>   debug audit: read container ID of a process
> 
>  drivers/tty/tty_audit.c     |   5 +-
>  fs/proc/base.c              |  53 ++++++++++++++++
>  include/linux/audit.h       |  43 +++++++++++++
>  include/linux/init_task.h   |   4 +-
>  include/linux/sched.h       |   1 +
>  include/net/net_namespace.h |  12 ++++
>  include/uapi/linux/audit.h  |   8 ++-
>  kernel/audit.c              |  75 ++++++++++++++++++++---
>  kernel/audit.h              |   3 +
>  kernel/audit_fsnotify.c     |   5 +-
>  kernel/audit_tree.c         |   5 +-
>  kernel/audit_watch.c        |  33 +++++-----
>  kernel/auditfilter.c        |  52 +++++++++++++++-
>  kernel/auditsc.c            | 145
> ++++++++++++++++++++++++++++++++++++++++++-- kernel/nsproxy.c            |
>   6 ++
>  net/core/net_namespace.c    |  45 ++++++++++++++
>  net/netfilter/xt_AUDIT.c    |  15 ++++-
>  17 files changed, 473 insertions(+), 37 deletions(-)

^ permalink raw reply

* Re: [PATCH bpf v3 1/5] selftests/bpf: test_sockmap, check test failure
From: John Fastabend @ 2018-05-30 13:26 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <20180530055611.10216-2-bhole_prashant_q7@lab.ntt.co.jp>

On 05/29/2018 10:56 PM, Prashant Bhole wrote:
> Test failures are not identified because exit code of RX/TX threads
> is not checked. Also threads are not returning correct exit code.
> 
> - Return exit code from threads depending on test execution status
> - In main thread, check the exit code of RX/TX threads
> - Skip error checking for corked tests as they are expected to timeout
> 
> Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> ---
>  tools/testing/selftests/bpf/test_sockmap.c | 25 ++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 

Looks good. Thanks.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout
From: John Fastabend @ 2018-05-30 13:31 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <20180530055611.10216-4-bhole_prashant_q7@lab.ntt.co.jp>

On 05/29/2018 10:56 PM, Prashant Bhole wrote:
> In order to reduce runtime of tests, recently timout for select() call
> was reduced from 1sec to 10usec. This was causing many tests failures.
> It was caught with failure handling commits in this series.
> 
> Restoring the timeout from 10usec to 1sec
> 
> Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> ---

Quick question, how long does it take to run now with the time increase?
If its taking too long we may need to remove some tests. I have a longer
running test_sockmap script that I run as part of Cilium[1] project
where I put longer running stress tests.

Acked-by: John Fastabend <john.fastabend@gmail.com>

[1] cilium.io

^ permalink raw reply

* Re: [PATCH bpf v3 0/5] fix test_sockmap
From: John Fastabend @ 2018-05-30 13:32 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <20180530055611.10216-1-bhole_prashant_q7@lab.ntt.co.jp>

On 05/29/2018 10:56 PM, Prashant Bhole wrote:
> This series fixes error handling, timeout and data verification in
> test_sockmap. Previously it was not able to detect failure/timeout in
> RX/TX thread because error was not notified to the main thread.
> 
> Also slightly improved test output by printing parameter values (cork,
> apply, start, end) so that parameters for all tests are displayed.
> 
> Changes in v3:
>   - Skipped error checking for corked tests
> 
> Prashant Bhole (5):
>   selftests/bpf: test_sockmap, check test failure
>   selftests/bpf: test_sockmap, join cgroup in selftest mode
>   selftests/bpf: test_sockmap, fix test timeout
>   selftests/bpf: test_sockmap, fix data verification
>   selftests/bpf: test_sockmap, print additional test options
> 
>  tools/testing/selftests/bpf/test_sockmap.c | 76 +++++++++++++++++-----
>  1 file changed, 58 insertions(+), 18 deletions(-)
> 

Looks good thanks. We may want to tune the running time a bit but
lets get this applied first. A lot of nice improvements!

.John

^ permalink raw reply

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Tariq Toukan @ 2018-05-30 13:49 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir, Qing Huang,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan
In-Reply-To: <20180530041152.113393-1-edumazet@google.com>



On 30/05/2018 7:11 AM, Eric Dumazet wrote:
> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> brought a regression caught in our regression suite, thanks to KASAN.
> 
> Note that mlx4_alloc_icm() is already able to try high order allocations
> and fallback to low-order allocations under high memory pressure.
> 
> We only have to tweak gfp_mask a bit, to help falling back faster,
> without risking OOM killings.
> 
> BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
> Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585
> 
> CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G           O
> Call Trace:
>   [<ffffffffba80d7bb>] dump_stack+0x4d/0x72
>   [<ffffffffb951dc5f>] print_address_description+0x6f/0x260
>   [<ffffffffb951e1c7>] kasan_report+0x257/0x370
>   [<ffffffffb951e339>] __asan_report_load4_noabort+0x19/0x20
>   [<ffffffffc0256d28>] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
>   [<ffffffffc02785b3>] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib]
>   [<ffffffffc02dbfdb>] qpstat_print_qp+0x13b/0x500 [ib_uverbs]
>   [<ffffffffc02dc3ea>] qpstat_seq_show+0x4a/0xb0 [ib_uverbs]
>   [<ffffffffb95f125c>] seq_read+0xa9c/0x1230
>   [<ffffffffb96e0821>] proc_reg_read+0xc1/0x180
>   [<ffffffffb9577918>] __vfs_read+0xe8/0x730
>   [<ffffffffb9578057>] vfs_read+0xf7/0x300
>   [<ffffffffb95794d2>] SyS_read+0xd2/0x1b0
>   [<ffffffffb8e06b16>] do_syscall_64+0x186/0x420
>   [<ffffffffbaa00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f851a7bb30d
> RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d
> RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b
> RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000
> R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000
> R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000
> 
> Allocated by task 4488:
>   save_stack+0x46/0xd0
>   kasan_kmalloc+0xad/0xe0
>   __kmalloc+0x101/0x5e0
>   ib_register_device+0xc03/0x1250 [ib_core]
>   mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib]
>   mlx4_add_device+0xa9/0x340 [mlx4_core]
>   mlx4_register_interface+0x16e/0x390 [mlx4_core]
>   xhci_pci_remove+0x7a/0x180 [xhci_pci]
>   do_one_initcall+0xa0/0x230
>   do_init_module+0x1b9/0x5a4
>   load_module+0x63e6/0x94c0
>   SYSC_init_module+0x1a4/0x1c0
>   SyS_init_module+0xe/0x10
>   do_syscall_64+0x186/0x420
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Freed by task 0:
> (stack is not available)
> 
> The buggy address belongs to the object at ffff8817df584f40
>   which belongs to the cache kmalloc-32 of size 32
> The buggy address is located 8 bytes to the right of
>   32-byte region [ffff8817df584f40, ffff8817df584f60)
> The buggy address belongs to the page:
> page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1
> flags: 0x880000000000100(slab)
> raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f
> raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0
> page dumped because: kasan: bad access detected
> page->mem_cgroup:ffff883ff78d26c0
> 
> Memory state around the buggy address:
>   ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc
>   ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
>> ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc
>                                                            ^
>   ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
>   ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: John Sperbeck <jsperbeck@google.com>
> Cc: Tarick Bedeir <tarick@google.com>
> Cc: Qing Huang <qing.huang@oracle.com>
> Cc: Daniel Jurgens <danielj@mellanox.com>
> Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 685337d58276fc91baeeb64387c52985e1bc6dda..cae33d5c7dbd9ba7929adcf2127b104f6796fa5a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,13 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in page size (default 4KB on many archs) chunks to avoid high
> - * order memory allocations in fragmented/high usage memory situation.
> + * We allocate in as big chunks as we can, up to a maximum of 256 KB
> + * per chunk. Note that the chunks are not necessarily in contiguous
> + * physical memory.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
> -	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
> +	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> +	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   	struct mlx4_icm *icm;
>   	struct mlx4_icm_chunk *chunk = NULL;
>   	int cur_order;
> +	gfp_t mask;
>   	int ret;
>   
>   	/* We use sg_set_buf for coherent allocs, which assumes low memory */
> @@ -178,13 +180,16 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   		while (1 << cur_order > npages)
>   			--cur_order;
>   
> +		mask = gfp_mask;
> +		if (cur_order)
> +			mask = (mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
>   		if (coherent)
>   			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
>   						      &chunk->mem[chunk->npages],
> -						      cur_order, gfp_mask);
> +						      cur_order, mask);
>   		else
>   			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
> -						   cur_order, gfp_mask,
> +						   cur_order, mask,
>   						   dev->numa_node);
>   
>   		if (ret) {
> 

Thanks Eric.

I think it preserves the original intention of commit 1383cb8103bb 
("mlx4_core: allocate ICM memory in page size chunks").

Looks good to me.

Acked-by: Tariq Toukan <tariqt@mellanox.com>

^ permalink raw reply

* Re: [PATCH v2] Revert "alx: remove WoL support"
From: Andrew Lunn @ 2018-05-30 13:58 UTC (permalink / raw)
  To: AceLan Kao
  Cc: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	Emily Chien, Johannes Berg, Johannes Stezenbach, linux-kernel
In-Reply-To: <20180530021008.15080-1-acelan.kao@canonical.com>

On Wed, May 30, 2018 at 10:10:08AM +0800, AceLan Kao wrote:
> This reverts commit bc2bebe8de8ed4ba6482c9cc370b0dd72ffe8cd2.
> 
> The WoL feature is a must to pass Energy Star 6.1 and above,
> the power consumption will be measured during S3 with WoL is enabled.
> 
> Reverting "alx: remove WoL support", and will try to fix the unintentional
> wake up issue when WoL is enabled.

Hi AceLan

I find this change log entry rather odd.

If i remember correctly, you first argued that you did not want to
have to distribute out of tree patches.

It was suggested that you might be able to justify the revert using
the argument that the cure is worse than the decease. You ignored
that, and when with this Energy Star argument. That got shot down by
DaveM, and told to actually try to find the problem.

So you then come back and said you think the problem is fixed, but
don't know exactly what fixed it. So DaveM said try again.

Now you are back to Energy Star.

I don't get this. It was the fact you said it was probably fixed that
made DaveM reconsider. That is the argument you should be using in the
change log. We want to know what testing you have done. See a
tested-by: from somebody who had the issue which caused the revert,
and now says the issue is fixed.

Ideally we would like to know which change actually fixed the issue,
so it can be added to stable. But that requires somebody to do a long
git bisect.

    Andrew

^ permalink raw reply

* Re: [PATCH 2/4] arcnet: com20020: bindings for smsc com20020
From: Andrea Greco @ 2018-05-30 14:07 UTC (permalink / raw)
  To: Rob Herring
  Cc: Tobin C. Harding, Andrea Greco, Mark Rutland, netdev, devicetree,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAL_JsqL1M3SuH_cJGUhh0+Xg+UYDjAc-=a+USENOKznvWib1Ng@mail.gmail.com>

On 05/24/2018 04:36 PM, Rob Herring wrote> If you want to add it, that's 
fine. But it's really not something that
> comes up often. For UARTs, there's already the "current-speed"
> property and most other things I can think of use Hz to express
> speeds.

No, Pref keep standard and use Hz.

This if finally:
```
SMSC com20020 Arcnet network controller

Required property:
- timeout-ns: Arcnet bus timeout, Idle Time (328000 - 20500)
- bus-speed-bps: Arcnet bus speed (10000000 - 156250)
- smsc,xtal-mhz: External oscillator frequency
- smsc,backplane-enabled: Controller use backplane mode
- reset-gpios: Chip reset pin
- interrupts: Should contain controller interrupt

arcnet@28000000 {
	compatible = "smsc,com20020";

	timeout-ns = <20500>;
	bus-speed-hz = <10000000>;
	smsc,xtal-mhz = <20>;
	smsc,backplane-enabled;

	reset-gpios = <&gpio3 21 GPIO_ACTIVE_LOW>;
	interrupts = <&gpio2 10 GPIO_ACTIVE_LOW>;
};
```
If confirmed, for me is right

Andrea

^ permalink raw reply

* pull-request: wireless-drivers 2018-05-30
From: Kalle Valo @ 2018-05-30 14:17 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel

Hi Dave,

I now this is late but hopefully this pull request can make it to net
tree and to the final 4.17 release still. But if not, please let me know
and I'll pull this to wireless-drivers-next instead.

More info in the signed tag below, and please let me know if there are
any problems.

Kalle

The following changes since commit 813477aa49aac5deba04eb4956360dde58a0e807:

  MAINTAINERS: change Kalle as wcn36xx maintainer (2018-05-22 15:36:41 +0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2018-05-30

for you to fetch changes up to ab1068d6866e28bf6427ceaea681a381e5870a4a:

  iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs (2018-05-29 10:40:25 +0300)

----------------------------------------------------------------
wireless-drivers fixes for 4.17

Two last minute fixes, hopefully they make it to 4.17 still.

rt2x00

* revert a fix which caused even more problems

iwlwifi

* fix a crash when there are 16 or more logical CPUs

----------------------------------------------------------------
Hao Wei Tee (1):
      iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs

Stanislaw Gruszka (1):
      Revert "rt2800: use TXOP_BACKOFF for probe frames"

 drivers/net/wireless/intel/iwlwifi/pcie/trans.c  | 10 +++++-----
 drivers/net/wireless/ralink/rt2x00/rt2x00queue.c |  7 +++----
 2 files changed, 8 insertions(+), 9 deletions(-)

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Tariq Toukan @ 2018-05-30 14:42 UTC (permalink / raw)
  To: Eric Dumazet, Tariq Toukan, moshe
  Cc: Eric Dumazet, aring, David Miller, netdev, Florian Westphal,
	Herbert Xu, Thomas Graf, Jesper Dangaard Brouer, Alexander Aring,
	Stefan Schmidt, Kirill Tkhai, Eran Ben Elisha
In-Reply-To: <CANn89iJN8=ZdnErcwMnEyhywksQLONL1t7DX=UoAJhFv4t0PrA@mail.gmail.com>



On 30/05/2018 10:36 AM, Eric Dumazet wrote:
> On Wed, May 30, 2018 at 3:20 AM Tariq Toukan <tariqt@mellanox.com> wrote:
> 
>> Not sure, the transmit BW you get is higher than what we saw.
>> Anyway, we'll check this.
> 
> That is on a 40Gbit test bed (mlx4 cx/3), maybe you were using a 10Gbit NIC
> ?
> 

It is a ConnectX-4 50G (mlx5).

Moshe is trying out the tuning you suggested.
He'll update once he's done.

^ permalink raw reply

* Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support
From: Jason Gunthorpe @ 2018-05-30 15:06 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Boris Pismenny, Matan Barak, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev, Alex Rosenbaum
In-Reply-To: <316f5042-b47d-2cee-48de-514467817e7a@dev.mellanox.co.il>

On Wed, May 30, 2018 at 03:31:34PM +0300, Yishai Hadas wrote:
> On 5/29/2018 10:56 PM, Jason Gunthorpe wrote:
> >On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
> >>diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
> >>index 508ea8c82da7..ef3f430a7050 100644
> >>+++ b/include/uapi/rdma/mlx5-abi.h
> >>@@ -443,4 +443,18 @@ enum {
> >>  enum {
> >>  	MLX5_IB_CLOCK_INFO_V1              = 0,
> >>  };
> >>+
> >>+struct mlx5_ib_flow_counters_data {
> >>+	__aligned_u64   counters_data;
> >>+	__u32   ncounters;
> >>+	__u32   reserved;
> >>+};
> >>+
> >>+struct mlx5_ib_create_flow {
> >>+	__u32   ncounters_data;
> >>+	__u32   reserved;
> >>+	/* Following are counters data based on ncounters_data */
> >>+	struct mlx5_ib_flow_counters_data data[];
> >>+};
> >>+
> >>  #endif /* MLX5_ABI_USER_H */
> >
> >This uapi thing still needs to be fixed as I pointed out before.
> 
> In V3 we can go with below, no change in memory layout but it can clarify
> the code/usage.
> 
> struct mlx5_ib_flow_counters_desc {
>         __u32   description;
>         __u32   index;
> };
> 
> struct mlx5_ib_flow_counters_data {
>         RDMA_UAPI_PTR(struct mlx5_ib_flow_counters_desc *, counters_data);
>         __u32   ncounters;
>         __u32   reserved;
> };

OK, this is what I asked for originally..

> struct mlx5_ib_create_flow {
>         __u32   ncounters_data;
>         __u32   reserved;
>         /* Following are counters data based on ncounters_data */
>         struct mlx5_ib_flow_counters_data data[];
> 
> 
> >I still can't figure out why this should be a 2d array.
> 
> This comes to support the future case of multiple counters objects/specs
> passed with the same flow. There is a need to differentiate mapping data for
> each counters object and that is done via the 'ncounters_data' field and the
> 2d array.

This still doesn't make any sense to me. How are these multiple
counters objects/specs going to be identified?

Basically, what does the array index for data[] mean? Should it match
the spec index from the main verb or something?

This is a good place for a comment, since the intention is completely
opaque here.

> >A flex array at the end of a struct means that the struct can never be
> >extended again which seems like a terrible idea,
> 
> The header [1] has a fixed size and will always exist even if there will be
> no counters. Future extensions [2] will be added in the memory post the flex
> array which its size depends on 'ncounters_data'. This pattern is used also
> in other extended APIs. [3]
> 
> struct mlx5_ib_create_flow {
>         __u32   ncounters_data;
>         __u32   reserved;
> [1] /* Header is above ********
> 
>         /* Following are counters data based on ncounters_data */
>         struct mlx5_ib_flow_counters_data data[];
> 
> [2] Future fields.

We could do that.. But we won't - if it comes to it this will have to
move to the new kabi.

> [3] https://elixir.bootlin.com/linux/latest/source/include/uapi/rdma/ib_user_verbs.h#L1145

?? That looks like a normal flex array to me.

Jason

^ permalink raw reply

* Re: [PATCH mlx5-next 1/2] net/mlx5: Add temperature warning event to log
From: Saeed Mahameed @ 2018-05-30 15:08 UTC (permalink / raw)
  To: andrew@lunn.ch
  Cc: Jason Gunthorpe, netdev@vger.kernel.org, Ilan Tayari,
	linux-rdma@vger.kernel.org, Leon Romanovsky, Adi Nissim
In-Reply-To: <20180530010404.GB30239@lunn.ch>

On Wed, 2018-05-30 at 03:04 +0200, Andrew Lunn wrote:
> On Tue, May 29, 2018 at 05:19:53PM -0700, Saeed Mahameed wrote:
> > From: Ilan Tayari <ilant@mellanox.com>
> > 
> > Temperature warning event is sent by FW to indicate high
> > temperature
> > as detected by one of the sensors on the board.
> > Add handling of this event by writing the numbers of the alert
> > sensors
> > to the kernel log.
> 
> Hi Saaed
> 
> Is the temperature itself available? If so, it would be better to
> expose this as a hwmon device per temperature sensor.
> 

Hi Andrew, yes the temperature is available by other means, this patch
is needed for alert information reasons in order to know which internal
sensors triggered the alarm.
We are working in parallel to expose temperature sensor to hwmon, but
this is still WIP.


Is it ok to have both ?

>        Andrew

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox