Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] udp: disable gso with no_check_tx
From: Michal Kubecek @ 2018-05-02  7:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn, Willem de Bruijn
In-Reply-To: <20180430195836.69378-1-willemdebruijn.kernel@gmail.com>

On Mon, Apr 30, 2018 at 03:58:36PM -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Syzbot managed to send a udp gso packet without checksum offload into
> the gso stack by disabling tx checksum (UDP_NO_CHECK6_TX). This
> triggered the skb_warn_bad_offload.
> 
>   RIP: 0010:skb_warn_bad_offload+0x2bc/0x600 net/core/dev.c:2658
>    skb_gso_segment include/linux/netdevice.h:4038 [inline]
>    validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3120
>    __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3577
>    dev_queue_xmit+0x17/0x20 net/core/dev.c:3618
> 
> UDP_NO_CHECK6_TX sets skb->ip_summed to CHECKSUM_NONE just after the
> udp gso integrity checks in udp_(v6_)send_skb. Extend those checks to
> catch and fail in this case.

Sounds rather familiar... perhaps we might want to check other
exceptions added to UFO over the years, some might apply here as well.

Michal Kubecek

^ permalink raw reply

* [PATCH net-next v2 0/2] mlxsw: Reject unsupported FIB configurations
From: Ido Schimmel @ 2018-05-02  7:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, dsahern, mlxsw, Ido Schimmel

Recently it became possible for listeners of the FIB notification chain
to veto operations such as addition of routes and rules.

Adjust the mlxsw driver to take advantage of it and return an error for
unsupported FIB rules and for routes configured after the abort
mechanism was triggered (due to exceeded resources for example).

v2:
* Change error code in first patch to -EOPNOTSUPP (David Ahern).

Ido Schimmel (2):
  mlxsw: spectrum_router: Return an error for non-default FIB rules
  mlxsw: spectrum_router: Return an error for routes added after abort

 .../net/ethernet/mellanox/mlxsw/spectrum_router.c   | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [PATCH net-next v2 1/2] mlxsw: spectrum_router: Return an error for non-default FIB rules
From: Ido Schimmel @ 2018-05-02  7:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180502071735.32352-1-idosch@mellanox.com>

Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
fib_nl_newrule") it is possible to forbid the installation of
unsupported FIB rules.

Have mlxsw return an error for non-default FIB rules in addition to the
existing extack message.

Example:
# ip rule add from 198.51.100.1 table 10
Error: mlxsw_spectrum: FIB rules not supported.

Note that offload is only aborted when non-default FIB rules are already
installed and merely replayed during module initialization.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 8e4edb634b11..added380e344 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5882,24 +5882,24 @@ static int mlxsw_sp_router_fib_rule_event(unsigned long event,
 	switch (info->family) {
 	case AF_INET:
 		if (!fib4_rule_default(rule) && !rule->l3mdev)
-			err = -1;
+			err = -EOPNOTSUPP;
 		break;
 	case AF_INET6:
 		if (!fib6_rule_default(rule) && !rule->l3mdev)
-			err = -1;
+			err = -EOPNOTSUPP;
 		break;
 	case RTNL_FAMILY_IPMR:
 		if (!ipmr_rule_default(rule) && !rule->l3mdev)
-			err = -1;
+			err = -EOPNOTSUPP;
 		break;
 	case RTNL_FAMILY_IP6MR:
 		if (!ip6mr_rule_default(rule) && !rule->l3mdev)
-			err = -1;
+			err = -EOPNOTSUPP;
 		break;
 	}
 
 	if (err < 0)
-		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported. Aborting offload");
+		NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported");
 
 	return err;
 }
@@ -5926,8 +5926,8 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 	case FIB_EVENT_RULE_DEL:
 		err = mlxsw_sp_router_fib_rule_event(event, info,
 						     router->mlxsw_sp);
-		if (!err)
-			return NOTIFY_DONE;
+		if (!err || info->extack)
+			return notifier_from_errno(err);
 	}
 
 	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next v2 2/2] mlxsw: spectrum_router: Return an error for routes added after abort
From: Ido Schimmel @ 2018-05-02  7:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180502071735.32352-1-idosch@mellanox.com>

We currently do not perform accounting in the driver and thus can't
reject routes before resources are exceeded.

However, in order to make users aware of the fact that routes are no
longer offloaded we can return an error for routes configured after the
abort mechanism was triggered.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index added380e344..8028d221aece 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5928,6 +5928,13 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 						     router->mlxsw_sp);
 		if (!err || info->extack)
 			return notifier_from_errno(err);
+		break;
+	case FIB_EVENT_ENTRY_ADD:
+		if (router->aborted) {
+			NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
+			return notifier_from_errno(-EINVAL);
+		}
+		break;
 	}
 
 	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
-- 
2.14.3

^ permalink raw reply related

* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-02  7:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <34781052-df9f-e505-cd3f-08e460b34dcc@redhat.com>

On Wed, May 02, 2018 at 10:51:06AM +0800, Jason Wang wrote:
> On 2018年04月25日 13:15, Tiwei Bie wrote:
> > This commit introduces the event idx support in packed
> > ring. This feature is temporarily disabled, because the
> > implementation in this patch may not work as expected,
> > and some further discussions on the implementation are
> > needed, e.g. do we have to check the wrap counter when
> > checking whether a kick is needed?
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> >   1 file changed, 49 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 0181e93897be..b1039c2985b9 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >   static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > -	u16 flags;
> > +	u16 new, old, off_wrap, flags;
> >   	bool needs_kick;
> >   	u32 snapshot;
> > @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> >   	 * suppressions. */
> >   	virtio_mb(vq->weak_barriers);
> > +	old = vq->next_avail_idx - vq->num_added;
> > +	new = vq->next_avail_idx;
> > +	vq->num_added = 0;
> > +
> >   	snapshot = *(u32 *)vq->vring_packed.device;
> > +	off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> >   	flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
> >   #ifdef DEBUG
> > @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> >   	vq->last_add_time_valid = false;
> >   #endif
> > -	needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > +	if (flags == VRING_EVENT_F_DESC)
> > +		needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
> 
> I wonder whether or not the math is correct. Both new and event are in the
> unit of descriptor ring size, but old looks not.

What vring_need_event() cares is the distance between
`new` and `old`, i.e. vq->num_added. So I think there
is nothing wrong with `old`. But the calculation of the
distance between `new` and `event_idx` isn't right when
`new` wraps. How do you think about the below code:

	wrap_counter = off_wrap >> 15;
	event_idx = off_wrap & ~(1<<15);
	if (wrap_counter != vq->wrap_counter)
		event_idx -= vq->vring_packed.num;
	
	needs_kick = vring_need_event(event_idx, new, old);

Best regards,
Tiwei Bie


> 
> Thanks
> 
> > +	else
> > +		needs_kick = (flags != VRING_EVENT_F_DISABLE);
> >   	END_USE(vq);
> >   	return needs_kick;
> >   }
> > @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> >   	if (vq->last_used_idx >= vq->vring_packed.num)
> >   		vq->last_used_idx -= vq->vring_packed.num;
> > +	/* If we expect an interrupt for the next entry, tell host
> > +	 * by writing event index and flush out the write before
> > +	 * the read in the next get_buf call. */
> > +	if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> > +		virtio_store_mb(vq->weak_barriers,
> > +				&vq->vring_packed.driver->off_wrap,
> > +				cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> > +						(vq->wrap_counter << 15)));
> > +
> >   #ifdef DEBUG
> >   	vq->last_add_time_valid = false;
> >   #endif
> > @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> >   	/* We optimistically turn back on interrupts, then check if there was
> >   	 * more to do. */
> > +	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > +	 * either clear the flags bit or point the event index at the next
> > +	 * entry. Always update the event index to keep code simple. */
> > +
> > +	vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > +			vq->last_used_idx | (vq->wrap_counter << 15));
> >   	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> >   		virtio_wmb(vq->weak_barriers);
> > -		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > +		vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > +						     VRING_EVENT_F_ENABLE;
> >   		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> >   							vq->event_flags_shadow);
> >   	}
> > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> >   static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	u16 bufs, used_idx, wrap_counter;
> >   	START_USE(vq);
> >   	/* We optimistically turn back on interrupts, then check if there was
> >   	 * more to do. */
> > +	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > +	 * either clear the flags bit or point the event index at the next
> > +	 * entry. Always update the event index to keep code simple. */
> > +
> > +	/* TODO: tune this threshold */
> > +	bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> > +
> > +	used_idx = vq->last_used_idx + bufs;
> > +	wrap_counter = vq->wrap_counter;
> > +
> > +	if (used_idx >= vq->vring_packed.num) {
> > +		used_idx -= vq->vring_packed.num;
> > +		wrap_counter ^= 1;
> > +	}
> > +
> > +	vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > +			used_idx | (wrap_counter << 15));
> >   	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> >   		virtio_wmb(vq->weak_barriers);
> > -		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > +		vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > +						     VRING_EVENT_F_ENABLE;
> >   		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> >   							vq->event_flags_shadow);
> >   	}
> > @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> >   		switch (i) {
> >   		case VIRTIO_RING_F_INDIRECT_DESC:
> >   			break;
> > +#if 0
> >   		case VIRTIO_RING_F_EVENT_IDX:
> >   			break;
> > +#endif
> >   		case VIRTIO_F_VERSION_1:
> >   			break;
> >   		case VIRTIO_F_IOMMU_PLATFORM:
> 

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-05-02  7:50 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown
In-Reply-To: <9c6f055c-c665-0601-3973-bce74d72544b@intel.com>

Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>> 
>> > > Now I try to change mac of the failover master:
>> > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> > > RTNETLINK answers: Operation not supported
>> > > 
>> > > That I did expect to work. I would expect this would change the mac of
>> > > the master and both standby and primary slaves.
>> > If a VF is untrusted, a VM will not able to change its MAC and moreover
>> Note that at this point, I have no VF. So I'm not sure why you mention
>> that.
>> 
>> 
>> > in this mode we are assuming that the hypervisor has assigned the MAC and
>> > guest is not expected to change the MAC.
>> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>> mac and all works fine. How is this different? Change mac on "failover
>> instance" should work and should propagate the mac down to its slaves.
>> 
>> 
>> > For the initial implementation, i would propose not allowing the guest to
>> > change the MAC of failover or standby dev.
>> I see no reason for such restriction.
>> 
>
>It is true that a VM user can change mac address of a normal virtio-net interface,
>however when it is in STANDBY mode i think we should not allow this change specifically
>because we are creating a failover instance based on a MAC that is assigned by the
>hypervisor.
>
>Moreover,  in a cloud environment i would think that PF/hypervisor assigns a MAC to
>the VF and it cannot be changed by the guest.

So that is easy. You allow the change of the mac and in the "failover"
mac change implementation you propagate the change down to slaves. If
one slave does not support the change, you bail out. And since VF does
not allow it as you say, once it will be enslaved, the mac change could
not be done. Seems like a correct behavior to me and is in-sync with how
bond/team behaves.


>
>So for the initial implementation, do you see any issues with having this restriction
>in STANDBY mode.
>
>

^ permalink raw reply

* Re: [RFC V3 PATCH 1/8] vhost: move get_rx_bufs to vhost.c
From: Tiwei Bie @ 2018-05-02  8:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, kvm, virtualization, netdev, linux-kernel, jfreimann, wexu
In-Reply-To: <1524461700-5469-2-git-send-email-jasowang@redhat.com>

On Mon, Apr 23, 2018 at 01:34:53PM +0800, Jason Wang wrote:
> Move get_rx_bufs() to vhost.c and rename it to
> vhost_get_rx_bufs(). This helps to hide vring internal layout from

A small typo. Based on the code change in this patch, it
seems that this function is renamed to vhost_get_bufs().

Thanks


> specific device implementation. Packed ring implementation will
> benefit from this.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c   | 83 ++-------------------------------------------------
>  drivers/vhost/vhost.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/vhost/vhost.h |  7 +++++
>  3 files changed, 88 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 986058a..762aa81 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -664,83 +664,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
>  	return len;
>  }
>  
> -/* This is a multi-buffer version of vhost_get_desc, that works if
> - *	vq has read descriptors only.
> - * @vq		- the relevant virtqueue
> - * @datalen	- data length we'll be reading
> - * @iovcount	- returned count of io vectors we fill
> - * @log		- vhost log
> - * @log_num	- log offset
> - * @quota       - headcount quota, 1 for big buffer
> - *	returns number of buffer heads allocated, negative on error
> - */
> -static int get_rx_bufs(struct vhost_virtqueue *vq,
> -		       struct vring_used_elem *heads,
> -		       int datalen,
> -		       unsigned *iovcount,
> -		       struct vhost_log *log,
> -		       unsigned *log_num,
> -		       unsigned int quota)
> -{
> -	unsigned int out, in;
> -	int seg = 0;
> -	int headcount = 0;
> -	unsigned d;
> -	int r, nlogs = 0;
> -	/* len is always initialized before use since we are always called with
> -	 * datalen > 0.
> -	 */
> -	u32 uninitialized_var(len);
> -
> -	while (datalen > 0 && headcount < quota) {
> -		if (unlikely(seg >= UIO_MAXIOV)) {
> -			r = -ENOBUFS;
> -			goto err;
> -		}
> -		r = vhost_get_vq_desc(vq, vq->iov + seg,
> -				      ARRAY_SIZE(vq->iov) - seg, &out,
> -				      &in, log, log_num);
> -		if (unlikely(r < 0))
> -			goto err;
> -
> -		d = r;
> -		if (d == vq->num) {
> -			r = 0;
> -			goto err;
> -		}
> -		if (unlikely(out || in <= 0)) {
> -			vq_err(vq, "unexpected descriptor format for RX: "
> -				"out %d, in %d\n", out, in);
> -			r = -EINVAL;
> -			goto err;
> -		}
> -		if (unlikely(log)) {
> -			nlogs += *log_num;
> -			log += *log_num;
> -		}
> -		heads[headcount].id = cpu_to_vhost32(vq, d);
> -		len = iov_length(vq->iov + seg, in);
> -		heads[headcount].len = cpu_to_vhost32(vq, len);
> -		datalen -= len;
> -		++headcount;
> -		seg += in;
> -	}
> -	heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
> -	*iovcount = seg;
> -	if (unlikely(log))
> -		*log_num = nlogs;
> -
> -	/* Detect overrun */
> -	if (unlikely(datalen > 0)) {
> -		r = UIO_MAXIOV + 1;
> -		goto err;
> -	}
> -	return headcount;
> -err:
> -	vhost_discard_vq_desc(vq, headcount);
> -	return r;
> -}
> -
>  /* Expects to be always run from workqueue - which acts as
>   * read-size critical section for our kind of RCU. */
>  static void handle_rx(struct vhost_net *net)
> @@ -790,9 +713,9 @@ static void handle_rx(struct vhost_net *net)
>  	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
>  		sock_len += sock_hlen;
>  		vhost_len = sock_len + vhost_hlen;
> -		headcount = get_rx_bufs(vq, vq->heads + nheads, vhost_len,
> -					&in, vq_log, &log,
> -					likely(mergeable) ? UIO_MAXIOV : 1);
> +		headcount = vhost_get_bufs(vq, vq->heads + nheads, vhost_len,
> +					   &in, vq_log, &log,
> +					   likely(mergeable) ? UIO_MAXIOV : 1);
>  		/* On error, stop handling until the next kick. */
>  		if (unlikely(headcount < 0))
>  			goto out;
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f3bd8e9..6b455f6 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2097,6 +2097,84 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
>  
> +/* This is a multi-buffer version of vhost_get_desc, that works if
> + *	vq has read descriptors only.
> + * @vq		- the relevant virtqueue
> + * @datalen	- data length we'll be reading
> + * @iovcount	- returned count of io vectors we fill
> + * @log		- vhost log
> + * @log_num	- log offset
> + * @quota       - headcount quota, 1 for big buffer
> + *	returns number of buffer heads allocated, negative on error
> + */
> +int vhost_get_bufs(struct vhost_virtqueue *vq,
> +		   struct vring_used_elem *heads,
> +		   int datalen,
> +		   unsigned *iovcount,
> +		   struct vhost_log *log,
> +		   unsigned *log_num,
> +		   unsigned int quota)
> +{
> +	unsigned int out, in;
> +	int seg = 0;
> +	int headcount = 0;
> +	unsigned d;
> +	int r, nlogs = 0;
> +	/* len is always initialized before use since we are always called with
> +	 * datalen > 0.
> +	 */
> +	u32 uninitialized_var(len);
> +
> +	while (datalen > 0 && headcount < quota) {
> +		if (unlikely(seg >= UIO_MAXIOV)) {
> +			r = -ENOBUFS;
> +			goto err;
> +		}
> +		r = vhost_get_vq_desc(vq, vq->iov + seg,
> +				      ARRAY_SIZE(vq->iov) - seg, &out,
> +				      &in, log, log_num);
> +		if (unlikely(r < 0))
> +			goto err;
> +
> +		d = r;
> +		if (d == vq->num) {
> +			r = 0;
> +			goto err;
> +		}
> +		if (unlikely(out || in <= 0)) {
> +			vq_err(vq, "unexpected descriptor format for RX: "
> +				"out %d, in %d\n", out, in);
> +			r = -EINVAL;
> +			goto err;
> +		}
> +		if (unlikely(log)) {
> +			nlogs += *log_num;
> +			log += *log_num;
> +		}
> +		heads[headcount].id = cpu_to_vhost32(vq, d);
> +		len = iov_length(vq->iov + seg, in);
> +		heads[headcount].len = cpu_to_vhost32(vq, len);
> +		datalen -= len;
> +		++headcount;
> +		seg += in;
> +	}
> +	heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
> +	*iovcount = seg;
> +	if (unlikely(log))
> +		*log_num = nlogs;
> +
> +	/* Detect overrun */
> +	if (unlikely(datalen > 0)) {
> +		r = UIO_MAXIOV + 1;
> +		goto err;
> +	}
> +	return headcount;
> +err:
> +	vhost_discard_vq_desc(vq, headcount);
> +	return r;
> +}
> +EXPORT_SYMBOL_GPL(vhost_get_bufs);
> +
>  /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
>  void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
>  {
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 6c844b9..52edd242 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -185,6 +185,13 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
>  		      struct iovec iov[], unsigned int iov_count,
>  		      unsigned int *out_num, unsigned int *in_num,
>  		      struct vhost_log *log, unsigned int *log_num);
> +int vhost_get_bufs(struct vhost_virtqueue *vq,
> +		   struct vring_used_elem *heads,
> +		   int datalen,
> +		   unsigned *iovcount,
> +		   struct vhost_log *log,
> +		   unsigned *log_num,
> +		   unsigned int quota);
>  void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
>  
>  int vhost_vq_init_access(struct vhost_virtqueue *);
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH V2 net-next 6/6] ipvlan: Add support for SCTP checksum offload
From: Davide Caratti @ 2018-05-02  8:12 UTC (permalink / raw)
  To: Vladislav Yasevich, netdev
  Cc: linux-sctp, virtualization, virtio-dev, mst, jasowang, nhorman,
	marcelo.leitner, Vladislav Yasevich
In-Reply-To: <20180502020739.19239-7-vyasevic@redhat.com>

On Tue, 2018-05-01 at 22:07 -0400, Vladislav Yasevich wrote:
> Since ipvlan is a software device, we can turn on SCTP checksum
> offload and let the checksum be computed at the lower layer.
> 
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Acked-by: Davide Caratti <dcaratti@redhat.com>

> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 450eec2..ddb1590 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -161,7 +161,8 @@ static void ipvlan_port_destroy(struct net_device *dev)
>  	(NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
>  	 NETIF_F_GSO | NETIF_F_TSO | NETIF_F_GSO_ROBUST | \
>  	 NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
> -	 NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
> +	 NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER | \
> +	 NETIF_F_SCTP_CRC)
>  
>  #define IPVLAN_STATE_MASK \
>  	((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))

^ permalink raw reply

* RE: [PATCH  2/8] dt-bindings: stm32-dwmac: add support of MPU families
From: Christophe ROULLIER @ 2018-05-02  8:15 UTC (permalink / raw)
  To: Rob Herring
  Cc: mark.rutland@arm.com, mcoquelin.stm32@gmail.com, Alexandre TORGUE,
	Peppe CAVALLARO, devicetree@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org
In-Reply-To: <20180501135848.GA12597@rob-hp-laptop>

Hi,
My answers below under "CRO"

Thanks.
Christophe

-----Original Message-----
From: Rob Herring [mailto:robh@kernel.org] 
Sent: mardi 1 mai 2018 15:59
To: Christophe ROULLIER <christophe.roullier@st.com>
Cc: mark.rutland@arm.com; mcoquelin.stm32@gmail.com; Alexandre TORGUE <alexandre.torgue@st.com>; Peppe CAVALLARO <peppe.cavallaro@st.com>; devicetree@vger.kernel.org; linux-arm-kernel@lists.infradead.org; netdev@vger.kernel.org
Subject: Re: [PATCH 2/8] dt-bindings: stm32-dwmac: add support of MPU families

On Tue, Apr 24, 2018 at 05:01:54PM +0200, Christophe Roullier wrote:
> Add description for Ethernet MPU families fields
> 
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
>  Documentation/devicetree/bindings/net/stm32-dwmac.txt | 16 
> ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/stm32-dwmac.txt 
> b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> index 489dbcb..e9d1c4a 100644
> --- a/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.txt
> @@ -6,14 +6,26 @@ Please see stmmac.txt for the other unchanged properties.
>  The device node has following properties.
>  
>  Required properties:
> -- compatible:  Should be "st,stm32-dwmac" to select glue, and
> +- compatible:  For MCU family should be "st,stm32-dwmac" to select 
> +glue, and
>  	       "snps,dwmac-3.50a" to select IP version.
> +	       For MPU family should be "st,stm32mp1-dwmac" to select
> +	       glue, and "snps,dwmac-4.20a" to select IP version.
>  - clocks: Must contain a phandle for each entry in clock-names.
>  - clock-names: Should be "stmmaceth" for the host clock.
>  	       Should be "mac-clk-tx" for the MAC TX clock.
>  	       Should be "mac-clk-rx" for the MAC RX clock.
> +	       For MPU family "ethstp" for power mode clock.
> +	       For MPU family need also "syscfg-clk" for SYSCFG clock.

These are in addition or instead of the first 3 clocks.
CRO: This is in addition of the first 3 clocks, I will modified my comment:
>>  - clock-names: Should be "stmmaceth" for the host clock.
>>  	       Should be "mac-clk-tx" for the MAC TX clock.
>>  	       Should be "mac-clk-rx" for the MAC RX clock.
>> +	       For MPU family need to add also "ethstp" for power mode clock and,
>> +	                                                                   "syscfg-clk" for SYSCFG clock.

> +- interrupt-names: Should contain a list of interrupt names corresponding to
> +           the interrupts in the interrupts property, if available.

You need to list the names. Seems unrelated to MPU support.
CRO: 		   
> +- interrupt-names: Should contain a list of interrupt names corresponding to
> +           the interrupts in the interrupts property, if available.
        + Should be "macirq" for the main MAC IT
        + Should be "eth_wake_irq" for the IT which wake up system

>  - st,syscon : Should be phandle/offset pair. The phandle to the syscon node which
> -	      encompases the glue register, and the offset of the control register.
> +	       encompases the glue register, and the offset of the control register.
> +
> +Optional properties:
> +- clock-names:     For MPU family "mac-clk-ck" for PHY without quartz

The clock is always connected whether you use it or not, right? So it shouldn't be optional based on use.
CRO: Yes the clock is always connected but it is not necessary to enable it if you have phy with quartz (it will not use)
So it is optional.

> +- st,int-phyclk :  valid only where PHY do not have quartz and need to be clock
> +	           by RCC

Boolean?
CRO: Yes
> +- st,int-phyclk (boolean) :  valid only where PHY do not have quartz and need to be clock

Is it ok ?

> +
>  Example:
>  
>  	ethernet@40028000 {
> --
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH bpf-next] bpf/verifier: enable ctx + const + 0.
From: Daniel Borkmann @ 2018-05-02  8:29 UTC (permalink / raw)
  To: Alexei Starovoitov, William Tu
  Cc: Linux Kernel Network Developers, Yonghong Song, Yifeng Sun
In-Reply-To: <20180502045206.6aqm4koawyisgkxr@ast-mbp>

On 05/02/2018 06:52 AM, Alexei Starovoitov wrote:
> On Tue, May 01, 2018 at 09:35:29PM -0700, William Tu wrote:
>>
>>> How did you test this patch?
>>>
>> Without the patch, the test case will fail.
>> With the patch, the test case passes.
> 
> Please test it with real program and you'll see crashes and garbage returned.

+1, *convert_ctx_access() use bpf_insn's off to determine what to rewrite,
so this is definitely buggy, and wasn't properly tested as it should have
been. The test case is also way too simple, just the LDX and then doing a
return 0 will get you past verifier, but won't give you anything in terms
of runtime testing that test_verifier is doing. A single test case for a
non trivial verifier change like this is also _completely insufficient_,
this really needs to test all sort of weird corner cases (involving out of
bounds accesses, overflows, etc).

^ permalink raw reply

* Re: [PATCH] net: stmmac: Avoid VLA usage
From: Jose Abreu @ 2018-05-02  8:54 UTC (permalink / raw)
  To: Kees Cook, Giuseppe Cavallaro, Alexandre Torgue
  Cc: linux-kernel, netdev, Jose Abreu
In-Reply-To: <20180501210130.GA47709@beast>

Hi Kees,

On 01-05-2018 22:01, Kees Cook wrote:
> In the quest to remove all stack VLAs from the kernel[1], this switches
> the "status" stack buffer to use the existing small (8) upper bound on
> how many queues can be checked for DMA, and adds a sanity-check just to
> make sure it doesn't operate under pathological conditions.
>
> [1] https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_CA-2B55aFzCG-2DzNmZwX4A2FQpadafLfEzK6CC-3DqPXydAacU1RqZWA-40mail.gmail.com&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=TBD6a7UY2VbpPmV9LOW_eHAyg8uPq1ZPDhq93VROTVE&s=4fvOST1HhWmZ4lThQe-dHCJYEXNOwey00BCXOWm8tKo&e=
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
>

I rather prefer the variables declaration in reverse-tree order,
but thats just a minor pick.

Reviewed-by: Jose Abreu <joabreu@synopsys.com>

Thanks and Best Regards,
Jose Miguel Abreu

PS: Is VLA warning switch in gcc already active? Because I didn't
see this warning in my builds.

^ permalink raw reply

* [PATCH] change the comment of ip6gre_tnl_addr_conflict
From: Sun Lianwen @ 2018-05-02  9:06 UTC (permalink / raw)
  To: davem; +Cc: netdev

The comment of ip6gre_tnl_addr_conflict() is wrong. which use
ip6_tnl_addr_conflict instead of ip6gre_tnl_addr_conflict.

Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
---
 net/ipv6/ip6_gre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 69727bc168cb..f81362d1ac8a 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -807,7 +807,7 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev)
 }
 
 /**
- * ip6_tnl_addr_conflict - compare packet addresses to tunnel's own
+ * ip6gre_tnl_addr_conflict - compare packet addresses to ip6gre tunnel's own
  *   @t: the outgoing tunnel device
  *   @hdr: IPv6 header from the incoming packet
  *
-- 
2.17.0

^ permalink raw reply related

* non-blocking connect for kernel SCTP sockets
From: Michal Kubecek @ 2018-05-02  9:06 UTC (permalink / raw)
  To: netdev
  Cc: linux-sctp, linux-kernel, Vlad Yasevich, Neil Horman, Gang He,
	GuoQing Jiang

Hello,

while investigating a bug, we noticed that DLM tries to connect an SCTP
socket in non-blocking mode using

	result = sock->ops->connect(sock, (struct sockaddr *)&daddr, addr_len,
				    O_NONBLOCK);

which does not work. The reason is that inet_dgram_connect() cannot pass
its flags argument to sctp_connect() so that __sctp_connect() which does
the actual waiting resorts to checking sk->sk_socket->file->f_flags
instead. As the socket used by DLM is a kernel socket with no associated
file, it ends up blocking.

TCP doesn't suffer from this problem as for TCP, the waiting is done in
inet_stream_connect() which has the flags argument. I also checked other
proto::connect handlers and sctp_connect() seems to be the only one with
this kind of problem.

This could be worked around in DLM and further experiments indicate
current DLM code wouldn't actually handle the non-blocking connect
properly. But I still feel ignoring the flags argument is rather a trap
that should be fixed.

I have prepared a series adding flags argument to proto::connect and
using it in sctp_connect() and __sctp_connect(). But I'm not sure if
it's not too big hammer to address issue only affecting one handler.
So my question is: would such generic approach be preferred or should we
rather make SCTP work the way TCP does, i.e. move the waiting from
proto::connect() to proto_ops::connect()? This would require introducing
inet_seqpacket_connect() as inet_dgram_connect() is primarily intended
for use with UDP.)

Michal Kubecek

^ permalink raw reply

* Re: [PATCH] net/xfrm: Fix lookups for states with spi == 0
From: Herbert Xu @ 2018-05-02  9:11 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Steffen Klassert, David S. Miller,
	netdev
In-Reply-To: <20180502020220.2027-1-dima@arista.com>

On Wed, May 02, 2018 at 03:02:20AM +0100, Dmitry Safonov wrote:
> It seems to be a valid use case to add xfrm state without
> Security Parameter Indexes (SPI) value associated:
> ip xfrm state add src $src dst $dst proto $proto mode $mode sel src $src dst $dst $algo
> 
> The bad thing is that it's currently impossible to get/delete the state
> without SPI: __xfrm_state_insert() obviously doesn't add hash for zero
> SPI in xfrm.state_byspi, and xfrm_user_state_lookup() will fail as
> xfrm_state_lookup() does lookups by hash.
> 
> It also isn't possible to workaround from userspace as
> xfrm_id_proto_match() will be always true for ah/esp/comp protos.
> 
> So, don't try looking up by hash if SPI == 0.
> 
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Dmitry Safonov <dima@arista.com>

A zero SPI is illegal for many IPsec protocols because that value
is used for other purposes, e.g., IKE encapsulation.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
From: Jesper Dangaard Brouer @ 2018-05-02  9:12 UTC (permalink / raw)
  To: Chris Mason
  Cc: brouer, Linus Torvalds, Alexei Starovoitov, David Miller,
	Daniel Borkmann, Greg Kroah-Hartman, Luis R. Rodriguez,
	Network Development, Linux Kernel Mailing List, kernel-team,
	Linux API
In-Reply-To: <EED9C7CB-BC5D-4E2D-B0CC-0003F682C73B@fb.com>


On Tue, 6 Mar 2018 15:42:41 -0800 Chris Mason <clm@fb.com> wrote:
> On 6 Mar 2018, at 11:12, Linus Torvalds wrote:
> 
[...]
> >
> > I do *not* want this to be a magical way to hide things.  
> 
> Especially early on, this makes a lot of sense.  But I wanted to plug 
> bps and the hopefully growing set of bpf introspection tools:
> 
> https://github.com/iovisor/bcc/blob/master/introspection/bps_example.txt
> 
> Long term these are probably a good place to tell the admin what's going 
> on.
(related to bpf itself not modprobe subject)

Hi Chris,

I just want to point out that the tool 'bpftool', is currently the
dominating tool for eBPF introspection.  And the 'bps' tool you mention
seems to have gained less (open source) traction.

The bpftool is part of the kernel git-tree: tools/bpf/bpftool
 https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/

And it even have bash-completion and man-pages in RST format so they
even render nicely when viewed via github:

 https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool.rst
 https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool-prog.rst
 https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool-map.rst

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] bpf: enable stackmap with build_id in nmi context
From: Peter Zijlstra @ 2018-05-02  9:21 UTC (permalink / raw)
  To: Song Liu; +Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20180502000220.2585320-1-songliubraving@fb.com>

On Tue, May 01, 2018 at 05:02:19PM -0700, Song Liu wrote:
> @@ -267,17 +285,27 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>  {
>  	int i;
>  	struct vm_area_struct *vma;
> +	bool in_nmi_ctx = in_nmi();
> +	bool irq_work_busy = false;
> +	struct stack_map_irq_work *work;
> +
> +	if (in_nmi_ctx) {
> +		work = this_cpu_ptr(&irq_work);
> +		if (work->sem)
> +			/* cannot queue more up_read, fallback */
> +			irq_work_busy = true;
> +	}
>  
>  	/*
> +	 * We cannot do up_read() in nmi context. To do build_id lookup
> +	 * in nmi context, we need to run up_read() in irq_work. We use
> +	 * a percpu variable to do the irq_work. If the irq_work is
> +	 * already used by another lookup, we fall back to report ips.
>  	 *
>  	 * Same fallback is used for kernel stack (!user) on a stackmap
>  	 * with build_id.
>  	 */
> +	if (!user || !current || !current->mm || irq_work_busy ||
>  	    down_read_trylock(&current->mm->mmap_sem) == 0) {
>  		/* cannot access current->mm, fall back to ips */
>  		for (i = 0; i < trace_nr; i++) {
> @@ -299,7 +327,13 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>  			- vma->vm_start;
>  		id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
>  	}
> +
> +	if (!in_nmi_ctx)
> +		up_read(&current->mm->mmap_sem);
> +	else {
> +		work->sem = &current->mm->mmap_sem;
> +		irq_work_queue(&work->work);
> +	}
>  }

This is disguisting.. :-)

It's broken though, I've bet you've never actually ran this with lockdep
enabled for example.

Also, you set work->sem before you do trylock, if the trylock fails you
return early and keep work->sem set, which will thereafter always result
in irq_work_busy.

^ permalink raw reply

* Re: [PATCH net-next] udp: Complement partial checksum for GSO packet
From: Willem de Bruijn @ 2018-05-02  9:25 UTC (permalink / raw)
  To: Sean Tranchetti
  Cc: Willem de Bruijn, David Miller, Network Development,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <CAF=yD-+L6ZJq9E0qasPXnA0p3SB=hag+RykO3V707yrrU9-8dA@mail.gmail.com>

On Tue, May 1, 2018 at 12:06 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Tue, May 1, 2018 at 2:01 AM, Sean Tranchetti <stranche@codeaurora.org> wrote:
>> Using the udp_v4_check() function to calculate the pseudo header
>> for the newly segmented UDP packets results in assigning the complement
>> of the value to the UDP header checksum field.
>>
>> Always undo the complement the partial checksum value in order to
>> match the case where GSO is not used on the UDP transmit path.
>>
>> Fixes: ee80d1ebe5ba ("udp: add udp gso")
>> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
>> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks Sean.

^ permalink raw reply

* Re: non-blocking connect for kernel SCTP sockets
From: Xin Long @ 2018-05-02  9:46 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: network dev, linux-sctp, LKML, Vlad Yasevich, Neil Horman,
	Gang He, GuoQing Jiang
In-Reply-To: <20180502090639.j55mnclmkzdts6xb@unicorn.suse.cz>

On Wed, May 2, 2018 at 5:06 PM, Michal Kubecek <mkubecek@suse.cz> wrote:
> Hello,
>
> while investigating a bug, we noticed that DLM tries to connect an SCTP
> socket in non-blocking mode using
>
>         result = sock->ops->connect(sock, (struct sockaddr *)&daddr, addr_len,
>                                     O_NONBLOCK);
>
> which does not work. The reason is that inet_dgram_connect() cannot pass
> its flags argument to sctp_connect() so that __sctp_connect() which does
> the actual waiting resorts to checking sk->sk_socket->file->f_flags
> instead. As the socket used by DLM is a kernel socket with no associated
> file, it ends up blocking.
>
> TCP doesn't suffer from this problem as for TCP, the waiting is done in
> inet_stream_connect() which has the flags argument. I also checked other
> proto::connect handlers and sctp_connect() seems to be the only one with
> this kind of problem.
>
> This could be worked around in DLM and further experiments indicate
> current DLM code wouldn't actually handle the non-blocking connect
> properly. But I still feel ignoring the flags argument is rather a trap
> that should be fixed.
It is a bug, https://bugzilla.redhat.com/show_bug.cgi?id=1251530

We have the fix which also includes some cleanup, and needs to do
more testing.

>
> I have prepared a series adding flags argument to proto::connect and
> using it in sctp_connect() and __sctp_connect(). But I'm not sure if
> it's not too big hammer to address issue only affecting one handler.
> So my question is: would such generic approach be preferred or should we,
> rather make SCTP work the way TCP does, i.e. move the waiting from,
> proto::connect() to proto_ops::connect()? This would require introducing
> inet_seqpacket_connect() as inet_dgram_connect() is primarily intended
> for use with UDP.)
We don't fix it in the generic proto::connect, which will afftect
many other places.

We're replacing only sctp's proto_ops::connect with sctp_connect and
leave its proto::connect as NULL, so that it can get this flags param
without touching the generic struct and code.

^ permalink raw reply

* [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Rahul Lakkireddy @ 2018-05-02  9:47 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	ganeshgr-ut6Up61K2wZBDgjK7y7TUQ, Rahul Lakkireddy,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On production servers running variety of workloads over time, kernel
panic can happen sporadically after days or even months. It is
important to collect as much debug logs as possible to root cause
and fix the problem, that may not be easy to reproduce. Snapshot of
underlying hardware/firmware state (like register dump, firmware
logs, adapter memory, etc.), at the time of kernel panic will be very
helpful while debugging the culprit device driver.

This series of patches add new generic framework that enable device
drivers to collect device specific snapshot of the hardware/firmware
state of the underlying device in the crash recovery kernel. In crash
recovery kernel, the collected logs are added as elf notes to
/proc/vmcore, which is copied by user space scripts for post-analysis.

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:

1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. vmcore module allocates the buffer with requested size. It adds
an elf note and invokes the device driver's registered callback
function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.

The device specific hardware/firmware logs can be seen as elf notes
with note type 0x700, as shown below:

# readelf -n /proc/vmcore

Displaying notes found at file offset 0x00001000 with length 0x040032c0:
  Owner                 Data size	Description
  LINUX                0x02000fec	Unknown note type: (0x00000700)
  LINUX                0x02000fec	Unknown note type: (0x00000700)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
  VMCOREINFO           0x00000785	Unknown note type: (0x00000000)

Patch 1 adds API to vmcore module to allow drivers to register callback
to collect the device specific hardware/firmware logs.  The logs will
be added to /proc/vmcore as elf notes.

Patch 2 updates read and mmap logic to append device specific hardware/
firmware logs as elf notes.

Patch 3 shows a cxgb4 driver example using the API to collect
hardware/firmware logs in crash recovery kernel, before hardware is
initialized.

Thanks,
Rahul

---
v8:
- Added missing linux/types.h header include.
- Removed __vmcore_add_device_dump().

v7:
- Removed "CHELSIO" vendor identifier in Elf Note name. Instead,
  writing "LINUX".
- Moved vmcoredd_header to new file include/uapi/linux/vmcore.h
- Reworked vmcoredd_header to include Elf Note as part of the header
  itself.
- Removed vmcoredd_get_note_size().
- Renamed vmcoredd_write_note() to vmcoredd_write_header().
- Replaced all "unsigned long" with "unsigned int" for device dump
  size since max size of Elf Word is u32.

v6:
- Reworked device dump elf note name to contain vendor identifier.
- Added vmcoredd_header that precedes actual dump in the Elf Note.
- Device dump's name is moved inside vmcoredd_header.
- Added "CHELSIO" string as vendor identifier in the Elf Note name
  for cxgb4 device dumps.

v5:
- Removed enabling CONFIG_PROC_VMCORE_DEVICE_DUMP by default and
  updated help message.

v4:
- Made __vmcore_add_device_dump() static.
- Moved compile check to define vmcore_add_device_dump() to
  crash_dump.h to fix compilation when vmcore.c is not compiled in.
- Convert ---help--- to help in Kconfig as indicated by checkpatch.
- Rebased to tip.

v3:
- Dropped sysfs crashdd module.
- Exported dumps as elf notes. Suggested by Eric Biederman
  <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.  Added as patch 2 in this version.
- Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
  dump support.
- Moved logic related to adding dumps from crashdd to vmcore module.
- Rename all crashdd* to vmcoredd*.
- Updated comments.

v2:
- Added ABI Documentation for crashdd.
- Directly use octal permission instead of macro.

Changes since rfc v2:
- Moved exporting crashdd from procfs to sysfs. Suggested by
  Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
- Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
- Replaced all proc API with sysfs API and updated comments.
- Calling driver callback before creating the binary file under
  crashdd sysfs.
- Changed binary dump file permission from S_IRUSR to S_IRUGO.
- Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.

rfc v2:
- Collecting logs in 2nd kernel instead of during kernel panic.
  Suggested by Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.
- Added new crashdd module that exports /proc/crashdd/ containing
  driver's registered hardware/firmware logs in patch 1.
- Replaced the API to allow drivers to register their hardware/firmware
  log collect routine in crash recovery kernel in patch 1.
- Updated patch 2 to use the new API in patch 1.

Rahul Lakkireddy (3):
  vmcore: add API to collect hardware dump in second kernel
  vmcore: append device dumps to vmcore as elf notes
  cxgb4: collect hardware dump in second kernel

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |   4 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c |  25 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |   3 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  10 +
 fs/proc/Kconfig                                  |  15 +
 fs/proc/vmcore.c                                 | 382 ++++++++++++++++++++++-
 include/linux/crash_dump.h                       |  18 ++
 include/linux/kcore.h                            |   6 +
 include/uapi/linux/elf.h                         |   1 +
 include/uapi/linux/vmcore.h                      |  18 ++
 10 files changed, 469 insertions(+), 13 deletions(-)
 create mode 100644 include/uapi/linux/vmcore.h

-- 
2.14.1

^ permalink raw reply

* [PATCH net-next v8 1/3] vmcore: add API to collect hardware dump in second kernel
From: Rahul Lakkireddy @ 2018-05-02  9:47 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	ganeshgr-ut6Up61K2wZBDgjK7y7TUQ, Rahul Lakkireddy,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1525253481.git.rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:

1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. vmcore module allocates the buffer with requested size. It adds
an Elf note and invokes the device driver's registered callback
function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.

Ensure that the device dump buffer size is always aligned to page size
so that it can be mmaped.

Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
generic and reserve NT_VMCOREDD note type to indicate vmcore device
dump.

Suggested-by: Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Ganesh Goudar <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
v8:
- Added missing linux/types.h header include.
- Removed __vmcore_add_device_dump().

v7:
- Removed "CHELSIO" vendor identifier in Elf Note name. Instead,
  writing "LINUX".
- Moved vmcoredd_header to new file include/uapi/linux/vmcore.h
- Reworked vmcoredd_header to include Elf Note as part of the header
  itself.
- Removed vmcoredd_get_note_size().
- Renamed vmcoredd_write_note() to vmcoredd_write_header().
- Replaced all "unsigned long" with "unsigned int" for device dump
  size since max size of Elf Word is u32.

v6:
- Reworked device dump elf note name to contain vendor identifier.
- Added vmcoredd_header that precedes actual dump in the Elf Note.
- Device dump's name is moved inside vmcoredd_header.

v5:
- Removed enabling CONFIG_PROC_VMCORE_DEVICE_DUMP by default and
  updated help message to indicate that the driver must be present
  in second kernel's initramfs to collect the underlying device
  snapshot.

v4:
- Made __vmcore_add_device_dump() static.
- Moved compile check to define vmcore_add_device_dump() to
  crash_dump.h to fix compilation when vmcore.c is not compiled in.
- Convert ---help--- to help in Kconfig as indicated by checkpatch.
- Rebased to tip.

v3:
- Dropped sysfs crashdd module.
- Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
  dump support.
- Moved logic related to adding dumps from crashdd to vmcore module.
- Rename all crashdd* to vmcoredd*.

v2:
- Added ABI Documentation for crashdd.
- Directly use octal permission instead of macro.

Changes since rfc v2:
- Moved exporting crashdd from procfs to sysfs.  Suggested by
  Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
- Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
- Replaced all proc API with sysfs API and updated comments.
- Calling driver callback before creating the binary file under
  crashdd sysfs.
- Changed binary dump file permission from S_IRUSR to S_IRUGO.
- Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.

rfc v2:
- Collecting logs in 2nd kernel instead of during kernel panic.
  Suggested by Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.
- Patch added in this series.

 fs/proc/Kconfig             |  15 +++++
 fs/proc/vmcore.c            | 135 +++++++++++++++++++++++++++++++++++++++++---
 include/linux/crash_dump.h  |  18 ++++++
 include/linux/kcore.h       |   6 ++
 include/uapi/linux/elf.h    |   1 +
 include/uapi/linux/vmcore.h |  18 ++++++
 6 files changed, 184 insertions(+), 9 deletions(-)
 create mode 100644 include/uapi/linux/vmcore.h

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 1ade1206bb89..0eaeb41453f5 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -43,6 +43,21 @@ config PROC_VMCORE
         help
         Exports the dump image of crashed kernel in ELF format.
 
+config PROC_VMCORE_DEVICE_DUMP
+	bool "Device Hardware/Firmware Log Collection"
+	depends on PROC_VMCORE
+	default n
+	help
+	  After kernel panic, device drivers can collect the device
+	  specific snapshot of their hardware or firmware before the
+	  underlying devices are initialized in crash recovery kernel.
+	  Note that the device driver must be present in the crash
+	  recovery kernel's initramfs to collect its underlying device
+	  snapshot.
+
+	  If you say Y here, the collected device dumps will be added
+	  as ELF notes to /proc/vmcore.
+
 config PROC_SYSCTL
 	bool "Sysctl support (/proc/sys)" if EXPERT
 	depends on PROC_FS
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af22a60..abb3dba0fa49 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/crash_dump.h>
 #include <linux/list.h>
+#include <linux/mutex.h>
 #include <linux/vmalloc.h>
 #include <linux/pagemap.h>
 #include <linux/uaccess.h>
@@ -44,6 +45,12 @@ static u64 vmcore_size;
 
 static struct proc_dir_entry *proc_vmcore;
 
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+/* Device Dump list and mutex to synchronize access to list */
+static LIST_HEAD(vmcoredd_list);
+static DEFINE_MUTEX(vmcoredd_mutex);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
 /*
  * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
  * The called function has to take care of module refcounting.
@@ -302,10 +309,8 @@ static const struct vm_operations_struct vmcore_mmap_ops = {
 };
 
 /**
- * alloc_elfnotes_buf - allocate buffer for ELF note segment in
- *                      vmalloc memory
- *
- * @notes_sz: size of buffer
+ * vmcore_alloc_buf - allocate buffer in vmalloc memory
+ * @sizez: size of buffer
  *
  * If CONFIG_MMU is defined, use vmalloc_user() to allow users to mmap
  * the buffer to user-space by means of remap_vmalloc_range().
@@ -313,12 +318,12 @@ static const struct vm_operations_struct vmcore_mmap_ops = {
  * If CONFIG_MMU is not defined, use vzalloc() since mmap_vmcore() is
  * disabled and there's no need to allow users to mmap the buffer.
  */
-static inline char *alloc_elfnotes_buf(size_t notes_sz)
+static inline char *vmcore_alloc_buf(size_t size)
 {
 #ifdef CONFIG_MMU
-	return vmalloc_user(notes_sz);
+	return vmalloc_user(size);
 #else
-	return vzalloc(notes_sz);
+	return vzalloc(size);
 #endif
 }
 
@@ -665,7 +670,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 		return rc;
 
 	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
-	*notes_buf = alloc_elfnotes_buf(*notes_sz);
+	*notes_buf = vmcore_alloc_buf(*notes_sz);
 	if (!*notes_buf)
 		return -ENOMEM;
 
@@ -851,7 +856,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 		return rc;
 
 	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
-	*notes_buf = alloc_elfnotes_buf(*notes_sz);
+	*notes_buf = vmcore_alloc_buf(*notes_sz);
 	if (!*notes_buf)
 		return -ENOMEM;
 
@@ -1145,6 +1150,115 @@ static int __init parse_crash_elf_headers(void)
 	return 0;
 }
 
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+/**
+ * vmcoredd_write_header - Write vmcore device dump header at the
+ * beginning of the dump's buffer.
+ * @buf: Output buffer where the note is written
+ * @data: Dump info
+ * @size: Size of the dump
+ *
+ * Fills beginning of the dump's buffer with vmcore device dump header.
+ */
+static void vmcoredd_write_header(void *buf, struct vmcoredd_data *data,
+				  u32 size)
+{
+	struct vmcoredd_header *vdd_hdr = (struct vmcoredd_header *)buf;
+
+	vdd_hdr->n_namesz = sizeof(vdd_hdr->name);
+	vdd_hdr->n_descsz = size + sizeof(vdd_hdr->dump_name);
+	vdd_hdr->n_type = NT_VMCOREDD;
+
+	strncpy((char *)vdd_hdr->name, VMCOREDD_NOTE_NAME,
+		sizeof(vdd_hdr->name));
+	memcpy(vdd_hdr->dump_name, data->dump_name, sizeof(vdd_hdr->dump_name));
+}
+
+/**
+ * vmcore_add_device_dump - Add a buffer containing device dump to vmcore
+ * @data: dump info.
+ *
+ * Allocate a buffer and invoke the calling driver's dump collect routine.
+ * Write Elf note at the beginning of the buffer to indicate vmcore device
+ * dump and add the dump to global list.
+ */
+int vmcore_add_device_dump(struct vmcoredd_data *data)
+{
+	struct vmcoredd_node *dump;
+	void *buf = NULL;
+	size_t data_size;
+	int ret;
+
+	if (!data || !strlen(data->dump_name) ||
+	    !data->vmcoredd_callback || !data->size)
+		return -EINVAL;
+
+	dump = vzalloc(sizeof(*dump));
+	if (!dump) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	/* Keep size of the buffer page aligned so that it can be mmaped */
+	data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
+			    PAGE_SIZE);
+
+	/* Allocate buffer for driver's to write their dumps */
+	buf = vmcore_alloc_buf(data_size);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	vmcoredd_write_header(buf, data, data_size -
+			      sizeof(struct vmcoredd_header));
+
+	/* Invoke the driver's dump collection routing */
+	ret = data->vmcoredd_callback(data, buf +
+				      sizeof(struct vmcoredd_header));
+	if (ret)
+		goto out_err;
+
+	dump->buf = buf;
+	dump->size = data_size;
+
+	/* Add the dump to driver sysfs list */
+	mutex_lock(&vmcoredd_mutex);
+	list_add_tail(&dump->list, &vmcoredd_list);
+	mutex_unlock(&vmcoredd_mutex);
+
+	return 0;
+
+out_err:
+	if (buf)
+		vfree(buf);
+
+	if (dump)
+		vfree(dump);
+
+	return ret;
+}
+EXPORT_SYMBOL(vmcore_add_device_dump);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+/* Free all dumps in vmcore device dump list */
+static void vmcore_free_device_dumps(void)
+{
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+	mutex_lock(&vmcoredd_mutex);
+	while (!list_empty(&vmcoredd_list)) {
+		struct vmcoredd_node *dump;
+
+		dump = list_first_entry(&vmcoredd_list, struct vmcoredd_node,
+					list);
+		list_del(&dump->list);
+		vfree(dump->buf);
+		vfree(dump);
+	}
+	mutex_unlock(&vmcoredd_mutex);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+}
+
 /* Init function for vmcore module. */
 static int __init vmcore_init(void)
 {
@@ -1192,4 +1306,7 @@ void vmcore_cleanup(void)
 		kfree(m);
 	}
 	free_elfcorebuf();
+
+	/* clear vmcore device dump list */
+	vmcore_free_device_dumps();
 }
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index f7ac2aa93269..3e4ba9d753c8 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -5,6 +5,7 @@
 #include <linux/kexec.h>
 #include <linux/proc_fs.h>
 #include <linux/elf.h>
+#include <uapi/linux/vmcore.h>
 
 #include <asm/pgtable.h> /* for pgprot_t */
 
@@ -93,4 +94,21 @@ static inline bool is_kdump_kernel(void) { return 0; }
 #endif /* CONFIG_CRASH_DUMP */
 
 extern unsigned long saved_max_pfn;
+
+/* Device Dump information to be filled by drivers */
+struct vmcoredd_data {
+	char dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Unique name of the dump */
+	unsigned int size;                       /* Size of the dump */
+	/* Driver's registered callback to be invoked to collect dump */
+	int (*vmcoredd_callback)(struct vmcoredd_data *data, void *buf);
+};
+
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+int vmcore_add_device_dump(struct vmcoredd_data *data);
+#else
+static inline int vmcore_add_device_dump(struct vmcoredd_data *data)
+{
+	return -EOPNOTSUPP;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
 #endif /* LINUX_CRASHDUMP_H */
diff --git a/include/linux/kcore.h b/include/linux/kcore.h
index 80db19d3a505..8de55e4b5ee9 100644
--- a/include/linux/kcore.h
+++ b/include/linux/kcore.h
@@ -28,6 +28,12 @@ struct vmcore {
 	loff_t offset;
 };
 
+struct vmcoredd_node {
+	struct list_head list;	/* List of dumps */
+	void *buf;		/* Buffer containing device's dump */
+	unsigned int size;	/* Size of the buffer */
+};
+
 #ifdef CONFIG_PROC_KCORE
 extern void kclist_add(struct kcore_list *, void *, size_t, int type);
 #else
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index e2535d6dcec7..4e12c423b9fe 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -421,6 +421,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_SYSTEM_CALL	0x404	/* ARM system call number */
 #define NT_ARM_SVE	0x405		/* ARM Scalable Vector Extension registers */
 #define NT_ARC_V2	0x600		/* ARCv2 accumulator/extra registers */
+#define NT_VMCOREDD	0x700		/* Vmcore Device Dump Note */
 
 /* Note header in a PT_NOTE section */
 typedef struct elf32_note {
diff --git a/include/uapi/linux/vmcore.h b/include/uapi/linux/vmcore.h
new file mode 100644
index 000000000000..022619668e0e
--- /dev/null
+++ b/include/uapi/linux/vmcore.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI_VMCORE_H
+#define _UAPI_VMCORE_H
+
+#include <linux/types.h>
+
+#define VMCOREDD_NOTE_NAME "LINUX"
+#define VMCOREDD_MAX_NAME_BYTES 44
+
+struct vmcoredd_header {
+	__u32 n_namesz; /* Name size */
+	__u32 n_descsz; /* Content size */
+	__u32 n_type;   /* NT_VMCOREDD */
+	__u8 name[8];   /* LINUX\0\0\0 */
+	__u8 dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Device dump's name */
+};
+
+#endif /* _UAPI_VMCORE_H */
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v8 2/3] vmcore: append device dumps to vmcore as elf notes
From: Rahul Lakkireddy @ 2018-05-02  9:47 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	ganeshgr-ut6Up61K2wZBDgjK7y7TUQ, Rahul Lakkireddy,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1525253481.git.rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Update read and mmap logic to append device dumps as additional notes
before the other elf notes. We add device dumps before other elf notes
because the other elf notes may not fill the elf notes buffer
completely and we will end up with zero-filled data between the elf
notes and the device dumps. Tools will then try to decode this
zero-filled data as valid notes and we don't want that. Hence, adding
device dumps before the other elf notes ensure that zero-filled data
can be avoided. This also ensures that the device dumps and the
other elf notes can be properly mmaped at page aligned address.

Incorporate device dump size into the total vmcore size. Also update
offsets for other program headers after the device dumps are added.

Suggested-by: Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Ganesh Goudar <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
v8:
- No changes.

v7:
- No changes.

v6:
- No changes.

v5:
- No changes.

v4:
- No changes.

v3:
- Patch added in this version.
- Exported dumps as elf notes. Suggested by Eric Biederman
  <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>.

 fs/proc/vmcore.c | 247 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 243 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index abb3dba0fa49..247c3499e5bd 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -39,6 +39,8 @@ static size_t elfcorebuf_sz_orig;
 
 static char *elfnotes_buf;
 static size_t elfnotes_sz;
+/* Size of all notes minus the device dump notes */
+static size_t elfnotes_orig_sz;
 
 /* Total size of vmcore file. */
 static u64 vmcore_size;
@@ -51,6 +53,9 @@ static LIST_HEAD(vmcoredd_list);
 static DEFINE_MUTEX(vmcoredd_mutex);
 #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
 
+/* Device Dump Size */
+static size_t vmcoredd_orig_sz;
+
 /*
  * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
  * The called function has to take care of module refcounting.
@@ -185,6 +190,77 @@ static int copy_to(void *target, void *src, size_t size, int userbuf)
 	return 0;
 }
 
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+static int vmcoredd_copy_dumps(void *dst, u64 start, size_t size, int userbuf)
+{
+	struct vmcoredd_node *dump;
+	u64 offset = 0;
+	int ret = 0;
+	size_t tsz;
+	char *buf;
+
+	mutex_lock(&vmcoredd_mutex);
+	list_for_each_entry(dump, &vmcoredd_list, list) {
+		if (start < offset + dump->size) {
+			tsz = min(offset + (u64)dump->size - start, (u64)size);
+			buf = dump->buf + start - offset;
+			if (copy_to(dst, buf, tsz, userbuf)) {
+				ret = -EFAULT;
+				goto out_unlock;
+			}
+
+			size -= tsz;
+			start += tsz;
+			dst += tsz;
+
+			/* Leave now if buffer filled already */
+			if (!size)
+				goto out_unlock;
+		}
+		offset += dump->size;
+	}
+
+out_unlock:
+	mutex_unlock(&vmcoredd_mutex);
+	return ret;
+}
+
+static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
+			       u64 start, size_t size)
+{
+	struct vmcoredd_node *dump;
+	u64 offset = 0;
+	int ret = 0;
+	size_t tsz;
+	char *buf;
+
+	mutex_lock(&vmcoredd_mutex);
+	list_for_each_entry(dump, &vmcoredd_list, list) {
+		if (start < offset + dump->size) {
+			tsz = min(offset + (u64)dump->size - start, (u64)size);
+			buf = dump->buf + start - offset;
+			if (remap_vmalloc_range_partial(vma, dst, buf, tsz)) {
+				ret = -EFAULT;
+				goto out_unlock;
+			}
+
+			size -= tsz;
+			start += tsz;
+			dst += tsz;
+
+			/* Leave now if buffer filled already */
+			if (!size)
+				goto out_unlock;
+		}
+		offset += dump->size;
+	}
+
+out_unlock:
+	mutex_unlock(&vmcoredd_mutex);
+	return ret;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
 /* Read from the ELF header and then the crash dump. On error, negative value is
  * returned otherwise number of bytes read are returned.
  */
@@ -222,10 +298,41 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
 	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
+		/* We add device dumps before other elf notes because the
+		 * other elf notes may not fill the elf notes buffer
+		 * completely and we will end up with zero-filled data
+		 * between the elf notes and the device dumps. Tools will
+		 * then try to decode this zero-filled data as valid notes
+		 * and we don't want that. Hence, adding device dumps before
+		 * the other elf notes ensure that zero-filled data can be
+		 * avoided.
+		 */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+		/* Read device dumps */
+		if (*fpos < elfcorebuf_sz + vmcoredd_orig_sz) {
+			tsz = min(elfcorebuf_sz + vmcoredd_orig_sz -
+				  (size_t)*fpos, buflen);
+			start = *fpos - elfcorebuf_sz;
+			if (vmcoredd_copy_dumps(buffer, start, tsz, userbuf))
+				return -EFAULT;
+
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (!buflen)
+				return acc;
+		}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+		/* Read remaining elf notes */
 		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
-		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
+		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz - vmcoredd_orig_sz;
 		if (copy_to(buffer, kaddr, tsz, userbuf))
 			return -EFAULT;
+
 		buflen -= tsz;
 		*fpos += tsz;
 		buffer += tsz;
@@ -451,11 +558,46 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
 	if (start < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
+		/* We add device dumps before other elf notes because the
+		 * other elf notes may not fill the elf notes buffer
+		 * completely and we will end up with zero-filled data
+		 * between the elf notes and the device dumps. Tools will
+		 * then try to decode this zero-filled data as valid notes
+		 * and we don't want that. Hence, adding device dumps before
+		 * the other elf notes ensure that zero-filled data can be
+		 * avoided. This also ensures that the device dumps and
+		 * other elf notes can be properly mmaped at page aligned
+		 * address.
+		 */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+		/* Read device dumps */
+		if (start < elfcorebuf_sz + vmcoredd_orig_sz) {
+			u64 start_off;
+
+			tsz = min(elfcorebuf_sz + vmcoredd_orig_sz -
+				  (size_t)start, size);
+			start_off = start - elfcorebuf_sz;
+			if (vmcoredd_mmap_dumps(vma, vma->vm_start + len,
+						start_off, tsz))
+				goto fail;
+
+			size -= tsz;
+			start += tsz;
+			len += tsz;
+
+			/* leave now if filled buffer already */
+			if (!size)
+				return 0;
+		}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+		/* Read remaining elf notes */
 		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
-		kaddr = elfnotes_buf + start - elfcorebuf_sz;
+		kaddr = elfnotes_buf + start - elfcorebuf_sz - vmcoredd_orig_sz;
 		if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
 						kaddr, tsz))
 			goto fail;
+
 		size -= tsz;
 		start += tsz;
 		len += tsz;
@@ -703,6 +845,11 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 
+	/* Store the size of all notes.  We need this to update the note
+	 * header when the device dumps will be added.
+	 */
+	elfnotes_orig_sz = phdr.p_memsz;
+
 	return 0;
 }
 
@@ -889,6 +1036,11 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 
+	/* Store the size of all notes.  We need this to update the note
+	 * header when the device dumps will be added.
+	 */
+	elfnotes_orig_sz = phdr.p_memsz;
+
 	return 0;
 }
 
@@ -981,8 +1133,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
-					   struct list_head *vc_list)
+static void set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
+				    struct list_head *vc_list)
 {
 	loff_t vmcore_off;
 	struct vmcore *m;
@@ -1174,6 +1326,92 @@ static void vmcoredd_write_header(void *buf, struct vmcoredd_data *data,
 	memcpy(vdd_hdr->dump_name, data->dump_name, sizeof(vdd_hdr->dump_name));
 }
 
+/**
+ * vmcoredd_update_program_headers - Update all Elf program headers
+ * @elfptr: Pointer to elf header
+ * @elfnotesz: Size of elf notes aligned to page size
+ * @vmcoreddsz: Size of device dumps to be added to elf note header
+ *
+ * Determine type of Elf header (Elf64 or Elf32) and update the elf note size.
+ * Also update the offsets of all the program headers after the elf note header.
+ */
+static void vmcoredd_update_program_headers(char *elfptr, size_t elfnotesz,
+					    size_t vmcoreddsz)
+{
+	unsigned char *e_ident = (unsigned char *)elfptr;
+	u64 start, end, size;
+	loff_t vmcore_off;
+	u32 i;
+
+	vmcore_off = elfcorebuf_sz + elfnotesz;
+
+	if (e_ident[EI_CLASS] == ELFCLASS64) {
+		Elf64_Ehdr *ehdr = (Elf64_Ehdr *)elfptr;
+		Elf64_Phdr *phdr = (Elf64_Phdr *)(elfptr + sizeof(Elf64_Ehdr));
+
+		/* Update all program headers */
+		for (i = 0; i < ehdr->e_phnum; i++, phdr++) {
+			if (phdr->p_type == PT_NOTE) {
+				/* Update note size */
+				phdr->p_memsz = elfnotes_orig_sz + vmcoreddsz;
+				phdr->p_filesz = phdr->p_memsz;
+				continue;
+			}
+
+			start = rounddown(phdr->p_offset, PAGE_SIZE);
+			end = roundup(phdr->p_offset + phdr->p_memsz,
+				      PAGE_SIZE);
+			size = end - start;
+			phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+			vmcore_off += size;
+		}
+	} else {
+		Elf32_Ehdr *ehdr = (Elf32_Ehdr *)elfptr;
+		Elf32_Phdr *phdr = (Elf32_Phdr *)(elfptr + sizeof(Elf32_Ehdr));
+
+		/* Update all program headers */
+		for (i = 0; i < ehdr->e_phnum; i++, phdr++) {
+			if (phdr->p_type == PT_NOTE) {
+				/* Update note size */
+				phdr->p_memsz = elfnotes_orig_sz + vmcoreddsz;
+				phdr->p_filesz = phdr->p_memsz;
+				continue;
+			}
+
+			start = rounddown(phdr->p_offset, PAGE_SIZE);
+			end = roundup(phdr->p_offset + phdr->p_memsz,
+				      PAGE_SIZE);
+			size = end - start;
+			phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+			vmcore_off += size;
+		}
+	}
+}
+
+/**
+ * vmcoredd_update_size - Update the total size of the device dumps and update
+ * Elf header
+ * @dump_size: Size of the current device dump to be added to total size
+ *
+ * Update the total size of all the device dumps and update the Elf program
+ * headers. Calculate the new offsets for the vmcore list and update the
+ * total vmcore size.
+ */
+static void vmcoredd_update_size(size_t dump_size)
+{
+	vmcoredd_orig_sz += dump_size;
+	elfnotes_sz = roundup(elfnotes_orig_sz, PAGE_SIZE) + vmcoredd_orig_sz;
+	vmcoredd_update_program_headers(elfcorebuf, elfnotes_sz,
+					vmcoredd_orig_sz);
+
+	/* Update vmcore list offsets */
+	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
+
+	vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+				      &vmcore_list);
+	proc_vmcore->size = vmcore_size;
+}
+
 /**
  * vmcore_add_device_dump - Add a buffer containing device dump to vmcore
  * @data: dump info.
@@ -1227,6 +1465,7 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
 	list_add_tail(&dump->list, &vmcoredd_list);
 	mutex_unlock(&vmcoredd_mutex);
 
+	vmcoredd_update_size(data_size);
 	return 0;
 
 out_err:
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v8 3/3] cxgb4: collect hardware dump in second kernel
From: Rahul Lakkireddy @ 2018-05-02  9:47 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	ganeshgr-ut6Up61K2wZBDgjK7y7TUQ, Rahul Lakkireddy,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1525253481.git.rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Register callback to collect hardware/firmware dumps in second kernel
before hardware/firmware is initialized. The dumps for each device
will be available as elf notes in /proc/vmcore in second kernel.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Ganesh Goudar <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
v8:
- No changes.

v7:
- Removed "CHELSIO" vendor identifier in Elf Note name.

v6:
- Added "CHELSIO" string as vendor identifier in the Elf Note name.

v5:
- No changes.

v4:
- No changes.

v3:
- Replaced all crashdd* with vmcoredd*.
- Replaced crashdd_add_dump() with vmcore_add_device_dump().
- Updated comments and commit message.

v2:
- No Changes.

Changes since rfc v2:
- Update comments and commit message for sysfs change.

rfc v2:
- Updated dump registration to the new API in patch 1.

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |  4 ++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 25 ++++++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |  3 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  | 10 ++++++++++
 4 files changed, 42 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 688f95440af2..01e7aad4ce5b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -50,6 +50,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
 #include <linux/ptp_classify.h>
+#include <linux/crash_dump.h>
 #include <asm/io.h>
 #include "t4_chip_type.h"
 #include "cxgb4_uld.h"
@@ -964,6 +965,9 @@ struct adapter {
 	struct hma_data hma;
 
 	struct srq_data *srq;
+
+	/* Dump buffer for collecting logs in kdump kernel */
+	struct vmcoredd_data vmcoredd;
 };
 
 /* Support for "sched-class" command to allow a TX Scheduling Class to be
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index 143686c60234..085691eb2b95 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -488,3 +488,28 @@ void cxgb4_init_ethtool_dump(struct adapter *adapter)
 	adapter->eth_dump.version = adapter->params.fw_vers;
 	adapter->eth_dump.len = 0;
 }
+
+static int cxgb4_cudbg_vmcoredd_collect(struct vmcoredd_data *data, void *buf)
+{
+	struct adapter *adap = container_of(data, struct adapter, vmcoredd);
+	u32 len = data->size;
+
+	return cxgb4_cudbg_collect(adap, buf, &len, CXGB4_ETH_DUMP_ALL);
+}
+
+int cxgb4_cudbg_vmcore_add_dump(struct adapter *adap)
+{
+	struct vmcoredd_data *data = &adap->vmcoredd;
+	u32 len;
+
+	len = sizeof(struct cudbg_hdr) +
+	      sizeof(struct cudbg_entity_hdr) * CUDBG_MAX_ENTITY;
+	len += CUDBG_DUMP_BUFF_SIZE;
+
+	data->size = len;
+	snprintf(data->dump_name, sizeof(data->dump_name), "%s_%s",
+		 cxgb4_driver_name, adap->name);
+	data->vmcoredd_callback = cxgb4_cudbg_vmcoredd_collect;
+
+	return vmcore_add_device_dump(data);
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
index ce1ac9a1c878..ef59ba1ed968 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
@@ -41,8 +41,11 @@ enum CXGB4_ETHTOOL_DUMP_FLAGS {
 	CXGB4_ETH_DUMP_HW = (1 << 1), /* various FW and HW dumps */
 };
 
+#define CXGB4_ETH_DUMP_ALL (CXGB4_ETH_DUMP_MEM | CXGB4_ETH_DUMP_HW)
+
 u32 cxgb4_get_dump_length(struct adapter *adap, u32 flag);
 int cxgb4_cudbg_collect(struct adapter *adap, void *buf, u32 *buf_size,
 			u32 flag);
 void cxgb4_init_ethtool_dump(struct adapter *adapter);
+int cxgb4_cudbg_vmcore_add_dump(struct adapter *adap);
 #endif /* __CXGB4_CUDBG_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 24d2865b8806..32cad0acf76c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5544,6 +5544,16 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto out_free_adapter;
 
+	if (is_kdump_kernel()) {
+		/* Collect hardware state and append to /proc/vmcore */
+		err = cxgb4_cudbg_vmcore_add_dump(adapter);
+		if (err) {
+			dev_warn(adapter->pdev_dev,
+				 "Fail collecting vmcore device dump, err: %d. Continuing\n",
+				 err);
+			err = 0;
+		}
+	}
 
 	if (!is_t4(adapter->params.chip)) {
 		s_qpp = (QUEUESPERPAGEPF0_S +
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next v2 15/15] arm64: dts: allwinner: a64: add SRAM controller device tree node
From: Maxime Ripard @ 2018-05-02  9:51 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Michael Turquette, Stephen Boyd, Giuseppe Cavallaro, Rob Herring,
	Mark Rutland, Mark Brown, Icenowy Zheng, linux-arm-kernel,
	linux-clk, devicetree, netdev, Corentin Labbe
In-Reply-To: <20180501161227.2110-16-wens@csie.org>

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

Hi,

On Wed, May 02, 2018 at 12:12:27AM +0800, Chen-Yu Tsai wrote:
> From: Icenowy Zheng <icenowy@aosc.io>
> 
> Allwinner A64 has a SRAM controller, and in the device tree currently
> we have a syscon node to enable EMAC driver to access the EMAC clock
> register. As SRAM controller driver can now export regmap for this
> register, replace the syscon node to the SRAM controller device node,
> and let EMAC driver to acquire its EMAC clock regmap.
> 
> Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
> ---
>  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 23 +++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> index 1b2ef28c42bd..1c37659d9d41 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> @@ -168,10 +168,25 @@
>  		#size-cells = <1>;
>  		ranges;
>  
> -		syscon: syscon@1c00000 {
> -			compatible = "allwinner,sun50i-a64-system-controller",
> -				"syscon";
> +		sram_controller: sram-controller@1c00000 {
> +			compatible = "allwinner,sun50i-a64-sram-controller";

I don't think there's anything preventing us from keeping the
-system-controller compatible. It's what was in the DT before, and
it's how it's called in the datasheet.

Otherwise, the whole serie looks good to me:
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 15/15] arm64: dts: allwinner: a64: add SRAM controller device tree node
From: Chen-Yu Tsai @ 2018-05-02  9:53 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Michael Turquette, Stephen Boyd, Giuseppe Cavallaro, Rob Herring,
	Mark Rutland, Mark Brown, Icenowy Zheng, linux-arm-kernel,
	linux-clk, devicetree, netdev, Corentin Labbe
In-Reply-To: <20180502095118.rqnfwy576xh6ercm@flea>

On Wed, May 2, 2018 at 5:51 PM, Maxime Ripard <maxime.ripard@bootlin.com> wrote:
> Hi,
>
> On Wed, May 02, 2018 at 12:12:27AM +0800, Chen-Yu Tsai wrote:
>> From: Icenowy Zheng <icenowy@aosc.io>
>>
>> Allwinner A64 has a SRAM controller, and in the device tree currently
>> we have a syscon node to enable EMAC driver to access the EMAC clock
>> register. As SRAM controller driver can now export regmap for this
>> register, replace the syscon node to the SRAM controller device node,
>> and let EMAC driver to acquire its EMAC clock regmap.
>>
>> Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
>> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
>> ---
>>  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 23 +++++++++++++++----
>>  1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> index 1b2ef28c42bd..1c37659d9d41 100644
>> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> @@ -168,10 +168,25 @@
>>               #size-cells = <1>;
>>               ranges;
>>
>> -             syscon: syscon@1c00000 {
>> -                     compatible = "allwinner,sun50i-a64-system-controller",
>> -                             "syscon";
>> +             sram_controller: sram-controller@1c00000 {
>> +                     compatible = "allwinner,sun50i-a64-sram-controller";
>
> I don't think there's anything preventing us from keeping the
> -system-controller compatible. It's what was in the DT before, and
> it's how it's called in the datasheet.

I actually meant to ask you about this. The -system-controller compatible
matches the datasheet better. Maybe we should just switch to that one?

ChenYu

> Otherwise, the whole serie looks good to me:
> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
>
> Maxime
>
> --
> Maxime Ripard, Bootlin (formerly Free Electrons)
> Embedded Linux and Kernel engineering
> https://bootlin.com

^ permalink raw reply

* [PATCH] NET/netlink: optimize output of seq_puts in af_netlink.c
From: Bo YU @ 2018-05-02  9:54 UTC (permalink / raw)
  To: davem, Wang, yuzibode; +Cc: netdev, kernel-janitors

Optimization of command output: `cat /proc/net/netlink`

After the patch, we will get:

https://clbin.com/lnu4L

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
---
  net/netlink/af_netlink.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 55342c4d5cec..2e2dd88fc79f 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2606,13 +2606,13 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
  {
  	if (v == SEQ_START_TOKEN) {
  		seq_puts(seq,
-			 "sk       Eth Pid    Groups   "
-			 "Rmem     Wmem     Dump     Locks     Drops     Inode\n");
+			 "sk               Eth Pid        Groups   "
+			 "Rmem     Wmem     Dump  Locks    Drops    Inode\n");
  	} else {
  		struct sock *s = v;
  		struct netlink_sock *nlk = nlk_sk(s);

-		seq_printf(seq, "%pK %-3d %-6u %08x %-8d %-8d %d %-8d %-8d %-8lu\n",
+		seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8d %-8lu\n",
  			   s,
  			   s->sk_protocol,
  			   nlk->portid,

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox