Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: roopa @ 2016-04-14  6:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jhs
In-Reply-To: <20160414.001948.1059678462607051806.davem@davemloft.net>

On 4/13/16, 9:19 PM, David Miller wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> Date: Fri,  8 Apr 2016 23:38:11 -0700
>
>> This patch adds a new RTM_GETSTATS message to query link stats via netlink
>> from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
>> returns a lot more than just stats and is expensive in some cases when
>> frequent polling for stats from userspace is a common operation.
> Great work.  One thing catches my eye:
>
>> +	if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK64)) {
>> +		attr = nla_reserve(skb, IFLA_STATS_LINK64,
>> +				   sizeof(struct rtnl_link_stats64));
>> +		if (!attr)
>> +			return -EMSGSIZE;
>> +
>> +		stats = dev_get_stats(dev, &temp);
>> +
>> +		copy_rtnl_link_stats64(nla_data(attr), stats);
> This extra copy bothers me, so I tried to figure out what is going
> on here.
>
> dev_get_stats() always returns the rtnl_link_stats64 pointer it was
> given.  We should be able to pass, therefore, nla_data(attr), straight
> there to avoid the copy.

nice catch. I picked this up straight from rtnl_fill_stats. agree, also thanks
 for the example below.

>
> Bonding even uses dev_get_stats() in this way.
>
> The existing rtnl_fill_stats() can be improved similarly but is of
> course a separate change.  In that case, we'd do something like:
>
> 	struct rtnl_link_stats64 *sp;
>
> 	attr = nla_reserve(skb, IFLA_STATS64,
> 			   sizeof(struct rtnl_link_stats64));
> 	if (!attr)
> 		return -EMSGSIZE;
>
> 	sp = nla_data(attr);
> 	dev_get_stats(dev, sp);
>
> 	attr = nla_reserve(skb, IFLA_STATS,
> 			   sizeof(struct rtnl_link_stats));
> 	if (!attr)
> 		return -EMSGSIZE;
>
> 	copy_rtnl_link_stats(nla_data(attr), sp);

I will submit a separate patch for this with some testing.

Will send a v3 out before end of this week.

Thank you!.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: roopa @ 2016-04-14  6:36 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Thomas Graf, netdev, davem, Nikolay Aleksandrov
In-Reply-To: <570E3784.9010809@mojatatu.com>

On 4/13/16, 5:11 AM, Jamal Hadi Salim wrote:
> On 16-04-12 09:21 AM, Thomas Graf wrote:
>> On 04/11/16 at 08:53pm, roopa wrote:
>>> Top level stats attributes can be netdev or global attributes: We can include string "LINK" in
>>> the names of all stats belonging to a netdev to make it easier to recognize the netdev stats (example):
>>>   IFLA_STATS_LINK64,           (netdev)
>>>   IFLA_STATS_LINK_INET6,    (netdev)
>>>   IFLA_STATS_TCP,                (non-netdev, global tcp stats)
>>
>> This is fine as well. It means that we cant mix netdev and non-netdev
>> stats or stats for multiple netdevs in the same request which would
>> not be the case if you nest it and have a top level attribute which
>> is a list of requests. That may be borderline to overengineering
>> though so I'm fine this as well.
>
>
> Well - using a subheader which has ifindex on it for non-netdev stats
> seems wrong then.
I think its ok for ifindex to be optional.
Almost all msg types have subheaders with ifindex and it is almost always
optional in the GET request (eg RTM_GETLINK with NLM_F_DUMP ifindex is
optional and never checked). Here we make ifindex optional both ways.

Its optional in the GET request with NLM_F_DUMP and it is also optional
in the msg to userspace (NEW). And no ifindex in NEW will indicate that the
message is not tied to a specific netdev (ie global stats).

Thanks!

^ permalink raw reply

* Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption
From: Nikos Mavrogiannopoulos @ 2016-04-14  6:47 UTC (permalink / raw)
  To: Tadeusz Struk
  Cc: Fridolin Pokorny, Tom Herbert, Herbert Xu,
	Linux Crypto Mailing List, LKML, David S. Miller,
	Linux Kernel Network Developers, Dave Watson, fridolin.pokorny
In-Reply-To: <570ECC28.3030008@intel.com>

On Thu, Apr 14, 2016 at 12:46 AM, Tadeusz Struk <tadeusz.struk@intel.com> wrote:
> Hi Fridolin,
> On 04/12/2016 04:13 AM, Fridolin Pokorny wrote:
>> we were experimenting with this. We have a prove of concept of a kernel
>> TLS type socket, so called AF_KTLS, which is based on Dave Watson's
>> RFC5288 patch. It handles both TLS and DTLS, unfortunately it is not
>> ready now to be proposed here. There are still issues which should be
>> solved (but mostly user space API design) [1]. If you are interested, we
>> could combine efforts.
>>
>> Regards,
>> Fridolin Pokorny
>>
>> [1] https://github.com/fridex/af_ktls
> I had a quick look and it looks like is limited only to gcm(aes).
> I would be more interested to have a generic interface that could do generic algorithm
> suits like aes-cbc-hmac-sha1 also.

This is not a real limitation but an advantage. The cbc-hmac-sha1
needs a lot of hacks to be implemented correct (just take a look at
one of the existing implementations). There is no point to bring such
hacks into kernel especially since these ciphersuites are banned from
HTTP/2.0 (see RFC7540), and have been dropped from TLS 1.3.

> This also seems to work in a synchronous (send one and wait) mode, which is a not good
> solution for HW accelerators, which I'm trying to enable.

Is that something that cannot be addressed?

regards,
Nikos

^ permalink raw reply

* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: Jason Wang @ 2016-04-14  6:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, Paolo Abeni
  Cc: netdev, David S. Miller, Hannes Frederic Sowa, Eric W. Biederman,
	Greg Kurz
In-Reply-To: <20160413132411-mutt-send-email-mst@redhat.com>



On 04/13/2016 06:26 PM, Michael S. Tsirkin wrote:
> On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
>> This patch disables the default qdisc by explicitly setting the
>> IFF_NO_QUEUE private flag so that now the tun xmit path do not
>> require any lock by default.
>>
>> The default qdisc was first removed as a side effect of commit
>> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
>> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> I wonder about this back and forth.
> Jason, do you see a workload where the default qdisc
> is preferable?

I don't know, but we used to behave like this so we'd better keep it.

An interesting thing is I vaguely remember that you have some concern
when I propose IFF_NO_QUEUE for macvtap[1] :)

I think this could be done by management or more safe by introducing a
new tun flag (TUN_NO_QUEUE).

[1] https://lkml.org/lkml/2015/8/24/147

>
>> ---
>>  drivers/net/tun.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 42992dc..0a0b63c 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1081,6 +1081,7 @@ static void tun_net_init(struct net_device *dev)
>>  
>>  		break;
>>  	}
>> +	dev->priv_flags |= IFF_NO_QUEUE;
>>  }
>>  
>>  /* Character device part */
>> -- 
>> 1.8.3.1

^ permalink raw reply

* Re: [PATCH RFC 1/2] tun: don't require serialization lock on tx
From: Jason Wang @ 2016-04-14  6:50 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: David S. Miller, Michael S. Tsirkin, Hannes Frederic Sowa,
	Eric W. Biederman, Greg Kurz
In-Reply-To: <425e8f69ffd8fe315ef9d3cc678519c7060fb2e0.1460393493.git.pabeni@redhat.com>



On 04/13/2016 05:04 PM, Paolo Abeni wrote:
> The current tun_net_xmit() implementation don't need any external
> lock since it relay on rcu protection for the tun data structure
> and on socket queue lock for skb queuing.
>
> This patch set the NETIF_F_LLTX feature bit in the tun device, so
> that on xmit, in absence of qdisc, no serialization lock is acquired
> by the caller.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  drivers/net/tun.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index faf9297..42992dc 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1796,7 +1796,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
>  				   TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
>  				   NETIF_F_HW_VLAN_STAG_TX;
> -		dev->features = dev->hw_features;
> +		dev->features = dev->hw_features | NETIF_F_LLTX;
>  		dev->vlan_features = dev->features &
>  				     ~(NETIF_F_HW_VLAN_CTAG_TX |
>  				       NETIF_F_HW_VLAN_STAG_TX);

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply

* pull-request: mac80211 2016-04-14
From: Johannes Berg @ 2016-04-14  7:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-wireless

Hi Dave,

Since I didn't get anything else, and this has been pending for a week,
here's the other part of the nl80211 socket problem fix (the netlink
family URELEASE was the first part.)

Let me know if there's any problem.

Thanks,
johannes



The following changes since commit 30d237a6c2e9be1bb816fe8e787b88fd7aad833b:

  Merge tag 'mac80211-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 (2016-04-08 16:41:28 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-davem-2016-04-14

for you to fetch changes up to 8f815cdde3e550e10c2736990d791f60c2ce43eb:

  nl80211: check netlink protocol in socket release notification (2016-04-12 15:39:06 +0200)

----------------------------------------------------------------
This has just the single fix from Dmitry Ivanov, adding the missing
netlink notifier family check to avoid the socket close DoS problem.

----------------------------------------------------------------
Dmitry Ivanov (1):
      nl80211: check netlink protocol in socket release notification

 net/wireless/nl80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

^ permalink raw reply

* [PATCHv3 net-next 0/6] sctp: support sctp_diag in kernel
From: Xin Long @ 2016-04-14  7:35 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem,
	eric.dumazet

This patchset will add sctp_diag module to implement diag interface on
sctp in kernel.

For a listening sctp endpoint, we will just dump it's ep info.
For a sctp connection, we will the assoc info and it's ep info.

The ss dump will looks like:

[iproute2]# ./misc/ss --sctp  -n -l
State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port
LISTEN     0      128      172.16.254.254:8888      *:*
LISTEN     0      5        127.0.0.1:1234           *:*
LISTEN     0      5        127.0.0.1:1234           *:*
  - ESTAB  0      0        127.0.0.1%lo:1234        127.0.0.1:4321
LISTEN     0      128      172.16.254.254:8888      *:*
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.253.253:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.1.1:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.1.2:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.2.1:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.2.2:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.3.1:8888
  - ESTAB  0      0        172.16.254.254%eth1:8888 172.16.3.2:8888
LISTEN     0      0        127.0.0.1:4321           *:*
  - ESTAB  0      0        127.0.0.1%lo:4321        127.0.0.1:1234

The entries with '- ESTAB' are the assocs, some of them may belong to
the same endpoint. So we will dump the parent endpoint first, like the
entry with 'LISTEN'. then dump the assocs. ep and assocs entries will
be dumped in right order so that ss can show them in tree format easily.

Besides, this patchset also simplifies sctp proc codes, cause it has
some similar codes with sctp diag in sctp transport traversal.

v1->v2:
  1. inet_diag_get_handler needs to return it as const.
  2. merge 5/7 into 2/7 of v1.

v2->v3:
  do some improvements and fixes in patch 1-4, see the details in
  each patch's comment.

Xin Long (6):
  sctp: add sctp_info dump api for sctp_diag
  sctp: export some apis or variables for sctp_diag and reuse some for
    proc
  sctp: export some functions for sctp_diag in inet_diag
  sctp: add the sctp_diag.c file
  sctp: merge the seq_start/next/exits in remaddrs and assocs
  sctp: fix some rhashtable functions using in sctp proc/diag

 include/linux/sctp.h           |  67 ++++++
 include/net/sctp/sctp.h        |  16 ++
 include/uapi/linux/inet_diag.h |   2 +
 net/ipv4/inet_diag.c           |  67 +++---
 net/sctp/Kconfig               |   4 +
 net/sctp/Makefile              |   1 +
 net/sctp/proc.c                | 105 ++-------
 net/sctp/sctp_diag.c           | 497 +++++++++++++++++++++++++++++++++++++++++
 net/sctp/socket.c              | 216 ++++++++++++++++++
 9 files changed, 863 insertions(+), 112 deletions(-)
 create mode 100644 net/sctp/sctp_diag.c

-- 
2.1.0

^ permalink raw reply

* [PATCHv3 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Xin Long @ 2016-04-14  7:35 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

sctp_diag will dump some important details of sctp's assoc or ep, we use
sctp_info to describe them,  sctp_get_sctp_info to get them, and export
it to sctp_diag.ko.

v2->v3:
- we will not use list_for_each_safe in sctp_get_sctp_info, cause
  all the callers of it will use lock_sock.

- fix the holes in struct sctp_info with __reserved* field.
  because sctp_diag is a new feature, and sctp_info is just for now,
  it may be changed in the future.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/linux/sctp.h    | 67 ++++++++++++++++++++++++++++++++++++++
 include/net/sctp/sctp.h |  3 ++
 net/sctp/socket.c       | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 156 insertions(+)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index a9414fd..dacb5e7 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -705,4 +705,71 @@ typedef struct sctp_auth_chunk {
 	sctp_authhdr_t auth_hdr;
 } __packed sctp_auth_chunk_t;
 
+struct sctp_info {
+	__u32	sctpi_tag;
+	__u32	sctpi_state;
+	__u32	sctpi_rwnd;
+	__u16	sctpi_unackdata;
+	__u16	sctpi_penddata;
+	__u16	sctpi_instrms;
+	__u16	sctpi_outstrms;
+	__u32	sctpi_fragmentation_point;
+	__u32	sctpi_inqueue;
+	__u32	sctpi_outqueue;
+	__u32	sctpi_overall_error;
+	__u32	sctpi_max_burst;
+	__u32	sctpi_maxseg;
+	__u32	sctpi_peer_rwnd;
+	__u32	sctpi_peer_tag;
+	__u8	sctpi_peer_capable;
+	__u8	sctpi_peer_sack;
+	__u16	__reserved1;
+
+	/* assoc status info */
+	__u64	sctpi_isacks;
+	__u64	sctpi_osacks;
+	__u64	sctpi_opackets;
+	__u64	sctpi_ipackets;
+	__u64	sctpi_rtxchunks;
+	__u64	sctpi_outofseqtsns;
+	__u64	sctpi_idupchunks;
+	__u64	sctpi_gapcnt;
+	__u64	sctpi_ouodchunks;
+	__u64	sctpi_iuodchunks;
+	__u64	sctpi_oodchunks;
+	__u64	sctpi_iodchunks;
+	__u64	sctpi_octrlchunks;
+	__u64	sctpi_ictrlchunks;
+
+	/* primary transport info */
+	struct sockaddr_storage	sctpi_p_address;
+	__s32	sctpi_p_state;
+	__u32	sctpi_p_cwnd;
+	__u32	sctpi_p_srtt;
+	__u32	sctpi_p_rto;
+	__u32	sctpi_p_hbinterval;
+	__u32	sctpi_p_pathmaxrxt;
+	__u32	sctpi_p_sackdelay;
+	__u32	sctpi_p_sackfreq;
+	__u32	sctpi_p_ssthresh;
+	__u32	sctpi_p_partial_bytes_acked;
+	__u32	sctpi_p_flight_size;
+	__u16	sctpi_p_error;
+	__u16	__reserved2;
+
+	/* sctp sock info */
+	__u32	sctpi_s_autoclose;
+	__u32	sctpi_s_adaptation_ind;
+	__u32	sctpi_s_pd_point;
+	__u8	sctpi_s_nodelay;
+	__u8	sctpi_s_disable_fragments;
+	__u8	sctpi_s_v4mapped;
+	__u8	sctpi_s_frag_interleave;
+};
+
+struct sctp_infox {
+	struct sctp_info *sctpinfo;
+	struct sctp_association *asoc;
+};
+
 #endif /* __LINUX_SCTP_H__ */
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 03fb33e..517e02d 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -116,6 +116,9 @@ extern struct percpu_counter sctp_sockets_allocated;
 int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry *);
 struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
 
+int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
+		       struct sctp_info *info);
+
 /*
  * sctp/primitive.c
  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 878d28e..7fbc098 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4202,6 +4202,92 @@ static void sctp_shutdown(struct sock *sk, int how)
 	}
 }
 
+int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
+		       struct sctp_info *info)
+{
+	struct sctp_transport *prim;
+	struct list_head *pos;
+	int mask;
+
+	memset(info, 0, sizeof(*info));
+	if (!asoc) {
+		struct sctp_sock *sp = sctp_sk(sk);
+
+		info->sctpi_s_autoclose = sp->autoclose;
+		info->sctpi_s_adaptation_ind = sp->adaptation_ind;
+		info->sctpi_s_pd_point = sp->pd_point;
+		info->sctpi_s_nodelay = sp->nodelay;
+		info->sctpi_s_disable_fragments = sp->disable_fragments;
+		info->sctpi_s_v4mapped = sp->v4mapped;
+		info->sctpi_s_frag_interleave = sp->frag_interleave;
+
+		return 0;
+	}
+
+	info->sctpi_tag = asoc->c.my_vtag;
+	info->sctpi_state = asoc->state;
+	info->sctpi_rwnd = asoc->a_rwnd;
+	info->sctpi_unackdata = asoc->unack_data;
+	info->sctpi_penddata = sctp_tsnmap_pending(&asoc->peer.tsn_map);
+	info->sctpi_instrms = asoc->c.sinit_max_instreams;
+	info->sctpi_outstrms = asoc->c.sinit_num_ostreams;
+	list_for_each(pos, &asoc->base.inqueue.in_chunk_list)
+		info->sctpi_inqueue++;
+	list_for_each(pos, &asoc->outqueue.out_chunk_list)
+		info->sctpi_outqueue++;
+	info->sctpi_overall_error = asoc->overall_error_count;
+	info->sctpi_max_burst = asoc->max_burst;
+	info->sctpi_maxseg = asoc->frag_point;
+	info->sctpi_peer_rwnd = asoc->peer.rwnd;
+	info->sctpi_peer_tag = asoc->c.peer_vtag;
+
+	mask = asoc->peer.ecn_capable << 1;
+	mask = (mask | asoc->peer.ipv4_address) << 1;
+	mask = (mask | asoc->peer.ipv6_address) << 1;
+	mask = (mask | asoc->peer.hostname_address) << 1;
+	mask = (mask | asoc->peer.asconf_capable) << 1;
+	mask = (mask | asoc->peer.prsctp_capable) << 1;
+	mask = (mask | asoc->peer.auth_capable);
+	info->sctpi_peer_capable = mask;
+	mask = asoc->peer.sack_needed << 1;
+	mask = (mask | asoc->peer.sack_generation) << 1;
+	mask = (mask | asoc->peer.zero_window_announced);
+	info->sctpi_peer_sack = mask;
+
+	info->sctpi_isacks = asoc->stats.isacks;
+	info->sctpi_osacks = asoc->stats.osacks;
+	info->sctpi_opackets = asoc->stats.opackets;
+	info->sctpi_ipackets = asoc->stats.ipackets;
+	info->sctpi_rtxchunks = asoc->stats.rtxchunks;
+	info->sctpi_outofseqtsns = asoc->stats.outofseqtsns;
+	info->sctpi_idupchunks = asoc->stats.idupchunks;
+	info->sctpi_gapcnt = asoc->stats.gapcnt;
+	info->sctpi_ouodchunks = asoc->stats.ouodchunks;
+	info->sctpi_iuodchunks = asoc->stats.iuodchunks;
+	info->sctpi_oodchunks = asoc->stats.oodchunks;
+	info->sctpi_iodchunks = asoc->stats.iodchunks;
+	info->sctpi_octrlchunks = asoc->stats.octrlchunks;
+	info->sctpi_ictrlchunks = asoc->stats.ictrlchunks;
+
+	prim = asoc->peer.primary_path;
+	memcpy(&info->sctpi_p_address, &prim->ipaddr,
+	       sizeof(struct sockaddr_storage));
+	info->sctpi_p_state = prim->state;
+	info->sctpi_p_cwnd = prim->cwnd;
+	info->sctpi_p_srtt = prim->srtt;
+	info->sctpi_p_rto = jiffies_to_msecs(prim->rto);
+	info->sctpi_p_hbinterval = prim->hbinterval;
+	info->sctpi_p_pathmaxrxt = prim->pathmaxrxt;
+	info->sctpi_p_sackdelay = jiffies_to_msecs(prim->sackdelay);
+	info->sctpi_p_ssthresh = prim->ssthresh;
+	info->sctpi_p_partial_bytes_acked = prim->partial_bytes_acked;
+	info->sctpi_p_flight_size = prim->flight_size;
+	info->sctpi_p_error = prim->error_count;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sctp_get_sctp_info);
+
 /* 7.2.1 Association Status (SCTP_STATUS)
 
  * Applications can retrieve current status information about an
-- 
2.1.0

^ permalink raw reply related

* [PATCHv3 net-next 2/6] sctp: export some apis or variables for sctp_diag and reuse some for proc
From: Xin Long @ 2016-04-14  7:35 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

For some main variables in sctp.ko, we couldn't export it to other modules,
so we have to define some api to access them.

It will include sctp transport and endpoint's traversal.

There are some transport traversal functions for sctp_diag, we can also
use it for sctp_proc. cause they have the similar situation to traversal
transport.

v2->v3:
- rhashtable_walk_init need the parameter gfp, because of recent upstrem
  update

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/sctp.h |  13 +++++
 net/sctp/proc.c         |  81 +++++++------------------------
 net/sctp/socket.c       | 125 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 156 insertions(+), 63 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 517e02d..17890f9 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -116,6 +116,19 @@ extern struct percpu_counter sctp_sockets_allocated;
 int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry *);
 struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
 
+int sctp_transport_walk_start(struct rhashtable_iter *iter);
+void sctp_transport_walk_stop(struct rhashtable_iter *iter);
+struct sctp_transport *sctp_transport_get_next(struct net *net,
+			struct rhashtable_iter *iter);
+struct sctp_transport *sctp_transport_get_idx(struct net *net,
+			struct rhashtable_iter *iter, int pos);
+int sctp_transport_lookup_process(int (*cb)(struct sctp_transport *, void *),
+				  struct net *net,
+				  const union sctp_addr *laddr,
+				  const union sctp_addr *paddr, void *p);
+int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
+			    struct net *net, int pos, void *p);
+int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p);
 int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
 		       struct sctp_info *info);
 
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 6d45d53..dd8492f 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -282,81 +282,31 @@ struct sctp_ht_iter {
 	struct rhashtable_iter hti;
 };
 
-static struct sctp_transport *sctp_transport_get_next(struct seq_file *seq)
-{
-	struct sctp_ht_iter *iter = seq->private;
-	struct sctp_transport *t;
-
-	t = rhashtable_walk_next(&iter->hti);
-	for (; t; t = rhashtable_walk_next(&iter->hti)) {
-		if (IS_ERR(t)) {
-			if (PTR_ERR(t) == -EAGAIN)
-				continue;
-			break;
-		}
-
-		if (net_eq(sock_net(t->asoc->base.sk), seq_file_net(seq)) &&
-		    t->asoc->peer.primary_path == t)
-			break;
-	}
-
-	return t;
-}
-
-static struct sctp_transport *sctp_transport_get_idx(struct seq_file *seq,
-						     loff_t pos)
-{
-	void *obj = SEQ_START_TOKEN;
-
-	while (pos && (obj = sctp_transport_get_next(seq)) && !IS_ERR(obj))
-		pos--;
-
-	return obj;
-}
-
-static int sctp_transport_walk_start(struct seq_file *seq)
-{
-	struct sctp_ht_iter *iter = seq->private;
-	int err;
-
-	err = rhashtable_walk_init(&sctp_transport_hashtable, &iter->hti,
-				   GFP_KERNEL);
-	if (err)
-		return err;
-
-	err = rhashtable_walk_start(&iter->hti);
-
-	return err == -EAGAIN ? 0 : err;
-}
-
-static void sctp_transport_walk_stop(struct seq_file *seq)
-{
-	struct sctp_ht_iter *iter = seq->private;
-
-	rhashtable_walk_stop(&iter->hti);
-	rhashtable_walk_exit(&iter->hti);
-}
-
 static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	int err = sctp_transport_walk_start(seq);
+	struct sctp_ht_iter *iter = seq->private;
+	int err = sctp_transport_walk_start(&iter->hti);
 
 	if (err)
 		return ERR_PTR(err);
 
-	return sctp_transport_get_idx(seq, *pos);
+	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
 
 static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
 {
-	sctp_transport_walk_stop(seq);
+	struct sctp_ht_iter *iter = seq->private;
+
+	sctp_transport_walk_stop(&iter->hti);
 }
 
 static void *sctp_assocs_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	struct sctp_ht_iter *iter = seq->private;
+
 	++*pos;
 
-	return sctp_transport_get_next(seq);
+	return sctp_transport_get_next(seq_file_net(seq), &iter->hti);
 }
 
 /* Display sctp associations (/proc/net/sctp/assocs). */
@@ -458,24 +408,29 @@ void sctp_assocs_proc_exit(struct net *net)
 
 static void *sctp_remaddr_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	int err = sctp_transport_walk_start(seq);
+	struct sctp_ht_iter *iter = seq->private;
+	int err = sctp_transport_walk_start(&iter->hti);
 
 	if (err)
 		return ERR_PTR(err);
 
-	return sctp_transport_get_idx(seq, *pos);
+	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
 
 static void *sctp_remaddr_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	struct sctp_ht_iter *iter = seq->private;
+
 	++*pos;
 
-	return sctp_transport_get_next(seq);
+	return sctp_transport_get_next(seq_file_net(seq), &iter->hti);
 }
 
 static void sctp_remaddr_seq_stop(struct seq_file *seq, void *v)
 {
-	sctp_transport_walk_stop(seq);
+	struct sctp_ht_iter *iter = seq->private;
+
+	sctp_transport_walk_stop(&iter->hti);
 }
 
 static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7fbc098..6226c0b 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4288,6 +4288,131 @@ int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
 }
 EXPORT_SYMBOL_GPL(sctp_get_sctp_info);
 
+/* use callback to avoid exporting the core structure */
+int sctp_transport_walk_start(struct rhashtable_iter *iter)
+{
+	int err;
+
+	err = rhashtable_walk_init(&sctp_transport_hashtable, iter,
+				   GFP_KERNEL);
+	if (err)
+		return err;
+
+	err = rhashtable_walk_start(iter);
+
+	return err == -EAGAIN ? 0 : err;
+}
+
+void sctp_transport_walk_stop(struct rhashtable_iter *iter)
+{
+	rhashtable_walk_stop(iter);
+	rhashtable_walk_exit(iter);
+}
+
+struct sctp_transport *sctp_transport_get_next(struct net *net,
+					       struct rhashtable_iter *iter)
+{
+	struct sctp_transport *t;
+
+	t = rhashtable_walk_next(iter);
+	for (; t; t = rhashtable_walk_next(iter)) {
+		if (IS_ERR(t)) {
+			if (PTR_ERR(t) == -EAGAIN)
+				continue;
+			break;
+		}
+
+		if (net_eq(sock_net(t->asoc->base.sk), net) &&
+		    t->asoc->peer.primary_path == t)
+			break;
+	}
+
+	return t;
+}
+
+struct sctp_transport *sctp_transport_get_idx(struct net *net,
+					      struct rhashtable_iter *iter,
+					      int pos)
+{
+	void *obj = SEQ_START_TOKEN;
+
+	while (pos && (obj = sctp_transport_get_next(net, iter)) &&
+	       !IS_ERR(obj))
+		pos--;
+
+	return obj;
+}
+
+int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
+			   void *p) {
+	int err = 0;
+	int hash = 0;
+	struct sctp_ep_common *epb;
+	struct sctp_hashbucket *head;
+
+	for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
+	     hash++, head++) {
+		read_lock(&head->lock);
+		sctp_for_each_hentry(epb, &head->chain) {
+			err = cb(sctp_ep(epb), p);
+			if (err)
+				break;
+		}
+		read_unlock(&head->lock);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(sctp_for_each_endpoint);
+
+int sctp_transport_lookup_process(int (*cb)(struct sctp_transport *, void *),
+				  struct net *net,
+				  const union sctp_addr *laddr,
+				  const union sctp_addr *paddr, void *p)
+{
+	struct sctp_transport *transport;
+	int err = 0;
+
+	rcu_read_lock();
+	transport = sctp_addrs_lookup_transport(net, laddr, paddr);
+	if (!transport || !sctp_transport_hold(transport))
+		goto out;
+	err = cb(transport, p);
+	sctp_transport_put(transport);
+
+out:
+	rcu_read_unlock();
+	return err;
+}
+EXPORT_SYMBOL_GPL(sctp_transport_lookup_process);
+
+int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
+			    struct net *net, int pos, void *p) {
+	struct rhashtable_iter hti;
+	int err = 0;
+	void *obj;
+
+	if (sctp_transport_walk_start(&hti))
+		goto out;
+
+	sctp_transport_get_idx(net, &hti, pos);
+	obj = sctp_transport_get_next(net, &hti);
+	for (; obj && !IS_ERR(obj); obj = sctp_transport_get_next(net, &hti)) {
+		struct sctp_transport *transport = obj;
+
+		if (!sctp_transport_hold(transport))
+			continue;
+		err = cb(transport, p);
+		sctp_transport_put(transport);
+		if (err)
+			break;
+	}
+out:
+	sctp_transport_walk_stop(&hti);
+	return err;
+}
+EXPORT_SYMBOL_GPL(sctp_for_each_transport);
+
 /* 7.2.1 Association Status (SCTP_STATUS)
 
  * Applications can retrieve current status information about an
-- 
2.1.0

^ permalink raw reply related

* [PATCHv3 net-next 3/6] sctp: export some functions for sctp_diag in inet_diag
From: Xin Long @ 2016-04-14  7:35 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

inet_diag_msg_common_fill is used to fill the diag msg common info,
we need to use it in sctp_diag as well, so export it.

inet_diag_msg_attrs_fill is used to fill some common attrs info between
sctp diag and tcp diag.

v2->v3:
- do not need to define and export inet_diag_get_handler any more.
  cause all the functions in it are in sctp_diag.ko, we just call
  them in sctp_diag.ko.

- add inet_diag_msg_attrs_fill to make codes clear.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/inet_diag.c | 67 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 26 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index bd591eb..70212bd 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -66,7 +66,7 @@ static void inet_diag_unlock_handler(const struct inet_diag_handler *handler)
 	mutex_unlock(&inet_diag_table_mutex);
 }
 
-static void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
+void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 {
 	r->idiag_family = sk->sk_family;
 
@@ -89,6 +89,7 @@ static void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 	r->id.idiag_dst[0] = sk->sk_daddr;
 	}
 }
+EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
 
 static size_t inet_sk_attr_size(void)
 {
@@ -104,13 +105,50 @@ static size_t inet_sk_attr_size(void)
 		+ 64;
 }
 
+int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
+			     struct inet_diag_msg *r, int ext,
+			     struct user_namespace *user_ns)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+
+	if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
+		goto errout;
+
+	/* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
+	 * hence this needs to be included regardless of socket family.
+	 */
+	if (ext & (1 << (INET_DIAG_TOS - 1)))
+		if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
+			goto errout;
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (r->idiag_family == AF_INET6) {
+		if (ext & (1 << (INET_DIAG_TCLASS - 1)))
+			if (nla_put_u8(skb, INET_DIAG_TCLASS,
+				       inet6_sk(sk)->tclass) < 0)
+				goto errout;
+
+		if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
+		    nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
+			goto errout;
+	}
+#endif
+
+	r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
+	r->idiag_inode = sock_i_ino(sk);
+
+	return 0;
+errout:
+	return 1;
+}
+EXPORT_SYMBOL_GPL(inet_diag_msg_attrs_fill);
+
 int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 		      struct sk_buff *skb, const struct inet_diag_req_v2 *req,
 		      struct user_namespace *user_ns,
 		      u32 portid, u32 seq, u16 nlmsg_flags,
 		      const struct nlmsghdr *unlh)
 {
-	const struct inet_sock *inet = inet_sk(sk);
 	const struct tcp_congestion_ops *ca_ops;
 	const struct inet_diag_handler *handler;
 	int ext = req->idiag_ext;
@@ -135,32 +173,9 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 	r->idiag_timer = 0;
 	r->idiag_retrans = 0;
 
-	if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
+	if (inet_diag_msg_attrs_fill(sk, skb, r, ext, user_ns))
 		goto errout;
 
-	/* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
-	 * hence this needs to be included regardless of socket family.
-	 */
-	if (ext & (1 << (INET_DIAG_TOS - 1)))
-		if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
-			goto errout;
-
-#if IS_ENABLED(CONFIG_IPV6)
-	if (r->idiag_family == AF_INET6) {
-		if (ext & (1 << (INET_DIAG_TCLASS - 1)))
-			if (nla_put_u8(skb, INET_DIAG_TCLASS,
-				       inet6_sk(sk)->tclass) < 0)
-				goto errout;
-
-		if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
-		    nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
-			goto errout;
-	}
-#endif
-
-	r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
-	r->idiag_inode = sock_i_ino(sk);
-
 	if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
 		struct inet_diag_meminfo minfo = {
 			.idiag_rmem = sk_rmem_alloc_get(sk),
-- 
2.1.0

^ permalink raw reply related

* [PATCHv3 net-next 4/6] sctp: add the sctp_diag.c file
From: Xin Long @ 2016-04-14  7:35 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

This one will implement all the interface of inet_diag, inet_diag_handler.
which includes sctp_diag_dump, sctp_diag_dump_one and sctp_diag_get_info.

It will work as a module, and register inet_diag_handler when loading.

v2->v3:
- fix the mistake in inet_assoc_attr_size().

- change inet_diag_msg_laddrs_fill() name to inet_diag_msg_sctpladdrs_fill.

- change inet_diag_msg_paddrs_fill() name to inet_diag_msg_sctpaddrs_fill.

- add inet_diag_msg_sctpinfo_fill() to make asoc/ep fill code clearer.

- add inet_diag_msg_sctpasoc_fill() to make asoc fill code clearer.

- merge inet_asoc_diag_fill() and inet_ep_diag_fill() to
  inet_sctp_diag_fill().

- call sctp_diag_get_info() directly, instead by handler, cause the caller
  is in the same file with it.

- call lock_sock in sctp_tsp_dump_one() to make sure we call get sctp info
  safely.

- after lock_sock(sk), we should check sk != assoc->base.sk.

- change mem[SK_MEMINFO_WMEM_ALLOC] to asoc->sndbuf_used for asoc dump when
  asoc->ep->sndbuf_policy is set. don't use INET_DIAG_MEMINFO attr any more.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/uapi/linux/inet_diag.h |   2 +
 net/sctp/Kconfig               |   4 +
 net/sctp/Makefile              |   1 +
 net/sctp/sctp_diag.c           | 497 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 504 insertions(+)
 create mode 100644 net/sctp/sctp_diag.c

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 68a1f71..f5f3629 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -113,6 +113,8 @@ enum {
 	INET_DIAG_DCTCPINFO,
 	INET_DIAG_PROTOCOL,  /* response attribute only */
 	INET_DIAG_SKV6ONLY,
+	INET_DIAG_LOCALS,
+	INET_DIAG_PEERS,
 };
 
 #define INET_DIAG_MAX INET_DIAG_SKV6ONLY
diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index 71c1a59..d9c04dc 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -99,5 +99,9 @@ config SCTP_COOKIE_HMAC_SHA1
 	select CRYPTO_HMAC if SCTP_COOKIE_HMAC_SHA1
 	select CRYPTO_SHA1 if SCTP_COOKIE_HMAC_SHA1
 
+config INET_SCTP_DIAG
+	depends on INET_DIAG
+	def_tristate INET_DIAG
+
 
 endif # IP_SCTP
diff --git a/net/sctp/Makefile b/net/sctp/Makefile
index 3b4ffb0..0fca582 100644
--- a/net/sctp/Makefile
+++ b/net/sctp/Makefile
@@ -4,6 +4,7 @@
 
 obj-$(CONFIG_IP_SCTP) += sctp.o
 obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o
+obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o
 
 sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
 	  protocol.o endpointola.o associola.o \
diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c
new file mode 100644
index 0000000..98ecd16
--- /dev/null
+++ b/net/sctp/sctp_diag.c
@@ -0,0 +1,497 @@
+#include <linux/module.h>
+#include <linux/inet_diag.h>
+#include <linux/sock_diag.h>
+#include <net/sctp/sctp.h>
+
+extern void inet_diag_msg_common_fill(struct inet_diag_msg *r,
+				      struct sock *sk);
+extern int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
+				    struct inet_diag_msg *r, int ext,
+				    struct user_namespace *user_ns);
+
+static void sctp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
+			       void *info);
+
+/* define some functions to make asoc/ep fill look clean */
+static void inet_diag_msg_sctpasoc_fill(struct inet_diag_msg *r,
+					struct sock *sk,
+					struct sctp_association *asoc)
+{
+	union sctp_addr laddr, paddr;
+	struct dst_entry *dst;
+
+	laddr = list_entry(asoc->base.bind_addr.address_list.next,
+			   struct sctp_sockaddr_entry, list)->a;
+	paddr = asoc->peer.primary_path->ipaddr;
+	dst = asoc->peer.primary_path->dst;
+
+	r->idiag_family = sk->sk_family;
+	r->id.idiag_sport = htons(asoc->base.bind_addr.port);
+	r->id.idiag_dport = htons(asoc->peer.port);
+	r->id.idiag_if = dst ? dst->dev->ifindex : 0;
+	sock_diag_save_cookie(sk, r->id.idiag_cookie);
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (sk->sk_family == AF_INET6) {
+		*(struct in6_addr *)r->id.idiag_src = laddr.v6.sin6_addr;
+		*(struct in6_addr *)r->id.idiag_dst = paddr.v6.sin6_addr;
+	} else
+#endif
+	{
+		memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
+		memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
+
+		r->id.idiag_src[0] = laddr.v4.sin_addr.s_addr;
+		r->id.idiag_dst[0] = paddr.v4.sin_addr.s_addr;
+	}
+
+	r->idiag_state = asoc->state;
+	r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
+	r->idiag_retrans = asoc->rtx_data_chunks;
+#define EXPIRES_IN_MS(tmo)  DIV_ROUND_UP((tmo - jiffies) * 1000, HZ)
+	r->idiag_expires =
+		EXPIRES_IN_MS(asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX]);
+#undef EXPIRES_IN_MS
+}
+
+static int inet_diag_msg_sctpladdrs_fill(struct sk_buff *skb,
+					 struct list_head *address_list)
+{
+	struct sctp_sockaddr_entry *laddr;
+	int addrlen = sizeof(struct sockaddr_storage);
+	int addrcnt = 0;
+	struct nlattr *attr;
+	void *info = NULL;
+
+	list_for_each_entry_rcu(laddr, address_list, list)
+		addrcnt++;
+
+	attr = nla_reserve(skb, INET_DIAG_LOCALS, addrlen * addrcnt);
+	if (!attr)
+		return -EMSGSIZE;
+
+	info = nla_data(attr);
+	list_for_each_entry_rcu(laddr, address_list, list) {
+		memcpy(info, &laddr->a, addrlen);
+		info += addrlen;
+	}
+
+	return 0;
+}
+
+static int inet_diag_msg_sctpaddrs_fill(struct sk_buff *skb,
+					struct sctp_association *asoc)
+{
+	int addrlen = sizeof(struct sockaddr_storage);
+	struct sctp_transport *from;
+	struct nlattr *attr;
+	void *info = NULL;
+
+	attr = nla_reserve(skb, INET_DIAG_PEERS,
+			   addrlen * asoc->peer.transport_count);
+	if (!attr)
+		return -EMSGSIZE;
+
+	info = nla_data(attr);
+	list_for_each_entry(from, &asoc->peer.transport_addr_list,
+			    transports) {
+		memcpy(info, &from->ipaddr, addrlen);
+		info += addrlen;
+	}
+
+	return 0;
+}
+
+/* sctp asoc/ep fill*/
+static int inet_sctp_diag_fill(struct sock *sk, struct sctp_association *asoc,
+			       struct sk_buff *skb,
+			       const struct inet_diag_req_v2 *req,
+			       struct user_namespace *user_ns,
+			       int portid, u32 seq, u16 nlmsg_flags,
+			       const struct nlmsghdr *unlh)
+{
+	struct sctp_endpoint *ep = sctp_sk(sk)->ep;
+	struct list_head *addr_list;
+	struct inet_diag_msg *r;
+	struct nlmsghdr  *nlh;
+	int ext = req->idiag_ext;
+	struct sctp_infox infox;
+	void *info = NULL;
+
+	nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
+			nlmsg_flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	r = nlmsg_data(nlh);
+	BUG_ON(!sk_fullsock(sk));
+
+	if (asoc) {
+		inet_diag_msg_sctpasoc_fill(r, sk, asoc);
+	} else {
+		inet_diag_msg_common_fill(r, sk);
+		r->idiag_state = sk->sk_state;
+		r->idiag_timer = 0;
+		r->idiag_retrans = 0;
+	}
+
+	if (inet_diag_msg_attrs_fill(sk, skb, r, ext, user_ns))
+		goto errout;
+
+	if (ext & (1 << (INET_DIAG_SKMEMINFO - 1))) {
+		u32 mem[SK_MEMINFO_VARS];
+		int amt;
+
+		if (asoc && asoc->ep->sndbuf_policy)
+			amt = asoc->sndbuf_used;
+		else
+			amt = sk_wmem_alloc_get(sk);
+		mem[SK_MEMINFO_WMEM_ALLOC] = amt;
+		mem[SK_MEMINFO_RMEM_ALLOC] = sk_rmem_alloc_get(sk);
+		mem[SK_MEMINFO_RCVBUF] = sk->sk_rcvbuf;
+		mem[SK_MEMINFO_SNDBUF] = sk->sk_sndbuf;
+		mem[SK_MEMINFO_FWD_ALLOC] = sk->sk_forward_alloc;
+		mem[SK_MEMINFO_WMEM_QUEUED] = sk->sk_wmem_queued;
+		mem[SK_MEMINFO_OPTMEM] = atomic_read(&sk->sk_omem_alloc);
+		mem[SK_MEMINFO_BACKLOG] = sk->sk_backlog.len;
+		mem[SK_MEMINFO_DROPS] = atomic_read(&sk->sk_drops);
+
+		if (nla_put(skb, INET_DIAG_SKMEMINFO, sizeof(mem), &mem) < 0)
+			goto errout;
+	}
+
+	if (ext & (1 << (INET_DIAG_INFO - 1))) {
+		struct nlattr *attr;
+
+		attr = nla_reserve(skb, INET_DIAG_INFO,
+				   sizeof(struct sctp_info));
+		if (!attr)
+			goto errout;
+
+		info = nla_data(attr);
+	}
+	infox.sctpinfo = (struct sctp_info *)info;
+	infox.asoc = asoc;
+	sctp_diag_get_info(sk, r, &infox);
+
+	addr_list = asoc ? &asoc->base.bind_addr.address_list
+			 : &ep->base.bind_addr.address_list;
+	if (inet_diag_msg_sctpladdrs_fill(skb, addr_list))
+		goto errout;
+
+	if (asoc && (ext & (1 << (INET_DIAG_CONG - 1))))
+		if (nla_put_string(skb, INET_DIAG_CONG, "reno") < 0)
+			goto errout;
+
+	if (asoc && inet_diag_msg_sctpaddrs_fill(skb, asoc))
+		goto errout;
+
+	nlmsg_end(skb, nlh);
+	return 0;
+
+errout:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+/* callback and param */
+struct sctp_comm_param {
+	struct sk_buff *skb;
+	struct netlink_callback *cb;
+	const struct inet_diag_req_v2 *r;
+	const struct nlmsghdr *nlh;
+};
+
+static size_t inet_assoc_attr_size(struct sctp_association *asoc)
+{
+	int addrlen = sizeof(struct sockaddr_storage);
+	int addrcnt = 0;
+	struct sctp_sockaddr_entry *laddr;
+
+	list_for_each_entry_rcu(laddr, &asoc->base.bind_addr.address_list,
+				list)
+		addrcnt++;
+
+	return	  nla_total_size(sizeof(struct sctp_info))
+		+ nla_total_size(1) /* INET_DIAG_SHUTDOWN */
+		+ nla_total_size(1) /* INET_DIAG_TOS */
+		+ nla_total_size(1) /* INET_DIAG_TCLASS */
+		+ nla_total_size(addrlen * asoc->peer.transport_count)
+		+ nla_total_size(addrlen * addrcnt)
+		+ nla_total_size(sizeof(struct inet_diag_meminfo))
+		+ nla_total_size(sizeof(struct inet_diag_msg))
+		+ 64;
+}
+
+static int sctp_tsp_dump_one(struct sctp_transport *tsp, void *p)
+{
+	struct sctp_association *assoc = tsp->asoc;
+	struct sock *sk = tsp->asoc->base.sk;
+	struct sctp_comm_param *commp = p;
+	struct sk_buff *in_skb = commp->skb;
+	const struct inet_diag_req_v2 *req = commp->r;
+	const struct nlmsghdr *nlh = commp->nlh;
+	struct net *net = sock_net(in_skb->sk);
+	struct sk_buff *rep;
+	int err;
+
+	err = sock_diag_check_cookie(sk, req->id.idiag_cookie);
+	if (err)
+		goto out;
+
+	err = -ENOMEM;
+	rep = nlmsg_new(inet_assoc_attr_size(assoc), GFP_KERNEL);
+	if (!rep)
+		goto out;
+
+	lock_sock(sk);
+	if (sk != assoc->base.sk) {
+		release_sock(sk);
+		sk = assoc->base.sk;
+		lock_sock(sk);
+	}
+	err = inet_sctp_diag_fill(sk, assoc, rep, req,
+				  sk_user_ns(NETLINK_CB(in_skb).sk),
+				  NETLINK_CB(in_skb).portid,
+				  nlh->nlmsg_seq, 0, nlh);
+	release_sock(sk);
+	if (err < 0) {
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(rep);
+		goto out;
+	}
+
+	err = netlink_unicast(net->diag_nlsk, rep, NETLINK_CB(in_skb).portid,
+			      MSG_DONTWAIT);
+	if (err > 0)
+		err = 0;
+out:
+	return err;
+}
+
+static int sctp_tsp_dump(struct sctp_transport *tsp, void *p)
+{
+	struct sctp_endpoint *ep = tsp->asoc->ep;
+	struct sctp_comm_param *commp = p;
+	struct sock *sk = ep->base.sk;
+	struct sk_buff *skb = commp->skb;
+	struct netlink_callback *cb = commp->cb;
+	const struct inet_diag_req_v2 *r = commp->r;
+	struct sctp_association *assoc =
+		list_entry(ep->asocs.next, struct sctp_association, asocs);
+	int err = 0;
+
+	/* find the ep only once through the transports by this condition */
+	if (tsp->asoc != assoc)
+		goto out;
+
+	if (r->sdiag_family != AF_UNSPEC && sk->sk_family != r->sdiag_family)
+		goto out;
+
+	lock_sock(sk);
+	if (sk != assoc->base.sk)
+		goto release;
+	list_for_each_entry(assoc, &ep->asocs, asocs) {
+		if (cb->args[4] < cb->args[1])
+			goto next;
+
+		if (r->id.idiag_sport != htons(assoc->base.bind_addr.port) &&
+		    r->id.idiag_sport)
+			goto next;
+		if (r->id.idiag_dport != htons(assoc->peer.port) &&
+		    r->id.idiag_dport)
+			goto next;
+
+		if (!cb->args[3] &&
+		    inet_sctp_diag_fill(sk, NULL, skb, r,
+					sk_user_ns(NETLINK_CB(cb->skb).sk),
+					NETLINK_CB(cb->skb).portid,
+					cb->nlh->nlmsg_seq,
+					NLM_F_MULTI, cb->nlh) < 0) {
+			cb->args[3] = 1;
+			err = 2;
+			goto release;
+		}
+		cb->args[3] = 1;
+
+		if (inet_sctp_diag_fill(sk, assoc, skb, r,
+					sk_user_ns(NETLINK_CB(cb->skb).sk),
+					NETLINK_CB(cb->skb).portid,
+					cb->nlh->nlmsg_seq, 0, cb->nlh) < 0) {
+			err = 2;
+			goto release;
+		}
+next:
+		cb->args[4]++;
+	}
+	cb->args[1] = 0;
+	cb->args[2]++;
+	cb->args[3] = 0;
+	cb->args[4] = 0;
+release:
+	release_sock(sk);
+	return err;
+out:
+	cb->args[2]++;
+	return err;
+}
+
+static int sctp_ep_dump(struct sctp_endpoint *ep, void *p)
+{
+	struct sctp_comm_param *commp = p;
+	struct sock *sk = ep->base.sk;
+	struct sk_buff *skb = commp->skb;
+	struct netlink_callback *cb = commp->cb;
+	const struct inet_diag_req_v2 *r = commp->r;
+	struct net *net = sock_net(skb->sk);
+	struct inet_sock *inet = inet_sk(sk);
+	int err = 0;
+
+	if (!net_eq(sock_net(sk), net))
+		goto out;
+
+	if (cb->args[4] < cb->args[1])
+		goto next;
+
+	if (r->sdiag_family != AF_UNSPEC &&
+	    sk->sk_family != r->sdiag_family)
+		goto next;
+
+	if (r->id.idiag_sport != inet->inet_sport &&
+	    r->id.idiag_sport)
+		goto next;
+
+	if (r->id.idiag_dport != inet->inet_dport &&
+	    r->id.idiag_dport)
+		goto next;
+
+	if (inet_sctp_diag_fill(sk, NULL, skb, r,
+				sk_user_ns(NETLINK_CB(cb->skb).sk),
+				NETLINK_CB(cb->skb).portid,
+				cb->nlh->nlmsg_seq, NLM_F_MULTI,
+				cb->nlh) < 0) {
+		err = 2;
+		goto out;
+	}
+next:
+	cb->args[4]++;
+out:
+	return err;
+}
+
+/* define the functions for sctp_diag_handler*/
+static void sctp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
+			       void *info)
+{
+	struct sctp_infox *infox = (struct sctp_infox *)info;
+
+	if (infox->asoc) {
+		r->idiag_rqueue = atomic_read(&infox->asoc->rmem_alloc);
+		r->idiag_wqueue = infox->asoc->sndbuf_used;
+	} else {
+		r->idiag_rqueue = sk->sk_ack_backlog;
+		r->idiag_wqueue = sk->sk_max_ack_backlog;
+	}
+	if (infox->sctpinfo)
+		sctp_get_sctp_info(sk, infox->asoc, infox->sctpinfo);
+}
+
+static int sctp_diag_dump_one(struct sk_buff *in_skb,
+			      const struct nlmsghdr *nlh,
+			      const struct inet_diag_req_v2 *req)
+{
+	struct net *net = sock_net(in_skb->sk);
+	union sctp_addr laddr, paddr;
+	struct sctp_comm_param commp = {
+		.skb = in_skb,
+		.r = req,
+		.nlh = nlh,
+	};
+
+	if (req->sdiag_family == AF_INET) {
+		laddr.v4.sin_port = req->id.idiag_sport;
+		laddr.v4.sin_addr.s_addr = req->id.idiag_src[0];
+		laddr.v4.sin_family = AF_INET;
+
+		paddr.v4.sin_port = req->id.idiag_dport;
+		paddr.v4.sin_addr.s_addr = req->id.idiag_dst[0];
+		paddr.v4.sin_family = AF_INET;
+	} else {
+		laddr.v6.sin6_port = req->id.idiag_sport;
+		memcpy(&laddr.v6.sin6_addr, req->id.idiag_src, 64);
+		laddr.v6.sin6_family = AF_INET6;
+
+		paddr.v6.sin6_port = req->id.idiag_dport;
+		memcpy(&paddr.v6.sin6_addr, req->id.idiag_dst, 64);
+		paddr.v6.sin6_family = AF_INET6;
+	}
+
+	return sctp_transport_lookup_process(sctp_tsp_dump_one,
+					     net, &laddr, &paddr, &commp);
+}
+
+static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
+			   const struct inet_diag_req_v2 *r, struct nlattr *bc)
+{
+	u32 idiag_states = r->idiag_states;
+	struct net *net = sock_net(skb->sk);
+	struct sctp_comm_param commp = {
+		.skb = skb,
+		.cb = cb,
+		.r = r,
+	};
+
+	/* eps hashtable dumps
+	 * args:
+	 * 0 : if it will traversal listen sock
+	 * 1 : to record the sock pos of this time's traversal
+	 * 4 : to work as a temporary variable to traversal list
+	 */
+	if (cb->args[0] == 0) {
+		if (!(idiag_states & TCPF_LISTEN))
+			goto skip;
+		if (sctp_for_each_endpoint(sctp_ep_dump, &commp))
+			goto done;
+skip:
+		cb->args[0] = 1;
+		cb->args[1] = 0;
+		cb->args[4] = 0;
+	}
+
+	/* asocs by transport hashtable dump
+	 * args:
+	 * 1 : to record the assoc pos of this time's traversal
+	 * 2 : to record the transport pos of this time's traversal
+	 * 3 : to mark if we have dumped the ep info of the current asoc
+	 * 4 : to work as a temporary variable to traversal list
+	 */
+	if (!(idiag_states & ~TCPF_LISTEN))
+		goto done;
+	sctp_for_each_transport(sctp_tsp_dump, net, cb->args[2], &commp);
+done:
+	cb->args[1] = cb->args[4];
+	cb->args[4] = 0;
+}
+
+static const struct inet_diag_handler sctp_diag_handler = {
+	.dump		 = sctp_diag_dump,
+	.dump_one	 = sctp_diag_dump_one,
+	.idiag_get_info  = sctp_diag_get_info,
+	.idiag_type	 = IPPROTO_SCTP,
+	.idiag_info_size = sizeof(struct sctp_info),
+};
+
+static int __init sctp_diag_init(void)
+{
+	return inet_diag_register(&sctp_diag_handler);
+}
+
+static void __exit sctp_diag_exit(void)
+{
+	inet_diag_unregister(&sctp_diag_handler);
+}
+
+module_init(sctp_diag_init);
+module_exit(sctp_diag_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 2-132);
-- 
2.1.0

^ permalink raw reply related

* [PATCHv3 net-next 5/6] sctp: merge the seq_start/next/exits in remaddrs and assocs
From: Xin Long @ 2016-04-14  7:35 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

In sctp proc, these three functions in remaddrs and assocs are the
same. we should merge them into one.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/proc.c | 45 +++++++++------------------------------------
 1 file changed, 9 insertions(+), 36 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index dd8492f..9fe1393 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -282,7 +282,7 @@ struct sctp_ht_iter {
 	struct rhashtable_iter hti;
 };
 
-static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
+static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct sctp_ht_iter *iter = seq->private;
 	int err = sctp_transport_walk_start(&iter->hti);
@@ -293,14 +293,14 @@ static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
 	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
 
-static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
+static void sctp_transport_seq_stop(struct seq_file *seq, void *v)
 {
 	struct sctp_ht_iter *iter = seq->private;
 
 	sctp_transport_walk_stop(&iter->hti);
 }
 
-static void *sctp_assocs_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+static void *sctp_transport_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct sctp_ht_iter *iter = seq->private;
 
@@ -367,9 +367,9 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 }
 
 static const struct seq_operations sctp_assoc_ops = {
-	.start = sctp_assocs_seq_start,
-	.next  = sctp_assocs_seq_next,
-	.stop  = sctp_assocs_seq_stop,
+	.start = sctp_transport_seq_start,
+	.next  = sctp_transport_seq_next,
+	.stop  = sctp_transport_seq_stop,
 	.show  = sctp_assocs_seq_show,
 };
 
@@ -406,33 +406,6 @@ void sctp_assocs_proc_exit(struct net *net)
 	remove_proc_entry("assocs", net->sctp.proc_net_sctp);
 }
 
-static void *sctp_remaddr_seq_start(struct seq_file *seq, loff_t *pos)
-{
-	struct sctp_ht_iter *iter = seq->private;
-	int err = sctp_transport_walk_start(&iter->hti);
-
-	if (err)
-		return ERR_PTR(err);
-
-	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
-}
-
-static void *sctp_remaddr_seq_next(struct seq_file *seq, void *v, loff_t *pos)
-{
-	struct sctp_ht_iter *iter = seq->private;
-
-	++*pos;
-
-	return sctp_transport_get_next(seq_file_net(seq), &iter->hti);
-}
-
-static void sctp_remaddr_seq_stop(struct seq_file *seq, void *v)
-{
-	struct sctp_ht_iter *iter = seq->private;
-
-	sctp_transport_walk_stop(&iter->hti);
-}
-
 static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
 {
 	struct sctp_association *assoc;
@@ -506,9 +479,9 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
 }
 
 static const struct seq_operations sctp_remaddr_ops = {
-	.start = sctp_remaddr_seq_start,
-	.next  = sctp_remaddr_seq_next,
-	.stop  = sctp_remaddr_seq_stop,
+	.start = sctp_transport_seq_start,
+	.next  = sctp_transport_seq_next,
+	.stop  = sctp_transport_seq_stop,
 	.show  = sctp_remaddr_seq_show,
 };
 
-- 
2.1.0

^ permalink raw reply related

* [PATCHv3 net-next 6/6] sctp: fix some rhashtable functions using in sctp proc/diag
From: Xin Long @ 2016-04-14  7:35 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

When rhashtable_walk_init return err, no release function should be
called, and when rhashtable_walk_start return err, we should only invoke
rhashtable_walk_exit to release the source.

But now when sctp_transport_walk_start return err, we just call
rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init
or start return err, which is so bad.

We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start
return err in sctp_transport_walk_start, and if sctp_transport_walk_start
return err, we do not need to call sctp_transport_walk_stop any more.

For sctp proc, we will use 'iter->start_fail' to decide if we will call
rhashtable_walk_stop/exit.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/proc.c   |  7 ++++++-
 net/sctp/socket.c | 15 ++++++++++-----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 9fe1393..4cb5aed 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -280,6 +280,7 @@ void sctp_eps_proc_exit(struct net *net)
 struct sctp_ht_iter {
 	struct seq_net_private p;
 	struct rhashtable_iter hti;
+	int start_fail;
 };
 
 static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
@@ -287,8 +288,10 @@ static void *sctp_transport_seq_start(struct seq_file *seq, loff_t *pos)
 	struct sctp_ht_iter *iter = seq->private;
 	int err = sctp_transport_walk_start(&iter->hti);
 
-	if (err)
+	if (err) {
+		iter->start_fail = 1;
 		return ERR_PTR(err);
+	}
 
 	return sctp_transport_get_idx(seq_file_net(seq), &iter->hti, *pos);
 }
@@ -297,6 +300,8 @@ static void sctp_transport_seq_stop(struct seq_file *seq, void *v)
 {
 	struct sctp_ht_iter *iter = seq->private;
 
+	if (iter->start_fail)
+		return;
 	sctp_transport_walk_stop(&iter->hti);
 }
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 6226c0b..b27a849 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4299,8 +4299,12 @@ int sctp_transport_walk_start(struct rhashtable_iter *iter)
 		return err;
 
 	err = rhashtable_walk_start(iter);
+	if (err && err != -EAGAIN) {
+		rhashtable_walk_exit(iter);
+		return err;
+	}
 
-	return err == -EAGAIN ? 0 : err;
+	return 0;
 }
 
 void sctp_transport_walk_stop(struct rhashtable_iter *iter)
@@ -4389,11 +4393,12 @@ EXPORT_SYMBOL_GPL(sctp_transport_lookup_process);
 int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
 			    struct net *net, int pos, void *p) {
 	struct rhashtable_iter hti;
-	int err = 0;
 	void *obj;
+	int err;
 
-	if (sctp_transport_walk_start(&hti))
-		goto out;
+	err = sctp_transport_walk_start(&hti);
+	if (err)
+		return err;
 
 	sctp_transport_get_idx(net, &hti, pos);
 	obj = sctp_transport_get_next(net, &hti);
@@ -4407,8 +4412,8 @@ int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
 		if (err)
 			break;
 	}
-out:
 	sctp_transport_walk_stop(&hti);
+
 	return err;
 }
 EXPORT_SYMBOL_GPL(sctp_for_each_transport);
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next] phy: make some bits preserved while setup forced mode
From: Weidong Wang @ 2016-04-14  7:43 UTC (permalink / raw)
  To: David Miller, f.fainelli; +Cc: netdev, linux-kernel, Sergei Shtylyov, andrew
In-Reply-To: <570E3488.9060403@huawei.com>

When tested the PHY SGMII Loopback:
1.set the LOOPBACK bit,
2.set the autoneg to AUTONEG_DISABLE, it calls the
genphy_setup_forced which will clear the bit.

The BMCR_LOOPBACK bit should be preserved.

As Florian pointed out that other bits should be preserved too.
So I make the BMCR_ISOLATE and BMCR_PDOWN as well.

Signed-off-by: Weidong Wang <wangweidong1@huawei.com>
---
the patch is evoluted from "phy: keep the BCMR_LOOPBACK bit
while setup forced mode".

---
 drivers/net/phy/phy_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index d551df6..9b24d7e 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -818,8 +818,9 @@ static int genphy_config_advert(struct phy_device *phydev)
  */
 int genphy_setup_forced(struct phy_device *phydev)
 {
-	int ctl = 0;
+	int ctl = phy_read(phydev, MII_BMCR);

+	ctl &= BMCR_LOOPBACK | BMCR_ISOLATE | BMCR_PDOWN;
 	phydev->pause = 0;
 	phydev->asym_pause = 0;

-- 
2.7.0

^ permalink raw reply related

* RE: [PATCH nf] netfilter: ipv6: Orphan skbs in nf_ct_frag6_gather()
From: David Laight @ 2016-04-14  8:31 UTC (permalink / raw)
  To: 'Joe Stringer', netfilter-devel@vger.kernel.org
  Cc: fw@strlen.de, diproiettod@vmware.com, netdev@vger.kernel.org
In-Reply-To: <1460570974-44050-1-git-send-email-joe@ovn.org>

From: Joe Stringer
> Sent: 13 April 2016 19:10
> This is the IPv6 equivalent of commit 8282f27449bf ("inet: frag: Always
> orphan skbs inside ip_defrag()").
> 
> Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
> clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
> cloned (implicitly orphaning) prior to queueing for reassembly. As such,
> when the IPv6 message is eventually reassembled, the skb->sk for all
> fragments would be NULL. After that commit was introduced, rather than
> cloning, the original skbs were queued directly without orphaning. The
> end result is that all frags except for the first and last may have a
> socket attached.

I'd have thought that the queued fragments would still want to be
resource-counted against the socket (I think that is what skb->sk is for).
So don't really want to orphan skb before queuing them, instead transfer
the entire resource count to the head skb when they are merged.

Although I can't imagine why IPv6 reassembly is happening on skb
associated with a socket.

	David

^ permalink raw reply

* Re: [PATCH nf] netfilter: ipv6: Orphan skbs in nf_ct_frag6_gather()
From: Florian Westphal @ 2016-04-14  8:40 UTC (permalink / raw)
  To: David Laight
  Cc: 'Joe Stringer', netfilter-devel@vger.kernel.org,
	fw@strlen.de, diproiettod@vmware.com, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D5F4A226C@AcuExch.aculab.com>

David Laight <David.Laight@ACULAB.COM> wrote:
> From: Joe Stringer
> > Sent: 13 April 2016 19:10
> > This is the IPv6 equivalent of commit 8282f27449bf ("inet: frag: Always
> > orphan skbs inside ip_defrag()").
> > 
> > Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
> > clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
> > cloned (implicitly orphaning) prior to queueing for reassembly. As such,
> > when the IPv6 message is eventually reassembled, the skb->sk for all
> > fragments would be NULL. After that commit was introduced, rather than
> > cloning, the original skbs were queued directly without orphaning. The
> > end result is that all frags except for the first and last may have a
> > socket attached.
> 
> I'd have thought that the queued fragments would still want to be
> resource-counted against the socket (I think that is what skb->sk is for).

No, ipv4/ipv6 reasm has its own accouting.

> Although I can't imagine why IPv6 reassembly is happening on skb
> associated with a socket.

Right, thats a much more interesting question -- both ipv4 and
ipv6 orphan skbs before NF_HOOK prerouting trip.

(That being said, I don't mind the patch, I'm just be curious how this
 can happen).

^ permalink raw reply

* RE: [PATCH] qlge: Replace create_singlethread_workqueue with alloc_ordered_workqueue
From: Manish Chopra @ 2016-04-14  7:25 UTC (permalink / raw)
  To: Tejun Heo, Amitoj Kaur Chawla
  Cc: Sudarsana Kalluru, netdev, linux-kernel, Dept-Eng Linux Driver,
	Harish Patil, Dept-GE Linux NIC Dev
In-Reply-To: <20160413174422.GB3676@htj.duckdns.org>

> -----Original Message-----
> From: dept_hsg_linux_nic_dev-bounces@qlclistserver.qlogic.com
> [mailto:dept_hsg_linux_nic_dev-bounces@qlclistserver.qlogic.com] On Behalf
> Of Tejun Heo
> Sent: Wednesday, April 13, 2016 11:14 PM
> To: Amitoj Kaur Chawla <amitoj1606@gmail.com>
> Cc: Sudarsana Kalluru <Sudarsana.Kalluru@qlogic.com>; netdev
> <netdev@vger.kernel.org>; linux-kernel <linux-kernel@vger.kernel.org>; Dept-
> Eng Linux Driver <Linux-Driver@qlogic.com>; Harish Patil
> <harish.patil@qlogic.com>; Dept-GE Linux NIC Dev <Dept-
> GELinuxNICDev@qlogic.com>
> Subject: Re: [PATCH] qlge: Replace create_singlethread_workqueue with
> alloc_ordered_workqueue
> 
> On Sat, Apr 09, 2016 at 05:27:45PM +0530, Amitoj Kaur Chawla wrote:
> > Replace deprecated create_singlethread_workqueue with
> > alloc_ordered_workqueue.
> >
> > Work items include getting tx/rx frame sizes, resetting MPI processor,
> > setting asic recovery bit so ordering seems necessary as only one work
> > item should be in queue/executing at any given time, hence the use of
> > alloc_ordered_workqueue.
> >
> > WQ_MEM_RECLAIM flag has been set since ethernet devices seem to sit in
> > memory reclaim path, so to guarantee forward progress regardless of
> > memory pressure.
> >
> > Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
> > Acked-by: Tejun Heo <tj@kernel.org>
> 
> Ping?
> 

Hi,
Just want to confirm that __WQ_LEGACY flag is not necessary here as this is removed
with this change ? 

Thanks
 

^ permalink raw reply

* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: Michael S. Tsirkin @ 2016-04-14  9:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
	Eric W. Biederman, Greg Kurz
In-Reply-To: <570F3D78.3070203@redhat.com>

On Thu, Apr 14, 2016 at 02:49:28PM +0800, Jason Wang wrote:
> 
> 
> On 04/13/2016 06:26 PM, Michael S. Tsirkin wrote:
> > On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
> >> This patch disables the default qdisc by explicitly setting the
> >> IFF_NO_QUEUE private flag so that now the tun xmit path do not
> >> require any lock by default.
> >>
> >> The default qdisc was first removed as a side effect of commit
> >> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
> >> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
> >>
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > I wonder about this back and forth.
> > Jason, do you see a workload where the default qdisc
> > is preferable?
> 
> I don't know, but we used to behave like this so we'd better keep it.
> 
> An interesting thing is I vaguely remember that you have some concern
> when I propose IFF_NO_QUEUE for macvtap[1] :)
> 
> [1] https://lkml.org/lkml/2015/8/24/147

It's the same concern - that we aren't fully addressing
the problem, so if user configures a qdisc, we are back to square 1.
It's especially annoying that IIUC in this setup, if one
does configured a non default qdisc, there's no way to go back.
It doesn't necessarily mean we must not do it as an intermediate step,
though.

> 
> I think this could be done by management or more safe by introducing a
> new tun flag (TUN_NO_QUEUE).

What exactly does this flag do/mean?

> >
> >> ---
> >>  drivers/net/tun.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> index 42992dc..0a0b63c 100644
> >> --- a/drivers/net/tun.c
> >> +++ b/drivers/net/tun.c
> >> @@ -1081,6 +1081,7 @@ static void tun_net_init(struct net_device *dev)
> >>  
> >>  		break;
> >>  	}
> >> +	dev->priv_flags |= IFF_NO_QUEUE;
> >>  }
> >>  
> >>  /* Character device part */
> >> -- 
> >> 1.8.3.1

^ permalink raw reply

* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: Jason Wang @ 2016-04-14  9:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
	Eric W. Biederman, Greg Kurz
In-Reply-To: <20160414090532.GA2185@redhat.com>



On 04/14/2016 05:05 PM, Michael S. Tsirkin wrote:
> On Thu, Apr 14, 2016 at 02:49:28PM +0800, Jason Wang wrote:
>> > 
>> > 
>> > On 04/13/2016 06:26 PM, Michael S. Tsirkin wrote:
>>> > > On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
>>>> > >> This patch disables the default qdisc by explicitly setting the
>>>> > >> IFF_NO_QUEUE private flag so that now the tun xmit path do not
>>>> > >> require any lock by default.
>>>> > >>
>>>> > >> The default qdisc was first removed as a side effect of commit
>>>> > >> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
>>>> > >> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
>>>> > >>
>>>> > >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> > > I wonder about this back and forth.
>>> > > Jason, do you see a workload where the default qdisc
>>> > > is preferable?
>> > 
>> > I don't know, but we used to behave like this so we'd better keep it.
>> > 
>> > An interesting thing is I vaguely remember that you have some concern
>> > when I propose IFF_NO_QUEUE for macvtap[1] :)
>> > 
>> > [1] https://lkml.org/lkml/2015/8/24/147
> It's the same concern - that we aren't fully addressing
> the problem, so if user configures a qdisc, we are back to square 1.
> It's especially annoying that IIUC in this setup, if one
> does configured a non default qdisc, there's no way to go back.
> It doesn't necessarily mean we must not do it as an intermediate step,
> though.
>
>> > 
>> > I think this could be done by management or more safe by introducing a
>> > new tun flag (TUN_NO_QUEUE).
> What exactly does this flag do/mean?

It means we don't need qdisc for this tuntap, so we can set IFF_NO_QUEUE
flag.

^ permalink raw reply

* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: Michael S. Tsirkin @ 2016-04-14  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
	Eric W. Biederman, Greg Kurz
In-Reply-To: <570F5DE6.9000305@redhat.com>

On Thu, Apr 14, 2016 at 05:07:50PM +0800, Jason Wang wrote:
> 
> 
> On 04/14/2016 05:05 PM, Michael S. Tsirkin wrote:
> > On Thu, Apr 14, 2016 at 02:49:28PM +0800, Jason Wang wrote:
> >> > 
> >> > 
> >> > On 04/13/2016 06:26 PM, Michael S. Tsirkin wrote:
> >>> > > On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
> >>>> > >> This patch disables the default qdisc by explicitly setting the
> >>>> > >> IFF_NO_QUEUE private flag so that now the tun xmit path do not
> >>>> > >> require any lock by default.
> >>>> > >>
> >>>> > >> The default qdisc was first removed as a side effect of commit
> >>>> > >> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
> >>>> > >> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
> >>>> > >>
> >>>> > >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>> > > I wonder about this back and forth.
> >>> > > Jason, do you see a workload where the default qdisc
> >>> > > is preferable?
> >> > 
> >> > I don't know, but we used to behave like this so we'd better keep it.
> >> > 
> >> > An interesting thing is I vaguely remember that you have some concern
> >> > when I propose IFF_NO_QUEUE for macvtap[1] :)
> >> > 
> >> > [1] https://lkml.org/lkml/2015/8/24/147
> > It's the same concern - that we aren't fully addressing
> > the problem, so if user configures a qdisc, we are back to square 1.
> > It's especially annoying that IIUC in this setup, if one
> > does configured a non default qdisc, there's no way to go back.
> > It doesn't necessarily mean we must not do it as an intermediate step,
> > though.
> >
> >> > 
> >> > I think this could be done by management or more safe by introducing a
> >> > new tun flag (TUN_NO_QUEUE).
> > What exactly does this flag do/mean?
> 
> It means we don't need qdisc for this tuntap, so we can set IFF_NO_QUEUE
> flag.

But what does it mean for the user? When to set it and when not to set
it?

-- 
MST

^ permalink raw reply

* Re: [PATCHv3 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Phil Sutter @ 2016-04-14  9:13 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem, eric.dumazet
In-Reply-To: <cd6bf9c2696f125a74491e1f82b77a30bb1005dd.1460618169.git.lucien.xin@gmail.com>

Hi Xin Long,

On Thu, Apr 14, 2016 at 03:35:30PM +0800, Xin Long wrote:
> sctp_diag will dump some important details of sctp's assoc or ep, we use
> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
> it to sctp_diag.ko.
> 
> v2->v3:
> - we will not use list_for_each_safe in sctp_get_sctp_info, cause
>   all the callers of it will use lock_sock.
> 
> - fix the holes in struct sctp_info with __reserved* field.
>   because sctp_diag is a new feature, and sctp_info is just for now,
>   it may be changed in the future.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

You're mixing commit message and changelog here (and in the following
patches, too). In order to not make the changelog part of the commit
message, you should separate it with three dashes, just like
git-format-patch does with the diff stat. So please reformat the patches
to meet this pattern:

------------------8<--------------------
Commit message here

Signed-off-by
---
Changelog here
---
diff stat added by Git

diff --git a/foo/bar.c b/foo/bar.c
...
------------------8<--------------------

If you think some of the changelog's info should be preserved, better
integrate it into the message itself.

Cheers, Phil

^ permalink raw reply

* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: Jason Wang @ 2016-04-14  9:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
	Eric W. Biederman, Greg Kurz
In-Reply-To: <20160414120826-mutt-send-email-mst@redhat.com>



On 04/14/2016 05:10 PM, Michael S. Tsirkin wrote:
> On Thu, Apr 14, 2016 at 05:07:50PM +0800, Jason Wang wrote:
>>
>> On 04/14/2016 05:05 PM, Michael S. Tsirkin wrote:
>>> On Thu, Apr 14, 2016 at 02:49:28PM +0800, Jason Wang wrote:
>>>>>
>>>>> On 04/13/2016 06:26 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
>>>>>>>>> This patch disables the default qdisc by explicitly setting the
>>>>>>>>> IFF_NO_QUEUE private flag so that now the tun xmit path do not
>>>>>>>>> require any lock by default.
>>>>>>>>>
>>>>>>>>> The default qdisc was first removed as a side effect of commit
>>>>>>>>> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
>>>>>>>>> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
>>>>>>>>>
>>>>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>>>> I wonder about this back and forth.
>>>>>>> Jason, do you see a workload where the default qdisc
>>>>>>> is preferable?
>>>>> I don't know, but we used to behave like this so we'd better keep it.
>>>>>
>>>>> An interesting thing is I vaguely remember that you have some concern
>>>>> when I propose IFF_NO_QUEUE for macvtap[1] :)
>>>>>
>>>>> [1] https://lkml.org/lkml/2015/8/24/147
>>> It's the same concern - that we aren't fully addressing
>>> the problem, so if user configures a qdisc, we are back to square 1.
>>> It's especially annoying that IIUC in this setup, if one
>>> does configured a non default qdisc, there's no way to go back.
>>> It doesn't necessarily mean we must not do it as an intermediate step,
>>> though.
>>>
>>>>> I think this could be done by management or more safe by introducing a
>>>>> new tun flag (TUN_NO_QUEUE).
>>> What exactly does this flag do/mean?
>> It means we don't need qdisc for this tuntap, so we can set IFF_NO_QUEUE
>> flag.
> But what does it mean for the user? When to set it and when not to set
> it?

It was used for user that don't want qdisc (e.g for the user that only
cares about performance).

^ permalink raw reply

* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: Michael S. Tsirkin @ 2016-04-14 10:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Paolo Abeni, netdev, David S. Miller, Hannes Frederic Sowa,
	Eric W. Biederman, Greg Kurz
In-Reply-To: <570F6124.6060107@redhat.com>

On Thu, Apr 14, 2016 at 05:21:40PM +0800, Jason Wang wrote:
> 
> 
> On 04/14/2016 05:10 PM, Michael S. Tsirkin wrote:
> > On Thu, Apr 14, 2016 at 05:07:50PM +0800, Jason Wang wrote:
> >>
> >> On 04/14/2016 05:05 PM, Michael S. Tsirkin wrote:
> >>> On Thu, Apr 14, 2016 at 02:49:28PM +0800, Jason Wang wrote:
> >>>>>
> >>>>> On 04/13/2016 06:26 PM, Michael S. Tsirkin wrote:
> >>>>>>> On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
> >>>>>>>>> This patch disables the default qdisc by explicitly setting the
> >>>>>>>>> IFF_NO_QUEUE private flag so that now the tun xmit path do not
> >>>>>>>>> require any lock by default.
> >>>>>>>>>
> >>>>>>>>> The default qdisc was first removed as a side effect of commit
> >>>>>>>>> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
> >>>>>>>>> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> >>>>>>> I wonder about this back and forth.
> >>>>>>> Jason, do you see a workload where the default qdisc
> >>>>>>> is preferable?
> >>>>> I don't know, but we used to behave like this so we'd better keep it.
> >>>>>
> >>>>> An interesting thing is I vaguely remember that you have some concern
> >>>>> when I propose IFF_NO_QUEUE for macvtap[1] :)
> >>>>>
> >>>>> [1] https://lkml.org/lkml/2015/8/24/147
> >>> It's the same concern - that we aren't fully addressing
> >>> the problem, so if user configures a qdisc, we are back to square 1.
> >>> It's especially annoying that IIUC in this setup, if one
> >>> does configured a non default qdisc, there's no way to go back.
> >>> It doesn't necessarily mean we must not do it as an intermediate step,
> >>> though.
> >>>
> >>>>> I think this could be done by management or more safe by introducing a
> >>>>> new tun flag (TUN_NO_QUEUE).
> >>> What exactly does this flag do/mean?
> >> It means we don't need qdisc for this tuntap, so we can set IFF_NO_QUEUE
> >> flag.
> > But what does it mean for the user? When to set it and when not to set
> > it?
> 
> It was used for user that don't want qdisc (e.g for the user that only
> cares about performance).

However
- for tun, how is a fifo qdisc functionally different?
- it doesn't prevent configuring a qdisc, does it?

-- 
MST

^ permalink raw reply

* Re: [PATCH RFC 2/2] tun: don't set a default qdisc
From: Hannes Frederic Sowa @ 2016-04-14 10:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Paolo Abeni, netdev, David S. Miller, Eric W. Biederman,
	Greg Kurz
In-Reply-To: <20160414125822-mutt-send-email-mst@redhat.com>

On 14.04.2016 12:01, Michael S. Tsirkin wrote:
> On Thu, Apr 14, 2016 at 05:21:40PM +0800, Jason Wang wrote:
>>
>>
>> On 04/14/2016 05:10 PM, Michael S. Tsirkin wrote:
>>> On Thu, Apr 14, 2016 at 05:07:50PM +0800, Jason Wang wrote:
>>>>
>>>> On 04/14/2016 05:05 PM, Michael S. Tsirkin wrote:
>>>>> On Thu, Apr 14, 2016 at 02:49:28PM +0800, Jason Wang wrote:
>>>>>>>
>>>>>>> On 04/13/2016 06:26 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Apr 13, 2016 at 11:04:47AM +0200, Paolo Abeni wrote:
>>>>>>>>>>> This patch disables the default qdisc by explicitly setting the
>>>>>>>>>>> IFF_NO_QUEUE private flag so that now the tun xmit path do not
>>>>>>>>>>> require any lock by default.
>>>>>>>>>>>
>>>>>>>>>>> The default qdisc was first removed as a side effect of commit
>>>>>>>>>>> f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
>>>>>>>>>>> and recently restored with commit 016adb7260f4 ("tuntap: restore default qdisc")
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>>>>>> I wonder about this back and forth.
>>>>>>>>> Jason, do you see a workload where the default qdisc
>>>>>>>>> is preferable?
>>>>>>> I don't know, but we used to behave like this so we'd better keep it.
>>>>>>>
>>>>>>> An interesting thing is I vaguely remember that you have some concern
>>>>>>> when I propose IFF_NO_QUEUE for macvtap[1] :)
>>>>>>>
>>>>>>> [1] https://lkml.org/lkml/2015/8/24/147
>>>>> It's the same concern - that we aren't fully addressing
>>>>> the problem, so if user configures a qdisc, we are back to square 1.
>>>>> It's especially annoying that IIUC in this setup, if one
>>>>> does configured a non default qdisc, there's no way to go back.
>>>>> It doesn't necessarily mean we must not do it as an intermediate step,
>>>>> though.
>>>>>
>>>>>>> I think this could be done by management or more safe by introducing a
>>>>>>> new tun flag (TUN_NO_QUEUE).
>>>>> What exactly does this flag do/mean?
>>>> It means we don't need qdisc for this tuntap, so we can set IFF_NO_QUEUE
>>>> flag.
>>> But what does it mean for the user? When to set it and when not to set
>>> it?
>>
>> It was used for user that don't want qdisc (e.g for the user that only
>> cares about performance).
>
> However
> - for tun, how is a fifo qdisc functionally different?

pfifo wouldn't matter match, but the pfifo_qdisc classifies packets into 
different bands and does alter the way ip packets are transmitted 
outside. With the LLTX change I don't see how in any case a queue should 
build up, so I don't really see this as a problem.

The only reason why this change could do harm is scripting and API 
compatibility, e.g. tools expect after creation a qdisc to be present to 
modify its behavior.

> - it doesn't prevent configuring a qdisc, does it?

Correct, you can still add qdiscs later on and switch back to no_queue, 
also. We are just talking about changing the default.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
From: Holger Schurig @ 2016-04-14 10:13 UTC (permalink / raw)
  To: Troy Kisky
  Cc: netdev, davem, fugang.duan, lznuaa, andrew, stillcompiling, arnd,
	sergei.shtylyov, gerg, fabio.estevam, johannes, l.stach,
	linux-arm-kernel, tremyfr
In-Reply-To: <1459909562-22865-1-git-send-email-troy.kisky@boundarydevices.com>

Do you guys that work with the FEC driver ever run with
CONFIG_DMA_API_DEBUG enabled?

I ask this Because I get this error when it's turned on when I do some
"rsync" transfer to my device:

[   58.420980] ------------[ cut here ]------------
[   58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux-4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8()
[   58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=66 bytes]
[   58.450248] Modules linked in: bnep usbhid imx_sdma flexcan btusb btrtl btbcm btintel smsc95xx usbnet mii bluetooth
[   58.460882] CPU: 0 PID: 377 Comm: sshd Tainted: G        W       4.5.1 #3
[   58.467671] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   58.474199] Backtrace: 
[   58.476675] [<c0012a24>] (dump_backtrace) from [<c0012c20>] (show_stack+0x18/0x1c)
[   58.484244]  r6:60000113 r5:c05a96c0 r4:00000000 r3:00000000
[   58.489964] [<c0012c08>] (show_stack) from [<c01dbc4c>] (dump_stack+0x9c/0xb0)
[   58.497197] [<c01dbbb0>] (dump_stack) from [<c001f558>] (warn_slowpath_common+0x8c/0xbc)
[   58.505286]  r6:c01f9c74 r5:00000009 r4:ee9f17f8 r3:c0596da4
[   58.511002] [<c001f4cc>] (warn_slowpath_common) from [<c001f5c0>] (warn_slowpath_fmt+0x38/0x40)
[   58.519698]  r8:00000042 r7:00000001 r6:00000000 r5:00000000 r4:c050c020
[   58.526470] [<c001f58c>] (warn_slowpath_fmt) from [<c01f9c74>] (check_unmap+0x9d0/0xab8)
[   58.534559]  r3:c0520e6c r2:c050c020
[   58.538159]  r4:00000000
[   58.540710] [<c01f92a4>] (check_unmap) from [<c01f9de0>] (debug_dma_unmap_page+0x84/0x8c)
[   58.548886]  r10:ef2ec000 r9:f09e5fa0 r8:ef0ef810 r7:00000001 r6:00000000 r5:00000042
[   58.556780]  r4:00000001
[   58.559336] [<c01f9d5c>] (debug_dma_unmap_page) from [<c02cdf00>] (fec_txq+0x140/0x31c)
[   58.567338]  r8:ef0ef810 r7:00000000 r6:00000000 r5:00000000 r4:ef2c6000
[   58.574108] [<c02cddc0>] (fec_txq) from [<c02ce2f4>] (fec_enet_napi_q1+0x98/0xe8)
[   58.581589]  r10:08000000 r9:ef2ec580 r8:00000000 r7:00000040 r6:00000000 r5:ef2ec000
[   58.589483]  r4:0c008000
[   58.592042] [<c02ce25c>] (fec_enet_napi_q1) from [<c038b3d8>] (net_rx_action+0x1fc/0x2f0)
[   58.600218]  r10:ee9f19c0 r9:00000040 r8:c059e100 r7:0000012c r6:ffffa1a3 r5:c02ce25c
[   58.608112]  r4:ef2ec580 r3:ee9f19c0
[   58.611720] [<c038b1dc>] (net_rx_action) from [<c00224c4>] (__do_softirq+0x134/0x254)
[   58.619549]  r10:c059e080 r9:40000003 r8:00000100 r7:ee9f0000 r6:c059e08c r5:00000003
[   58.627445]  r4:00000000
[   58.629995] [<c0022390>] (__do_softirq) from [<c00228a8>] (irq_exit+0xb8/0x120)
[   58.637303]  r10:ee9f1e38 r9:f4001100 r8:ef008000 r7:00000001 r6:00000000 r5:00000000
[   58.645197]  r4:c05970b8
[   58.647754] [<c00227f0>] (irq_exit) from [<c0061340>] (__handle_domain_irq+0x68/0xbc)
[   58.655583]  r4:c05970b8 r3:c0064e24
[   58.659190] [<c00612d8>] (__handle_domain_irq) from [<c00093f8>] (gic_handle_irq+0x50/0x90)
[   58.667539]  r8:f4000100 r7:ee9f1ac8 r6:f400010c r5:c059e7a0 r4:c05a9788 r3:ee9f1ac8
[   58.675350] [<c00093a8>] (gic_handle_irq) from [<c0013740>] (__irq_svc+0x40/0x54)
[   58.682833] Exception stack(0xee9f1ac8 to 0xee9f1b10)
[   58.687887] 1ac0:                   00000000 ee9c0d4c 0000000c 00000000 00000000 00000000
[   58.696067] 1ae0: ee9f1e38 ee9f1e3c ee9f1e40 edc6ac00 ee9f1e38 ee9f1e1c 00000000 ee9f1b18
[   58.704245] 1b00: ee9c0d4c c00df648 60000013 ffffffff
[   58.709295]  r9:edc6ac00 r8:ee9f1e40 r7:ee9f1afc r6:ffffffff r5:60000013 r4:c00df648
[   58.717112] [<c00df544>] (do_select) from [<c00dfcb8>] (core_sys_select+0x144/0x320)
[   58.724854]  r10:ee9f1e38 r9:ee9f1e38 r8:0000000c r7:805af838 r6:00000000 r5:805af848
[   58.732749]  r4:00000004
[   58.735300] [<c00dfb74>] (core_sys_select) from [<c00dff68>] (SyS_select+0xd4/0x120)
[   58.743042]  r10:00000000 r9:0000000c r8:805af848 r7:805af838 r6:00000000 r5:ee9f1f70
[   58.750936]  r4:00000000
[   58.753488] [<c00dfe94>] (SyS_select) from [<c000f820>] (ret_fast_syscall+0x0/0x34)
[   58.761143]  r9:ee9f0000 r8:c000f9c4 r7:0000008e r6:00000000 r5:7f5f77c0 r4:00000000
[   58.768984] ---[ end trace cb88537fdc8fa202 ]---

The amount of data transferred isn't even huge:

sent 382,979 bytes  received 28,086 bytes  32,885.20 bytes/sec
total size is 147,758,955  speedup is 359.45



This happens with:

* Kernel 4.5
* Kernel 4.5.1
* Kernel 4.5.1 with the fec-related patches from 4.6-rc3
* Kernel 4.5.1 with the fec-related patches from 4.6-rc3 and Troy's
  patch series from this thread


Should I post an extra e-mail with "BUG" in the subject?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox