Netdev List
 help / color / mirror / Atom feed
* Re: resurrecting tcphealth
From: Piotr Sawuk @ 2012-07-21 10:34 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel
In-Reply-To: <CAK6E8=dKZUJ3evPDGc3gP0a5bsBbYnL0NGhPZenB=T_t+5Kx5w@mail.gmail.com>

On Fr, 20.07.2012, 16:06, Yuchung Cheng wrote:
> On Mon, Jul 16, 2012 at 6:03 AM, Piotr Sawuk <a9702387@unet.univie.ac.at>
> wrote:
>> On Mo, 16.07.2012, 13:46, Eric Dumazet wrote:
>>> On Mon, 2012-07-16 at 13:33 +0200, Piotr Sawuk wrote:
>>>> On Sa, 14.07.2012, 01:55, Stephen Hemminger wrote:
>>>> > I am not sure if the is really necessary since the most
>>>> > of the stats are available elsewhere.
>>>>
>>>> if by "most" you mean address and port then you're right.
>>>> but even the rtt reported by "ss -i" seems to differ from tcphealth.
>>>
>>> Thats because tcphealth is wrong, it assumes HZ=1000 ?
>>>
>>> tp->srtt unit is jiffies, not ms.
>>
>> thanks. any conversion-functions in the kernel for that?
>>>
>>> tcphealth is a gross hack.
>>
>> what would you do if you tried making it less gross?
>>
>> I've not found any similar functionality, in the kernel.
>> I want to know an estimate for the percentage of data lost in tcp.
>> and I want to know that without actually sending much packets.
>> afterall I'm on the receiving end most of the time.
>> percentage of duplicate packets received is nice too.
>> you have any suggestions?
>
> counting dupack may not be as reliable as you'd like.
> say the remote sends you 100 packets and only the first one is lost,
> you'll see 99 dupacks. Morover any small degree reordering (<3)
> will generate substantial dupacks but the network is perfectly fine

I understand that.
but you must consider the difference between network-health and tcp-health.
network-health on my end I can see by looking at wlan-signal strength.
slow downloads can have many causes, some loose cable is only one possibility.

for example I once played a lan-game, 2 computers connected directly.
however, one computer was 10 times slower than the other.
so when the slow computer would act as server, the game would never start.
the reason wasn't bad connection, it was packet-loss caused by slowness.
and it had to do with the protocol being used (i.e. not tcp).

so when in tcp I get high percentage of dupack I see something's wrong.
not necessarily with the physical connection, but with protocol-handling.
the paper showed dupacks indicate spikes in network-usage.
and as we all know the bottleneck isn't the cable, it's data-processing.
when there is a spike, lots of users connecting, network itself is ok.
however, reordering and lost packets indicate something's up with the server.
and that's actually the info I'm after.

if I were the net-admin I would be interested in network-health too.
bad connection indicated by packet-loss itself means I've got to check cables.
but a user might have much wider area of interest.
the user can't do anything about the cables, but yet is interested in them.
i.e. useless info for the net-admin could be interesting for the user.
that's why I do not recommend tcphealth for servers, useless overhead.

so, if you want to judge usefulness of this patch, consider the situation:
you are powerless but interested in responsiveness of thousands of servers.
you want to learn how those servers behave at different times of a day.
isn't dupacks and dup-packets the best info on that you can possibly get?

> (see Craig Patridge's "reordering is not pathological" paper).

thanks, will look into that.

> unfortunately receiver can't and does not have to distinguish loss

true, not needed for the protocol.
on a higher level it sill can be interesting though.
most of the work for preventing packetloss must be done by the sender.
but as I said before, the receiver can do something too: avoid traffic-jams!
in a network many things are predictable, can be reprogrammed.
this way a network could become more efficient as a whole.
that's what spikes my interest in tcphealth, thinking more globally.

>  or reordering. you can infer that but it should not be kernel's job.

that's why I made it an option as opposed to what the original author did.
theoretically it should be possible to get the same functionality without it.
just read the raw network-data and emulate the work of tcp and tcphealth.
but that definitely would add a big overhead as tcp-handling is duplicated.

> there are public tools that inspect tcpdump traces to do that

good example. so to figure out dupacks you filter out the acks.
and you must somehow compare them, or you parse them the way the kernel does.
in the latter case you'll have to recreate the kernel's internal data.
definitely faster, but could result in duplicate code, that requires space.

also you should consider that not all users have privilegues for tcpdump.
and if they had, it would add another security-risk to their computer.
and you'd have to consider multiple users on one computer, using that service.
I can imagine a daemon in the background doing what tcphealth does.
that's the alternative, allows for more fine-grained security.
it could disallow spying on what connections other users have.
(of course then you'd need to remove /proc/net/tcp output too.)
but imagine the nightmare of keeping that daemon secure.
afterall it must be privilegued to read all network data.

if the kernel would provide what I'm looking for, this daemon could still run.
but then it wouldn't need that risky privilegues, could focus on other stuff.
the task of preventing users from seeing eachothers connections is enough...
>
> exposing duplicate packets received can be done via getsockopt(TCP_INFO)
> although I don't know what that gives you. the remote can be too
> aggressive in retransmission (not just because of a bad RTO) or
> the network RTT fluctuates.

TCP_INFO contains only duplicate packets *sent* (retransmits), not received!
am I missing something? can you give a code-example that can obtain such info?
if running that code in userspace results in same values as tcphealth...
well, actually dupacks is more interesting than dup-packets.
afterall in usual usage the latter will always be zero.
>
> I don't what if tracking last_ack_sent (the latest RCV.NXT) without
> knowing the ISN is useful.

so you suggest I should store and compare ISN too, for accuracy?
you think the gain in accuracy justifies the added overhead?
>
> btw the term project paper cited concludes SACK is not useful is simply
> wrong. This makes me suspicious about how rigorous and thoughtful of
> its design.

isn't my paper.
but I'd guess the usefulness of SACK is only doubted from pov of users.
remember, users connect to many servers.
if a server behaves badly, choose another one.
servers do not have such a choice, for them SACK naturally is important.

a server would just need to look at TCP_INFO to see how useful SACK is.
so I would conclude the author was quite thoughtful about users' pov.
(and quite ignorant about the servers.)

no matter how little knowledge the authors have, tcphealth is interesting.
maybe it was a random discovery by sheer luck.
the correlation between the data it provides and reality is compelling.
if we'd judge inventions by the stupidity of their inventors...

^ permalink raw reply

* Re: [PATCH] b44: add 64 bit stats
From: Julian Anastasov @ 2012-07-21 10:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kevin Groeneveld, netdev, Simon Horman, Wensong Zhang, lvs-devel
In-Reply-To: <1342847376.2626.8162.camel@edumazet-glaptop>


	Hello,

On Sat, 21 Jul 2012, Eric Dumazet wrote:

> The writer sides might be run concurrently by several cpus, so
> u64_stats_update_begin(&sstats->syncp); are racy : a reader can
> be trapped forever.
> 
> > net/netfilter/ipvs/ip_vs_est.c
> > 
> 
> Same problem for this one, I think.
> 
> I CCed ipvs maintainers so that they can take a look.

	IPVS moved to percpu counters, i.e. even on 32-bit SMP
we do not use locks to protect the seqcounter:

commit b17fc9963f837ef1acfe36e193108fb16ed58647
Author: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date:   Mon Jan 3 14:44:56 2011 +0100

    IPVS: netns, ip_vs_stats and its procfs

> > Do these need to be updated as well?  Looking at these files quickly
> > and with my limited knowledge of the kernel I am not sure if they
> > update the stats in a BH context or not.

	We have 2 kinds of readers:

- timer context (ip_vs_est.c): no _bh is used for fetch
- user context (ip_vs_ctl.c): _bh is used for fetch

> > Kevin
> 
> Thanks !

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH] Crash in tun
From: Nicholas A. Bellinger @ 2012-07-21  9:53 UTC (permalink / raw)
  To: Al Viro
  Cc: David Miller, mikulas, eric.dumazet, maxk, vtun, netdev,
	linux-sctp, Andy Grover, Hannes Reinecke, Christoph Hellwig,
	Mike Christie, target-devel
In-Reply-To: <20120721075518.GY31729@ZenIV.linux.org.uk>

On Sat, 2012-07-21 at 08:55 +0100, Al Viro wrote:
> 	BTW, speaking of struct file treatment related to sockets -
> there's this piece of code in iscsi:
>         /*
>          * The SCTP stack needs struct socket->file.
>          */
>         if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
>             (np->np_network_transport == ISCSI_SCTP_UDP)) {
>                 if (!new_sock->file) {
>                         new_sock->file = kzalloc(
>                                         sizeof(struct file), GFP_KERNEL);
> 
> For one thing, as far as I can see it'not true - sctp does *not* depend on
> socket->file being non-NULL; it does, in one place, check socket->file->f_flags
> for O_NONBLOCK, but there it treats NULL socket->file as "flag not set".
> Which is the case here anyway - the fake struct file created in
> __iscsi_target_login_thread() (and in iscsi_target_setup_login_socket(), with
> the same excuse) do *not* get that flag set.
> 
> Moreover, it's a bloody serious violation of a bunch of asserts in VFS;
> all struct file instances should come from filp_cachep, via get_empty_filp()
> (or alloc_file(), which is a wrapper for it).  FWIW, I'm very tempted to
> do this and be done with the entire mess:
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

<nod>  Merged into target-pending.git/for-next.

For the record, this logic was originally required in order to get non
multi-homed SCTP endpoints with iscsi_target_mod to connect using
Core-iSCSI/SCTP initiators to Linux/SCTP, but it's obviously incorrect
for modern code.

Since we don't have iscsi_sctp for Open/iSCSI code implemented just yet
(mnc CC'ed), adding CC' to drop this incorrect code for stable.

Thank you,

--nab

> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index d57d10c..d7dcd67 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -429,19 +429,8 @@ int iscsit_reset_np_thread(
>  
>  int iscsit_del_np_comm(struct iscsi_np *np)
>  {
> -	if (!np->np_socket)
> -		return 0;
> -
> -	/*
> -	 * Some network transports allocate their own struct sock->file,
> -	 * see  if we need to free any additional allocated resources.
> -	 */
> -	if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
> -		kfree(np->np_socket->file);
> -		np->np_socket->file = NULL;
> -	}
> -
> -	sock_release(np->np_socket);
> +	if (np->np_socket)
> +		sock_release(np->np_socket);
>  	return 0;
>  }
>  
> @@ -4077,13 +4066,8 @@ int iscsit_close_connection(
>  	kfree(conn->conn_ops);
>  	conn->conn_ops = NULL;
>  
> -	if (conn->sock) {
> -		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
> -			kfree(conn->sock->file);
> -			conn->sock->file = NULL;
> -		}
> +	if (conn->sock)
>  		sock_release(conn->sock);
> -	}
>  	conn->thread_set = NULL;
>  
>  	pr_debug("Moving to TARG_CONN_STATE_FREE.\n");
> diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
> index 1c70144..1dd5716 100644
> --- a/drivers/target/iscsi/iscsi_target_core.h
> +++ b/drivers/target/iscsi/iscsi_target_core.h
> @@ -224,7 +224,6 @@ enum iscsi_timer_flags_table {
>  /* Used for struct iscsi_np->np_flags */
>  enum np_flags_table {
>  	NPF_IP_NETWORK		= 0x00,
> -	NPF_SCTP_STRUCT_FILE	= 0x01 /* Bugfix */
>  };
>  
>  /* Used for struct iscsi_np->np_thread_state */
> @@ -503,7 +502,6 @@ struct iscsi_conn {
>  	u16			local_port;
>  	int			net_size;
>  	u32			auth_id;
> -#define CONNFLAG_SCTP_STRUCT_FILE			0x01
>  	u32			conn_flags;
>  	/* Used for iscsi_tx_login_rsp() */
>  	u32			login_itt;
> diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
> index a3656c9..ae30424 100644
> --- a/drivers/target/iscsi/iscsi_target_login.c
> +++ b/drivers/target/iscsi/iscsi_target_login.c
> @@ -795,22 +795,6 @@ int iscsi_target_setup_login_socket(
>  	}
>  	np->np_socket = sock;
>  	/*
> -	 * The SCTP stack needs struct socket->file.
> -	 */
> -	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
> -	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
> -		if (!sock->file) {
> -			sock->file = kzalloc(sizeof(struct file), GFP_KERNEL);
> -			if (!sock->file) {
> -				pr_err("Unable to allocate struct"
> -						" file for SCTP\n");
> -				ret = -ENOMEM;
> -				goto fail;
> -			}
> -			np->np_flags |= NPF_SCTP_STRUCT_FILE;
> -		}
> -	}
> -	/*
>  	 * Setup the np->np_sockaddr from the passed sockaddr setup
>  	 * in iscsi_target_configfs.c code..
>  	 */
> @@ -869,21 +853,15 @@ int iscsi_target_setup_login_socket(
>  
>  fail:
>  	np->np_socket = NULL;
> -	if (sock) {
> -		if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
> -			kfree(sock->file);
> -			sock->file = NULL;
> -		}
> -
> +	if (sock)
>  		sock_release(sock);
> -	}
>  	return ret;
>  }
>  
>  static int __iscsi_target_login_thread(struct iscsi_np *np)
>  {
>  	u8 buffer[ISCSI_HDR_LEN], iscsi_opcode, zero_tsih = 0;
> -	int err, ret = 0, set_sctp_conn_flag, stop;
> +	int err, ret = 0, stop;
>  	struct iscsi_conn *conn = NULL;
>  	struct iscsi_login *login;
>  	struct iscsi_portal_group *tpg = NULL;
> @@ -894,7 +872,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
>  	struct sockaddr_in6 sock_in6;
>  
>  	flush_signals(current);
> -	set_sctp_conn_flag = 0;
>  	sock = np->np_socket;
>  
>  	spin_lock_bh(&np->np_thread_lock);
> @@ -917,35 +894,12 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
>  		spin_unlock_bh(&np->np_thread_lock);
>  		goto out;
>  	}
> -	/*
> -	 * The SCTP stack needs struct socket->file.
> -	 */
> -	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
> -	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
> -		if (!new_sock->file) {
> -			new_sock->file = kzalloc(
> -					sizeof(struct file), GFP_KERNEL);
> -			if (!new_sock->file) {
> -				pr_err("Unable to allocate struct"
> -						" file for SCTP\n");
> -				sock_release(new_sock);
> -				/* Get another socket */
> -				return 1;
> -			}
> -			set_sctp_conn_flag = 1;
> -		}
> -	}
> -
>  	iscsi_start_login_thread_timer(np);
>  
>  	conn = kzalloc(sizeof(struct iscsi_conn), GFP_KERNEL);
>  	if (!conn) {
>  		pr_err("Could not allocate memory for"
>  			" new connection\n");
> -		if (set_sctp_conn_flag) {
> -			kfree(new_sock->file);
> -			new_sock->file = NULL;
> -		}
>  		sock_release(new_sock);
>  		/* Get another socket */
>  		return 1;
> @@ -955,9 +909,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
>  	conn->conn_state = TARG_CONN_STATE_FREE;
>  	conn->sock = new_sock;
>  
> -	if (set_sctp_conn_flag)
> -		conn->conn_flags |= CONNFLAG_SCTP_STRUCT_FILE;
> -
>  	pr_debug("Moving to TARG_CONN_STATE_XPT_UP.\n");
>  	conn->conn_state = TARG_CONN_STATE_XPT_UP;
>  
> @@ -1205,13 +1156,8 @@ old_sess_out:
>  		iscsi_release_param_list(conn->param_list);
>  		conn->param_list = NULL;
>  	}
> -	if (conn->sock) {
> -		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
> -			kfree(conn->sock->file);
> -			conn->sock->file = NULL;
> -		}
> +	if (conn->sock)
>  		sock_release(conn->sock);
> -	}
>  	kfree(conn);
>  
>  	if (tpg) {

^ permalink raw reply

* Re: [PATCH net-next v2] ipv4: tcp: remove per net tcp_sock
From: Eric Dumazet @ 2012-07-21  8:28 UTC (permalink / raw)
  To: Hiroaki SHIMODA; +Cc: davem, netdev, therbert, wsommerfeld
In-Reply-To: <20120721170035.be1bb3aff663ecf9e81028e2@gmail.com>


On Sat, 2012-07-21 at 17:00 +0900, Hiroaki SHIMODA wrote:
> On Thu, 19 Jul 2012 19:34:03 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > +static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
> > +	.sk = {
> > +		.__sk_common = {
> > +			.skc_refcnt = ATOMIC_INIT(1),
> > +		},
> > +		.sk_wmem_alloc	= ATOMIC_INIT(1),
> > +		.sk_allocation	= GFP_ATOMIC,
> > +		.sk_flags	= (1UL << SOCK_USE_WRITE_QUEUE),
> > +	},
> > +	.pmtudisc = IP_PMTUDISC_WANT,
> > +};
> 
> RST packet generated from unicast_sock have 0 TTL value.
> I think ".uc_ttl = -1" is needed in above initialization.

Good catch, thanks a lot !

[PATCH net-next] ipv4: tcp: set unicast_sock uc_ttl to -1

Set unicast_sock uc_ttl to -1 so that we select the right ttl,
instead of sending packets with a 0 ttl.

Bug added in commit be9f4a44e7d4 (ipv4: tcp: remove per net tcp_sock)

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ip_output.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c528f84..665abbb 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1476,7 +1476,8 @@ static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
 		.sk_allocation	= GFP_ATOMIC,
 		.sk_flags	= (1UL << SOCK_USE_WRITE_QUEUE),
 	},
-	.pmtudisc = IP_PMTUDISC_WANT,
+	.pmtudisc	= IP_PMTUDISC_WANT,
+	.uc_ttl		= -1,
 };
 
 void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,

^ permalink raw reply related

* Re: [PATCH net-next v2] ipv4: tcp: remove per net tcp_sock
From: Hiroaki SHIMODA @ 2012-07-21  8:00 UTC (permalink / raw)
  To: eric.dumazet; +Cc: davem, netdev, therbert, wsommerfeld
In-Reply-To: <1342719243.2626.4571.camel@edumazet-glaptop>

On Thu, 19 Jul 2012 19:34:03 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> +static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
> +	.sk = {
> +		.__sk_common = {
> +			.skc_refcnt = ATOMIC_INIT(1),
> +		},
> +		.sk_wmem_alloc	= ATOMIC_INIT(1),
> +		.sk_allocation	= GFP_ATOMIC,
> +		.sk_flags	= (1UL << SOCK_USE_WRITE_QUEUE),
> +	},
> +	.pmtudisc = IP_PMTUDISC_WANT,
> +};

RST packet generated from unicast_sock have 0 TTL value.
I think ".uc_ttl = -1" is needed in above initialization.

^ permalink raw reply

* Re: [PATCH] Crash in tun
From: Al Viro @ 2012-07-21  7:55 UTC (permalink / raw)
  To: David Miller
  Cc: mikulas, eric.dumazet, maxk, vtun, netdev, Nicholas A. Bellinger,
	linux-sctp
In-Reply-To: <20120720.112337.474955511809249636.davem@davemloft.net>

	BTW, speaking of struct file treatment related to sockets -
there's this piece of code in iscsi:
        /*
         * The SCTP stack needs struct socket->file.
         */
        if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
            (np->np_network_transport == ISCSI_SCTP_UDP)) {
                if (!new_sock->file) {
                        new_sock->file = kzalloc(
                                        sizeof(struct file), GFP_KERNEL);

For one thing, as far as I can see it'not true - sctp does *not* depend on
socket->file being non-NULL; it does, in one place, check socket->file->f_flags
for O_NONBLOCK, but there it treats NULL socket->file as "flag not set".
Which is the case here anyway - the fake struct file created in
__iscsi_target_login_thread() (and in iscsi_target_setup_login_socket(), with
the same excuse) do *not* get that flag set.

Moreover, it's a bloody serious violation of a bunch of asserts in VFS;
all struct file instances should come from filp_cachep, via get_empty_filp()
(or alloc_file(), which is a wrapper for it).  FWIW, I'm very tempted to
do this and be done with the entire mess:

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index d57d10c..d7dcd67 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -429,19 +429,8 @@ int iscsit_reset_np_thread(
 
 int iscsit_del_np_comm(struct iscsi_np *np)
 {
-	if (!np->np_socket)
-		return 0;
-
-	/*
-	 * Some network transports allocate their own struct sock->file,
-	 * see  if we need to free any additional allocated resources.
-	 */
-	if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
-		kfree(np->np_socket->file);
-		np->np_socket->file = NULL;
-	}
-
-	sock_release(np->np_socket);
+	if (np->np_socket)
+		sock_release(np->np_socket);
 	return 0;
 }
 
@@ -4077,13 +4066,8 @@ int iscsit_close_connection(
 	kfree(conn->conn_ops);
 	conn->conn_ops = NULL;
 
-	if (conn->sock) {
-		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
-			kfree(conn->sock->file);
-			conn->sock->file = NULL;
-		}
+	if (conn->sock)
 		sock_release(conn->sock);
-	}
 	conn->thread_set = NULL;
 
 	pr_debug("Moving to TARG_CONN_STATE_FREE.\n");
diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
index 1c70144..1dd5716 100644
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -224,7 +224,6 @@ enum iscsi_timer_flags_table {
 /* Used for struct iscsi_np->np_flags */
 enum np_flags_table {
 	NPF_IP_NETWORK		= 0x00,
-	NPF_SCTP_STRUCT_FILE	= 0x01 /* Bugfix */
 };
 
 /* Used for struct iscsi_np->np_thread_state */
@@ -503,7 +502,6 @@ struct iscsi_conn {
 	u16			local_port;
 	int			net_size;
 	u32			auth_id;
-#define CONNFLAG_SCTP_STRUCT_FILE			0x01
 	u32			conn_flags;
 	/* Used for iscsi_tx_login_rsp() */
 	u32			login_itt;
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index a3656c9..ae30424 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -795,22 +795,6 @@ int iscsi_target_setup_login_socket(
 	}
 	np->np_socket = sock;
 	/*
-	 * The SCTP stack needs struct socket->file.
-	 */
-	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
-	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
-		if (!sock->file) {
-			sock->file = kzalloc(sizeof(struct file), GFP_KERNEL);
-			if (!sock->file) {
-				pr_err("Unable to allocate struct"
-						" file for SCTP\n");
-				ret = -ENOMEM;
-				goto fail;
-			}
-			np->np_flags |= NPF_SCTP_STRUCT_FILE;
-		}
-	}
-	/*
 	 * Setup the np->np_sockaddr from the passed sockaddr setup
 	 * in iscsi_target_configfs.c code..
 	 */
@@ -869,21 +853,15 @@ int iscsi_target_setup_login_socket(
 
 fail:
 	np->np_socket = NULL;
-	if (sock) {
-		if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
-			kfree(sock->file);
-			sock->file = NULL;
-		}
-
+	if (sock)
 		sock_release(sock);
-	}
 	return ret;
 }
 
 static int __iscsi_target_login_thread(struct iscsi_np *np)
 {
 	u8 buffer[ISCSI_HDR_LEN], iscsi_opcode, zero_tsih = 0;
-	int err, ret = 0, set_sctp_conn_flag, stop;
+	int err, ret = 0, stop;
 	struct iscsi_conn *conn = NULL;
 	struct iscsi_login *login;
 	struct iscsi_portal_group *tpg = NULL;
@@ -894,7 +872,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
 	struct sockaddr_in6 sock_in6;
 
 	flush_signals(current);
-	set_sctp_conn_flag = 0;
 	sock = np->np_socket;
 
 	spin_lock_bh(&np->np_thread_lock);
@@ -917,35 +894,12 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
 		spin_unlock_bh(&np->np_thread_lock);
 		goto out;
 	}
-	/*
-	 * The SCTP stack needs struct socket->file.
-	 */
-	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
-	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
-		if (!new_sock->file) {
-			new_sock->file = kzalloc(
-					sizeof(struct file), GFP_KERNEL);
-			if (!new_sock->file) {
-				pr_err("Unable to allocate struct"
-						" file for SCTP\n");
-				sock_release(new_sock);
-				/* Get another socket */
-				return 1;
-			}
-			set_sctp_conn_flag = 1;
-		}
-	}
-
 	iscsi_start_login_thread_timer(np);
 
 	conn = kzalloc(sizeof(struct iscsi_conn), GFP_KERNEL);
 	if (!conn) {
 		pr_err("Could not allocate memory for"
 			" new connection\n");
-		if (set_sctp_conn_flag) {
-			kfree(new_sock->file);
-			new_sock->file = NULL;
-		}
 		sock_release(new_sock);
 		/* Get another socket */
 		return 1;
@@ -955,9 +909,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
 	conn->conn_state = TARG_CONN_STATE_FREE;
 	conn->sock = new_sock;
 
-	if (set_sctp_conn_flag)
-		conn->conn_flags |= CONNFLAG_SCTP_STRUCT_FILE;
-
 	pr_debug("Moving to TARG_CONN_STATE_XPT_UP.\n");
 	conn->conn_state = TARG_CONN_STATE_XPT_UP;
 
@@ -1205,13 +1156,8 @@ old_sess_out:
 		iscsi_release_param_list(conn->param_list);
 		conn->param_list = NULL;
 	}
-	if (conn->sock) {
-		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
-			kfree(conn->sock->file);
-			conn->sock->file = NULL;
-		}
+	if (conn->sock)
 		sock_release(conn->sock);
-	}
 	kfree(conn);
 
 	if (tpg) {

^ permalink raw reply related

* Re: [PATCH v5] sctp: Implement quick failover draft from tsvwg
From: Vlad Yasevich @ 2012-07-21  6:45 UTC (permalink / raw)
  To: Neil Horman, netdev; +Cc: Sridhar Samudrala, David S. Miller, linux-sctp, joe
In-Reply-To: <1342810319-27457-1-git-send-email-nhorman@tuxdriver.com>

Neil Horman <nhorman@tuxdriver.com> wrote:

>I've seen several attempts recently made to do quick failover of sctp
>transports
>by reducing various retransmit timers and counters.  While its possible
>to
>implement a faster failover on multihomed sctp associations, its not
>particularly robust, in that it can lead to unneeded retransmits, as
>well as
>false connection failures due to intermittent latency on a network.
>
>Instead, lets implement the new ietf quick failover draft found here:
>http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
>
>This will let the sctp stack identify transports that have had a small
>number of
>errors, and avoid using them quickly until their reliability can be
>re-established.  I've tested this out on two virt guests connected via
>multiple
>isolated virt networks and believe its in compliance with the above
>draft and
>works well.
>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>CC: Vlad Yasevich <vyasevich@gmail.com>
>CC: Sridhar Samudrala <sri@us.ibm.com>
>CC: "David S. Miller" <davem@davemloft.net>
>CC: linux-sctp@vger.kernel.org
>CC: joe@perches.com
>
>---
>Change notes:
>
>V2)
>- Added socket option API from section 6.1 of the specification, as per
>request from Vlad. Adding this socket option allows us to alter both
>the path
>maximum retransmit value and the path partial failure threshold for
>each
>transport and the association as a whole.
>
>- Added a per transport pf_retrans value, and initialized it from the
>association value.  This makes each transport independently
>configurable as per
>the socket option above, and prevents changes in the sysctl from
>bleeding into
>an already created association.
>
>V3)
>- Cleaned up some line spacing (Joe Perches)
>- Fixed some socket option user data sanitization (Vlad Yasevich)
>
>V4)
>- Added additional documentation (Flavio Leitner)
>
>V5)
>- Modified setsockopt option to ignore 0 pathmaxrxt rather than return
>  error (Vlad Yasevich)
>- Modified getsocopt to return option length written (Vlad Y.)
>---
> Documentation/networking/ip-sysctl.txt |   14 +++++
> include/net/sctp/constants.h           |    1 +
> include/net/sctp/structs.h             |   20 ++++++-
> include/net/sctp/user.h                |   11 ++++
> net/sctp/associola.c                   |   37 ++++++++++--
> net/sctp/outqueue.c                    |    6 +-
> net/sctp/sm_sideeffect.c               |   33 +++++++++-
>net/sctp/socket.c                      |  100
>++++++++++++++++++++++++++++++++
> net/sctp/sysctl.c                      |    9 +++
> net/sctp/transport.c                   |    4 +-
> 10 files changed, 220 insertions(+), 15 deletions(-)
>
>> /* API 6.2 setsockopt(), getsockopt()
>  *
>  * Applications use setsockopt() and getsockopt() to set or retrieve
>@@ -3619,6 +3669,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk,
>int level, int optname,
> 	case SCTP_AUTO_ASCONF:
> 		retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
> 		break;
>+	case SCTP_PEER_ADDR_THLDS:
>+		retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen);
>+		break;
> 	default:
> 		retval = -ENOPROTOOPT;
> 		break;
>@@ -5490,6 +5543,50 @@ static int sctp_getsockopt_assoc_ids(struct sock
>*sk, int len,
> 	return 0;
> }
> 
>+/*
>+ * SCTP_PEER_ADDR_THLDS
>+ *
>+ * This option allows us to fetch the partially failed threshold for
>one or all
>+ * transports in an association.  See Section 6.1 of:
>+ * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
>+ */
>+static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
>+					    char __user *optval,
>+					    int optlen)
>+{
>+	struct sctp_paddrthlds val;
>+	struct sctp_transport *trans;
>+	struct sctp_association *asoc;
>+
>+	if (optlen < sizeof(struct sctp_paddrthlds))
>+		return -EINVAL;
>+	optlen = sizeof(struct sctp_paddrthlds);
>+	if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
>optlen))
>+		return -EFAULT;
>+
>+	if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
>+		asoc = sctp_id2assoc(sk, val.spt_assoc_id);
>+		if (!asoc)
>+			return -ENOENT;
>+
>+		val.spt_pathpfthld = asoc->pf_retrans;
>+		val.spt_pathmaxrxt = asoc->pathmaxrxt;
>+	} else {
>+		trans = sctp_addr_id2transport(sk, &val.spt_address,
>+					       val.spt_assoc_id);
>+		if (!trans)
>+			return -ENOENT;
>+
>+		val.spt_pathmaxrxt = trans->pathmaxrxt;
>+		val.spt_pathpfthld = trans->pf_retrans;
>+	}
>+
>+	if (copy_to_user(optval, &val, optlen))
>+		return -EFAULT;
>+
>+	return optlen;

I don't think you can simply return this.  You have to call put_user() with the value to write it back to the User.  See how other get calls are done.

-Vlad
>+}
>+
>SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int
>optname,
> 				char __user *optval, int __user *optlen)
> {
>@@ -5628,6 +5725,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk,
>int level, int optname,
> 	case SCTP_AUTO_ASCONF:
> 		retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
> 		break;
>+	case SCTP_PEER_ADDR_THLDS:
>+		retval = sctp_getsockopt_paddr_thresholds(sk, optval, len);
>+		break;
> 	default:
> 		retval = -ENOPROTOOPT;
> 		break;
>diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>index e5fe639..2b2bfe9 100644
>--- a/net/sctp/sysctl.c
>+++ b/net/sctp/sysctl.c
>@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
> 		.extra2		= &int_max
> 	},
> 	{
>+		.procname	= "pf_retrans",
>+		.data		= &sctp_pf_retrans,
>+		.maxlen		= sizeof(int),
>+		.mode		= 0644,
>+		.proc_handler	= proc_dointvec_minmax,
>+		.extra1		= &zero,
>+		.extra2		= &int_max
>+	},
>+	{
> 		.procname	= "max_init_retransmits",
> 		.data		= &sctp_max_retrans_init,
> 		.maxlen		= sizeof(int),
>diff --git a/net/sctp/transport.c b/net/sctp/transport.c
>index b026ba0..194d0f3 100644
>--- a/net/sctp/transport.c
>+++ b/net/sctp/transport.c
>@@ -85,6 +85,7 @@ static struct sctp_transport
>*sctp_transport_init(struct sctp_transport *peer,
> 
> 	/* Initialize the default path max_retrans.  */
> 	peer->pathmaxrxt  = sctp_max_retrans_path;
>+	peer->pf_retrans  = sctp_pf_retrans;
> 
> 	INIT_LIST_HEAD(&peer->transmitted);
> 	INIT_LIST_HEAD(&peer->send_ready);
>@@ -585,7 +586,8 @@ unsigned long sctp_transport_timeout(struct
>sctp_transport *t)
> {
> 	unsigned long timeout;
> 	timeout = t->rto + sctp_jitter(t->rto);
>-	if (t->state != SCTP_UNCONFIRMED)
>+	if ((t->state != SCTP_UNCONFIRMED) &&
>+	    (t->state != SCTP_PF))
> 		timeout += t->hbinterval;
> 	timeout += jiffies;
> 	return timeout;
>-- 
>1.7.7.6


-- 
Sent from my Android phone with SkitMail. Please excuse my brevity.

^ permalink raw reply

* Re: [PATCH 00/16] Remove the ipv4 routing cache
From: Eric Dumazet @ 2012-07-21  5:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120720.161331.2251326254024647237.davem@davemloft.net>

On Fri, 2012-07-20 at 16:13 -0700, David Miller wrote:

> 
> ====================
> [PATCH] ipv4: Fix neigh lookup keying over loopback/point-to-point devices.
> 
> We were using a special key "0" for all loopback and point-to-point
> device neigh lookups under ipv4, but we wouldn't use that special
> key for the neigh creation.
> 
> So basically we'd make a new neigh at each and every lookup :-)
> 
> This special case to use only one neigh for these device types
> is of dubious value, so just remove it entirely.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  include/net/arp.h |    3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/include/net/arp.h b/include/net/arp.h
> index 4617d98..7f7df93 100644
> --- a/include/net/arp.h
> +++ b/include/net/arp.h
> @@ -21,9 +21,6 @@ static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev
>  	struct neighbour *n;
>  	u32 hash_val;
>  
> -	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
> -		key = 0;
> -
>  	hash_val = arp_hashfn(key, dev, nht->hash_rnd[0]) >> (32 - nht->hash_shift);
>  	for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]);
>  	     n != NULL;

Excellent

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    8024082     0.024     2.443
 Close        5894419     0.025     1.426
 Rename        339732     0.024     0.097
 Unlink       1620260     0.024     1.438
 Deltree          228     0.000     0.001
 Mkdir            114     0.025     0.032
 Qpathinfo    7272340     0.024     1.436
 Qfileinfo    1275230     0.024     1.412
 Qfsinfo      1333479     0.025     0.887
 Sfileinfo     653595     0.025     0.160
 Find         2811797     0.024     1.380
 WriteX       4005005     0.045     2.272
 ReadX        12576354     0.031     6.542
 LockX          26120     0.026     0.059
 UnlockX        26120     0.025     0.052
 Flush         562378     0.025     1.372

Throughput 4202.27 MB/sec  24 clients  24 procs  max_latency=2.343 ms

^ permalink raw reply

* Re: [PATCH] b44: add 64 bit stats
From: Eric Dumazet @ 2012-07-21  5:09 UTC (permalink / raw)
  To: Kevin Groeneveld
  Cc: netdev, Simon Horman, Julian Anastasov, Wensong Zhang, lvs-devel
In-Reply-To: <CABF+-6XGO0Qvqd1O-aY28kG1jV=L6eRcupnZpaimTunddPwL2A@mail.gmail.com>

On Fri, 2012-07-20 at 22:22 -0400, Kevin Groeneveld wrote:
> On Fri, Jul 20, 2012 at 2:56 PM, Kevin Groeneveld <kgroeneveld@gmail.com> wrote:
> >> In fact all network drivers should use the _bh version.
> >> Could you send a patch for all of them, based on net-next tree ?
> >
> > Sure, I can work on that.  It should be a relatively easy thing to
> > update.  I can probably send a patch within the next couple days.
> 
> As I have been working on the patch I have been trying convince myself
> that each case I change actually needs the _bh version of the
> functions instead of blindly changing them.  So far I have found the
> following where the change seems to make sense:
> 
> drivers/net/dummy.c
> drivers/net/ethernet/neterion/vxge/vxge-main.c
> drivers/net/loopback.c
> drivers/net/virtio_net.c
> net/bridge/br_device.c
> 

Thats right.

> The only two other places in the networking code that use
> u64_stats_fetch_begin/u64_stats_fetch_retry are:
> 
> net/l2tp/l2tp_netlink.c

This one is completely buggy, dont waste your time on it.

My plan for this one : dont try to have 64bit stats on 32bit arches, and
use plain "unsigned long" counters (if they are percpu), or
atomic_long_t (if they are shared by all cpus)

The writer sides might be run concurrently by several cpus, so
u64_stats_update_begin(&sstats->syncp); are racy : a reader can
be trapped forever.

> net/netfilter/ipvs/ip_vs_est.c
> 

Same problem for this one, I think.

I CCed ipvs maintainers so that they can take a look.

> Do these need to be updated as well?  Looking at these files quickly
> and with my limited knowledge of the kernel I am not sure if they
> update the stats in a BH context or not.
> 
> 
> Kevin

Thanks !

^ permalink raw reply

* Re: [PATCH] b44: add 64 bit stats
From: Kevin Groeneveld @ 2012-07-21  2:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <CABF+-6Ubh6rSYV=L8-QV466XNo5OqCxA+4DRqXGdELhnSPkPZQ@mail.gmail.com>

On Fri, Jul 20, 2012 at 2:56 PM, Kevin Groeneveld <kgroeneveld@gmail.com> wrote:
>> In fact all network drivers should use the _bh version.
>> Could you send a patch for all of them, based on net-next tree ?
>
> Sure, I can work on that.  It should be a relatively easy thing to
> update.  I can probably send a patch within the next couple days.

As I have been working on the patch I have been trying convince myself
that each case I change actually needs the _bh version of the
functions instead of blindly changing them.  So far I have found the
following where the change seems to make sense:

drivers/net/dummy.c
drivers/net/ethernet/neterion/vxge/vxge-main.c
drivers/net/loopback.c
drivers/net/virtio_net.c
net/bridge/br_device.c

The only two other places in the networking code that use
u64_stats_fetch_begin/u64_stats_fetch_retry are:

net/l2tp/l2tp_netlink.c
net/netfilter/ipvs/ip_vs_est.c

Do these need to be updated as well?  Looking at these files quickly
and with my limited knowledge of the kernel I am not sure if they
update the stats in a BH context or not.


Kevin

^ permalink raw reply

* Re: [net-next PATCH v1] net: netprio_cgroup: rework update socket logic
From: Neil Horman @ 2012-07-21  2:00 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, netdev, eric.dumazet, gaofeng, lizefan
In-Reply-To: <20120720203925.30782.36819.stgit@jf-dev1-dcblab>

On Fri, Jul 20, 2012 at 01:39:25PM -0700, John Fastabend wrote:
> Instead of updating the sk_cgrp_prioidx struct field on every send
> this only updates the field when a task is moved via cgroup
> infrastructure.
> 
> This allows sockets that may be used by a kernel worker thread
> to be managed. For example in the iscsi case today a user can
> put iscsid in a netprio cgroup and control traffic will be sent
> with the correct sk_cgrp_prioidx value set but as soon as data
> is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
> is updated with the kernel worker threads value which is the
> default case.
> 
> It seems more correct to only update the field when the user
> explicitly sets it via control group infrastructure. This allows
> the users to manage sockets that may be used with other threads.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
I like the idea, but IIRC last time we tried this I think it caused problems
with processes that shared sockets.  That is to say, if you have a parent and
child process that dup an socket descriptior, and put them in separate cgroups,
you get unpredictable results, as the socket gets assigned a priority based on
the last processed that moved cgroups.

Neil

> 

^ permalink raw reply

* [PATCH v3 1/2] net: ethernet: davinci_emac: Remove unnecessary #include
From: Mark A. Greer @ 2012-07-21  1:19 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, linux-arm-kernel, Mark A. Greer, Sekhar Nori
In-Reply-To: <1342833562-3435-1-git-send-email-mgreer@animalcreek.com>

From: "Mark A. Greer" <mgreer@animalcreek.com>

The '#include <mach/mux.h>' line in davinci_emac.c
causes a compile error because that header file
isn't found.  It turns out that the #include isn't
needed because the driver isn't (and shoudn't be)
touching the mux anyway, so remove it.

CC: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---
 drivers/net/ethernet/ti/davinci_emac.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index b298ab0..1a9c43f 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -63,8 +63,6 @@
 #include <linux/of_irq.h>
 #include <linux/of_net.h>
 
-#include <mach/mux.h>
-
 #include <asm/irq.h>
 #include <asm/page.h>
 
-- 
1.7.11.2

^ permalink raw reply related

* [PATCH v3 2/2] net: ethernet: davinci_emac: add pm_runtime support
From: Mark A. Greer @ 2012-07-21  1:19 UTC (permalink / raw)
  To: netdev
  Cc: linux-omap, linux-arm-kernel, Mark A. Greer, Sekhar Nori,
	Kevin Hilman
In-Reply-To: <1342833562-3435-1-git-send-email-mgreer@animalcreek.com>

From: "Mark A. Greer" <mgreer@animalcreek.com>

Add pm_runtime support to the TI Davinci EMAC driver.

CC: Sekhar Nori <nsekhar@ti.com>
CC: Kevin Hilman <khilman@ti.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---
a) Sorry for the bad patch earlier.

b) Now applies on top of net-next.

c) This patch depends on a patch by Kevin Hilman that has been
   accepted for 3.6 and is waiting in arm-soc/for-next:
   ce9dcb8784611c50974d1c6b600c71f5c0a29308
   (ARM: davinci: add runtime PM support for clock management)

 drivers/net/ethernet/ti/davinci_emac.c | 43 +++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 1a9c43f..fce89a0 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -57,6 +57,7 @@
 #include <linux/bitops.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/pm_runtime.h>
 #include <linux/davinci_emac.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
@@ -353,10 +354,6 @@ struct emac_priv {
 	void (*int_disable) (void);
 };
 
-/* clock frequency for EMAC */
-static struct clk *emac_clk;
-static unsigned long emac_bus_frequency;
-
 /* EMAC TX Host Error description strings */
 static char *emac_txhost_errcodes[16] = {
 	"No error", "SOP error", "Ownership bit not set in SOP buffer",
@@ -1540,6 +1537,8 @@ static int emac_dev_open(struct net_device *ndev)
 	int k = 0;
 	struct emac_priv *priv = netdev_priv(ndev);
 
+	pm_runtime_get(&priv->pdev->dev);
+
 	netif_carrier_off(ndev);
 	for (cnt = 0; cnt < ETH_ALEN; cnt++)
 		ndev->dev_addr[cnt] = priv->mac_addr[cnt];
@@ -1609,7 +1608,7 @@ static int emac_dev_open(struct net_device *ndev)
 				priv->phy_id);
 			ret = PTR_ERR(priv->phydev);
 			priv->phydev = NULL;
-			return ret;
+			goto err;
 		}
 
 		priv->link = 0;
@@ -1650,7 +1649,11 @@ rollback:
 		res = platform_get_resource(priv->pdev, IORESOURCE_IRQ, k-1);
 		m = res->end;
 	}
-	return -EBUSY;
+
+	ret = -EBUSY;
+err:
+	pm_runtime_put(&priv->pdev->dev);
+	return ret;
 }
 
 /**
@@ -1692,6 +1695,7 @@ static int emac_dev_stop(struct net_device *ndev)
 	if (netif_msg_drv(priv))
 		dev_notice(emac_dev, "DaVinci EMAC: %s stopped\n", ndev->name);
 
+	pm_runtime_put(&priv->pdev->dev);
 	return 0;
 }
 
@@ -1856,6 +1860,9 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	struct emac_platform_data *pdata;
 	struct device *emac_dev;
 	struct cpdma_params dma_params;
+	struct clk *emac_clk;
+	unsigned long emac_bus_frequency;
+
 
 	/* obtain emac clock from kernel */
 	emac_clk = clk_get(&pdev->dev, NULL);
@@ -1864,12 +1871,14 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 		return -EBUSY;
 	}
 	emac_bus_frequency = clk_get_rate(emac_clk);
+	clk_put(emac_clk);
+
 	/* TODO: Probe PHY here if possible */
 
 	ndev = alloc_etherdev(sizeof(struct emac_priv));
 	if (!ndev) {
 		rc = -ENOMEM;
-		goto free_clk;
+		goto no_ndev;
 	}
 
 	platform_set_drvdata(pdev, ndev);
@@ -1985,15 +1994,13 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	SET_ETHTOOL_OPS(ndev, &ethtool_ops);
 	netif_napi_add(ndev, &priv->napi, emac_poll, EMAC_POLL_WEIGHT);
 
-	clk_enable(emac_clk);
-
 	/* register the network device */
 	SET_NETDEV_DEV(ndev, &pdev->dev);
 	rc = register_netdev(ndev);
 	if (rc) {
 		dev_err(&pdev->dev, "error in register_netdev\n");
 		rc = -ENODEV;
-		goto netdev_reg_err;
+		goto no_irq_res;
 	}
 
 
@@ -2002,10 +2009,12 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 			   "(regs: %p, irq: %d)\n",
 			   (void *)priv->emac_base_phys, ndev->irq);
 	}
+
+	pm_runtime_enable(&pdev->dev);
+	pm_runtime_resume(&pdev->dev);
+
 	return 0;
 
-netdev_reg_err:
-	clk_disable(emac_clk);
 no_irq_res:
 	if (priv->txchan)
 		cpdma_chan_destroy(priv->txchan);
@@ -2019,8 +2028,7 @@ no_dma:
 
 probe_quit:
 	free_netdev(ndev);
-free_clk:
-	clk_put(emac_clk);
+no_ndev:
 	return rc;
 }
 
@@ -2054,9 +2062,6 @@ static int __devexit davinci_emac_remove(struct platform_device *pdev)
 	iounmap(priv->remap_addr);
 	free_netdev(ndev);
 
-	clk_disable(emac_clk);
-	clk_put(emac_clk);
-
 	return 0;
 }
 
@@ -2068,8 +2073,6 @@ static int davinci_emac_suspend(struct device *dev)
 	if (netif_running(ndev))
 		emac_dev_stop(ndev);
 
-	clk_disable(emac_clk);
-
 	return 0;
 }
 
@@ -2078,8 +2081,6 @@ static int davinci_emac_resume(struct device *dev)
 	struct platform_device *pdev = to_platform_device(dev);
 	struct net_device *ndev = platform_get_drvdata(pdev);
 
-	clk_enable(emac_clk);
-
 	if (netif_running(ndev))
 		emac_dev_open(ndev);
 
-- 
1.7.11.2


^ permalink raw reply related

* [PATCH v3 0/2] net: ethernet: davinci_emac: fixups + pm_runtime
From: Mark A. Greer @ 2012-07-21  1:19 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, linux-arm-kernel, Mark A. Greer

From: "Mark A. Greer" <mgreer@animalcreek.com>

This series fixes a compile error in, and adds pm_runtime support
to, the davinci_emac driver.

To test on a davinci platform, you will need another patch just
submitted to netdev:

	http://marc.info/?l=linux-netdev&m=134282758408187&w=2

Mark A. Greer (2):
  net: ethernet: davinci_emac: Remove unnecessary #include
  net: ethernet: davinci_emac: add pm_runtime support

 drivers/net/ethernet/ti/davinci_emac.c | 45 +++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 23 deletions(-)

-- 
1.7.11.2


^ permalink raw reply

* Re: [net-next 4/6] e1000: configure and read MDI settings
From: Brandeburg, Jesse @ 2012-07-21  1:17 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jeff Kirsher, davem, netdev, gospo, sassmann, Tushar Dave
In-Reply-To: <1342826859.11373.123.camel@deadeye.wl.decadent.org.uk>



On Fri, 20 Jul 2012, Ben Hutchings wrote:
> Why don't you set ecmd->eth_tp_mdix_ctrl here?
> 
> If you also leave it as 0, it's impossible for userland to tell whether
> the current mode was forced or automatically selected.

Thanks for the review, right now the get interface (and ethtool display) 
doesn't support any way to report if the setting was forced or not.  I 
didn't think about changing the get because I didn't want to modify the 
userland reporting (I also figured it was a simple interface right now, 
and didn't need changing, and was focused on the _set_ which is the part 
fixing the users' reported bugs.)

I think the patches as they currently stand are okay, do you agree? I 
would be glad to submit a followon to implement the new "get" interface if 
we can hash out the interface changes, but I see no reason to gate these 
patches.

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: davinci_emac: Remove unnecessary #include
From: Mark A. Greer @ 2012-07-21  1:16 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, linux-arm-kernel, Sekhar Nori
In-Reply-To: <1342833161-3314-1-git-send-email-mgreer@animalcreek.com>

Sigh, please ignore this series.  I'll resend with [0/2].

^ permalink raw reply

* [PATCH 2/2] net: ethernet: davinci_emac: add pm_runtime support
From: Mark A. Greer @ 2012-07-21  1:12 UTC (permalink / raw)
  To: netdev
  Cc: linux-omap, linux-arm-kernel, Mark A. Greer, Sekhar Nori,
	Kevin Hilman
In-Reply-To: <1342833161-3314-1-git-send-email-mgreer@animalcreek.com>

From: "Mark A. Greer" <mgreer@animalcreek.com>

Add pm_runtime support to the TI Davinci EMAC driver.

CC: Sekhar Nori <nsekhar@ti.com>
CC: Kevin Hilman <khilman@ti.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---
a) Sorry for the bad patch earlier.

b) Now applies on top of net-next.

c) This patch depends on a patch by Kevin Hilman that has been
   accepted for 3.6 and is waiting in arm-soc/for-next:
   ce9dcb8784611c50974d1c6b600c71f5c0a29308
   (ARM: davinci: add runtime PM support for clock management)

 drivers/net/ethernet/ti/davinci_emac.c | 43 +++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 1a9c43f..fce89a0 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -57,6 +57,7 @@
 #include <linux/bitops.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/pm_runtime.h>
 #include <linux/davinci_emac.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
@@ -353,10 +354,6 @@ struct emac_priv {
 	void (*int_disable) (void);
 };
 
-/* clock frequency for EMAC */
-static struct clk *emac_clk;
-static unsigned long emac_bus_frequency;
-
 /* EMAC TX Host Error description strings */
 static char *emac_txhost_errcodes[16] = {
 	"No error", "SOP error", "Ownership bit not set in SOP buffer",
@@ -1540,6 +1537,8 @@ static int emac_dev_open(struct net_device *ndev)
 	int k = 0;
 	struct emac_priv *priv = netdev_priv(ndev);
 
+	pm_runtime_get(&priv->pdev->dev);
+
 	netif_carrier_off(ndev);
 	for (cnt = 0; cnt < ETH_ALEN; cnt++)
 		ndev->dev_addr[cnt] = priv->mac_addr[cnt];
@@ -1609,7 +1608,7 @@ static int emac_dev_open(struct net_device *ndev)
 				priv->phy_id);
 			ret = PTR_ERR(priv->phydev);
 			priv->phydev = NULL;
-			return ret;
+			goto err;
 		}
 
 		priv->link = 0;
@@ -1650,7 +1649,11 @@ rollback:
 		res = platform_get_resource(priv->pdev, IORESOURCE_IRQ, k-1);
 		m = res->end;
 	}
-	return -EBUSY;
+
+	ret = -EBUSY;
+err:
+	pm_runtime_put(&priv->pdev->dev);
+	return ret;
 }
 
 /**
@@ -1692,6 +1695,7 @@ static int emac_dev_stop(struct net_device *ndev)
 	if (netif_msg_drv(priv))
 		dev_notice(emac_dev, "DaVinci EMAC: %s stopped\n", ndev->name);
 
+	pm_runtime_put(&priv->pdev->dev);
 	return 0;
 }
 
@@ -1856,6 +1860,9 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	struct emac_platform_data *pdata;
 	struct device *emac_dev;
 	struct cpdma_params dma_params;
+	struct clk *emac_clk;
+	unsigned long emac_bus_frequency;
+
 
 	/* obtain emac clock from kernel */
 	emac_clk = clk_get(&pdev->dev, NULL);
@@ -1864,12 +1871,14 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 		return -EBUSY;
 	}
 	emac_bus_frequency = clk_get_rate(emac_clk);
+	clk_put(emac_clk);
+
 	/* TODO: Probe PHY here if possible */
 
 	ndev = alloc_etherdev(sizeof(struct emac_priv));
 	if (!ndev) {
 		rc = -ENOMEM;
-		goto free_clk;
+		goto no_ndev;
 	}
 
 	platform_set_drvdata(pdev, ndev);
@@ -1985,15 +1994,13 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 	SET_ETHTOOL_OPS(ndev, &ethtool_ops);
 	netif_napi_add(ndev, &priv->napi, emac_poll, EMAC_POLL_WEIGHT);
 
-	clk_enable(emac_clk);
-
 	/* register the network device */
 	SET_NETDEV_DEV(ndev, &pdev->dev);
 	rc = register_netdev(ndev);
 	if (rc) {
 		dev_err(&pdev->dev, "error in register_netdev\n");
 		rc = -ENODEV;
-		goto netdev_reg_err;
+		goto no_irq_res;
 	}
 
 
@@ -2002,10 +2009,12 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
 			   "(regs: %p, irq: %d)\n",
 			   (void *)priv->emac_base_phys, ndev->irq);
 	}
+
+	pm_runtime_enable(&pdev->dev);
+	pm_runtime_resume(&pdev->dev);
+
 	return 0;
 
-netdev_reg_err:
-	clk_disable(emac_clk);
 no_irq_res:
 	if (priv->txchan)
 		cpdma_chan_destroy(priv->txchan);
@@ -2019,8 +2028,7 @@ no_dma:
 
 probe_quit:
 	free_netdev(ndev);
-free_clk:
-	clk_put(emac_clk);
+no_ndev:
 	return rc;
 }
 
@@ -2054,9 +2062,6 @@ static int __devexit davinci_emac_remove(struct platform_device *pdev)
 	iounmap(priv->remap_addr);
 	free_netdev(ndev);
 
-	clk_disable(emac_clk);
-	clk_put(emac_clk);
-
 	return 0;
 }
 
@@ -2068,8 +2073,6 @@ static int davinci_emac_suspend(struct device *dev)
 	if (netif_running(ndev))
 		emac_dev_stop(ndev);
 
-	clk_disable(emac_clk);
-
 	return 0;
 }
 
@@ -2078,8 +2081,6 @@ static int davinci_emac_resume(struct device *dev)
 	struct platform_device *pdev = to_platform_device(dev);
 	struct net_device *ndev = platform_get_drvdata(pdev);
 
-	clk_enable(emac_clk);
-
 	if (netif_running(ndev))
 		emac_dev_open(ndev);
 
-- 
1.7.11.2

^ permalink raw reply related

* [PATCH 1/2] net: ethernet: davinci_emac: Remove unnecessary #include
From: Mark A. Greer @ 2012-07-21  1:12 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, linux-arm-kernel, Mark A. Greer, Sekhar Nori

From: "Mark A. Greer" <mgreer@animalcreek.com>

The '#include <mach/mux.h>' line in davinci_emac.c
causes a compile error because that header file
isn't found.  It turns out that the #include isn't
needed because the driver isn't (and shoudn't be)
touching the mux anyway, so remove it.

CC: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---
 drivers/net/ethernet/ti/davinci_emac.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index b298ab0..1a9c43f 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -63,8 +63,6 @@
 #include <linux/of_irq.h>
 #include <linux/of_net.h>
 
-#include <mach/mux.h>
-
 #include <asm/irq.h>
 #include <asm/page.h>
 
-- 
1.7.11.2


^ permalink raw reply related

* Re: ibmveth bug?
From: Benjamin Herrenschmidt @ 2012-07-21  0:52 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: santil, anton, paulus, netdev, linux-kernel, Anton Blanchard
In-Reply-To: <20120720224112.GD19288@linux.vnet.ibm.com>

On Fri, 2012-07-20 at 15:41 -0700, Nishanth Aravamudan wrote:
> Ping on this ... we've tripped the same issue on a different system, it
> would appear. Would appreciate if anyone can provide answers to the
> questions below.

Well, I haven't hit it but your description makes sense. Why not use
dma_alloc_coherent though ? Rather than kmalloc followed by map ?

At least on power we give you page alignment for these, so the problem
is solved magically :-) Another option is GFP + dma_map_page which is
roughly equivalent.

If you think it's a waste of space based on the queue size, then just
add an extra pointer, I wouldn't bother with a cache for something only
allocated when the driver initializes.

Cheers,
Ben.

> Thanks,
> Nish
> 
> On 15.05.2012 [10:01:41 -0700], Nishanth Aravamudan wrote:
> > Hi Santiago,
> > 
> > Are you still working on ibmveth?
> > 
> > I've found a very sporadic bug with ibmveth in some testing. PAPR
> > requires that:
> > 
> > "Validate the Buffer Descriptor of the receive queue buffer (I/O
> > addresses for entire buffer length starting at the spec- ified I/O
> > address are translated by the RTCE table, length is a multiple of 16
> > bytes, and alignment is on a 16 byte boundary) else H_Parameter."
> > 
> > but from what I can tell ibmveth.c is not enforcing this last condition:
> > 
> > 	adapter->rx_queue.queue_addr =
> > 		kmalloc(adapter->rx_queue.queue_len, GFP_KERNEL);
> > 
> > 	...
> > 
> > 	adapter->rx_queue.queue_dma = dma_map_single(dev,
> > 		adapter->rx_queue.queue_addr, adapter->rx_queue.queue_len,
> > 		DMA_BIDIRECTIONAL);
> > 
> > 	...
> > 
> > 	rxq_desc.fields.address = adapter->rx_queue.queue_dma;
> > 
> > 	...
> > 	
> > 
> > 	lpar_rc = ibmveth_register_logical_lan(adapter, rxq_desc,
> > 		mac_address);
> > 	netdev_err(netdev, "buffer TCE:0x%llx filter TCE:0x%llx rxq "
> > 	 	"desc:0x%llx MAC:0x%llx\n", adapter->buffer_list_dma,
> > 	 	adapter->filter_list_dma, rxq_desc.desc, mac_address);
> > 
> > And I got on one install attempt:
> > 
> > [ 39.978430] ibmveth 30000004: eth0: h_register_logical_lan failed with -4
> > [ 39.978449] ibmveth 30000004: eth0: buffer TCE:0x1000 filter TCE:0x10000 rxq desc:0x80006010000200a8 MAC:0x56754de8e904
> > 
> > rxq desc, as you can see is not 16byte aligned. kmalloc() only
> > guarantees 8-byte alignment (as does gcc, I think). Initially, I thought
> > we could just overallocate the queue_addr and ALIGN() down, but then we
> > would need to save the original kmalloc pointer in a new struct member
> > per rx_queue.
> > 
> > So a couple of questions:
> > 
> > 1) Is my analysis accurate? :)
> > 
> > 2) How gross would it be to save an extra pointer for every rx_queue?
> > 
> > 3) Based upon 2), is it better to just go ahead and create our own
> > kmem_cache (which gets an alignment specified)?
> > 
> > For 3), I started coding this, but couldn't find a clean place to
> > allocate the kmem_cache itself, as the size of each object depends on
> > the run-time characteristics (afaict), but needs to be specified at
> > cache creation time. Any insight you could provide would be great!
> > 
> > Thanks,
> > Nish
> >  
> > -- 
> > Nishanth Aravamudan <nacc@us.ibm.com>
> > IBM Linux Technology Center
> 

^ permalink raw reply

* RE: New commands to configure IOV features
From: Rose, Gregory V @ 2012-07-21  0:52 UTC (permalink / raw)
  To: Chris Friesen, Ben Hutchings
  Cc: Don Dutile, David Miller, yuvalmin@broadcom.com,
	netdev@vger.kernel.org, linux-pci@vger.kernel.org
In-Reply-To: <5009ECDF.4090305@genband.com>

> -----Original Message-----
> From: Chris Friesen [mailto:chris.friesen@genband.com]
> Sent: Friday, July 20, 2012 4:42 PM
> To: Ben Hutchings
> Cc: Don Dutile; David Miller; yuvalmin@broadcom.com; Rose, Gregory V;
> netdev@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: New commands to configure IOV features
> 
> On 07/20/2012 02:01 PM, Ben Hutchings wrote:
> > On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:
> 
> >> Once the device exists, then domain-specific APIs would be used to
> >> configure it the same way that they would configure a physical device.
> >
> > To an extent, but not entirely.
> >
> > Currently, the assigned MAC address and (optional) VLAN tag for each
> > networking VF are configured via the PF net device (though this is
> > done though the rtnetlink API rather than ethtool).
> 
> I actually have a use-case where the guest needs to be able to modify the
> MAC addresses of network devices that are actually VFs.
> 
> The guest is bonding the network devices together, so the bonding driver
> in the guest expects to be able to set all the slaves to the same MAC
> address.
> 
> As I read the ixgbe driver, this should be possible as long as the host
> hasn't explicitly set the MAC address of the VF.  Is that correct?

That is correct.

- Greg
Intel Corp.
LAN Access Division

^ permalink raw reply

* Re: [PATCHv3 0/6] tun zerocopy support
From: David Miller @ 2012-07-21  0:49 UTC (permalink / raw)
  To: mst; +Cc: jasowang, eric.dumazet, netdev, linux-kernel, ebiederm
In-Reply-To: <cover.1342812067.git.mst@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 20 Jul 2012 22:23:03 +0300

> Same as with macvtap, I get single-percentage wins in CPU utilization
> on guest to external from this patchset, and a performance regression on
> guest to host, so more work is needed until this feature can move out of
> experimental status, but I think it's useful for some people already.
> 
> Pls review and consider for 3.6.

It doesn't improve performance in one case, and hurts performance in
another.

You'll have to give me a more compelling argument than that.  You've
just given me every reason not to include these patches in 3.6

^ permalink raw reply

* Re: [PATCH] net-next: minor cleanups for bonding documentation
From: Jay Vosburgh @ 2012-07-21  0:08 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, Andy Gospodarek
In-Reply-To: <20120720205137.84E3D290043B@tardy>

Rick Jones <raj@tardy.cup.hp.com> wrote:

>From: Rick Jones <rick.jones2@hp.com>
>
>The section titled "Configuring Bonding for Maximum Throughput" is
>actually section twelve not thirteen, and there are a couple of words
>spelled incorrectly.
>
>Signed-off-by: Rick Jones <rick.jones2@hp.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: New commands to configure IOV features
From: Chris Friesen @ 2012-07-20 23:42 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Don Dutile, David Miller, yuvalmin, gregory.v.rose, netdev,
	linux-pci
In-Reply-To: <1342814473.2678.65.camel@bwh-desktop.uk.solarflarecom.com>

On 07/20/2012 02:01 PM, Ben Hutchings wrote:
> On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:

>> Once the device exists, then domain-specific APIs would be used to
>> configure it the same way that they would configure a physical device.
>
> To an extent, but not entirely.
>
> Currently, the assigned MAC address and (optional) VLAN tag for each
> networking VF are configured via the PF net device (though this is done
> though the rtnetlink API rather than ethtool).

I actually have a use-case where the guest needs to be able to modify 
the MAC addresses of network devices that are actually VFs.

The guest is bonding the network devices together, so the bonding driver 
in the guest expects to be able to set all the slaves to the same MAC 
address.

As I read the ixgbe driver, this should be possible as long as the host 
hasn't explicitly set the MAC address of the VF.  Is that correct?

Chris

^ permalink raw reply

* Re: [PATCH] ipv4: show pmtu in route list
From: Julian Anastasov @ 2012-07-20 23:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120720.162215.2278959554536542562.davem@davemloft.net>


	Hello,

On Fri, 20 Jul 2012, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> Date: Sat, 21 Jul 2012 02:26:28 +0300 (EEST)
> 
> > 	I'll try this weekend to reorganize the seqlock
> > usage in tcp_metrics.c and to provide method to feed
> > rt_fill_info with values from this cache.
> 
> Wouldn't it be better to just export the TCP metrics via it's own
> file or netlink facility?
> 
> They keying of the TCP metrics is completely different from how routes
> are key'd.  So I see little value in creating the illusion that these
> two things live in the same keying domain.
> 
> The routing cache will be completely gone and /proc/net/rt_cache will
> be an empty file.

	OK, then I'll try it in this way. When I have some
plan I'll be back for comments before implementing it.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [PATCH] rtnl: Add #ifdef CONFIG_RPS around num_rx_queues reference
From: Mark A. Greer @ 2012-07-20 23:35 UTC (permalink / raw)
  To: netdev; +Cc: davem, Mark A. Greer, Jiri Pirko

From: "Mark A. Greer" <mgreer@animalcreek.com>

Commit 76ff5cc91935c51fcf1a6a99ffa28b97a6e7a884
(rtnl: allow to specify number of rx and tx queues
on device creation) added a reference to the net_device
structure's 'num_rx_queues' member in

	net/core/rtnetlink.c:rtnl_fill_ifinfo()

However, the definition for 'num_rx_queues' is surrounded
by an '#ifdef CONFIG_RPS' while the new reference to it is
not.  This causes a compile error when CONFIG_RPS is not
defined.

Fix the compile error by surrounding the new reference to
'num_rx_queues' by an '#ifdef CONFIG_RPS'.

CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
---

The problem can be easily reproduced by compiling with
davinci_all_defconfig (ARCH=arm).  I don't know this
area well enough to know whether that (and other)
defconfigs should have CONFIG_RPS enabled or not, or
whether there is some missing Kconfig logic to enable
it.

 net/core/rtnetlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5bb1ebc..334b930 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -892,7 +892,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	    nla_put_u32(skb, IFLA_GROUP, dev->group) ||
 	    nla_put_u32(skb, IFLA_PROMISCUITY, dev->promiscuity) ||
 	    nla_put_u32(skb, IFLA_NUM_TX_QUEUES, dev->num_tx_queues) ||
+#ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
+#endif
 	    (dev->ifindex != dev->iflink &&
 	     nla_put_u32(skb, IFLA_LINK, dev->iflink)) ||
 	    (dev->master &&
-- 
1.7.11.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox