Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH V2 net-next 4/4] ptp: clarify the clock_name sysfs attribute
From: Ben Hutchings @ 2012-09-22 15:44 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, David Miller, Jacob Keller, Jeff Kirsher, John Stultz,
	Matthew Vick
In-Reply-To: <6bc914f0607428ab0d8f5d5fc42a5e87bedc95c5.1348299094.git.richardcochran@gmail.com>

On Sat, 2012-09-22 at 09:42 +0200, Richard Cochran wrote:
> There has been some confusion among PHC driver authors about the
> intended purpose of the clock_name attribute. This patch expands the
> documation in order to clarify how the clock_name field should be
> understood.
> 
> Signed-off-by: Richard Cochran <richardcochran@gmail.com>
> ---
>  Documentation/ABI/testing/sysfs-ptp |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-ptp b/Documentation/ABI/testing/sysfs-ptp
> index d40d2b5..c906488 100644
> --- a/Documentation/ABI/testing/sysfs-ptp
> +++ b/Documentation/ABI/testing/sysfs-ptp
> @@ -19,7 +19,10 @@ Date:		September 2010
>  Contact:	Richard Cochran <richardcochran@gmail.com>
>  Description:
>  		This file contains the name of the PTP hardware clock
> -		as a human readable string.
> +		as a human readable string. The purpose of this
> +		attribute is to provide the user with a "friendly
> +		name" and to help distinguish PHY based devices from
> +		MAC based ones.

Might also be worth saying that it is not required to be unique.  And
this explanation should also go in the kernel-doc in ptp_clock_kernel.h,
which is what driver writers are most likely to read.

Ben.

>  What:		/sys/class/ptp/ptpN/max_adjustment
>  Date:		September 2010

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH V2 net-next 3/4] ptp: link the phc device to its parent device
From: Ben Hutchings @ 2012-09-22 15:40 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, David Miller, Jacob Keller, Jeff Kirsher, John Stultz,
	Matthew Vick
In-Reply-To: <847f89f4f5a2757c2487b23dac218e15613c2bb5.1348299094.git.richardcochran@gmail.com>

On Sat, 2012-09-22 at 09:42 +0200, Richard Cochran wrote:
> PTP Hardware Clock devices appear as class devices in sysfs. This patch
> changes the registration API to use the parent device, clarifying the
> clock's relationship to the underlying device.
[...]
> --- a/drivers/net/ethernet/sfc/ptp.c
> +++ b/drivers/net/ethernet/sfc/ptp.c
> @@ -931,7 +931,8 @@ static int efx_ptp_probe_channel(struct efx_channel *channel)
>  	ptp->phc_clock_info.settime = efx_phc_settime;
>  	ptp->phc_clock_info.enable = efx_phc_enable;
>  
> -	ptp->phc_clock = ptp_clock_register(&ptp->phc_clock_info);
> +	ptp->phc_clock = ptp_clock_register(&ptp->phc_clock_info,
> +					    &channel->napi_dev->dev);
[...]

channel->napi_dev is the net device, not the PCI device.  The parent
device should be &efx->pci_dev->dev.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH net-next 4/4] tcp: TCP Fast Open Server - call tcp_validate_incoming() for all packets
From: Neal Cardwell @ 2012-09-22 14:18 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Eric Dumazet, Yuchung Cheng, Jerry Chu, Neal Cardwell
In-Reply-To: <1348323537-30310-1-git-send-email-ncardwell@google.com>

A TCP Fast Open (TFO) passive connection must call both
tcp_check_req() and tcp_validate_incoming() for all incoming ACKs that
are attempting to complete the 3WHS.

This is needed to parallel all the action that happens for a non-TFO
connection, where for an ACK that is attempting to complete the 3WHS
we call both tcp_check_req() and tcp_validate_incoming().

For example, upon receiving the ACK that completes the 3WHS, we need
to call tcp_fast_parse_options() and update ts_recent based on the
incoming timestamp value in the ACK.

One symptom of the problem with the previous code was that for passive
TFO connections using TCP timestamps, the outgoing TS ecr values
ignored the incoming TS val value on the ACK that completed the 3WHS.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bb32668..36e069a1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5969,7 +5969,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,

 		if (tcp_check_req(sk, skb, req, NULL, true) == NULL)
 			goto discard;
-	} else if (!tcp_validate_incoming(sk, skb, th, 0))
+	}
+	if (!tcp_validate_incoming(sk, skb, th, 0))
 		return 0;

 	/* step 5: check the ACK field */
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH net-next 3/4] tcp: TCP Fast Open Server - note timestamps and retransmits for SYNACK RTT
From: Neal Cardwell @ 2012-09-22 14:18 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Eric Dumazet, Yuchung Cheng, Jerry Chu, Neal Cardwell
In-Reply-To: <1348323537-30310-1-git-send-email-ncardwell@google.com>

Previously, when using TCP Fast Open a server would return from
tcp_check_req() before updating snt_synack based on TCP timestamp echo
replies and whether or not we've retransmitted the SYNACK. The result
was that (a) for TFO connections using timestamps we used an incorrect
baseline SYNACK send time (tcp_time_stamp of SYNACK send instead of
rcv_tsecr), and (b) for TFO connections that do not have TCP
timestamps but retransmit the SYNACK we took a SYNACK RTT sample when
we should not take a sample.

This fix merely moves the snt_synack update logic a bit earlier in the
function, so that connections using TCP Fast Open will properly do
these updates when the ACK for the SYNACK arrives.

Moving this snt_synack update logic means that with TCP_DEFER_ACCEPT
enabled we do a few instructions of wasted work on each bare ACK, but
that seems OK.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_minisocks.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 5792577..27536ba 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -692,6 +692,12 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	if (!(flg & TCP_FLAG_ACK))
 		return NULL;

+	/* Got ACK for our SYNACK, so update baseline for SYNACK RTT sample. */
+	if (tmp_opt.saw_tstamp && tmp_opt.rcv_tsecr)
+		tcp_rsk(req)->snt_synack = tmp_opt.rcv_tsecr;
+	else if (req->retrans) /* don't take RTT sample if retrans && ~TS */
+		tcp_rsk(req)->snt_synack = 0;
+
 	/* For Fast Open no more processing is needed (sk is the
 	 * child socket).
 	 */
@@ -705,10 +711,6 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPDEFERACCEPTDROP);
 		return NULL;
 	}
-	if (tmp_opt.saw_tstamp && tmp_opt.rcv_tsecr)
-		tcp_rsk(req)->snt_synack = tmp_opt.rcv_tsecr;
-	else if (req->retrans) /* don't take RTT sample if retrans && ~TS */
-		tcp_rsk(req)->snt_synack = 0;

 	/* OK, ACK is valid, create big socket and
 	 * feed this segment to it. It will repeat all
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH net-next 2/4] tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS
From: Neal Cardwell @ 2012-09-22 14:18 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Eric Dumazet, Yuchung Cheng, Jerry Chu, Neal Cardwell
In-Reply-To: <1348323537-30310-1-git-send-email-ncardwell@google.com>

When taking SYNACK RTT samples for servers using TCP Fast Open, fix
the code to ensure that we only call tcp_valid_rtt_meas() after we
receive the ACK that completes the 3-way handshake.

Previously we were always taking an RTT sample in
tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections
tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we
receive the SYN. So for TFO we must wait until tcp_rcv_state_process()
to take the RTT sample.

To fix this, we wait until after TFO calls tcp_v4_syn_recv_sock()
before we set the snt_synack timestamp, since tcp_synack_rtt_meas()
already ensures that we only take a SYNACK RTT sample if snt_synack is
non-zero. To be careful, we only take a snt_synack timestamp when
a SYNACK transmit or retransmit succeeds.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 include/net/tcp.h    |    1 +
 net/ipv4/tcp_input.c |    1 +
 net/ipv4/tcp_ipv4.c  |   12 +++++++++---
 net/ipv6/tcp_ipv6.c  |    2 +-
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a718d0e..6feeccd 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1125,6 +1125,7 @@ static inline void tcp_openreq_init(struct request_sock *req,
 	req->cookie_ts = 0;
 	tcp_rsk(req)->rcv_isn = TCP_SKB_CB(skb)->seq;
 	tcp_rsk(req)->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
+	tcp_rsk(req)->snt_synack = 0;
 	req->mss = rx_opt->mss_clamp;
 	req->ts_recent = rx_opt->saw_tstamp ? rx_opt->rcv_tsval : 0;
 	ireq->tstamp_ok = rx_opt->tstamp_ok;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e2bec81..bb32668 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5983,6 +5983,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 				 * need req so release it.
 				 */
 				if (req) {
+					tcp_synack_rtt_meas(sk, req);
 					reqsk_fastopen_remove(sk, req, false);
 				} else {
 					/* Make sure socket is routed, for
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1e66f7f..0a7e020 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -868,6 +868,8 @@ static int tcp_v4_send_synack(struct sock *sk, struct dst_entry *dst,
 					    ireq->rmt_addr,
 					    ireq->opt);
 		err = net_xmit_eval(err);
+		if (!tcp_rsk(req)->snt_synack && !err)
+			tcp_rsk(req)->snt_synack = tcp_time_stamp;
 	}
 
 	return err;
@@ -1382,6 +1384,7 @@ static int tcp_v4_conn_req_fastopen(struct sock *sk,
 	struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;
 	const struct inet_request_sock *ireq = inet_rsk(req);
 	struct sock *child;
+	int err;
 
 	req->retrans = 0;
 	req->sk = NULL;
@@ -1393,8 +1396,11 @@ static int tcp_v4_conn_req_fastopen(struct sock *sk,
 		kfree_skb(skb_synack);
 		return -1;
 	}
-	ip_build_and_send_pkt(skb_synack, sk, ireq->loc_addr,
-			ireq->rmt_addr, ireq->opt);
+	err = ip_build_and_send_pkt(skb_synack, sk, ireq->loc_addr,
+				    ireq->rmt_addr, ireq->opt);
+	err = net_xmit_eval(err);
+	if (!err)
+		tcp_rsk(req)->snt_synack = tcp_time_stamp;
 	/* XXX (TFO) - is it ok to ignore error and continue? */
 
 	spin_lock(&queue->fastopenq->lock);
@@ -1612,7 +1618,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		isn = tcp_v4_init_sequence(skb);
 	}
 	tcp_rsk(req)->snt_isn = isn;
-	tcp_rsk(req)->snt_synack = tcp_time_stamp;
 
 	if (dst == NULL) {
 		dst = inet_csk_route_req(sk, &fl4, req);
@@ -1650,6 +1655,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		if (err || want_cookie)
 			goto drop_and_free;
 
+		tcp_rsk(req)->snt_synack = tcp_time_stamp;
 		tcp_rsk(req)->listener = NULL;
 		/* Add the request_sock to the SYN table */
 		inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index cfeeeb7..d6212d6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1169,7 +1169,6 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	}
 have_isn:
 	tcp_rsk(req)->snt_isn = isn;
-	tcp_rsk(req)->snt_synack = tcp_time_stamp;
 
 	if (security_inet_conn_request(sk, skb, req))
 		goto drop_and_release;
@@ -1180,6 +1179,7 @@ have_isn:
 	    want_cookie)
 		goto drop_and_free;
 
+	tcp_rsk(req)->snt_synack = tcp_time_stamp;
 	tcp_rsk(req)->listener = NULL;
 	inet6_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
 	return 0;
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH net-next 1/4] tcp: extract code to compute SYNACK RTT
From: Neal Cardwell @ 2012-09-22 14:18 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Eric Dumazet, Yuchung Cheng, Jerry Chu, Neal Cardwell

In preparation for adding another spot where we compute the SYNACK
RTT, extract this code so that it can be shared.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 include/net/tcp.h   |    9 +++++++++
 net/ipv4/tcp_ipv4.c |    4 +---
 net/ipv6/tcp_ipv6.c |    4 +---
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a8cb00c..a718d0e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1137,6 +1137,15 @@ static inline void tcp_openreq_init(struct request_sock *req,
 	ireq->loc_port = tcp_hdr(skb)->dest;
 }
 
+/* Compute time elapsed between SYNACK and the ACK completing 3WHS */
+static inline void tcp_synack_rtt_meas(struct sock *sk,
+				       struct request_sock *req)
+{
+	if (tcp_rsk(req)->snt_synack)
+		tcp_valid_rtt_meas(sk,
+		    tcp_time_stamp - tcp_rsk(req)->snt_synack);
+}
+
 extern void tcp_enter_memory_pressure(struct sock *sk);
 
 static inline int keepalive_intvl_when(const struct tcp_sock *tp)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e64abed..1e66f7f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1733,9 +1733,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		newtp->advmss = tcp_sk(sk)->rx_opt.user_mss;
 
 	tcp_initialize_rcv_mss(newsk);
-	if (tcp_rsk(req)->snt_synack)
-		tcp_valid_rtt_meas(newsk,
-		    tcp_time_stamp - tcp_rsk(req)->snt_synack);
+	tcp_synack_rtt_meas(newsk, req);
 	newtp->total_retrans = req->retrans;
 
 #ifdef CONFIG_TCP_MD5SIG
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index f3bfb8b..cfeeeb7 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1348,9 +1348,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		newtp->advmss = tcp_sk(sk)->rx_opt.user_mss;
 
 	tcp_initialize_rcv_mss(newsk);
-	if (tcp_rsk(req)->snt_synack)
-		tcp_valid_rtt_meas(newsk,
-		    tcp_time_stamp - tcp_rsk(req)->snt_synack);
+	tcp_synack_rtt_meas(newsk, req);
 	newtp->total_retrans = req->retrans;
 
 	newinet->inet_daddr = newinet->inet_saddr = LOOPBACK4_IPV6;
-- 
1.7.7.3

^ permalink raw reply related

* Re: [PATCH 2/7][RFC] netfilter: add xt_qtaguid matching module
From: Eric W. Biederman @ 2012-09-22 13:38 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, JP Abgrall, netdev, Ashish Sharma, Peter P Waskiewicz Jr
In-Reply-To: <1348279853-44499-3-git-send-email-john.stultz@linaro.org>


I just took a quick skim through the patch, and spotted a few things.

This patch is going to need to handle current_fsuid returning a kuid_t.

The permission checks are weird, and ancient looking.

There is no network namespace handling, to the point I don't think
the code will function properly if network namespace support is enabled
in the kernel.

The code probably wants to use struct pid *pid, instead of pid_t values.
Both to cope with the pid namespace and to avoid issues with pid wrap
around.

A few more comments below.

Eric



John Stultz <john.stultz@linaro.org> writes:

> From: JP Abgrall <jpa@google.com>
>
> This module allows tracking stats at the socket level for given UIDs.
> It replaces xt_owner.
> If the --uid-owner is not specified, it will just count stats based on
> who the skb belongs to. This will even happen on incoming skbs as it
> looks into the skb via xt_socket magic to see who owns it.
> If an skb is lost, it will be assigned to uid=0.
>
> To control what sockets of what UIDs are tagged by what, one uses:
>   echo t $sock_fd $accounting_tag $the_billed_uid \
>      > /proc/net/xt_qtaguid/ctrl
>  So whenever an skb belongs to a sock_fd, it will be accounted against
>    $the_billed_uid
>   and matching stats will show up under the uid with the given
>    $accounting_tag.
>
> Because the number of allocations for the stats structs is not that big:
>   ~500 apps * 32 per app
> we'll just do it atomic. This avoids walking lists many times, and
> the fancy worker thread handling. Slabs will grow when needed later.
>
> It use netdevice and inetaddr notifications instead of hooks in the core dev
> code to track when a device comes and goes. This removes the need for
> exposed iface_stat.h.
>
> Put procfs dirs in /proc/net/xt_qtaguid/
>   ctrl
>   stats
>   iface_stat/<iface>/...
> The uid stats are obtainable in ./stats.
>
> Cc: netdev@vger.kernel.org
> Cc: JP Abgrall <jpa@google.com>
> Cc: Ashish Sharma <ashishsharma@google.com>
> Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: JP Abgrall <jpa@google.com>
> [Dropped paranoid network ifdefs and minor compile fixups -jstultz]
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---


> +static struct qtaguid_event_counts qtu_events;
> +/*----------------------------------------------*/
> +static bool can_manipulate_uids(void)
> +{

You probably want something like a test for capable(CAP_SETUID).
Certainly raw check for the uid == 0 fell out of common practice
for this sort of thing a decade or so ago.
> +	/* root pwnd */
> +	return unlikely(!current_fsuid()) || unlikely(!proc_ctrl_write_gid)
> +		|| in_egroup_p(proc_ctrl_write_gid);
> +}
> +
> +static bool can_impersonate_uid(uid_t uid)
> +{
> +	return uid == current_fsuid() || can_manipulate_uids();
> +}
> +
> +static bool can_read_other_uid_stats(uid_t uid)
> +{
Perhaps may_ptrace?
> +	/* root pwnd */
> +	return unlikely(!current_fsuid()) || uid == current_fsuid()
> +		|| unlikely(!proc_stats_readall_gid)
> +		|| in_egroup_p(proc_stats_readall_gid);
> +}
> +


> +/*
> + * Track tag that this socket is transferring data for, and not necessarily
> + * the uid that owns the socket.
> + * This is the tag against which tag_stat.counters will be billed.
> + * These structs need to be looked up by sock and pid.
> + */
> +struct sock_tag {
> +	struct rb_node sock_node;
> +	struct sock *sk;  /* Only used as a number, never dereferenced */
> +	/* The socket is needed for sockfd_put() */
> +	struct socket *socket;
> +	/* Used to associate with a given pid */
> +	struct list_head list;   /* in proc_qtu_data.sock_tag_list */
> +	pid_t pid;
You probably want this to be "struct pid *pid;"
> +
> +	tag_t tag;
> +};


> + * The qtu uid data is used to track resources that are created directly or
> + * indirectly by processes (uid tracked).
> + * It is shared by the processes with the same uid.
> + * Some of the resource will be counted to prevent further rogue allocations,
> + * some will need freeing once the owner process (uid) exits.
> + */
> +struct uid_tag_data {
> +	struct rb_node node;
> +	uid_t uid;

You probably want to say kuid_t uid here.

> +	/*
> +	 * For the uid, how many accounting tags have been set.
> +	 */
> +	int num_active_tags;
> +	/* Track the number of proc_qtu_data that reference it */
> +	int num_pqd;
> +	struct rb_root tag_ref_tree;
> +	/* No tag_node_tree_lock; use uid_tag_data_tree_lock */
> +};


> +struct proc_qtu_data {
> +	struct rb_node node;
> +	pid_t pid;

I suspect you want to use "struct pid *pid" here.
> +
> +	struct uid_tag_data *parent_tag_data;
> +
> +	/* Tracks the sock_tags that need freeing upon this proc's death */
> +	struct list_head sock_tag_list;
> +	/* No spinlock_t sock_tag_list_lock; use the global one. */
> +};
> +

> diff --git a/net/netfilter/xt_qtaguid.c b/net/netfilter/xt_qtaguid.c
> new file mode 100644
> index 0000000..7c4ac46
> +/*------------------------------------------*/
> +static int __init qtaguid_proc_register(struct proc_dir_entry **res_procdir)
> +{
> +	int ret;
> +	*res_procdir = proc_mkdir(module_procdirname, init_net.proc_net);

You probably want to handle all of the network namespaces.
> +	if (!*res_procdir) {
> +		pr_err("qtaguid: failed to create proc/.../xt_qtaguid\n");
> +		ret = -ENOMEM;
> +		goto no_dir;
> +	}
> +

^ permalink raw reply

* [net-next 7/7] igb: Use dma_unmap_addr and dma_unmap_len defines
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is meant to improve performance on systems that do not require
the DMA unmap calls.  On those systems we do not need to make use of the
unmap address for Tx or the unmap length so we can drop both thereby
reducing the size of the Tx buffer info structure.

In addition I have changed the logic to check for unmap length instead of
unmap address when checking to see if a buffer needs to be unmapped from
DMA use.  The reasons for this change is that on some platforms it is
possible to receive a valid DMA address of 0 from an IOMMU.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |  4 +-
 drivers/net/ethernet/intel/igb/igb_main.c | 64 +++++++++++++++----------------
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 9cad058..8aad230 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -168,8 +168,8 @@ struct igb_tx_buffer {
 	unsigned int bytecount;
 	u16 gso_segs;
 	__be16 protocol;
-	dma_addr_t dma;
-	u32 length;
+	DEFINE_DMA_UNMAP_ADDR(dma);
+	DEFINE_DMA_UNMAP_LEN(len);
 	u32 tx_flags;
 };
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index db6e456..60bf465 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -403,8 +403,8 @@ static void igb_dump(struct igb_adapter *adapter)
 		buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
 		pr_info(" %5d %5X %5X %016llX %04X %p %016llX\n",
 			n, tx_ring->next_to_use, tx_ring->next_to_clean,
-			(u64)buffer_info->dma,
-			buffer_info->length,
+			(u64)dma_unmap_addr(buffer_info, dma),
+			dma_unmap_len(buffer_info, len),
 			buffer_info->next_to_watch,
 			(u64)buffer_info->time_stamp);
 	}
@@ -455,8 +455,8 @@ static void igb_dump(struct igb_adapter *adapter)
 				" %04X  %p %016llX %p%s\n", i,
 				le64_to_cpu(u0->a),
 				le64_to_cpu(u0->b),
-				(u64)buffer_info->dma,
-				buffer_info->length,
+				(u64)dma_unmap_addr(buffer_info, dma),
+				dma_unmap_len(buffer_info, len),
 				buffer_info->next_to_watch,
 				(u64)buffer_info->time_stamp,
 				buffer_info->skb, next_desc);
@@ -465,7 +465,8 @@ static void igb_dump(struct igb_adapter *adapter)
 				print_hex_dump(KERN_INFO, "",
 					DUMP_PREFIX_ADDRESS,
 					16, 1, buffer_info->skb->data,
-					buffer_info->length, true);
+					dma_unmap_len(buffer_info, len),
+					true);
 		}
 	}
 
@@ -3198,20 +3199,20 @@ void igb_unmap_and_free_tx_resource(struct igb_ring *ring,
 {
 	if (tx_buffer->skb) {
 		dev_kfree_skb_any(tx_buffer->skb);
-		if (tx_buffer->dma)
+		if (dma_unmap_len(tx_buffer, len))
 			dma_unmap_single(ring->dev,
-					 tx_buffer->dma,
-					 tx_buffer->length,
+					 dma_unmap_addr(tx_buffer, dma),
+					 dma_unmap_len(tx_buffer, len),
 					 DMA_TO_DEVICE);
-	} else if (tx_buffer->dma) {
+	} else if (dma_unmap_len(tx_buffer, len)) {
 		dma_unmap_page(ring->dev,
-			       tx_buffer->dma,
-			       tx_buffer->length,
+			       dma_unmap_addr(tx_buffer, dma),
+			       dma_unmap_len(tx_buffer, len),
 			       DMA_TO_DEVICE);
 	}
 	tx_buffer->next_to_watch = NULL;
 	tx_buffer->skb = NULL;
-	tx_buffer->dma = 0;
+	dma_unmap_len_set(tx_buffer, len, 0);
 	/* buffer_info must be completely set up in the transmit path */
 }
 
@@ -4206,7 +4207,7 @@ static void igb_tx_map(struct igb_ring *tx_ring,
 		       const u8 hdr_len)
 {
 	struct sk_buff *skb = first->skb;
-	struct igb_tx_buffer *tx_buffer_info;
+	struct igb_tx_buffer *tx_buffer;
 	union e1000_adv_tx_desc *tx_desc;
 	dma_addr_t dma;
 	struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
@@ -4227,8 +4228,8 @@ static void igb_tx_map(struct igb_ring *tx_ring,
 		goto dma_error;
 
 	/* record length, and DMA address */
-	first->length = size;
-	first->dma = dma;
+	dma_unmap_len_set(first, len, size);
+	dma_unmap_addr_set(first, dma, dma);
 	tx_desc->read.buffer_addr = cpu_to_le64(dma);
 
 	for (;;) {
@@ -4270,9 +4271,9 @@ static void igb_tx_map(struct igb_ring *tx_ring,
 		if (dma_mapping_error(tx_ring->dev, dma))
 			goto dma_error;
 
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		tx_buffer_info->length = size;
-		tx_buffer_info->dma = dma;
+		tx_buffer = &tx_ring->tx_buffer_info[i];
+		dma_unmap_len_set(tx_buffer, len, size);
+		dma_unmap_addr_set(tx_buffer, dma, dma);
 
 		tx_desc->read.olinfo_status = 0;
 		tx_desc->read.buffer_addr = cpu_to_le64(dma);
@@ -4323,9 +4324,9 @@ dma_error:
 
 	/* clear dma mappings for failed tx_buffer_info map */
 	for (;;) {
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		igb_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
-		if (tx_buffer_info == first)
+		tx_buffer = &tx_ring->tx_buffer_info[i];
+		igb_unmap_and_free_tx_resource(tx_ring, tx_buffer);
+		if (tx_buffer == first)
 			break;
 		if (i == 0)
 			i = tx_ring->count;
@@ -5716,18 +5717,19 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 
 		/* free the skb */
 		dev_kfree_skb_any(tx_buffer->skb);
-		tx_buffer->skb = NULL;
 
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
-				 tx_buffer->dma,
-				 tx_buffer->length,
+				 dma_unmap_addr(tx_buffer, dma),
+				 dma_unmap_len(tx_buffer, len),
 				 DMA_TO_DEVICE);
 
+		/* clear tx_buffer data */
+		tx_buffer->skb = NULL;
+		dma_unmap_len_set(tx_buffer, len, 0);
+
 		/* clear last DMA location and unmap remaining buffers */
 		while (tx_desc != eop_desc) {
-			tx_buffer->dma = 0;
-
 			tx_buffer++;
 			tx_desc++;
 			i++;
@@ -5738,17 +5740,15 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 			}
 
 			/* unmap any remaining paged data */
-			if (tx_buffer->dma) {
+			if (dma_unmap_len(tx_buffer, len)) {
 				dma_unmap_page(tx_ring->dev,
-					       tx_buffer->dma,
-					       tx_buffer->length,
+					       dma_unmap_addr(tx_buffer, dma),
+					       dma_unmap_len(tx_buffer, len),
 					       DMA_TO_DEVICE);
+				dma_unmap_len_set(tx_buffer, len, 0);
 			}
 		}
 
-		/* clear last DMA location */
-		tx_buffer->dma = 0;
-
 		/* move us one more past the eop_desc for start of next pkt */
 		tx_buffer++;
 		tx_desc++;
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 6/7] igb: Simplify how we populate the RSS key
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Instead of storing the RSS key as a character array we can simplify the
configuration by making it a u32 array.  This allows us to just write one
value per register without any unnecessary operations to construct the
value.

This change will produce the same exact key, the only difference is that I
translated the u8 array to a u32 array which will be correctly ordered on
writes to hardware by the cpu_to_le32 operations that are built into the
writel calls.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 27688d9..db6e456 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2835,20 +2835,14 @@ static void igb_setup_mrqc(struct igb_adapter *adapter)
 	struct e1000_hw *hw = &adapter->hw;
 	u32 mrqc, rxcsum;
 	u32 j, num_rx_queues, shift = 0;
-	static const u8 rsshash[40] = {
-		0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, 0x41, 0x67,
-		0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, 0xd0, 0xca, 0x2b, 0xcb,
-		0xae, 0x7b, 0x30, 0xb4,	0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30,
-		0xf2, 0x0c, 0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa };
+	static const u32 rsskey[10] = { 0xDA565A6D, 0xC20E5B25, 0x3D256741,
+					0xB08FA343, 0xCB2BCAD0, 0xB4307BAE,
+					0xA32DCB77, 0x0CF23080, 0x3BB7426A,
+					0xFA01ACBE };
 
 	/* Fill out hash function seeds */
-	for (j = 0; j < 10; j++) {
-		u32 rsskey = rsshash[(j * 4)];
-		rsskey |= rsshash[(j * 4) + 1] << 8;
-		rsskey |= rsshash[(j * 4) + 2] << 16;
-		rsskey |= rsshash[(j * 4) + 3] << 24;
-		array_wr32(E1000_RSSRK(0), j, rsskey);
-	}
+	for (j = 0; j < 10; j++)
+		wr32(E1000_RSSRK(j), rsskey[j]);
 
 	num_rx_queues = adapter->rss_queues;
 
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 5/7] igb: Change how we populate the RSS indirection table
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch cleans up our RSS indirection table configuration so that we
generate the same table regardless of CPU endianness.  In addition it
changes the table setup so that instead of doing a modulo based setup it is
instead a divisor based setup.  The advantage to this is that we should be
able to take the Rx hash and compute the Rx queue with very little CPU
overhead if needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 55 +++++++++++++++----------------
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 91f542c..27688d9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2834,11 +2834,7 @@ static void igb_setup_mrqc(struct igb_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	u32 mrqc, rxcsum;
-	u32 j, num_rx_queues, shift = 0, shift2 = 0;
-	union e1000_reta {
-		u32 dword;
-		u8  bytes[4];
-	} reta;
+	u32 j, num_rx_queues, shift = 0;
 	static const u8 rsshash[40] = {
 		0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, 0x41, 0x67,
 		0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, 0xd0, 0xca, 0x2b, 0xcb,
@@ -2856,35 +2852,36 @@ static void igb_setup_mrqc(struct igb_adapter *adapter)
 
 	num_rx_queues = adapter->rss_queues;
 
-	if (adapter->vfs_allocated_count) {
-		/* 82575 and 82576 supports 2 RSS queues for VMDq */
-		switch (hw->mac.type) {
-		case e1000_i350:
-		case e1000_82580:
-			num_rx_queues = 1;
-			shift = 0;
-			break;
-		case e1000_82576:
+	switch (hw->mac.type) {
+	case e1000_82575:
+		shift = 6;
+		break;
+	case e1000_82576:
+		/* 82576 supports 2 RSS queues for SR-IOV */
+		if (adapter->vfs_allocated_count) {
 			shift = 3;
 			num_rx_queues = 2;
-			break;
-		case e1000_82575:
-			shift = 2;
-			shift2 = 6;
-		default:
-			break;
 		}
-	} else {
-		if (hw->mac.type == e1000_82575)
-			shift = 6;
+		break;
+	default:
+		break;
 	}
 
-	for (j = 0; j < (32 * 4); j++) {
-		reta.bytes[j & 3] = (j % num_rx_queues) << shift;
-		if (shift2)
-			reta.bytes[j & 3] |= num_rx_queues << shift2;
-		if ((j & 3) == 3)
-			wr32(E1000_RETA(j >> 2), reta.dword);
+	/*
+	 * Populate the indirection table 4 entries at a time.  To do this
+	 * we are generating the results for n and n+2 and then interleaving
+	 * those with the results with n+1 and n+3.
+	 */
+	for (j = 0; j < 32; j++) {
+		/* first pass generates n and n+2 */
+		u32 base = ((j * 0x00040004) + 0x00020000) * num_rx_queues;
+		u32 reta = (base & 0x07800780) >> (7 - shift);
+
+		/* second pass generates n+1 and n+3 */
+		base += 0x00010001 * num_rx_queues;
+		reta |= (base & 0x07800780) << (1 + shift);
+
+		wr32(E1000_RETA(j), reta);
 	}
 
 	/*
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 4/7] igb: Change Tx cleanup loop to do/while instead of for
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change makes it so that Tx cleanup is done in a do/while loop instead
of a for loop.  The main motivation behind this is the fact that we should
never be invoked with a budget less than 1 so we can skip checking the
budget before processing the first descriptor.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c9997d8..91f542c 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5690,7 +5690,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 	struct igb_adapter *adapter = q_vector->adapter;
 	struct igb_ring *tx_ring = q_vector->tx.ring;
 	struct igb_tx_buffer *tx_buffer;
-	union e1000_adv_tx_desc *tx_desc, *eop_desc;
+	union e1000_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
 	unsigned int budget = q_vector->tx.work_limit;
 	unsigned int i = tx_ring->next_to_clean;
@@ -5702,16 +5702,16 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 	tx_desc = IGB_TX_DESC(tx_ring, i);
 	i -= tx_ring->count;
 
-	for (; budget; budget--) {
-		eop_desc = tx_buffer->next_to_watch;
-
-		/* prevent any other reads prior to eop_desc */
-		rmb();
+	do {
+		union e1000_adv_tx_desc *eop_desc = tx_buffer->next_to_watch;
 
 		/* if next_to_watch is not set then there is no work pending */
 		if (!eop_desc)
 			break;
 
+		/* prevent any other reads prior to eop_desc */
+		rmb();
+
 		/* if DD is not set pending work has not been completed */
 		if (!(eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)))
 			break;
@@ -5767,7 +5767,13 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 			tx_buffer = tx_ring->tx_buffer_info;
 			tx_desc = IGB_TX_DESC(tx_ring, 0);
 		}
-	}
+
+		/* issue prefetch for next Tx descriptor */
+		prefetch(tx_desc);
+
+		/* update budget accounting */
+		budget--;
+	} while (likely(budget));
 
 	netdev_tx_completed_queue(txring_txq(tx_ring),
 				  total_packets, total_bytes);
@@ -5783,12 +5789,10 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 	if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
 		struct e1000_hw *hw = &adapter->hw;
 
-		eop_desc = tx_buffer->next_to_watch;
-
 		/* Detect a transmit hang in hardware, this serializes the
 		 * check with the clearing of time_stamp and movement of i */
 		clear_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags);
-		if (eop_desc &&
+		if (tx_buffer->next_to_watch &&
 		    time_after(jiffies, tx_buffer->time_stamp +
 			       (adapter->tx_timeout_factor * HZ)) &&
 		    !(rd32(E1000_STATUS) & E1000_STATUS_TXOFF)) {
@@ -5812,9 +5816,9 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 				tx_ring->next_to_use,
 				tx_ring->next_to_clean,
 				tx_buffer->time_stamp,
-				eop_desc,
+				tx_buffer->next_to_watch,
 				jiffies,
-				eop_desc->wb.status);
+				tx_buffer->next_to_watch->wb.status);
 			netif_stop_subqueue(tx_ring->netdev,
 					    tx_ring->queue_index);
 
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 3/7] igb: Remove logic that was doing NUMA pseudo-aware allocations
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change removes the code that was doing the NUMA allocations for the
q_vectors, rings, and ring resources.  The problem is the logic used assumed
that the NUMA nodes were always interleved and that is not always the case.

At some point I hope to add this functionality back in a more controlled
manner in the future.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |  3 -
 drivers/net/ethernet/intel/igb/igb_main.c | 95 +++++--------------------------
 2 files changed, 13 insertions(+), 85 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 6f17f69..9cad058 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -213,7 +213,6 @@ struct igb_q_vector {
 	struct igb_ring_container rx, tx;
 
 	struct napi_struct napi;
-	int numa_node;
 
 	u16 itr_val;
 	u8 set_itr;
@@ -258,7 +257,6 @@ struct igb_ring {
 	};
 	/* Items past this point are only used during ring alloc / free */
 	dma_addr_t dma;                /* phys address of the ring */
-	int numa_node;                  /* node to alloc ring memory on */
 };
 
 enum e1000_ring_flags_t {
@@ -373,7 +371,6 @@ struct igb_adapter {
 	int vf_rate_link_speed;
 	u32 rss_queues;
 	u32 wvbr;
-	int node;
 	u32 *shadow_vfta;
 
 #ifdef CONFIG_IGB_PTP
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 60cf3eb..c9997d8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -682,52 +682,29 @@ static int igb_alloc_queues(struct igb_adapter *adapter)
 {
 	struct igb_ring *ring;
 	int i;
-	int orig_node = adapter->node;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
-		if (orig_node == -1) {
-			int cur_node = next_online_node(adapter->node);
-			if (cur_node == MAX_NUMNODES)
-				cur_node = first_online_node;
-			adapter->node = cur_node;
-		}
-		ring = kzalloc_node(sizeof(struct igb_ring), GFP_KERNEL,
-				    adapter->node);
-		if (!ring)
-			ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
+		ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
 		if (!ring)
 			goto err;
 		ring->count = adapter->tx_ring_count;
 		ring->queue_index = i;
 		ring->dev = &adapter->pdev->dev;
 		ring->netdev = adapter->netdev;
-		ring->numa_node = adapter->node;
 		/* For 82575, context index must be unique per ring. */
 		if (adapter->hw.mac.type == e1000_82575)
 			set_bit(IGB_RING_FLAG_TX_CTX_IDX, &ring->flags);
 		adapter->tx_ring[i] = ring;
 	}
-	/* Restore the adapter's original node */
-	adapter->node = orig_node;
 
 	for (i = 0; i < adapter->num_rx_queues; i++) {
-		if (orig_node == -1) {
-			int cur_node = next_online_node(adapter->node);
-			if (cur_node == MAX_NUMNODES)
-				cur_node = first_online_node;
-			adapter->node = cur_node;
-		}
-		ring = kzalloc_node(sizeof(struct igb_ring), GFP_KERNEL,
-				    adapter->node);
-		if (!ring)
-			ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
+		ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
 		if (!ring)
 			goto err;
 		ring->count = adapter->rx_ring_count;
 		ring->queue_index = i;
 		ring->dev = &adapter->pdev->dev;
 		ring->netdev = adapter->netdev;
-		ring->numa_node = adapter->node;
 		/* set flag indicating ring supports SCTP checksum offload */
 		if (adapter->hw.mac.type >= e1000_82576)
 			set_bit(IGB_RING_FLAG_RX_SCTP_CSUM, &ring->flags);
@@ -741,16 +718,12 @@ static int igb_alloc_queues(struct igb_adapter *adapter)
 
 		adapter->rx_ring[i] = ring;
 	}
-	/* Restore the adapter's original node */
-	adapter->node = orig_node;
 
 	igb_cache_ring_register(adapter);
 
 	return 0;
 
 err:
-	/* Restore the adapter's original node */
-	adapter->node = orig_node;
 	igb_free_queues(adapter);
 
 	return -ENOMEM;
@@ -1116,24 +1089,10 @@ static int igb_alloc_q_vectors(struct igb_adapter *adapter)
 	struct igb_q_vector *q_vector;
 	struct e1000_hw *hw = &adapter->hw;
 	int v_idx;
-	int orig_node = adapter->node;
 
 	for (v_idx = 0; v_idx < adapter->num_q_vectors; v_idx++) {
-		if ((adapter->num_q_vectors == (adapter->num_rx_queues +
-						adapter->num_tx_queues)) &&
-		    (adapter->num_rx_queues == v_idx))
-			adapter->node = orig_node;
-		if (orig_node == -1) {
-			int cur_node = next_online_node(adapter->node);
-			if (cur_node == MAX_NUMNODES)
-				cur_node = first_online_node;
-			adapter->node = cur_node;
-		}
-		q_vector = kzalloc_node(sizeof(struct igb_q_vector), GFP_KERNEL,
-					adapter->node);
-		if (!q_vector)
-			q_vector = kzalloc(sizeof(struct igb_q_vector),
-					   GFP_KERNEL);
+		q_vector = kzalloc(sizeof(struct igb_q_vector),
+				   GFP_KERNEL);
 		if (!q_vector)
 			goto err_out;
 		q_vector->adapter = adapter;
@@ -1142,14 +1101,10 @@ static int igb_alloc_q_vectors(struct igb_adapter *adapter)
 		netif_napi_add(adapter->netdev, &q_vector->napi, igb_poll, 64);
 		adapter->q_vector[v_idx] = q_vector;
 	}
-	/* Restore the adapter's original node */
-	adapter->node = orig_node;
 
 	return 0;
 
 err_out:
-	/* Restore the adapter's original node */
-	adapter->node = orig_node;
 	igb_free_q_vectors(adapter);
 	return -ENOMEM;
 }
@@ -2423,8 +2378,6 @@ static int __devinit igb_sw_init(struct igb_adapter *adapter)
 				  VLAN_HLEN;
 	adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
 
-	adapter->node = -1;
-
 	spin_lock_init(&adapter->stats64_lock);
 #ifdef CONFIG_PCI_IOV
 	switch (hw->mac.type) {
@@ -2671,13 +2624,11 @@ static int igb_close(struct net_device *netdev)
 int igb_setup_tx_resources(struct igb_ring *tx_ring)
 {
 	struct device *dev = tx_ring->dev;
-	int orig_node = dev_to_node(dev);
 	int size;
 
 	size = sizeof(struct igb_tx_buffer) * tx_ring->count;
-	tx_ring->tx_buffer_info = vzalloc_node(size, tx_ring->numa_node);
-	if (!tx_ring->tx_buffer_info)
-		tx_ring->tx_buffer_info = vzalloc(size);
+
+	tx_ring->tx_buffer_info = vzalloc(size);
 	if (!tx_ring->tx_buffer_info)
 		goto err;
 
@@ -2685,18 +2636,10 @@ int igb_setup_tx_resources(struct igb_ring *tx_ring)
 	tx_ring->size = tx_ring->count * sizeof(union e1000_adv_tx_desc);
 	tx_ring->size = ALIGN(tx_ring->size, 4096);
 
-	set_dev_node(dev, tx_ring->numa_node);
 	tx_ring->desc = dma_alloc_coherent(dev,
 					   tx_ring->size,
 					   &tx_ring->dma,
 					   GFP_KERNEL);
-	set_dev_node(dev, orig_node);
-	if (!tx_ring->desc)
-		tx_ring->desc = dma_alloc_coherent(dev,
-						   tx_ring->size,
-						   &tx_ring->dma,
-						   GFP_KERNEL);
-
 	if (!tx_ring->desc)
 		goto err;
 
@@ -2707,8 +2650,8 @@ int igb_setup_tx_resources(struct igb_ring *tx_ring)
 
 err:
 	vfree(tx_ring->tx_buffer_info);
-	dev_err(dev,
-		"Unable to allocate memory for the transmit descriptor ring\n");
+	tx_ring->tx_buffer_info = NULL;
+	dev_err(dev, "Unable to allocate memory for the Tx descriptor ring\n");
 	return -ENOMEM;
 }
 
@@ -2825,34 +2768,23 @@ static void igb_configure_tx(struct igb_adapter *adapter)
 int igb_setup_rx_resources(struct igb_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
-	int orig_node = dev_to_node(dev);
-	int size, desc_len;
+	int size;
 
 	size = sizeof(struct igb_rx_buffer) * rx_ring->count;
-	rx_ring->rx_buffer_info = vzalloc_node(size, rx_ring->numa_node);
-	if (!rx_ring->rx_buffer_info)
-		rx_ring->rx_buffer_info = vzalloc(size);
+
+	rx_ring->rx_buffer_info = vzalloc(size);
 	if (!rx_ring->rx_buffer_info)
 		goto err;
 
-	desc_len = sizeof(union e1000_adv_rx_desc);
 
 	/* Round up to nearest 4K */
-	rx_ring->size = rx_ring->count * desc_len;
+	rx_ring->size = rx_ring->count * sizeof(union e1000_adv_rx_desc);
 	rx_ring->size = ALIGN(rx_ring->size, 4096);
 
-	set_dev_node(dev, rx_ring->numa_node);
 	rx_ring->desc = dma_alloc_coherent(dev,
 					   rx_ring->size,
 					   &rx_ring->dma,
 					   GFP_KERNEL);
-	set_dev_node(dev, orig_node);
-	if (!rx_ring->desc)
-		rx_ring->desc = dma_alloc_coherent(dev,
-						   rx_ring->size,
-						   &rx_ring->dma,
-						   GFP_KERNEL);
-
 	if (!rx_ring->desc)
 		goto err;
 
@@ -2864,8 +2796,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
 err:
 	vfree(rx_ring->rx_buffer_info);
 	rx_ring->rx_buffer_info = NULL;
-	dev_err(dev, "Unable to allocate memory for the receive descriptor"
-		" ring\n");
+	dev_err(dev, "Unable to allocate memory for the Rx descriptor ring\n");
 	return -ENOMEM;
 }
 
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 2/7] igb: Fix stats output on i210/i211 parts.
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Carolyn Wyborny, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

Due to a hardware issue, on i210 and i211 parts, the TNCRS statistic
provides an invalid value.  This patch changes the update stats function
to increment the stat only for non-i210/i211 parts.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 0730096..60cf3eb 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4776,7 +4776,11 @@ void igb_update_stats(struct igb_adapter *adapter,
 	reg = rd32(E1000_CTRL_EXT);
 	if (!(reg & E1000_CTRL_EXT_LINK_MODE_MASK)) {
 		adapter->stats.rxerrc += rd32(E1000_RXERRC);
-		adapter->stats.tncrs += rd32(E1000_TNCRS);
+
+		/* this stat has invalid values on i210/i211 */
+		if ((hw->mac.type != e1000_i210) &&
+		    (hw->mac.type != e1000_i211))
+			adapter->stats.tncrs += rd32(E1000_TNCRS);
 	}

 	adapter->stats.tsctc += rd32(E1000_TSCTC);
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 1/7] igb: Change how we check for pre-existing and assigned VFs
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Stefan Assmann, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348309836-7107-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

Adapt the pre-existing and assigned VFs code to the ixgbe way introduced
in commit 9297127b9cdd8d30c829ef5fd28b7cc0323a7bcd.

Instead of searching the enabled VFs we use pci_num_vf to determine enabled VFs.
By comparing to which PF an assigned VF is owned it's possible to decide
whether to leave it enabled or not.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |   1 -
 drivers/net/ethernet/intel/igb/igb_main.c | 104 +++++++-----------------------
 2 files changed, 22 insertions(+), 83 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 43c8e29..6f17f69 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -101,7 +101,6 @@ struct vf_data_storage {
 	u16 pf_vlan; /* When set, guest VLAN config not allowed. */
 	u16 pf_qos;
 	u16 tx_rate;
-	struct pci_dev *vfdev;
 };
 
 #define IGB_VF_FLAG_CTS            0x00000001 /* VF is clear to send data */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 246646b..0730096 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -172,8 +172,7 @@ static void igb_check_vf_rate_limit(struct igb_adapter *);
 
 #ifdef CONFIG_PCI_IOV
 static int igb_vf_configure(struct igb_adapter *adapter, int vf);
-static int igb_find_enabled_vfs(struct igb_adapter *adapter);
-static int igb_check_vf_assignment(struct igb_adapter *adapter);
+static bool igb_vfs_are_assigned(struct igb_adapter *adapter);
 #endif
 
 #ifdef CONFIG_PM
@@ -2300,11 +2299,11 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 	/* reclaim resources allocated to VFs */
 	if (adapter->vf_data) {
 		/* disable iov and allow time for transactions to clear */
-		if (!igb_check_vf_assignment(adapter)) {
+		if (igb_vfs_are_assigned(adapter)) {
+			dev_info(&pdev->dev, "Unloading driver while VFs are assigned - VFs will not be deallocated\n");
+		} else {
 			pci_disable_sriov(pdev);
 			msleep(500);
-		} else {
-			dev_info(&pdev->dev, "VF(s) assigned to guests!\n");
 		}
 
 		kfree(adapter->vf_data);
@@ -2344,7 +2343,7 @@ static void __devinit igb_probe_vfs(struct igb_adapter * adapter)
 #ifdef CONFIG_PCI_IOV
 	struct pci_dev *pdev = adapter->pdev;
 	struct e1000_hw *hw = &adapter->hw;
-	int old_vfs = igb_find_enabled_vfs(adapter);
+	int old_vfs = pci_num_vf(adapter->pdev);
 	int i;
 
 	/* Virtualization features not supported on i210 family. */
@@ -5037,102 +5036,43 @@ static int igb_notify_dca(struct notifier_block *nb, unsigned long event,
 static int igb_vf_configure(struct igb_adapter *adapter, int vf)
 {
 	unsigned char mac_addr[ETH_ALEN];
-	struct pci_dev *pdev = adapter->pdev;
-	struct e1000_hw *hw = &adapter->hw;
-	struct pci_dev *pvfdev;
-	unsigned int device_id;
-	u16 thisvf_devfn;
 
 	eth_random_addr(mac_addr);
 	igb_set_vf_mac(adapter, vf, mac_addr);
 
-	switch (adapter->hw.mac.type) {
-	case e1000_82576:
-		device_id = IGB_82576_VF_DEV_ID;
-		/* VF Stride for 82576 is 2 */
-		thisvf_devfn = (pdev->devfn + 0x80 + (vf << 1)) |
-			(pdev->devfn & 1);
-		break;
-	case e1000_i350:
-		device_id = IGB_I350_VF_DEV_ID;
-		/* VF Stride for I350 is 4 */
-		thisvf_devfn = (pdev->devfn + 0x80 + (vf << 2)) |
-				(pdev->devfn & 3);
-		break;
-	default:
-		device_id = 0;
-		thisvf_devfn = 0;
-		break;
-	}
-
-	pvfdev = pci_get_device(hw->vendor_id, device_id, NULL);
-	while (pvfdev) {
-		if (pvfdev->devfn == thisvf_devfn)
-			break;
-		pvfdev = pci_get_device(hw->vendor_id,
-					device_id, pvfdev);
-	}
-
-	if (pvfdev)
-		adapter->vf_data[vf].vfdev = pvfdev;
-	else
-		dev_err(&pdev->dev,
-			"Couldn't find pci dev ptr for VF %4.4x\n",
-			thisvf_devfn);
-	return pvfdev != NULL;
+	return 0;
 }
 
-static int igb_find_enabled_vfs(struct igb_adapter *adapter)
+static bool igb_vfs_are_assigned(struct igb_adapter *adapter)
 {
-	struct e1000_hw *hw = &adapter->hw;
 	struct pci_dev *pdev = adapter->pdev;
-	struct pci_dev *pvfdev;
-	u16 vf_devfn = 0;
-	u16 vf_stride;
-	unsigned int device_id;
-	int vfs_found = 0;
+	struct pci_dev *vfdev;
+	int dev_id;
 
 	switch (adapter->hw.mac.type) {
 	case e1000_82576:
-		device_id = IGB_82576_VF_DEV_ID;
-		/* VF Stride for 82576 is 2 */
-		vf_stride = 2;
+		dev_id = IGB_82576_VF_DEV_ID;
 		break;
 	case e1000_i350:
-		device_id = IGB_I350_VF_DEV_ID;
-		/* VF Stride for I350 is 4 */
-		vf_stride = 4;
+		dev_id = IGB_I350_VF_DEV_ID;
 		break;
 	default:
-		device_id = 0;
-		vf_stride = 0;
-		break;
-	}
-
-	vf_devfn = pdev->devfn + 0x80;
-	pvfdev = pci_get_device(hw->vendor_id, device_id, NULL);
-	while (pvfdev) {
-		if (pvfdev->devfn == vf_devfn &&
-		    (pvfdev->bus->number >= pdev->bus->number))
-			vfs_found++;
-		vf_devfn += vf_stride;
-		pvfdev = pci_get_device(hw->vendor_id,
-					device_id, pvfdev);
+		return false;
 	}
 
-	return vfs_found;
-}
-
-static int igb_check_vf_assignment(struct igb_adapter *adapter)
-{
-	int i;
-	for (i = 0; i < adapter->vfs_allocated_count; i++) {
-		if (adapter->vf_data[i].vfdev) {
-			if (adapter->vf_data[i].vfdev->dev_flags &
-			    PCI_DEV_FLAGS_ASSIGNED)
+	/* loop through all the VFs to see if we own any that are assigned */
+	vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, dev_id, NULL);
+	while (vfdev) {
+		/* if we don't own it we don't care */
+		if (vfdev->is_virtfn && vfdev->physfn == pdev) {
+			/* if it is assigned we cannot release it */
+			if (vfdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED)
 				return true;
 		}
+
+		vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, dev_id, vfdev);
 	}
+
 	return false;
 }
 
-- 
1.7.11.4

^ permalink raw reply related

* [net-next 0/7][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-09-22 10:30 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to igb only.

The following are changes since commit abb17e6c0c7b27693201dc85f75dbb184279fd10:
  netlink: use <linux/export.h> instead of <linux/module.h>
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (5):
  igb: Remove logic that was doing NUMA pseudo-aware allocations
  igb: Change Tx cleanup loop to do/while instead of for
  igb: Change how we populate the RSS indirection table
  igb: Simplify how we populate the RSS key
  igb: Use dma_unmap_addr and dma_unmap_len defines

Carolyn Wyborny (1):
  igb: Fix stats output on i210/i211 parts.

Stefan Assmann (1):
  igb: Change how we check for pre-existing and assigned VFs

 drivers/net/ethernet/intel/igb/igb.h      |   8 +-
 drivers/net/ethernet/intel/igb/igb_main.c | 370 ++++++++++--------------------
 2 files changed, 122 insertions(+), 256 deletions(-)

-- 
1.7.11.4

^ permalink raw reply

* [PATCH] pppoe: drop PPPOX_ZOMBIEs in pppoe_release
From: Xiaodong Xu @ 2012-09-22 10:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

From: Xiaodong Xu <stid.smth@gmail.com>

When PPPOE is running over a virtual ethernet interface (e.g., a
bonding interface) and the user tries to delete the interface in case
the PPPOE state is ZOMBIE, the kernel will loop forever while
unregistering net_device for the reference count is not decreased to
zero which should have been done with dev_put().

Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>

---

--- linux/drivers/net/ppp/pppoe.c.orig	2012-09-19 11:49:27.921826868 +0800
+++ linux/drivers/net/ppp/pppoe.c	2012-09-22 17:44:03.642730082 +0800
@@ -570,7 +570,7 @@ static int pppoe_release(struct socket *

 	po = pppox_sk(sk);

-	if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+	if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
 		dev_put(po->pppoe_dev);
 		po->pppoe_dev = NULL;
 	}

-- 
Regards,
Xiaodong Xu

^ permalink raw reply

* [PATCH] ipv4: raw: fix icmp_filter()
From: Eric Dumazet @ 2012-09-22 10:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

icmp_filter() should not modify its input, or else its caller
would need to recompute ip_hdr() if skb->head is reallocated.

Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
This is a minimal fix, to meet stable expectations.

 net/ipv4/raw.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff0f071..d23c657 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -131,18 +131,20 @@ found:
  *	0 - deliver
  *	1 - block
  */
-static __inline__ int icmp_filter(struct sock *sk, struct sk_buff *skb)
+static int icmp_filter(const struct sock *sk, const struct sk_buff *skb)
 {
-	int type;
+	struct icmphdr _hdr;
+	const struct icmphdr *hdr;
 
-	if (!pskb_may_pull(skb, sizeof(struct icmphdr)))
+	hdr = skb_header_pointer(skb, skb_transport_offset(skb),
+				 sizeof(_hdr), &_hdr);
+	if (!hdr)
 		return 1;
 
-	type = icmp_hdr(skb)->type;
-	if (type < 32) {
+	if (hdr->type < 32) {
 		__u32 data = raw_sk(sk)->filter.data;
 
-		return ((1 << type) & data) != 0;
+		return ((1U << hdr->type) & data) != 0;
 	}
 
 	/* Do not block unknown ICMP types */

^ permalink raw reply related

* [PATCH] pppoe: drop PPPOX_ZOMBIEs in pppoe_release
From: Xiaodong Xu @ 2012-09-22  9:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

From: Xiaodong Xu <stid.smth@gmail.com>

When PPPOE is running over a virtual ethernet interface (e.g., a
bonding interface) and the user tries to delete the interface in case
the PPPOE state is ZOMBIE, the kernel will loop forever while
unregistering net_device for the reference count is not decreased to
zero which should have been done with dev_put().

Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>

---

--- drivers/net/ppp/pppoe.c.orig	2012-09-19 11:49:27.921826868 +0800
+++ drivers/net/ppp/pppoe.c	2012-09-22 17:44:03.642730082 +0800
@@ -570,7 +570,7 @@ static int pppoe_release(struct socket *

 	po = pppox_sk(sk);

-	if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+	if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
 		dev_put(po->pppoe_dev);
 		po->pppoe_dev = NULL;
 	}

-- 
Regards,
Xiaodong Xu

^ permalink raw reply

* Warning! Your mailbox is almost full.
From: Webmail Account Upgrade @ 2012-09-22  8:35 UTC (permalink / raw)


You have exceeded your email limit quota of 450MB. You need to upgrade  
your email limit quota to 2GB within the next 48 hours. Use the below  
web link to upgrade your email account:

click link below:
   http://www.formchamp.com/goform.php?id=38467

Thank you for using our email.
Copyright ©2012 Email Helpdesk Centre.
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply

* Re: [PATCH V2 net-next 0/4] Two new PTP Hardware Clock features
From: Richard Cochran @ 2012-09-22  9:11 UTC (permalink / raw)
  To: netdev
  Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
	John Stultz, Matthew Vick
In-Reply-To: <20120922074553.GA4143@netboy.at.omicron.at>

On Sat, Sep 22, 2012 at 09:45:53AM +0200, Richard Cochran wrote:
> On Sat, Sep 22, 2012 at 09:42:52AM +0200, Richard Cochran wrote:
> > This patch series adds two new features to the PHC code.
> 
> Forgot to say:
> 
> V2 preserves the clock_name attribute as it was meant to be, instead
> of making any changes to it.

... and covers the registration API change in the brand new solarflare
phc device, which was overlooked in V1.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH net-next v1] net: use a per task frag allocator
From: Eric Dumazet @ 2012-09-21 21:11 UTC (permalink / raw)
  To: Vijay Subramanian
  Cc: David Miller, linux-kernel, netdev, Ben Hutchings,
	Alexander Duyck
In-Reply-To: <CAGK4HS98RG78Auvaai3Ny6zz5c_hccO-ShD3GuT=moRRm4+RUA@mail.gmail.com>

On Fri, 2012-09-21 at 13:27 -0700, Vijay Subramanian wrote:
> I get the following compile error with the newer version of the patch
> 
> net/sched/em_meta.c: In function ‘meta_int_sk_sendmsg_off’:
> net/sched/em_meta.c:464: error: ‘struct sock’ has no member named
> ‘sk_sndmsg_off’
> make[1]: *** [net/sched/em_meta.o] Error 1
> make: *** [net/sched/em_meta.o] Error 2
> 
> 
> 
> Vijay

Oh well, I wonder what's the expected use of this crap...

Thanks, I'll fix this on v3 !

^ permalink raw reply

* Re: [PATCH V2 net-next 0/4] Two new PTP Hardware Clock features
From: Richard Cochran @ 2012-09-22  7:45 UTC (permalink / raw)
  To: netdev
  Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
	John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>

On Sat, Sep 22, 2012 at 09:42:52AM +0200, Richard Cochran wrote:
> This patch series adds two new features to the PHC code.

Forgot to say:

V2 preserves the clock_name attribute as it was meant to be, instead
of making any changes to it.

Thanks,
Richard

^ permalink raw reply

* [PATCH V2 net-next 4/4] ptp: clarify the clock_name sysfs attribute
From: Richard Cochran @ 2012-09-22  7:42 UTC (permalink / raw)
  To: netdev
  Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
	John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>

There has been some confusion among PHC driver authors about the
intended purpose of the clock_name attribute. This patch expands the
documation in order to clarify how the clock_name field should be
understood.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 Documentation/ABI/testing/sysfs-ptp |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-ptp b/Documentation/ABI/testing/sysfs-ptp
index d40d2b5..c906488 100644
--- a/Documentation/ABI/testing/sysfs-ptp
+++ b/Documentation/ABI/testing/sysfs-ptp
@@ -19,7 +19,10 @@ Date:		September 2010
 Contact:	Richard Cochran <richardcochran@gmail.com>
 Description:
 		This file contains the name of the PTP hardware clock
-		as a human readable string.
+		as a human readable string. The purpose of this
+		attribute is to provide the user with a "friendly
+		name" and to help distinguish PHY based devices from
+		MAC based ones.
 
 What:		/sys/class/ptp/ptpN/max_adjustment
 Date:		September 2010
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH V2 net-next 3/4] ptp: link the phc device to its parent device
From: Richard Cochran @ 2012-09-22  7:42 UTC (permalink / raw)
  To: netdev
  Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
	John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>

PTP Hardware Clock devices appear as class devices in sysfs. This patch
changes the registration API to use the parent device, clarifying the
clock's relationship to the underlying device.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 drivers/net/ethernet/freescale/gianfar_ptp.c |    2 +-
 drivers/net/ethernet/intel/igb/igb_ptp.c     |    3 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c |    3 ++-
 drivers/net/ethernet/sfc/ptp.c               |    3 ++-
 drivers/net/phy/dp83640.c                    |    2 +-
 drivers/ptp/ptp_clock.c                      |    5 +++--
 drivers/ptp/ptp_ixp46x.c                     |    2 +-
 drivers/ptp/ptp_pch.c                        |    2 +-
 include/linux/ptp_clock_kernel.h             |    7 +++++--
 9 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar_ptp.c b/drivers/net/ethernet/freescale/gianfar_ptp.c
index c08e5d4..18762a3 100644
--- a/drivers/net/ethernet/freescale/gianfar_ptp.c
+++ b/drivers/net/ethernet/freescale/gianfar_ptp.c
@@ -510,7 +510,7 @@ static int gianfar_ptp_probe(struct platform_device *dev)
 
 	spin_unlock_irqrestore(&etsects->lock, flags);
 
-	etsects->clock = ptp_clock_register(&etsects->caps);
+	etsects->clock = ptp_clock_register(&etsects->caps, &dev->dev);
 	if (IS_ERR(etsects->clock)) {
 		err = PTR_ERR(etsects->clock);
 		goto no_clock;
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index e13ba1d..ee21445 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -752,7 +752,8 @@ void igb_ptp_init(struct igb_adapter *adapter)
 		wr32(E1000_IMS, E1000_IMS_TS);
 	}
 
-	adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps);
+	adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps,
+						&adapter->pdev->dev);
 	if (IS_ERR(adapter->ptp_clock)) {
 		adapter->ptp_clock = NULL;
 		dev_err(&adapter->pdev->dev, "ptp_clock_register failed\n");
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index 3456d56..39881cb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -960,7 +960,8 @@ void ixgbe_ptp_init(struct ixgbe_adapter *adapter)
 	/* (Re)start the overflow check */
 	adapter->flags2 |= IXGBE_FLAG2_OVERFLOW_CHECK_ENABLED;
 
-	adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps);
+	adapter->ptp_clock = ptp_clock_register(&adapter->ptp_caps,
+						&adapter->pdev->dev);
 	if (IS_ERR(adapter->ptp_clock)) {
 		adapter->ptp_clock = NULL;
 		e_dev_err("ptp_clock_register failed\n");
diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index 2b07a4e..3ed5d13 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -931,7 +931,8 @@ static int efx_ptp_probe_channel(struct efx_channel *channel)
 	ptp->phc_clock_info.settime = efx_phc_settime;
 	ptp->phc_clock_info.enable = efx_phc_enable;
 
-	ptp->phc_clock = ptp_clock_register(&ptp->phc_clock_info);
+	ptp->phc_clock = ptp_clock_register(&ptp->phc_clock_info,
+					    &channel->napi_dev->dev);
 	if (!ptp->phc_clock)
 		goto fail3;
 
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index b0da022..24e05c4 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -980,7 +980,7 @@ static int dp83640_probe(struct phy_device *phydev)
 
 	if (choose_this_phy(clock, phydev)) {
 		clock->chosen = dp83640;
-		clock->ptp_clock = ptp_clock_register(&clock->caps);
+		clock->ptp_clock = ptp_clock_register(&clock->caps, &phydev->dev);
 		if (IS_ERR(clock->ptp_clock)) {
 			err = PTR_ERR(clock->ptp_clock);
 			goto no_register;
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 6f7009a..b15a376 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -186,7 +186,8 @@ static void delete_ptp_clock(struct posix_clock *pc)
 
 /* public interface */
 
-struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info)
+struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
+				     struct device *parent)
 {
 	struct ptp_clock *ptp;
 	int err = 0, index, major = MAJOR(ptp_devt);
@@ -219,7 +220,7 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info)
 	init_waitqueue_head(&ptp->tsev_wq);
 
 	/* Create a new device in our class. */
-	ptp->dev = device_create(ptp_class, NULL, ptp->devid, ptp,
+	ptp->dev = device_create(ptp_class, parent, ptp->devid, ptp,
 				 "ptp%d", ptp->index);
 	if (IS_ERR(ptp->dev))
 		goto no_device;
diff --git a/drivers/ptp/ptp_ixp46x.c b/drivers/ptp/ptp_ixp46x.c
index e03c406..d49b851 100644
--- a/drivers/ptp/ptp_ixp46x.c
+++ b/drivers/ptp/ptp_ixp46x.c
@@ -298,7 +298,7 @@ static int __init ptp_ixp_init(void)
 
 	ixp_clock.caps = ptp_ixp_caps;
 
-	ixp_clock.ptp_clock = ptp_clock_register(&ixp_clock.caps);
+	ixp_clock.ptp_clock = ptp_clock_register(&ixp_clock.caps, NULL);
 
 	if (IS_ERR(ixp_clock.ptp_clock))
 		return PTR_ERR(ixp_clock.ptp_clock);
diff --git a/drivers/ptp/ptp_pch.c b/drivers/ptp/ptp_pch.c
index 3a9c17e..e624e4d 100644
--- a/drivers/ptp/ptp_pch.c
+++ b/drivers/ptp/ptp_pch.c
@@ -627,7 +627,7 @@ pch_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 
 	chip->caps = ptp_pch_caps;
-	chip->ptp_clock = ptp_clock_register(&chip->caps);
+	chip->ptp_clock = ptp_clock_register(&chip->caps, &pdev->dev);
 
 	if (IS_ERR(chip->ptp_clock))
 		return PTR_ERR(chip->ptp_clock);
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index a644b29..56c71b2 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -21,6 +21,7 @@
 #ifndef _PTP_CLOCK_KERNEL_H_
 #define _PTP_CLOCK_KERNEL_H_
 
+#include <linux/device.h>
 #include <linux/pps_kernel.h>
 #include <linux/ptp_clock.h>
 
@@ -93,10 +94,12 @@ struct ptp_clock;
 /**
  * ptp_clock_register() - register a PTP hardware clock driver
  *
- * @info:  Structure describing the new clock.
+ * @info:   Structure describing the new clock.
+ * @parent: Pointer to the parent device of the new clock.
  */
 
-extern struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info);
+extern struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
+					    struct device *parent);
 
 /**
  * ptp_clock_unregister() - unregister a PTP hardware clock driver
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH V2 net-next 2/4] ptp: provide the clock's adjusted frequency
From: Richard Cochran @ 2012-09-22  7:42 UTC (permalink / raw)
  To: netdev
  Cc: Ben Hutchings, David Miller, Jacob Keller, Jeff Kirsher,
	John Stultz, Matthew Vick
In-Reply-To: <cover.1348299094.git.richardcochran@gmail.com>

If the timex.mode field indicates a query, then we provide the value of
the current frequency adjustment.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 drivers/ptp/ptp_clock.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 67e628e..6f7009a 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -148,6 +148,11 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx)
 
 		err = ops->adjfreq(ops, scaled_ppm_to_ppb(tx->freq));
 		ptp->dialed_frequency = tx->freq;
+
+	} else if (tx->modes == 0) {
+
+		tx->freq = ptp->dialed_frequency;
+		err = 0;
 	}
 
 	return err;
-- 
1.7.2.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox