Netdev List
 help / color / mirror / Atom feed
* [PATCH] ipv6: Fix ipv6_getsockopt for IPV6_2292PKTOPTIONS
From: Daniel Baluta @ 2011-08-19  9:55 UTC (permalink / raw)
  To: davem, kuznet, jmorris, kaber; +Cc: netdev, Daniel Baluta, Sorin Dumitru

IPV6_2292PKTOPTIONS is broken for 32-bit applications running
in COMPAT mode on 64-bit kernels.

The same problem was fixed for IPv4 with the patch:
ipv4: Fix ip_getsockopt for IP_PKTOPTIONS,
commit dd23198e58cd35259dd09e8892bbdb90f1d57748

Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
---
 net/ipv6/ipv6_sockglue.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 9cb191e..71ce053 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -913,7 +913,7 @@ static int ipv6_getsockopt_sticky(struct sock *sk, struct ipv6_txoptions *opt,
 }
 
 static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
-		    char __user *optval, int __user *optlen)
+		    char __user *optval, int __user *optlen, unsigned flags)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	int len;
@@ -962,7 +962,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 
 		msg.msg_control = optval;
 		msg.msg_controllen = len;
-		msg.msg_flags = 0;
+		msg.msg_flags = flags;
 
 		lock_sock(sk);
 		skb = np->pktoptions;
@@ -1222,7 +1222,7 @@ int ipv6_getsockopt(struct sock *sk, int level, int optname,
 	if(level != SOL_IPV6)
 		return -ENOPROTOOPT;
 
-	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen);
+	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen, 0);
 #ifdef CONFIG_NETFILTER
 	/* we need to exclude all possible ENOPROTOOPTs except default case */
 	if (err == -ENOPROTOOPT && optname != IPV6_2292PKTOPTIONS) {
@@ -1264,7 +1264,8 @@ int compat_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		return compat_mc_getsockopt(sk, level, optname, optval, optlen,
 			ipv6_getsockopt);
 
-	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen);
+	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen, 
+				 MSG_CMSG_COMPAT);
 #ifdef CONFIG_NETFILTER
 	/* we need to exclude all possible ENOPROTOOPTs except default case */
 	if (err == -ENOPROTOOPT && optname != IPV6_2292PKTOPTIONS) {
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH] ipv6: Fix ipv6_getsockopt for IPV6_2292PKTOPTIONS
From: David Miller @ 2011-08-19 10:19 UTC (permalink / raw)
  To: dbaluta; +Cc: kuznet, jmorris, kaber, netdev, sdumitru
In-Reply-To: <1313747758-10063-1-git-send-email-dbaluta@ixiacom.com>

From: Daniel Baluta <dbaluta@ixiacom.com>
Date: Fri, 19 Aug 2011 12:55:58 +0300

> IPV6_2292PKTOPTIONS is broken for 32-bit applications running
> in COMPAT mode on 64-bit kernels.
> 
> The same problem was fixed for IPv4 with the patch:
> ipv4: Fix ip_getsockopt for IP_PKTOPTIONS,
> commit dd23198e58cd35259dd09e8892bbdb90f1d57748
> 
> Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>

Applied, but:

> -	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen);
> +	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen, 

This adds trailing whitespace, which GIT catches and warns about,
please be mindful of this in the future.

I took care of it and fixed it up this time.

^ permalink raw reply

* Re: [PATCH v2] Proportional Rate Reduction for TCP.
From: Ilpo Järvinen @ 2011-08-19 10:25 UTC (permalink / raw)
  To: Nandita Dukkipati
  Cc: David S. Miller, netdev, Tom Herbert, Matt Mathis, Yuchung Cheng
In-Reply-To: <1313739212-2315-1-git-send-email-nanditad@google.com>

On Fri, 19 Aug 2011, Nandita Dukkipati wrote:

> +static void tcp_update_cwnd_in_recovery(struct sock *sk, int newly_acked_sacked,
> +					int fast_rexmit, int flag)
> +{
> +	struct tcp_sock *tp = tcp_sk(sk);
> +	int sndcnt = 0;
> +	int delta = tp->snd_ssthresh - tcp_packets_in_flight(tp);
> +
> +	if (tcp_packets_in_flight(tp) > tp->snd_ssthresh) {
> +		if (WARN_ON(!tp->prior_cwnd))
> +			tp->prior_cwnd = 1;

This should still be made larger to avoid problems if it ever will be 
needed.

> +		sndcnt = DIV_ROUND_UP((u64)(tp->prr_delivered *
> +					    tp->snd_ssthresh),
> +				      (u64)tp->prior_cwnd) - tp->prr_out;

I think you should pick one from include/linux/math64.h instead of letting 
gcc to do / operand all by itself. ...Obviosly then the ROUND_UP part 
needs to be done manually (either using the remainder or pre-addition 
like DIV_ROUND_UP does).

-- 
 i.

^ permalink raw reply

* Re: [PATCH v2] Proportional Rate Reduction for TCP.
From: David Miller @ 2011-08-19 10:26 UTC (permalink / raw)
  To: nanditad; +Cc: netdev, therbert, mattmathis, ycheng
In-Reply-To: <1313739212-2315-1-git-send-email-nanditad@google.com>

From: Nandita Dukkipati <nanditad@google.com>
Date: Fri, 19 Aug 2011 00:33:32 -0700

> @@ -2830,9 +2830,14 @@ static int tcp_try_undo_loss(struct sock *sk)
>  static inline void tcp_complete_cwr(struct sock *sk)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
> -	/* Do not moderate cwnd if it's already undone in cwr or recovery */
> -	if (tp->undo_marker && tp->snd_cwnd > tp->snd_ssthresh) {
> -		tp->snd_cwnd = tp->snd_ssthresh;
> +
> +	/* Do not moderate cwnd if it's already undone in cwr or recovery. */
> +	if (tp->undo_marker) {
> +
> +		if (inet_csk(sk)->icsk_ca_state == TCP_CA_CWR)

Please get rid of that empty line before the TCP_CA_CWR case.

> +		sndcnt = DIV_ROUND_UP((u64)(tp->prr_delivered *
> +					    tp->snd_ssthresh),
> +				      (u64)tp->prior_cwnd) - tp->prr_out;

This won't link on 32-bit unless __divdi3 libgcc routine is provided
by the architecture.  To portably do 64-bit division you need to use
do_div() or something based upon it.  Perhaps DIV_ROUND_UP_LL() will
work best in this case.



^ permalink raw reply

* Re: [PATCH net-next v4 af-packet 0/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: David Miller @ 2011-08-19 10:33 UTC (permalink / raw)
  To: loke.chetan; +Cc: netdev
In-Reply-To: <1313116260-1000-1-git-send-email-loke.chetan@gmail.com>

From: Chetan Loke <loke.chetan@gmail.com>
Date: Thu, 11 Aug 2011 22:30:58 -0400

> This patch attempts to:
> 1)Improve network capture visibility by increasing packet density
> 2)Assist in analyzing multiple(aggregated) capture ports.

Just some general comments before I say something about the actual
code.

Claiming loss-less, or almost loss-less, packet capture is something
which is highly dependent upon the hardware and traffic load.

You cannot claim near loss-less packet capture for an i386 with a
10gbit ethernet part going line rate, for example.

So I want you to describe this as what it is, an improvement.  More
specifically it's just a more flexible mmap ring buffer scheme that
reduces the wastage incurred by the fixed buffer size scheme we
have currently.

Next, you need to use a different subject line for the two patches.
Right now you say the same thing in both, and someone scanning the
changesets headers will have no idea what is different about one
change vs. the other.

^ permalink raw reply

* Re: [PATCH net-next v4 af-packet 2/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: David Miller @ 2011-08-19 10:36 UTC (permalink / raw)
  To: loke.chetan; +Cc: netdev
In-Reply-To: <1313116260-1000-3-git-send-email-loke.chetan@gmail.com>

From: Chetan Loke <loke.chetan@gmail.com>
Date: Thu, 11 Aug 2011 22:31:00 -0400

> +struct kbdq_ft_ops {
> +	int num_ops;
> +	void (*ft_ops[2])(void *, void *);
> +};
...
> +	struct kbdq_ft_ops kfops;
 ...
> +static void prb_init_ft_ops(struct kbdq_core *p1,
> +			union tpacket_req_u *req_u)
> +{
> +	p1->kfops.ft_ops[p1->kfops.num_ops++] = prb_fill_vlan_info;
> +
> +	if (req_u->req3.tp_feature_req_word) {
> +		if (req_u->req3.tp_feature_req_word & TP_FT_REQ_FILL_RXHASH)
> +			p1->kfops.ft_ops[p1->kfops.num_ops++] = prb_fill_rxhash;
> +		else
> +			p1->kfops.ft_ops[p1->kfops.num_ops++] =
> +			prb_clear_rxhash;
> +	}
> +}

It is a lot cheaper to just test the flags in-line than do indirect calls.
Indirect calls are very expensive on many cpus.

In fact, since the first op (prb_fill_vlan_info) is unconditional, we eat
the indirect call cost for absolutely no reason at all.

This kfops stuff was not present in your previous changes.  And I'm
going to tell you that if you keep adding things each revision instead
of just fixing the specific items you've received feedback about, a
set of changes this invasive and of this size will never get merged.

Please resist the urge to further tinker with the code, and just
address the feedback we give you.

^ permalink raw reply

* Re: [PATCH net-next] ipv4: one more case for non-local saddr in ICMP
From: David Miller @ 2011-08-19 10:43 UTC (permalink / raw)
  To: ja; +Cc: netdev, herbert
In-Reply-To: <alpine.LFD.2.00.1108151857420.1481@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Mon, 15 Aug 2011 19:21:23 +0300 (EEST)

> 
> 	May be there is one more case that we can avoid using
> non-local source for ICMP errors: xfrm_lookup, num_xfrms = 0 when
> reverse "Flow passes untransformed". Avoid using the input route
> if xfrm_lookup returns same dst.
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
> 
> 	In fact, should we use local IP in all cases when
> sending ICMP? I'm asking it for the following case:
> 
> 	Large packet is forwarded but is rejected with ICMP FRAG
> NEEDED. We usually send ICMP with local saddr instead of the
> original non-local destination. What is the role of
> this reverse check? May be after xfrm_decode_session_reverse
> we should use 'fl4_dec.saddr = fl4->saddr;' so that xfrm_lookup
> works with ICMP from local IP? What is right thing to do here?
> I don't see code that looks in the embedded header...

Well.. this relookup behavior is guided by a special transform state
XFRM_STATE_ICMP that the user must explicitly create IPSEC rules for.

Presumably they are going to add real transforms to such special IPSEC
rules, not create NOP ones with no transforms.  And if they do create
such IPSEC state with no transforms, perhaps the intention is to trigger
to use of the non-local source.

The whole thing revolves around how Herbert envisions people implementing
RFC4301 support using this new XFRM_STATE_ICMP thing.

Right?


^ permalink raw reply

* Re: [PATCH] ipv6: Fix ipv6_getsockopt for IPV6_2292PKTOPTIONS
From: Daniel Baluta @ 2011-08-19 11:23 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, kaber, netdev, sdumitru
In-Reply-To: <20110819.031957.1143398061283788162.davem@davemloft.net>

On Fri, Aug 19, 2011 at 1:19 PM, David Miller <davem@davemloft.net> wrote:
> From: Daniel Baluta <dbaluta@ixiacom.com>
> Date: Fri, 19 Aug 2011 12:55:58 +0300
>
>> IPV6_2292PKTOPTIONS is broken for 32-bit applications running
>> in COMPAT mode on 64-bit kernels.
>>
>> The same problem was fixed for IPv4 with the patch:
>> ipv4: Fix ip_getsockopt for IP_PKTOPTIONS,
>> commit dd23198e58cd35259dd09e8892bbdb90f1d57748
>>
>> Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
>> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
>
> Applied, but:
>
>> -     err = do_ipv6_getsockopt(sk, level, optname, optval, optlen);
>> +     err = do_ipv6_getsockopt(sk, level, optname, optval, optlen,
>
> This adds trailing whitespace, which GIT catches and warns about,
> please be mindful of this in the future.
>
> I took care of it and fixed it up this time.

Thanks David. I will pay more attention to this next time.

Daniel.

^ permalink raw reply

* Traceroute and "ping" sockets: some questions
From: Dmitry Butskoy @ 2011-08-19 11:19 UTC (permalink / raw)
  To: netdev

Hi

I've released new version of the Linux traceroute 2.0.18, which supports 
new (SOCK_DGRAM, IPPROTO_ICMP) sockets, appeared in the kernel 3.0 . 
This way users might perform icmp tracerouting (`-I') without setuid bit 
or cap_net_raw settings of the executable.
(See it at 
http://sourceforge.net/projects/traceroute/files/traceroute/traceroute-2.0.18/)

I would like to ask some questions:

- Currently such "ping" sockets implemented for IPv4 only. Are there any 
plans to implement it for IPv6 as well?

The traceroute-2.0.18 is ready for IPv6 anyway -- just wait for 
appearing it in the kernel without recompile (I hope :) ) -- but I would 
prefer to test it as soon as possible.

- Are there any plans to implement some "rate control" (maybe 
sysctl-configurable too), to restrict unprivileged users to send icmp 
echoes too fast (ie. faster than 200 ms -- the current ping(8) restriction)?


Regards,
Dmitry Butskoy
http://www.fedoraproject.org/wiki/DmitryButskoy

^ permalink raw reply

* Re: Traceroute and "ping" sockets: some questions
From: David Miller @ 2011-08-19 11:38 UTC (permalink / raw)
  To: buc; +Cc: netdev
In-Reply-To: <4E4E46B4.9010909@odu.neva.ru>

From: Dmitry Butskoy <buc@odusz.so-cdu.ru>
Date: Fri, 19 Aug 2011 15:19:16 +0400

> I would like to ask some questions:
> 
> - Currently such "ping" sockets implemented for IPv4 only. Are there any
> - plans to implement it for IPv6 as well?

I think the original authors of the kernel component said they would
work on this, but it hasn't materialized yet.

> - Are there any plans to implement some "rate control" (maybe
> - sysctl-configurable too), to restrict unprivileged users to send icmp
> - echoes too fast (ie. faster than 200 ms -- the current ping(8)
> - restriction)?

Why limit?  He can spam with UDP socket just as easily at any rate
he pleases, and rate limiting is policy issue and there relegated
to netfilter and/or the packet scheduler layer.

^ permalink raw reply

* winner
From: Microsoft @ 2011-08-19  9:07 UTC (permalink / raw)


You have won 500.000 GBP
send your phone number
and address

^ permalink raw reply

* Re: net: rps: support 802.1Q
From: Ben Hutchings @ 2011-08-19 11:54 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Eric Dumazet, Tom Herbert, netdev
In-Reply-To: <1313730353-25379-1-git-send-email-xiaosuo@gmail.com>

On Fri, 2011-08-19 at 13:05 +0800, Changli Gao wrote:
> For the 802.1Q packets, if the NIC doesn't support hw-accel-vlan-rx, RPS
> won't inspect the internal 4 tuples to generate skb->rxhash, so this kind
> of traffic can't get any benefit from RPS.
> 
> This patch adds the support for 802.1Q to RPS.
[...]
> @@ -2565,6 +2566,13 @@ again:
>  		addr2 = (__force u32) ip6->daddr.s6_addr32[3];
>  		nhoff += 40;
>  		break;
> +	case __constant_htons(ETH_P_8021Q):
> +		if (!pskb_may_pull(skb, sizeof(*vlan) + nhoff))
> +			goto done;
> +		vlan = (const struct vlan_hdr *) (skb->data + nhoff);
> +		proto = vlan->h_vlan_encapsulated_proto;
> +		nhoff += sizeof(*vlan);
> +		goto again;
>  	default:
>  		goto done;
>  	}

Should this really be reading an unlimited number of tags?  What if an
attacker starts sending packets full of VLAN tags?  Since this runs
before netfilter, there would be no way to prevent those packets burning
our CPU time.  And if there are legitimately multiple VLAN tags, they
presumably won't all have the 802.1q Ethertype.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: Traceroute and "ping" sockets: some questions
From: Dmitry Butskoy @ 2011-08-19 12:22 UTC (permalink / raw)
  Cc: netdev
In-Reply-To: <20110819.043816.1648681777223816477.davem@davemloft.net>

David Miller wrote:
>> - Are there any plans to implement some "rate control" (maybe
>> - sysctl-configurable too), to restrict unprivileged users to send icmp
>> - echoes too fast (ie. faster than 200 ms -- the current ping(8)
>> - restriction)?
>>      
> Why limit?  He can spam with UDP socket just as easily at any rate
> he pleases,
>    
Yes, but most cases such UDP is "one-way" spam (until services like 
"echo 7/udp" are enabled).
For icmp ping, we normally receive icmp replies, hence it is 
bidirectional crap. Which was not present before.

Besides that ping(8) is normally present in the system even if C 
development is not installed (ie. user cannot build its spam software at 
the host etc...)

Regards,
Dmitry
http://www.fedoraproject.org/wiki/DmitryButskoy

^ permalink raw reply

* Re: [RFC 0/0] Introducing a generic socket offload framework
From: jamal @ 2011-08-19 12:49 UTC (permalink / raw)
  To: San Mehat
  Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
	digitaleric, mikew, miche, maccarro
In-Reply-To: <20110818220756.5C93E5C80B@san.sea.corp.google.com>

On Thu, 2011-08-18 at 15:07 -0700, San Mehat wrote:
> TL;DR
> -----
> In this RFC we propose the introduction of the concept of hardware socket
> offload to the Linux kernel. Patches will accompany this RFC in a few days,
> but we felt we had enough on the design to solicit constructive discussion
> from the community at-large.
> 

[..]

> ALTERNATIVE STRATEGIES
> ----------------------
> 
> An alternative strategy for providing similar functionality involves either
> modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We were
> forced to rule this out due to the complexity (and fragility) involved with
> attempting to maintain a general solution compatible across various
> distributions where platform-libraries differ.

Above should have been in your TL;DR;->

LD_PRELOAD is also horrible because of the granularity of the socket
calls;
Having things in the kernel and specifically tagging socket as needing
this feature is much much more manageable.

Tying things to virtualization may miss the big picture because there
are many other use cases for intercepting socket calls, example:
Samir Bellabes <sam@synack.fr> has been trying to get what he refers to
as "personal firewall" (equivalent to the silly windows firewall) which
prompts the user "ping from blah, do you want to allow a response?"
That requires intercepting send/recv calls, prompt the user in possibly
some pop-up and act based on response. It requires looking at content.
He is trying to use selinux for that interface,
but i think this would be the right abstraction.
I hope you plan to support send/recv.
I also hope you add support for SOCK_RAW (and maybe SOCK_PACKET).

Q: If you want this to be transparent to the apps, who/what is doing
the tagging of SOCK_HWASSIST? clearly not the app if you dont want to
change it.

I like the uri abstraction if it doesnt come at the expense of the
app transparency.

cheers,
jamal

^ permalink raw reply

* [net-next 0/6][pull request] Intel Wired LAN Driver Update
From: Jeff Kirsher @ 2011-08-19 13:11 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo

The following series contains updates to e1000e and ixgbe.

The following are changes since commit ae1511bf769cafeae5ab61aaf9947a16a22cbd10:
  net: rps: support PPPOE session messages
and are available in the git repository at:
  master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (3):
  ixgbe: Refactor transmit map and cleanup routines
  ixgbe: replace reference to CONFIG_FCOE with IXGBE_FCOE
  ixgbe: Cleanup FCOE and VLAN handling in xmit_frame_ring

Amir Hanania (1):
  ixgbe - DDP last user buffer - error to warn

Bruce Allan (2):
  e1000e: convert driver to use extended descriptors
  e1000e: bump driver version number

 drivers/net/ethernet/intel/e1000e/e1000.h       |    3 +-
 drivers/net/ethernet/intel/e1000e/ethtool.c     |    9 +-
 drivers/net/ethernet/intel/e1000e/netdev.c      |  199 +++++----
 drivers/net/ethernet/intel/ixgbe/ixgbe.h        |   27 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c   |   10 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |  554 ++++++++++++----------
 7 files changed, 445 insertions(+), 359 deletions(-)

-- 
1.7.6


^ permalink raw reply

* [net-next 1/6] e1000e: convert driver to use extended descriptors
From: Jeff Kirsher @ 2011-08-19 13:11 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Some features currently not supported by the driver (e.g. RSS) require the
use of extended descriptors, but the driver is setup to only use legacy
descriptors in all modes except for when jumbo frames are enabled on some
parts.  Convert the driver to always use extended descriptors in order to
enable the forthcoming support of these other features.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/e1000.h   |    3 +-
 drivers/net/ethernet/intel/e1000e/ethtool.c |    9 +-
 drivers/net/ethernet/intel/e1000e/netdev.c  |  197 +++++++++++++++------------
 3 files changed, 120 insertions(+), 89 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index 638d175..cbbbff4 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -456,8 +456,9 @@ struct e1000_info {
 
 #define E1000_RX_DESC_PS(R, i)	    \
 	(&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
+#define E1000_RX_DESC_EXT(R, i)	    \
+	(&(((union e1000_rx_desc_extended *)((R).desc))[i]))
 #define E1000_GET_DESC(R, i, type)	(&(((struct type *)((R).desc))[i]))
-#define E1000_RX_DESC(R, i)		E1000_GET_DESC(R, i, e1000_rx_desc)
 #define E1000_TX_DESC(R, i)		E1000_GET_DESC(R, i, e1000_tx_desc)
 #define E1000_CONTEXT_DESC(R, i)	E1000_GET_DESC(R, i, e1000_context_desc)
 
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 06d88f3..8d3ca85 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -1195,7 +1195,7 @@ static int e1000_setup_desc_rings(struct e1000_adapter *adapter)
 		goto err_nomem;
 	}
 
-	rx_ring->size = rx_ring->count * sizeof(struct e1000_rx_desc);
+	rx_ring->size = rx_ring->count * sizeof(union e1000_rx_desc_extended);
 	rx_ring->desc = dma_alloc_coherent(&pdev->dev, rx_ring->size,
 					   &rx_ring->dma, GFP_KERNEL);
 	if (!rx_ring->desc) {
@@ -1220,7 +1220,7 @@ static int e1000_setup_desc_rings(struct e1000_adapter *adapter)
 	ew32(RCTL, rctl);
 
 	for (i = 0; i < rx_ring->count; i++) {
-		struct e1000_rx_desc *rx_desc = E1000_RX_DESC(*rx_ring, i);
+		union e1000_rx_desc_extended *rx_desc;
 		struct sk_buff *skb;
 
 		skb = alloc_skb(2048 + NET_IP_ALIGN, GFP_KERNEL);
@@ -1238,8 +1238,9 @@ static int e1000_setup_desc_rings(struct e1000_adapter *adapter)
 			ret_val = 8;
 			goto err_nomem;
 		}
-		rx_desc->buffer_addr =
-			cpu_to_le64(rx_ring->buffer_info[i].dma);
+		rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
+		rx_desc->read.buffer_addr =
+		    cpu_to_le64(rx_ring->buffer_info[i].dma);
 		memset(skb->data, 0x00, skb->len);
 	}
 
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index d0fdb51..55c3cc1 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -192,7 +192,7 @@ static void e1000e_dump(struct e1000_adapter *adapter)
 	struct e1000_buffer *buffer_info;
 	struct e1000_ring *rx_ring = adapter->rx_ring;
 	union e1000_rx_desc_packet_split *rx_desc_ps;
-	struct e1000_rx_desc *rx_desc;
+	union e1000_rx_desc_extended *rx_desc;
 	struct my_u1 {
 		u64 a;
 		u64 b;
@@ -399,41 +399,70 @@ rx_ring_summary:
 		break;
 	default:
 	case 0:
-		/* Legacy Receive Descriptor Format
+		/* Extended Receive Descriptor (Read) Format
 		 *
-		 * +-----------------------------------------------------+
-		 * |                Buffer Address [63:0]                |
-		 * +-----------------------------------------------------+
-		 * | VLAN Tag | Errors | Status 0 | Packet csum | Length |
-		 * +-----------------------------------------------------+
-		 * 63       48 47    40 39      32 31         16 15      0
+		 *   +-----------------------------------------------------+
+		 * 0 |                Buffer Address [63:0]                |
+		 *   +-----------------------------------------------------+
+		 * 8 |                      Reserved                       |
+		 *   +-----------------------------------------------------+
 		 */
-		printk(KERN_INFO "Rl[desc]     [address 63:0  ] "
-		       "[vl er S cks ln] [bi->dma       ] [bi->skb] "
-		       "<-- Legacy format\n");
-		for (i = 0; rx_ring->desc && (i < rx_ring->count); i++) {
-			rx_desc = E1000_RX_DESC(*rx_ring, i);
+		printk(KERN_INFO "R  [desc]      [buf addr 63:0 ] "
+		       "[reserved 63:0 ] [bi->dma       ] "
+		       "[bi->skb] <-- Ext (Read) format\n");
+		/* Extended Receive Descriptor (Write-Back) Format
+		 *
+		 *   63       48 47    32 31    24 23            4 3        0
+		 *   +------------------------------------------------------+
+		 *   |     RSS Hash      |        |               |         |
+		 * 0 +-------------------+  Rsvd  |   Reserved    | MRQ RSS |
+		 *   | Packet   | IP     |        |               |  Type   |
+		 *   | Checksum | Ident  |        |               |         |
+		 *   +------------------------------------------------------+
+		 * 8 | VLAN Tag | Length | Extended Error | Extended Status |
+		 *   +------------------------------------------------------+
+		 *   63       48 47    32 31            20 19               0
+		 */
+		printk(KERN_INFO "RWB[desc]      [cs ipid    mrq] "
+		       "[vt   ln xe  xs] "
+		       "[bi->skb] <-- Ext (Write-Back) format\n");
+
+		for (i = 0; i < rx_ring->count; i++) {
 			buffer_info = &rx_ring->buffer_info[i];
-			u0 = (struct my_u0 *)rx_desc;
-			printk(KERN_INFO "Rl[0x%03X]    %016llX %016llX "
-			       "%016llX %p", i,
-			       (unsigned long long)le64_to_cpu(u0->a),
-			       (unsigned long long)le64_to_cpu(u0->b),
-			       (unsigned long long)buffer_info->dma,
-			       buffer_info->skb);
+			rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
+			u1 = (struct my_u1 *)rx_desc;
+			staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
+			if (staterr & E1000_RXD_STAT_DD) {
+				/* Descriptor Done */
+				printk(KERN_INFO "RWB[0x%03X]     %016llX "
+				       "%016llX ---------------- %p", i,
+				       (unsigned long long)le64_to_cpu(u1->a),
+				       (unsigned long long)le64_to_cpu(u1->b),
+				       buffer_info->skb);
+			} else {
+				printk(KERN_INFO "R  [0x%03X]     %016llX "
+				       "%016llX %016llX %p", i,
+				       (unsigned long long)le64_to_cpu(u1->a),
+				       (unsigned long long)le64_to_cpu(u1->b),
+				       (unsigned long long)buffer_info->dma,
+				       buffer_info->skb);
+
+				if (netif_msg_pktdata(adapter))
+					print_hex_dump(KERN_INFO, "",
+						       DUMP_PREFIX_ADDRESS, 16,
+						       1,
+						       phys_to_virt
+						       (buffer_info->dma),
+						       adapter->rx_buffer_len,
+						       true);
+			}
+
 			if (i == rx_ring->next_to_use)
 				printk(KERN_CONT " NTU\n");
 			else if (i == rx_ring->next_to_clean)
 				printk(KERN_CONT " NTC\n");
 			else
 				printk(KERN_CONT "\n");
-
-			if (netif_msg_pktdata(adapter))
-				print_hex_dump(KERN_INFO, "",
-					       DUMP_PREFIX_ADDRESS,
-					       16, 1,
-					       phys_to_virt(buffer_info->dma),
-					       adapter->rx_buffer_len, true);
 		}
 	}
 
@@ -519,7 +548,7 @@ static void e1000_rx_checksum(struct e1000_adapter *adapter, u32 status_err,
 }
 
 /**
- * e1000_alloc_rx_buffers - Replace used receive buffers; legacy & extended
+ * e1000_alloc_rx_buffers - Replace used receive buffers
  * @adapter: address of board private structure
  **/
 static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
@@ -528,7 +557,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
 	struct e1000_ring *rx_ring = adapter->rx_ring;
-	struct e1000_rx_desc *rx_desc;
+	union e1000_rx_desc_extended *rx_desc;
 	struct e1000_buffer *buffer_info;
 	struct sk_buff *skb;
 	unsigned int i;
@@ -562,8 +591,8 @@ map_skb:
 			break;
 		}
 
-		rx_desc = E1000_RX_DESC(*rx_ring, i);
-		rx_desc->buffer_addr = cpu_to_le64(buffer_info->dma);
+		rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
+		rx_desc->read.buffer_addr = cpu_to_le64(buffer_info->dma);
 
 		if (unlikely(!(i & (E1000_RX_BUFFER_WRITE - 1)))) {
 			/*
@@ -697,7 +726,7 @@ static void e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 {
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
-	struct e1000_rx_desc *rx_desc;
+	union e1000_rx_desc_extended *rx_desc;
 	struct e1000_ring *rx_ring = adapter->rx_ring;
 	struct e1000_buffer *buffer_info;
 	struct sk_buff *skb;
@@ -738,8 +767,8 @@ check_page:
 			                                PAGE_SIZE,
 							DMA_FROM_DEVICE);
 
-		rx_desc = E1000_RX_DESC(*rx_ring, i);
-		rx_desc->buffer_addr = cpu_to_le64(buffer_info->dma);
+		rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
+		rx_desc->read.buffer_addr = cpu_to_le64(buffer_info->dma);
 
 		if (unlikely(++i == rx_ring->count))
 			i = 0;
@@ -774,28 +803,27 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 	struct pci_dev *pdev = adapter->pdev;
 	struct e1000_hw *hw = &adapter->hw;
 	struct e1000_ring *rx_ring = adapter->rx_ring;
-	struct e1000_rx_desc *rx_desc, *next_rxd;
+	union e1000_rx_desc_extended *rx_desc, *next_rxd;
 	struct e1000_buffer *buffer_info, *next_buffer;
-	u32 length;
+	u32 length, staterr;
 	unsigned int i;
 	int cleaned_count = 0;
 	bool cleaned = 0;
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 
 	i = rx_ring->next_to_clean;
-	rx_desc = E1000_RX_DESC(*rx_ring, i);
+	rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
+	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	buffer_info = &rx_ring->buffer_info[i];
 
-	while (rx_desc->status & E1000_RXD_STAT_DD) {
+	while (staterr & E1000_RXD_STAT_DD) {
 		struct sk_buff *skb;
-		u8 status;
 
 		if (*work_done >= work_to_do)
 			break;
 		(*work_done)++;
 		rmb();	/* read descriptor and rx_buffer_info after status DD */
 
-		status = rx_desc->status;
 		skb = buffer_info->skb;
 		buffer_info->skb = NULL;
 
@@ -804,7 +832,7 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 		i++;
 		if (i == rx_ring->count)
 			i = 0;
-		next_rxd = E1000_RX_DESC(*rx_ring, i);
+		next_rxd = E1000_RX_DESC_EXT(*rx_ring, i);
 		prefetch(next_rxd);
 
 		next_buffer = &rx_ring->buffer_info[i];
@@ -817,7 +845,7 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 				 DMA_FROM_DEVICE);
 		buffer_info->dma = 0;
 
-		length = le16_to_cpu(rx_desc->length);
+		length = le16_to_cpu(rx_desc->wb.upper.length);
 
 		/*
 		 * !EOP means multiple descriptors were used to store a single
@@ -826,7 +854,7 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 		 * next frame that _does_ have the EOP bit set, as it is by
 		 * definition only a frame fragment
 		 */
-		if (unlikely(!(status & E1000_RXD_STAT_EOP)))
+		if (unlikely(!(staterr & E1000_RXD_STAT_EOP)))
 			adapter->flags2 |= FLAG2_IS_DISCARDING;
 
 		if (adapter->flags2 & FLAG2_IS_DISCARDING) {
@@ -834,12 +862,12 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 			e_dbg("Receive packet consumed multiple buffers\n");
 			/* recycle */
 			buffer_info->skb = skb;
-			if (status & E1000_RXD_STAT_EOP)
+			if (staterr & E1000_RXD_STAT_EOP)
 				adapter->flags2 &= ~FLAG2_IS_DISCARDING;
 			goto next_desc;
 		}
 
-		if (rx_desc->errors & E1000_RXD_ERR_FRAME_ERR_MASK) {
+		if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
 			/* recycle */
 			buffer_info->skb = skb;
 			goto next_desc;
@@ -877,15 +905,15 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 		skb_put(skb, length);
 
 		/* Receive Checksum Offload */
-		e1000_rx_checksum(adapter,
-				  (u32)(status) |
-				  ((u32)(rx_desc->errors) << 24),
-				  le16_to_cpu(rx_desc->csum), skb);
+		e1000_rx_checksum(adapter, staterr,
+				  le16_to_cpu(rx_desc->wb.lower.hi_dword.
+					      csum_ip.csum), skb);
 
-		e1000_receive_skb(adapter, netdev, skb,status,rx_desc->special);
+		e1000_receive_skb(adapter, netdev, skb, staterr,
+				  rx_desc->wb.upper.vlan);
 
 next_desc:
-		rx_desc->status = 0;
+		rx_desc->wb.upper.status_error &= cpu_to_le32(~0xFF);
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
@@ -897,6 +925,8 @@ next_desc:
 		/* use prefetched values */
 		rx_desc = next_rxd;
 		buffer_info = next_buffer;
+
+		staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	}
 	rx_ring->next_to_clean = i;
 
@@ -1280,35 +1310,34 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
 	struct e1000_ring *rx_ring = adapter->rx_ring;
-	struct e1000_rx_desc *rx_desc, *next_rxd;
+	union e1000_rx_desc_extended *rx_desc, *next_rxd;
 	struct e1000_buffer *buffer_info, *next_buffer;
-	u32 length;
+	u32 length, staterr;
 	unsigned int i;
 	int cleaned_count = 0;
 	bool cleaned = false;
 	unsigned int total_rx_bytes=0, total_rx_packets=0;
 
 	i = rx_ring->next_to_clean;
-	rx_desc = E1000_RX_DESC(*rx_ring, i);
+	rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
+	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	buffer_info = &rx_ring->buffer_info[i];
 
-	while (rx_desc->status & E1000_RXD_STAT_DD) {
+	while (staterr & E1000_RXD_STAT_DD) {
 		struct sk_buff *skb;
-		u8 status;
 
 		if (*work_done >= work_to_do)
 			break;
 		(*work_done)++;
 		rmb();	/* read descriptor and rx_buffer_info after status DD */
 
-		status = rx_desc->status;
 		skb = buffer_info->skb;
 		buffer_info->skb = NULL;
 
 		++i;
 		if (i == rx_ring->count)
 			i = 0;
-		next_rxd = E1000_RX_DESC(*rx_ring, i);
+		next_rxd = E1000_RX_DESC_EXT(*rx_ring, i);
 		prefetch(next_rxd);
 
 		next_buffer = &rx_ring->buffer_info[i];
@@ -1319,23 +1348,22 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
 			       DMA_FROM_DEVICE);
 		buffer_info->dma = 0;
 
-		length = le16_to_cpu(rx_desc->length);
+		length = le16_to_cpu(rx_desc->wb.upper.length);
 
 		/* errors is only valid for DD + EOP descriptors */
-		if (unlikely((status & E1000_RXD_STAT_EOP) &&
-		    (rx_desc->errors & E1000_RXD_ERR_FRAME_ERR_MASK))) {
-				/* recycle both page and skb */
-				buffer_info->skb = skb;
-				/* an error means any chain goes out the window
-				 * too */
-				if (rx_ring->rx_skb_top)
-					dev_kfree_skb_irq(rx_ring->rx_skb_top);
-				rx_ring->rx_skb_top = NULL;
-				goto next_desc;
+		if (unlikely((staterr & E1000_RXD_STAT_EOP) &&
+			     (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK))) {
+			/* recycle both page and skb */
+			buffer_info->skb = skb;
+			/* an error means any chain goes out the window too */
+			if (rx_ring->rx_skb_top)
+				dev_kfree_skb_irq(rx_ring->rx_skb_top);
+			rx_ring->rx_skb_top = NULL;
+			goto next_desc;
 		}
 
 #define rxtop (rx_ring->rx_skb_top)
-		if (!(status & E1000_RXD_STAT_EOP)) {
+		if (!(staterr & E1000_RXD_STAT_EOP)) {
 			/* this descriptor is only the beginning (or middle) */
 			if (!rxtop) {
 				/* this is the beginning of a chain */
@@ -1390,10 +1418,9 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
 		}
 
 		/* Receive Checksum Offload XXX recompute due to CRC strip? */
-		e1000_rx_checksum(adapter,
-		                  (u32)(status) |
-		                  ((u32)(rx_desc->errors) << 24),
-		                  le16_to_cpu(rx_desc->csum), skb);
+		e1000_rx_checksum(adapter, staterr,
+				  le16_to_cpu(rx_desc->wb.lower.hi_dword.
+					      csum_ip.csum), skb);
 
 		/* probably a little skewed due to removing CRC */
 		total_rx_bytes += skb->len;
@@ -1406,11 +1433,11 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
 			goto next_desc;
 		}
 
-		e1000_receive_skb(adapter, netdev, skb, status,
-		                  rx_desc->special);
+		e1000_receive_skb(adapter, netdev, skb, staterr,
+				  rx_desc->wb.upper.vlan);
 
 next_desc:
-		rx_desc->status = 0;
+		rx_desc->wb.upper.status_error &= cpu_to_le32(~0xFF);
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (unlikely(cleaned_count >= E1000_RX_BUFFER_WRITE)) {
@@ -1422,6 +1449,8 @@ next_desc:
 		/* use prefetched values */
 		rx_desc = next_rxd;
 		buffer_info = next_buffer;
+
+		staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	}
 	rx_ring->next_to_clean = i;
 
@@ -2820,6 +2849,10 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 		break;
 	}
 
+	/* Enable Extended Status in all Receive Descriptors */
+	rfctl = er32(RFCTL);
+	rfctl |= E1000_RFCTL_EXTEN;
+
 	/*
 	 * 82571 and greater support packet-split where the protocol
 	 * header is placed in skb->data and the packet data is
@@ -2845,9 +2878,6 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 	if (adapter->rx_ps_pages) {
 		u32 psrctl = 0;
 
-		/* Configure extra packet-split registers */
-		rfctl = er32(RFCTL);
-		rfctl |= E1000_RFCTL_EXTEN;
 		/*
 		 * disable packet split support for IPv6 extension headers,
 		 * because some malformed IPv6 headers can hang the Rx
@@ -2855,8 +2885,6 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 		rfctl |= (E1000_RFCTL_IPV6_EX_DIS |
 			  E1000_RFCTL_NEW_IPV6_EXT_DIS);
 
-		ew32(RFCTL, rfctl);
-
 		/* Enable Packet split descriptors */
 		rctl |= E1000_RCTL_DTYP_PS;
 
@@ -2879,6 +2907,7 @@ static void e1000_setup_rctl(struct e1000_adapter *adapter)
 		ew32(PSRCTL, psrctl);
 	}
 
+	ew32(RFCTL, rfctl);
 	ew32(RCTL, rctl);
 	/* just started the receive unit, no need to restart */
 	adapter->flags &= ~FLAG_RX_RESTART_NOW;
@@ -2904,11 +2933,11 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 		adapter->clean_rx = e1000_clean_rx_irq_ps;
 		adapter->alloc_rx_buf = e1000_alloc_rx_buffers_ps;
 	} else if (adapter->netdev->mtu > ETH_FRAME_LEN + ETH_FCS_LEN) {
-		rdlen = rx_ring->count * sizeof(struct e1000_rx_desc);
+		rdlen = rx_ring->count * sizeof(union e1000_rx_desc_extended);
 		adapter->clean_rx = e1000_clean_jumbo_rx_irq;
 		adapter->alloc_rx_buf = e1000_alloc_jumbo_rx_buffers;
 	} else {
-		rdlen = rx_ring->count * sizeof(struct e1000_rx_desc);
+		rdlen = rx_ring->count * sizeof(union e1000_rx_desc_extended);
 		adapter->clean_rx = e1000_clean_rx_irq;
 		adapter->alloc_rx_buf = e1000_alloc_rx_buffers;
 	}
-- 
1.7.6


^ permalink raw reply related

* [net-next 2/6] e1000e: bump driver version number
From: Jeff Kirsher @ 2011-08-19 13:11 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 55c3cc1..6ea342e 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -56,7 +56,7 @@
 
 #define DRV_EXTRAVERSION "-k"
 
-#define DRV_VERSION "1.3.16" DRV_EXTRAVERSION
+#define DRV_VERSION "1.5.1" DRV_EXTRAVERSION
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;
 
-- 
1.7.6


^ permalink raw reply related

* [net-next 3/6] ixgbe - DDP last user buffer - error to warn
From: Jeff Kirsher @ 2011-08-19 13:11 UTC (permalink / raw)
  To: davem; +Cc: Amir Hanania, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Amir Hanania <amir.hanania@intel.com>

Change the error message in the last DDP user buffer to warn_once

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index 824edae..e9b992f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -241,10 +241,12 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
 	 */
 	if (lastsize == bufflen) {
 		if (j >= IXGBE_BUFFCNT_MAX) {
-			e_err(drv, "xid=%x:%d,%d,%d:addr=%llx "
-				"not enough user buffers. We need an extra "
-				"buffer because lastsize is bufflen.\n",
-				xid, i, j, dmacount, (u64)addr);
+			printk_once("Will NOT use DDP since there are not "
+				    "enough user buffers. We need an  extra "
+				    "buffer because lastsize is bufflen. "
+				    "xid=%x:%d,%d,%d:addr=%llx\n",
+				    xid, i, j, dmacount, (u64)addr);
+
 			goto out_noddp_free;
 		}
 
-- 
1.7.6


^ permalink raw reply related

* [net-next 5/6] ixgbe: replace reference to CONFIG_FCOE with IXGBE_FCOE
From: Jeff Kirsher @ 2011-08-19 13:11 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

CONFIG_FCOE is not the correct define to check since it is possible for it
to be CONFIG_FCOE_MODULE, as such the reference to it should be replaced
with IXGBE_FCOE.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
index 0ace6ce..da6d53e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
@@ -414,7 +414,7 @@ static u8 ixgbe_dcbnl_set_all(struct net_device *netdev)
 		u8 prio_tc[MAX_TRAFFIC_CLASS] = {0, 1, 2, 3, 4, 5, 6, 7};
 		int max_frame = adapter->netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
 
-#ifdef CONFIG_FCOE
+#ifdef IXGBE_FCOE
 		if (adapter->netdev->features & NETIF_F_FCOE_MTU)
 			max_frame = max(max_frame, IXGBE_FCOE_JUMBO_FRAME_SIZE);
 #endif
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d9c1625..9a2d2d4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3615,7 +3615,7 @@ static void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
 
 	/* reconfigure the hardware */
 	if (adapter->dcbx_cap & DCB_CAP_DCBX_VER_CEE) {
-#ifdef CONFIG_FCOE
+#ifdef IXGBE_FCOE
 		if (adapter->netdev->features & NETIF_F_FCOE_MTU)
 			max_frame = max(max_frame, IXGBE_FCOE_JUMBO_FRAME_SIZE);
 #endif
-- 
1.7.6


^ permalink raw reply related

* [net-next 4/6] ixgbe: Refactor transmit map and cleanup routines
From: Jeff Kirsher @ 2011-08-19 13:11 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch implements a partial refactor of the TX map/queue and cleanup
routines.  It merges the map and queue functionality and as a result
improves the transmit performance by avoiding unnecessary reads from memory.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   13 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  450 +++++++++++++------------
 2 files changed, 247 insertions(+), 216 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index e04a8e4..a12fd9f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -96,6 +96,7 @@
 #define IXGBE_TX_FLAGS_IPV4		(u32)(1 << 3)
 #define IXGBE_TX_FLAGS_FCOE		(u32)(1 << 4)
 #define IXGBE_TX_FLAGS_FSO		(u32)(1 << 5)
+#define IXGBE_TX_FLAGS_MAPPED_AS_PAGE	(u32)(1 << 6)
 #define IXGBE_TX_FLAGS_VLAN_MASK	0xffff0000
 #define IXGBE_TX_FLAGS_VLAN_PRIO_MASK   0x0000e000
 #define IXGBE_TX_FLAGS_VLAN_SHIFT	16
@@ -141,14 +142,14 @@ struct vf_macvlans {
 /* wrapper around a pointer to a socket buffer,
  * so a DMA handle can be stored along with the buffer */
 struct ixgbe_tx_buffer {
-	struct sk_buff *skb;
-	dma_addr_t dma;
+	union ixgbe_adv_tx_desc *next_to_watch;
 	unsigned long time_stamp;
-	u16 length;
-	u16 next_to_watch;
-	unsigned int bytecount;
+	dma_addr_t dma;
+	u32 length;
+	u32 tx_flags;
+	struct sk_buff *skb;
+	u32 bytecount;
 	u16 gso_segs;
-	u8 mapped_as_page;
 };
 
 struct ixgbe_rx_buffer {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index faa83ce..d9c1625 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -385,7 +385,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 		tx_ring = adapter->tx_ring[n];
 		tx_buffer_info =
 			&tx_ring->tx_buffer_info[tx_ring->next_to_clean];
-		pr_info(" %5d %5X %5X %016llX %04X %3X %016llX\n",
+		pr_info(" %5d %5X %5X %016llX %04X %p %016llX\n",
 			   n, tx_ring->next_to_use, tx_ring->next_to_clean,
 			   (u64)tx_buffer_info->dma,
 			   tx_buffer_info->length,
@@ -424,7 +424,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 			tx_buffer_info = &tx_ring->tx_buffer_info[i];
 			u0 = (struct my_u0 *)tx_desc;
 			pr_info("T [0x%03X]    %016llX %016llX %016llX"
-				" %04X  %3X %016llX %p", i,
+				" %04X  %p %016llX %p", i,
 				le64_to_cpu(u0->a),
 				le64_to_cpu(u0->b),
 				(u64)tx_buffer_info->dma,
@@ -643,27 +643,31 @@ static inline void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
 	}
 }
 
-void ixgbe_unmap_and_free_tx_resource(struct ixgbe_ring *tx_ring,
-				      struct ixgbe_tx_buffer *tx_buffer_info)
+static inline void ixgbe_unmap_tx_resource(struct ixgbe_ring *ring,
+					   struct ixgbe_tx_buffer *tx_buffer)
 {
-	if (tx_buffer_info->dma) {
-		if (tx_buffer_info->mapped_as_page)
-			dma_unmap_page(tx_ring->dev,
-				       tx_buffer_info->dma,
-				       tx_buffer_info->length,
-				       DMA_TO_DEVICE);
+	if (tx_buffer->dma) {
+		if (tx_buffer->tx_flags & IXGBE_TX_FLAGS_MAPPED_AS_PAGE)
+			dma_unmap_page(ring->dev,
+			               tx_buffer->dma,
+			               tx_buffer->length,
+			               DMA_TO_DEVICE);
 		else
-			dma_unmap_single(tx_ring->dev,
-					 tx_buffer_info->dma,
-					 tx_buffer_info->length,
-					 DMA_TO_DEVICE);
-		tx_buffer_info->dma = 0;
+			dma_unmap_single(ring->dev,
+			                 tx_buffer->dma,
+			                 tx_buffer->length,
+			                 DMA_TO_DEVICE);
 	}
-	if (tx_buffer_info->skb) {
+	tx_buffer->dma = 0;
+}
+
+void ixgbe_unmap_and_free_tx_resource(struct ixgbe_ring *tx_ring,
+				      struct ixgbe_tx_buffer *tx_buffer_info)
+{
+	ixgbe_unmap_tx_resource(tx_ring, tx_buffer_info);
+	if (tx_buffer_info->skb)
 		dev_kfree_skb_any(tx_buffer_info->skb);
-		tx_buffer_info->skb = NULL;
-	}
-	tx_buffer_info->time_stamp = 0;
+	tx_buffer_info->skb = NULL;
 	/* tx_buffer_info must be completely set up in the transmit path */
 }
 
@@ -797,56 +801,72 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			       struct ixgbe_ring *tx_ring)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	union ixgbe_adv_tx_desc *tx_desc, *eop_desc;
-	struct ixgbe_tx_buffer *tx_buffer_info;
+	struct ixgbe_tx_buffer *tx_buffer;
+	union ixgbe_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
-	u16 i, eop, count = 0;
+	u16 i = tx_ring->next_to_clean;
+	u16 count;
 
-	i = tx_ring->next_to_clean;
-	eop = tx_ring->tx_buffer_info[i].next_to_watch;
-	eop_desc = IXGBE_TX_DESC_ADV(tx_ring, eop);
+	tx_buffer = &tx_ring->tx_buffer_info[i];
+	tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
 
-	while ((eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)) &&
-	       (count < q_vector->tx.work_limit)) {
-		bool cleaned = false;
-		rmb(); /* read buffer_info after eop_desc */
-		for ( ; !cleaned; count++) {
-			tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
-			tx_buffer_info = &tx_ring->tx_buffer_info[i];
+	for (count = 0; count < q_vector->tx.work_limit; count++) {
+		union ixgbe_adv_tx_desc *eop_desc = tx_buffer->next_to_watch;
+
+		/* if next_to_watch is not set then there is no work pending */
+		if (!eop_desc)
+			break;
+
+		/* if DD is not set pending work has not been completed */
+		if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
+			break;
+
+		/* count the packet as being completed */
+		tx_ring->tx_stats.completed++;
+
+		/* clear next_to_watch to prevent false hangs */
+		tx_buffer->next_to_watch = NULL;
 
+		/* prevent any other reads prior to eop_desc being verified */
+		rmb();
+
+		do {
+			ixgbe_unmap_tx_resource(tx_ring, tx_buffer);
 			tx_desc->wb.status = 0;
-			cleaned = (i == eop);
+			if (likely(tx_desc == eop_desc)) {
+				eop_desc = NULL;
+				dev_kfree_skb_any(tx_buffer->skb);
+				tx_buffer->skb = NULL;
+
+				total_bytes += tx_buffer->bytecount;
+				total_packets += tx_buffer->gso_segs;
+			}
 
+			tx_buffer++;
+			tx_desc++;
 			i++;
-			if (i == tx_ring->count)
+			if (unlikely(i == tx_ring->count)) {
 				i = 0;
 
-			if (cleaned && tx_buffer_info->skb) {
-				total_bytes += tx_buffer_info->bytecount;
-				total_packets += tx_buffer_info->gso_segs;
+				tx_buffer = tx_ring->tx_buffer_info;
+				tx_desc = IXGBE_TX_DESC_ADV(tx_ring, 0);
 			}
 
-			ixgbe_unmap_and_free_tx_resource(tx_ring,
-							 tx_buffer_info);
-		}
-
-		tx_ring->tx_stats.completed++;
-		eop = tx_ring->tx_buffer_info[i].next_to_watch;
-		eop_desc = IXGBE_TX_DESC_ADV(tx_ring, eop);
+		} while (eop_desc);
 	}
 
 	tx_ring->next_to_clean = i;
+	u64_stats_update_begin(&tx_ring->syncp);
 	tx_ring->stats.bytes += total_bytes;
 	tx_ring->stats.packets += total_packets;
-	u64_stats_update_begin(&tx_ring->syncp);
+	u64_stats_update_end(&tx_ring->syncp);
 	q_vector->tx.total_bytes += total_bytes;
 	q_vector->tx.total_packets += total_packets;
-	u64_stats_update_end(&tx_ring->syncp);
 
 	if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
 		/* schedule immediate reset if we believe we hung */
 		struct ixgbe_hw *hw = &adapter->hw;
-		tx_desc = IXGBE_TX_DESC_ADV(tx_ring, eop);
+		tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
 		e_err(drv, "Detected Tx Unit Hang\n"
 			"  Tx Queue             <%d>\n"
 			"  TDH, TDT             <%x>, <%x>\n"
@@ -858,8 +878,8 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			tx_ring->queue_index,
 			IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
 			IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
-			tx_ring->next_to_use, eop,
-			tx_ring->tx_buffer_info[eop].time_stamp, jiffies);
+			tx_ring->next_to_use, i,
+			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
 		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
 
@@ -6406,185 +6426,179 @@ static bool ixgbe_tx_csum(struct ixgbe_ring *tx_ring,
 	return (skb->ip_summed == CHECKSUM_PARTIAL);
 }
 
-static int ixgbe_tx_map(struct ixgbe_adapter *adapter,
-			struct ixgbe_ring *tx_ring,
-			struct sk_buff *skb, u32 tx_flags,
-			unsigned int first, const u8 hdr_len)
+static __le32 ixgbe_tx_cmd_type(u32 tx_flags)
 {
-	struct device *dev = tx_ring->dev;
-	struct ixgbe_tx_buffer *tx_buffer_info;
-	unsigned int len;
-	unsigned int total = skb->len;
-	unsigned int offset = 0, size, count = 0;
-	unsigned int nr_frags = skb_shinfo(skb)->nr_frags;
-	unsigned int f;
-	unsigned int bytecount = skb->len;
-	u16 gso_segs = 1;
-	u16 i;
+	/* set type for advanced descriptor with frame checksum insertion */
+	__le32 cmd_type = cpu_to_le32(IXGBE_ADVTXD_DTYP_DATA |
+				      IXGBE_ADVTXD_DCMD_IFCS |
+				      IXGBE_ADVTXD_DCMD_DEXT);
 
-	i = tx_ring->next_to_use;
+	/* set HW vlan bit if vlan is present */
+	if (tx_flags & IXGBE_TX_FLAGS_VLAN)
+		cmd_type |= cpu_to_le32(IXGBE_ADVTXD_DCMD_VLE);
 
-	if (tx_flags & IXGBE_TX_FLAGS_FCOE)
-		/* excluding fcoe_crc_eof for FCoE */
-		total -= sizeof(struct fcoe_crc_eof);
+	/* set segmentation enable bits for TSO/FSO */
+#ifdef IXGBE_FCOE
+	if ((tx_flags & IXGBE_TX_FLAGS_TSO) || (tx_flags & IXGBE_TX_FLAGS_FSO))
+#else
+	if (tx_flags & IXGBE_TX_FLAGS_TSO)
+#endif
+		cmd_type |= cpu_to_le32(IXGBE_ADVTXD_DCMD_TSE);
 
-	len = min(skb_headlen(skb), total);
-	while (len) {
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		size = min(len, (uint)IXGBE_MAX_DATA_PER_TXD);
-
-		tx_buffer_info->length = size;
-		tx_buffer_info->mapped_as_page = false;
-		tx_buffer_info->dma = dma_map_single(dev,
-						     skb->data + offset,
-						     size, DMA_TO_DEVICE);
-		if (dma_mapping_error(dev, tx_buffer_info->dma))
-			goto dma_error;
-		tx_buffer_info->time_stamp = jiffies;
-		tx_buffer_info->next_to_watch = i;
+	return cmd_type;
+}
 
-		len -= size;
-		total -= size;
-		offset += size;
-		count++;
+static __le32 ixgbe_tx_olinfo_status(u32 tx_flags, unsigned int paylen)
+{
+	__le32 olinfo_status =
+		cpu_to_le32(paylen << IXGBE_ADVTXD_PAYLEN_SHIFT);
 
-		if (len) {
-			i++;
-			if (i == tx_ring->count)
-				i = 0;
-		}
+	if (tx_flags & IXGBE_TX_FLAGS_TSO) {
+		olinfo_status |= cpu_to_le32(IXGBE_ADVTXD_POPTS_TXSM |
+					    (1 << IXGBE_ADVTXD_IDX_SHIFT));
+		/* enble IPv4 checksum for TSO */
+		if (tx_flags & IXGBE_TX_FLAGS_IPV4)
+			olinfo_status |= cpu_to_le32(IXGBE_ADVTXD_POPTS_IXSM);
 	}
 
-	for (f = 0; f < nr_frags; f++) {
-		struct skb_frag_struct *frag;
+	/* enable L4 checksum for TSO and TX checksum offload */
+	if (tx_flags & IXGBE_TX_FLAGS_CSUM)
+		olinfo_status |= cpu_to_le32(IXGBE_ADVTXD_POPTS_TXSM);
 
-		frag = &skb_shinfo(skb)->frags[f];
-		len = min((unsigned int)frag->size, total);
-		offset = frag->page_offset;
+#ifdef IXGBE_FCOE
+	/* use index 1 context for FCOE/FSO */
+	if (tx_flags & IXGBE_TX_FLAGS_FCOE)
+		olinfo_status |= cpu_to_le32(IXGBE_ADVTXD_CC |
+					    (1 << IXGBE_ADVTXD_IDX_SHIFT));
 
-		while (len) {
-			i++;
-			if (i == tx_ring->count)
-				i = 0;
+#endif
+	return olinfo_status;
+}
 
-			tx_buffer_info = &tx_ring->tx_buffer_info[i];
-			size = min(len, (uint)IXGBE_MAX_DATA_PER_TXD);
-
-			tx_buffer_info->length = size;
-			tx_buffer_info->dma = dma_map_page(dev,
-							   frag->page,
-							   offset, size,
-							   DMA_TO_DEVICE);
-			tx_buffer_info->mapped_as_page = true;
-			if (dma_mapping_error(dev, tx_buffer_info->dma))
-				goto dma_error;
-			tx_buffer_info->time_stamp = jiffies;
-			tx_buffer_info->next_to_watch = i;
-
-			len -= size;
-			total -= size;
-			offset += size;
-			count++;
+#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
+		       IXGBE_TXD_CMD_RS)
+
+static void ixgbe_tx_map(struct ixgbe_ring *tx_ring,
+			 struct sk_buff *skb,
+			 struct ixgbe_tx_buffer *first,
+			 u32 tx_flags,
+			 const u8 hdr_len)
+{
+	struct device *dev = tx_ring->dev;
+	struct ixgbe_tx_buffer *tx_buffer_info;
+	union ixgbe_adv_tx_desc *tx_desc;
+	dma_addr_t dma;
+	__le32 cmd_type, olinfo_status;
+	struct skb_frag_struct *frag;
+	unsigned int f = 0;
+	unsigned int data_len = skb->data_len;
+	unsigned int size = skb_headlen(skb);
+	u32 offset = 0;
+	u32 paylen = skb->len - hdr_len;
+	u16 i = tx_ring->next_to_use;
+	u16 gso_segs;
+
+#ifdef IXGBE_FCOE
+	if (tx_flags & IXGBE_TX_FLAGS_FCOE) {
+		if (data_len >= sizeof(struct fcoe_crc_eof)) {
+			data_len -= sizeof(struct fcoe_crc_eof);
+		} else {
+			size -= sizeof(struct fcoe_crc_eof) - data_len;
+			data_len = 0;
 		}
-		if (total == 0)
-			break;
 	}
 
-	if (tx_flags & IXGBE_TX_FLAGS_TSO)
-		gso_segs = skb_shinfo(skb)->gso_segs;
-#ifdef IXGBE_FCOE
-	/* adjust for FCoE Sequence Offload */
-	else if (tx_flags & IXGBE_TX_FLAGS_FSO)
-		gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
-					skb_shinfo(skb)->gso_size);
-#endif /* IXGBE_FCOE */
-	bytecount += (gso_segs - 1) * hdr_len;
+#endif
+	dma = dma_map_single(dev, skb->data, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, dma))
+		goto dma_error;
 
-	/* multiply data chunks by size of headers */
-	tx_ring->tx_buffer_info[i].bytecount = bytecount;
-	tx_ring->tx_buffer_info[i].gso_segs = gso_segs;
-	tx_ring->tx_buffer_info[i].skb = skb;
-	tx_ring->tx_buffer_info[first].next_to_watch = i;
+	cmd_type = ixgbe_tx_cmd_type(tx_flags);
+	olinfo_status = ixgbe_tx_olinfo_status(tx_flags, paylen);
 
-	return count;
+	tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
 
-dma_error:
-	e_dev_err("TX DMA map failed\n");
+	for (;;) {
+		while (size > IXGBE_MAX_DATA_PER_TXD) {
+			tx_desc->read.buffer_addr = cpu_to_le64(dma + offset);
+			tx_desc->read.cmd_type_len =
+				cmd_type | cpu_to_le32(IXGBE_MAX_DATA_PER_TXD);
+			tx_desc->read.olinfo_status = olinfo_status;
 
-	/* clear timestamp and dma mappings for failed tx_buffer_info map */
-	tx_buffer_info->dma = 0;
-	tx_buffer_info->time_stamp = 0;
-	tx_buffer_info->next_to_watch = 0;
-	if (count)
-		count--;
+			offset += IXGBE_MAX_DATA_PER_TXD;
+			size -= IXGBE_MAX_DATA_PER_TXD;
+
+			tx_desc++;
+			i++;
+			if (i == tx_ring->count) {
+				tx_desc = IXGBE_TX_DESC_ADV(tx_ring, 0);
+				i = 0;
+			}
+		}
 
-	/* clear timestamp and dma mappings for remaining portion of packet */
-	while (count--) {
-		if (i == 0)
-			i += tx_ring->count;
-		i--;
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		ixgbe_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
-	}
+		tx_buffer_info->length = offset + size;
+		tx_buffer_info->tx_flags = tx_flags;
+		tx_buffer_info->dma = dma;
 
-	return 0;
-}
+		tx_desc->read.buffer_addr = cpu_to_le64(dma + offset);
+		tx_desc->read.cmd_type_len = cmd_type | cpu_to_le32(size);
+		tx_desc->read.olinfo_status = olinfo_status;
 
-static void ixgbe_tx_queue(struct ixgbe_ring *tx_ring,
-			   int tx_flags, int count, u32 paylen, u8 hdr_len)
-{
-	union ixgbe_adv_tx_desc *tx_desc = NULL;
-	struct ixgbe_tx_buffer *tx_buffer_info;
-	u32 olinfo_status = 0, cmd_type_len = 0;
-	unsigned int i;
-	u32 txd_cmd = IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS | IXGBE_TXD_CMD_IFCS;
-
-	cmd_type_len |= IXGBE_ADVTXD_DTYP_DATA;
+		if (!data_len)
+			break;
 
-	cmd_type_len |= IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT;
+		frag = &skb_shinfo(skb)->frags[f];
+#ifdef IXGBE_FCOE
+		size = min_t(unsigned int, data_len, frag->size);
+#else
+		size = frag->size;
+#endif
+		data_len -= size;
+		f++;
 
-	if (tx_flags & IXGBE_TX_FLAGS_VLAN)
-		cmd_type_len |= IXGBE_ADVTXD_DCMD_VLE;
+		offset = 0;
+		tx_flags |= IXGBE_TX_FLAGS_MAPPED_AS_PAGE;
 
-	if (tx_flags & IXGBE_TX_FLAGS_TSO) {
-		cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE;
+		dma = dma_map_page(dev, frag->page, frag->page_offset,
+				   size, DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, dma))
+			goto dma_error;
 
-		olinfo_status |= IXGBE_TXD_POPTS_TXSM <<
-				 IXGBE_ADVTXD_POPTS_SHIFT;
+		tx_desc++;
+		i++;
+		if (i == tx_ring->count) {
+			tx_desc = IXGBE_TX_DESC_ADV(tx_ring, 0);
+			i = 0;
+		}
+	}
 
-		/* use index 1 context for tso */
-		olinfo_status |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
-		if (tx_flags & IXGBE_TX_FLAGS_IPV4)
-			olinfo_status |= IXGBE_TXD_POPTS_IXSM <<
-					 IXGBE_ADVTXD_POPTS_SHIFT;
+	tx_desc->read.cmd_type_len |= cpu_to_le32(IXGBE_TXD_CMD);
 
-	} else if (tx_flags & IXGBE_TX_FLAGS_CSUM)
-		olinfo_status |= IXGBE_TXD_POPTS_TXSM <<
-				 IXGBE_ADVTXD_POPTS_SHIFT;
+	i++;
+	if (i == tx_ring->count)
+		i = 0;
 
-	if (tx_flags & IXGBE_TX_FLAGS_FCOE) {
-		olinfo_status |= IXGBE_ADVTXD_CC;
-		olinfo_status |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
-		if (tx_flags & IXGBE_TX_FLAGS_FSO)
-			cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE;
-	}
+	tx_ring->next_to_use = i;
 
-	olinfo_status |= ((paylen - hdr_len) << IXGBE_ADVTXD_PAYLEN_SHIFT);
+	if (tx_flags & IXGBE_TX_FLAGS_TSO)
+		gso_segs = skb_shinfo(skb)->gso_segs;
+#ifdef IXGBE_FCOE
+	/* adjust for FCoE Sequence Offload */
+	else if (tx_flags & IXGBE_TX_FLAGS_FSO)
+		gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
+					skb_shinfo(skb)->gso_size);
+#endif /* IXGBE_FCOE */
+	else
+		gso_segs = 1;
 
-	i = tx_ring->next_to_use;
-	while (count--) {
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
-		tx_desc->read.buffer_addr = cpu_to_le64(tx_buffer_info->dma);
-		tx_desc->read.cmd_type_len =
-			cpu_to_le32(cmd_type_len | tx_buffer_info->length);
-		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
-		i++;
-		if (i == tx_ring->count)
-			i = 0;
-	}
+	/* multiply data chunks by size of headers */
+	tx_buffer_info->bytecount = paylen + (gso_segs * hdr_len);
+	tx_buffer_info->gso_segs = gso_segs;
+	tx_buffer_info->skb = skb;
 
-	tx_desc->read.cmd_type_len |= cpu_to_le32(txd_cmd);
+	/* set the timestamp */
+	first->time_stamp = jiffies;
 
 	/*
 	 * Force memory writes to complete before letting h/w
@@ -6594,8 +6608,30 @@ static void ixgbe_tx_queue(struct ixgbe_ring *tx_ring,
 	 */
 	wmb();
 
-	tx_ring->next_to_use = i;
+	/* set next_to_watch value indicating a packet is present */
+	first->next_to_watch = tx_desc;
+
+	/* notify HW of packet */
 	writel(i, tx_ring->tail);
+
+	return;
+dma_error:
+	dev_err(dev, "TX DMA map failed\n");
+
+	/* clear dma mappings for failed tx_buffer_info map */
+	for (;;) {
+		tx_buffer_info = &tx_ring->tx_buffer_info[i];
+		ixgbe_unmap_tx_resource(tx_ring, tx_buffer_info);
+		if (tx_buffer_info == first)
+			break;
+		if (i == 0)
+			i = tx_ring->count;
+		i--;
+	}
+
+	dev_kfree_skb_any(skb);
+
+	tx_ring->next_to_use = i;
 }
 
 static void ixgbe_atr(struct ixgbe_ring *ring, struct sk_buff *skb,
@@ -6742,12 +6778,12 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_adapter *adapter,
 			  struct ixgbe_ring *tx_ring)
 {
+	struct ixgbe_tx_buffer *first;
 	int tso;
-	u32  tx_flags = 0;
+	u32 tx_flags = 0;
 #if PAGE_SIZE > IXGBE_MAX_DATA_PER_TXD
 	unsigned short f;
 #endif
-	u16 first;
 	u16 count = TXD_USE_COUNT(skb_headlen(skb));
 	__be16 protocol;
 	u8 hdr_len = 0;
@@ -6796,7 +6832,7 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 
 #endif
 	/* record the location of the first descriptor for this packet */
-	first = tx_ring->next_to_use;
+	first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
 
 	if (tx_flags & IXGBE_TX_FLAGS_FCOE) {
 #ifdef IXGBE_FCOE
@@ -6817,22 +6853,16 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			tx_flags |= IXGBE_TX_FLAGS_TSO;
 		else if (ixgbe_tx_csum(tx_ring, skb, tx_flags, protocol))
 			tx_flags |= IXGBE_TX_FLAGS_CSUM;
-	}
 
-	count = ixgbe_tx_map(adapter, tx_ring, skb, tx_flags, first, hdr_len);
-	if (count) {
 		/* add the ATR filter if ATR is on */
 		if (test_bit(__IXGBE_TX_FDIR_INIT_DONE, &tx_ring->state))
 			ixgbe_atr(tx_ring, skb, tx_flags, protocol);
-		ixgbe_tx_queue(tx_ring, tx_flags, count, skb->len, hdr_len);
-		ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
-
-	} else {
-		tx_ring->tx_buffer_info[first].time_stamp = 0;
-		tx_ring->next_to_use = first;
-		goto out_drop;
 	}
 
+	ixgbe_tx_map(tx_ring, skb, first, tx_flags, hdr_len);
+
+	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
+
 	return NETDEV_TX_OK;
 
 out_drop:
-- 
1.7.6


^ permalink raw reply related

* [net-next 6/6] ixgbe: Cleanup FCOE and VLAN handling in xmit_frame_ring
From: Jeff Kirsher @ 2011-08-19 13:11 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is meant to further cleanup the transmit path by streamlining
some of the VLAN and FCOE/DCB tasks in the transmit path.  In addition it
adds code for support software VLANs in the event that they are used in
conjunction with DCB and/or FCOE.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   16 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  108 +++++++++++++++----------
 2 files changed, 73 insertions(+), 51 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index a12fd9f..378ce46 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -91,14 +91,16 @@
 #define IXGBE_RX_BUFFER_WRITE	16	/* Must be power of 2 */
 
 #define IXGBE_TX_FLAGS_CSUM		(u32)(1)
-#define IXGBE_TX_FLAGS_VLAN		(u32)(1 << 1)
-#define IXGBE_TX_FLAGS_TSO		(u32)(1 << 2)
-#define IXGBE_TX_FLAGS_IPV4		(u32)(1 << 3)
-#define IXGBE_TX_FLAGS_FCOE		(u32)(1 << 4)
-#define IXGBE_TX_FLAGS_FSO		(u32)(1 << 5)
-#define IXGBE_TX_FLAGS_MAPPED_AS_PAGE	(u32)(1 << 6)
+#define IXGBE_TX_FLAGS_HW_VLAN		(u32)(1 << 1)
+#define IXGBE_TX_FLAGS_SW_VLAN		(u32)(1 << 2)
+#define IXGBE_TX_FLAGS_TSO		(u32)(1 << 3)
+#define IXGBE_TX_FLAGS_IPV4		(u32)(1 << 4)
+#define IXGBE_TX_FLAGS_FCOE		(u32)(1 << 5)
+#define IXGBE_TX_FLAGS_FSO		(u32)(1 << 6)
+#define IXGBE_TX_FLAGS_MAPPED_AS_PAGE	(u32)(1 << 7)
 #define IXGBE_TX_FLAGS_VLAN_MASK	0xffff0000
-#define IXGBE_TX_FLAGS_VLAN_PRIO_MASK   0x0000e000
+#define IXGBE_TX_FLAGS_VLAN_PRIO_MASK	0xe0000000
+#define IXGBE_TX_FLAGS_VLAN_PRIO_SHIFT  29
 #define IXGBE_TX_FLAGS_VLAN_SHIFT	16
 
 #define IXGBE_MAX_RSC_INT_RATE          162760
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9a2d2d4..44ded0c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6369,7 +6369,7 @@ static bool ixgbe_tx_csum(struct ixgbe_ring *tx_ring,
 	u32 type_tucmd = 0;
 
 	if (skb->ip_summed != CHECKSUM_PARTIAL) {
-	    if (!(tx_flags & IXGBE_TX_FLAGS_VLAN))
+	    if (!(tx_flags & IXGBE_TX_FLAGS_HW_VLAN))
 			return false;
 	} else {
 		u8 l4_hdr = 0;
@@ -6434,7 +6434,7 @@ static __le32 ixgbe_tx_cmd_type(u32 tx_flags)
 				      IXGBE_ADVTXD_DCMD_DEXT);
 
 	/* set HW vlan bit if vlan is present */
-	if (tx_flags & IXGBE_TX_FLAGS_VLAN)
+	if (tx_flags & IXGBE_TX_FLAGS_HW_VLAN)
 		cmd_type |= cpu_to_le32(IXGBE_ADVTXD_DCMD_VLE);
 
 	/* set segmentation enable bits for TSO/FSO */
@@ -6670,8 +6670,8 @@ static void ixgbe_atr(struct ixgbe_ring *ring, struct sk_buff *skb,
 
 	th = tcp_hdr(skb);
 
-	/* skip this packet since the socket is closing */
-	if (th->fin)
+	/* skip this packet since it is invalid or the socket is closing */
+	if (!th || th->fin)
 		return;
 
 	/* sample on all syn packets or once every atr sample count */
@@ -6696,7 +6696,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring, struct sk_buff *skb,
 	 * since src port and flex bytes occupy the same word XOR them together
 	 * and write the value to source port portion of compressed dword
 	 */
-	if (vlan_id)
+	if (tx_flags & (IXGBE_TX_FLAGS_SW_VLAN | IXGBE_TX_FLAGS_HW_VLAN))
 		common.port.src ^= th->dest ^ __constant_htons(ETH_P_8021Q);
 	else
 		common.port.src ^= th->dest ^ protocol;
@@ -6785,7 +6785,7 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 	unsigned short f;
 #endif
 	u16 count = TXD_USE_COUNT(skb_headlen(skb));
-	__be16 protocol;
+	__be16 protocol = skb->protocol;
 	u8 hdr_len = 0;
 
 	/*
@@ -6806,59 +6806,79 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 		return NETDEV_TX_BUSY;
 	}
 
-	protocol = vlan_get_protocol(skb);
-
+	/* if we have a HW VLAN tag being added default to the HW one */
 	if (vlan_tx_tag_present(skb)) {
-		tx_flags |= vlan_tx_tag_get(skb);
-		if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
-			tx_flags &= ~IXGBE_TX_FLAGS_VLAN_PRIO_MASK;
-			tx_flags |= tx_ring->dcb_tc << 13;
+		tx_flags |= vlan_tx_tag_get(skb) << IXGBE_TX_FLAGS_VLAN_SHIFT;
+		tx_flags |= IXGBE_TX_FLAGS_HW_VLAN;
+	/* else if it is a SW VLAN check the next protocol and store the tag */
+	} else if (protocol == __constant_htons(ETH_P_8021Q)) {
+		struct vlan_hdr *vhdr, _vhdr;
+		vhdr = skb_header_pointer(skb, ETH_HLEN, sizeof(_vhdr), &_vhdr);
+		if (!vhdr)
+			goto out_drop;
+
+		protocol = vhdr->h_vlan_encapsulated_proto;
+		tx_flags |= ntohs(vhdr->h_vlan_TCI) << IXGBE_TX_FLAGS_VLAN_SHIFT;
+		tx_flags |= IXGBE_TX_FLAGS_SW_VLAN;
+	}
+
+	if ((adapter->flags & IXGBE_FLAG_DCB_ENABLED) &&
+	    skb->priority != TC_PRIO_CONTROL) {
+		tx_flags &= ~IXGBE_TX_FLAGS_VLAN_PRIO_MASK;
+		tx_flags |= tx_ring->dcb_tc <<
+			    IXGBE_TX_FLAGS_VLAN_PRIO_SHIFT;
+		if (tx_flags & IXGBE_TX_FLAGS_SW_VLAN) {
+			struct vlan_ethhdr *vhdr;
+			if (skb_header_cloned(skb) &&
+			    pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
+				goto out_drop;
+			vhdr = (struct vlan_ethhdr *)skb->data;
+			vhdr->h_vlan_TCI = htons(tx_flags >>
+						 IXGBE_TX_FLAGS_VLAN_SHIFT);
+		} else {
+			tx_flags |= IXGBE_TX_FLAGS_HW_VLAN;
 		}
-		tx_flags <<= IXGBE_TX_FLAGS_VLAN_SHIFT;
-		tx_flags |= IXGBE_TX_FLAGS_VLAN;
-	} else if (adapter->flags & IXGBE_FLAG_DCB_ENABLED &&
-		   skb->priority != TC_PRIO_CONTROL) {
-		tx_flags |= tx_ring->dcb_tc << 13;
-		tx_flags <<= IXGBE_TX_FLAGS_VLAN_SHIFT;
-		tx_flags |= IXGBE_TX_FLAGS_VLAN;
 	}
 
-#ifdef IXGBE_FCOE
-	/* for FCoE with DCB, we force the priority to what
-	 * was specified by the switch */
-	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED &&
-	    (protocol == htons(ETH_P_FCOE)))
-		tx_flags |= IXGBE_TX_FLAGS_FCOE;
-
-#endif
 	/* record the location of the first descriptor for this packet */
 	first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
 
-	if (tx_flags & IXGBE_TX_FLAGS_FCOE) {
 #ifdef IXGBE_FCOE
-		/* setup tx offload for FCoE */
+	/* setup tx offload for FCoE */
+	if ((protocol == __constant_htons(ETH_P_FCOE)) &&
+	    (adapter->flags & IXGBE_FLAG_FCOE_ENABLED)) {
 		tso = ixgbe_fso(tx_ring, skb, tx_flags, &hdr_len);
 		if (tso < 0)
 			goto out_drop;
 		else if (tso)
-			tx_flags |= IXGBE_TX_FLAGS_FSO;
-#endif /* IXGBE_FCOE */
-	} else {
-		if (protocol == htons(ETH_P_IP))
-			tx_flags |= IXGBE_TX_FLAGS_IPV4;
-		tso = ixgbe_tso(tx_ring, skb, tx_flags, protocol, &hdr_len);
-		if (tso < 0)
-			goto out_drop;
-		else if (tso)
-			tx_flags |= IXGBE_TX_FLAGS_TSO;
-		else if (ixgbe_tx_csum(tx_ring, skb, tx_flags, protocol))
-			tx_flags |= IXGBE_TX_FLAGS_CSUM;
+			tx_flags |= IXGBE_TX_FLAGS_FSO |
+				    IXGBE_TX_FLAGS_FCOE;
+		else
+			tx_flags |= IXGBE_TX_FLAGS_FCOE;
 
-		/* add the ATR filter if ATR is on */
-		if (test_bit(__IXGBE_TX_FDIR_INIT_DONE, &tx_ring->state))
-			ixgbe_atr(tx_ring, skb, tx_flags, protocol);
+		goto xmit_fcoe;
 	}
 
+#endif /* IXGBE_FCOE */
+	/* setup IPv4/IPv6 offloads */
+	if (protocol == __constant_htons(ETH_P_IP))
+		tx_flags |= IXGBE_TX_FLAGS_IPV4;
+
+	tso = ixgbe_tso(tx_ring, skb, tx_flags, protocol, &hdr_len);
+	if (tso < 0)
+		goto out_drop;
+	else if (tso)
+		tx_flags |= IXGBE_TX_FLAGS_TSO;
+	else if (ixgbe_tx_csum(tx_ring, skb, tx_flags, protocol))
+		tx_flags |= IXGBE_TX_FLAGS_CSUM;
+
+	/* add the ATR filter if ATR is on */
+	if (test_bit(__IXGBE_TX_FDIR_INIT_DONE, &tx_ring->state))
+		ixgbe_atr(tx_ring, skb, tx_flags, protocol);
+
+#ifdef IXGBE_FCOE
+xmit_fcoe:
+#endif /* IXGBE_FCOE */
 	ixgbe_tx_map(tx_ring, skb, first, tx_flags, hdr_len);
 
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
-- 
1.7.6


^ permalink raw reply related

* [PATCH/RFC v3 0/75] enable SKB paged fragment lifetime visibility
From: Ian Campbell @ 2011-08-19 13:26 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi,

This is v3 of my series to enable visibility into SKB paged fragment's
lifecycles, v1 is at [0] and contains some more background and rationale
but basically the series allows entities which inject pages into the
networking stack to receive a notification when the stack has really
finished with those pages (i.e. including retransmissions, clones,
pull-ups etc) and not just when the original skb is finished with, which
is beneficial to many subsystems which wish to inject pages into the
network stack without giving up full ownership of those page's
lifecycle. It implements something broadly along the lines of what was
described in [1].

I have updated based on the feedback from v2. In particular:
      * Rebase to v3.2-rc2
      * Rework to move reliance on core struct page const'ness change to
        end or series (David Miller). 
      * Remove inadvertent semantic change to drivers/atm/eni.c (David)
      * Add skb_frag_set_destructor instead of adding a parameter to
        skb_fill_page_desc (Michał Mirosław)
      * Resolved conflict with skb zero-copy buffers (nb: I revolved
        this with no functional change but I have a suspicion that this
        could be vastly simplified using the new destructor hook,
        although I've not looked all that closely)
      * Split drivers/* patches up from the huge monolithic patch.
      * Build tested defconfig on a bunch architectures (arm i386 m68k
        mips64 mips s390x sparc64 x86_64) and fixed a variety of issues
        arising from this.
      * Unfortunately I found an interesting breakage in
        kernel/signals.c arising from the additional #include of
        highmem.h in skbuff.h which. See:
        http://lkml.kernel.org/r/1313675198-7413-1-git-send-email-ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org I split the addition of skb_frag_kmap out and moved it to the end of the series to avoid this blocking the rest.
      * s/kmap_skb_frag/skb_frag_kmap_atomic/ (similar for kunmap) since
        the names kmap_skb_frag and skb_frag_kmap were confusingly
        similar. At the same time I moved them to skbuff.h since they
        are used in a couple of places outside net/core (open coded in
        drivers/scsi/fcoe/fcoe.c and #include "../core/kmap_skb.h" in
        net/appletalk/ddp.c)

Changes between v1 and v2:
      * Added destructor directly to sendpage() protocol hooks instead
        of inventing sendpage_destructor() (David Miller)
      * Dropped skb_frag_pci_map in favour of skb_frag_dma_map on Michał
        Mirosław's advice
      * Pushed the NFS fix down into the RPC layer. (Trond)
      * I also split out the patches to make protocols use the new
        interface into per-protocol patches. I held back on splitting up
        the patch to drivers/* (since that will cause an explosion in
        the series length) -- I'll do this split for v3.

I've a bit of an open dilemma regarding what to do about the small
number of drivers which use skb_frag_t as part of their internal
datastructures (it's a convenient page+offset+len container). It's a bit
ugly sprinkling the ".p" in around those drivers. I've considered
leaving skb_frag_t alone and defining skb_shared_info->frag as an
anonymous struct with the new layout (i.e. repurposing the existing
skb_frag_t for the alternative use and making a new struct for the
traditional/proper use). This would need a new accessor to copy one to
the other. Alternatively perhaps a generic struct subpage would be
useful in its own right and I could transition those drivers to that?
Opinions?

For v4 I will investigate that issue, collect maintainers ACKs and
rebase over the big drivers/net re-org which is going on (lets see what
git rebase is made of ;-))

Cheers,
Ian.

[0] http://marc.info/?l=linux-netdev&m=131072801125521&w=2
[1] http://marc.info/?l=linux-netdev&m=130925719513084&w=2

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 01/75] net: add APIs for manipulating skb page fragments.
From: Ian Campbell @ 2011-08-19 13:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Ian Campbell, David S. Miller, Eric Dumazet,
	Michał Mirosław
In-Reply-To: <1313760393.5010.356.camel@zakaz.uk.xensource.com>

The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
bare get/put_page on SKB pages fragments and to isolate users from subsequent
changes to the skb_frag_t data structure.

Also included are helper APIs for passing a paged fragment to kmap and
dma_map_page since I was seeing the same pattern a lot. A helper for
pci_map_page is ommitted due to Michał Mirosław's recommendation that users
should transition to pci_map_page instead.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
---
 include/linux/skbuff.h |  170 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 168 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7b996ed..2562764 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -29,6 +29,7 @@
 #include <linux/rcupdate.h>
 #include <linux/dmaengine.h>
 #include <linux/hrtimer.h>
+#include <linux/dma-mapping.h>
 
 /* Don't change this without changing skb_csum_unnecessary! */
 #define CHECKSUM_NONE 0
@@ -1126,14 +1127,47 @@ static inline int skb_pagelen(const struct sk_buff *skb)
 	return len + skb_headlen(skb);
 }
 
-static inline void skb_fill_page_desc(struct sk_buff *skb, int i,
-				      struct page *page, int off, int size)
+/**
+ * __skb_fill_page_desc - initialise a paged fragment in an skb
+ * @skb: buffer containing fragment to be initialised
+ * @i: paged fragment index to initialise
+ * @page: the page to use for this fragment
+ * @off: the offset to the data with @page
+ * @size: the length of the data
+ *
+ * Initialises the @i'th fragment of @skb to point to &size bytes at
+ * offset @off within @page.
+ *
+ * Does not take any additional reference on the fragment.
+ */
+static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
+					struct page *page, int off, int size)
 {
 	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
 	frag->page		  = page;
 	frag->page_offset	  = off;
 	frag->size		  = size;
+}
+
+/**
+ * skb_fill_page_desc - initialise a paged fragment in an skb
+ * @skb: buffer containing fragment to be initialised
+ * @i: paged fragment index to initialise
+ * @page: the page to use for this fragment
+ * @off: the offset to the data with @page
+ * @size: the length of the data
+ *
+ * As per __skb_fill_page_desc() -- initialises the @i'th fragment of
+ * @skb to point to &size bytes at offset @off within @page. In
+ * addition updates @skb such that @i is the last fragment.
+ *
+ * Does not take any additional reference on the fragment.
+ */
+static inline void skb_fill_page_desc(struct sk_buff *skb, int i,
+				      struct page *page, int off, int size)
+{
+	__skb_fill_page_desc(skb, i, page, off, size);
 	skb_shinfo(skb)->nr_frags = i + 1;
 }
 
@@ -1628,6 +1662,138 @@ static inline void netdev_free_page(struct net_device *dev, struct page *page)
 }
 
 /**
+ * skb_frag_page - retrieve the page refered to by a paged fragment
+ * @frag: the paged fragment
+ *
+ * Returns the &struct page associated with @frag.
+ */
+static inline struct page *skb_frag_page(const skb_frag_t *frag)
+{
+	return frag->page;
+}
+
+/**
+ * __skb_frag_ref - take an addition reference on a paged fragment.
+ * @frag: the paged fragment
+ *
+ * Takes an additional reference on the paged fragment @frag.
+ */
+static inline void __skb_frag_ref(skb_frag_t *frag)
+{
+	get_page(skb_frag_page(frag));
+}
+
+/**
+ * skb_frag_ref - take an addition reference on a paged fragment of an skb.
+ * @skb: the buffer
+ * @f: the fragment offset.
+ *
+ * Takes an additional reference on the @f'th paged fragment of @skb.
+ */
+static inline void skb_frag_ref(struct sk_buff *skb, int f)
+{
+	__skb_frag_ref(&skb_shinfo(skb)->frags[f]);
+}
+
+/**
+ * __skb_frag_unref - release a reference on a paged fragment.
+ * @frag: the paged fragment
+ *
+ * Releases a reference on the paged fragment @frag.
+ */
+static inline void __skb_frag_unref(skb_frag_t *frag)
+{
+	put_page(skb_frag_page(frag));
+}
+
+/**
+ * skb_frag_unref - release a reference on a paged fragment of an skb.
+ * @skb: the buffer
+ * @f: the fragment offset
+ *
+ * Releases a reference on the @f'th paged fragment of @skb.
+ */
+static inline void skb_frag_unref(struct sk_buff *skb, int f)
+{
+	__skb_frag_unref(&skb_shinfo(skb)->frags[f]);
+}
+
+/**
+ * skb_frag_address - gets the address of the data contained in a paged fragment
+ * @frag: the paged fragment buffer
+ *
+ * Returns the address of the data within @frag. The page must already
+ * be mapped.
+ */
+static inline void *skb_frag_address(const skb_frag_t *frag)
+{
+	return page_address(skb_frag_page(frag)) + frag->page_offset;
+}
+
+/**
+ * skb_frag_address_safe - gets the address of the data contained in a paged fragment
+ * @frag: the paged fragment buffer
+ *
+ * Returns the address of the data within @frag. Checks that the page
+ * is mapped and returns %NULL otherwise.
+ */
+static inline void *skb_frag_address_safe(const skb_frag_t *frag)
+{
+	void *ptr = page_address(skb_frag_page(frag));
+	if (unlikely(!ptr))
+		return NULL;
+
+	return ptr + frag->page_offset;
+}
+
+/**
+ * __skb_frag_set_page - sets the page contained in a paged fragment
+ * @frag: the paged fragment
+ * @page: the page to set
+ *
+ * Sets the fragment @frag to contain @page.
+ */
+static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page)
+{
+	frag->page = page;
+	__skb_frag_ref(frag);
+}
+
+/**
+ * skb_frag_set_page - sets the page contained in a paged fragment of an skb
+ * @skb: the buffer
+ * @f: the fragment offset
+ * @page: the page to set
+ *
+ * Sets the @f'th fragment of @skb to contain @page.
+ */
+static inline void skb_frag_set_page(struct sk_buff *skb, int f,
+				     struct page *page)
+{
+	__skb_frag_set_page(&skb_shinfo(skb)->frags[f], page);
+}
+
+/**
+ * skb_frag_dma_map - maps a paged fragment via the DMA API
+ * @device: the device to map the fragment to
+ * @frag: the paged fragment to map
+ * @offset: the offset within the fragment (starting at the
+ *          fragment's own offset)
+ * @size: the number of bytes to map
+ * @direction: the direction of the mapping (%PCI_DMA_*)
+ *
+ * Maps the page associated with @frag to @device.
+ */
+static inline dma_addr_t skb_frag_dma_map(struct device *dev,
+					  const skb_frag_t *frag,
+					  size_t offset, size_t size,
+					  enum dma_data_direction dir)
+{
+	return dma_map_page(dev, skb_frag_page(frag),
+			    frag->page_offset + offset, size, dir);
+}
+
+/**
  *	skb_clone_writable - is the header of a clone writable
  *	@skb: buffer to check
  *	@len: length up to which to write
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 02/75] net: convert core to skb paged frag APIs
From: Ian Campbell @ 2011-08-19 13:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Ian Campbell, David S. Miller, Eric Dumazet,
	Michał Mirosław
In-Reply-To: <1313760393.5010.356.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
---
 include/linux/skbuff.h |    4 ++--
 net/core/datagram.c    |    8 ++++----
 net/core/dev.c         |   16 +++++++++-------
 net/core/kmap_skb.h    |    2 +-
 net/core/pktgen.c      |    3 +--
 net/core/skbuff.c      |   29 +++++++++++++++--------------
 net/core/sock.c        |   12 +++++-------
 net/core/user_dma.c    |    2 +-
 8 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2562764..c015e61 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1895,12 +1895,12 @@ static inline int skb_add_data(struct sk_buff *skb,
 }
 
 static inline int skb_can_coalesce(struct sk_buff *skb, int i,
-				   struct page *page, int off)
+				   const struct page *page, int off)
 {
 	if (i) {
 		struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1];
 
-		return page == frag->page &&
+		return page == skb_frag_page(frag) &&
 		       off == frag->page_offset + frag->size;
 	}
 	return 0;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 18ac112..6449bed 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -332,7 +332,7 @@ int skb_copy_datagram_iovec(const struct sk_buff *skb, int offset,
 			int err;
 			u8  *vaddr;
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-			struct page *page = frag->page;
+			struct page *page = skb_frag_page(frag);
 
 			if (copy > len)
 				copy = len;
@@ -418,7 +418,7 @@ int skb_copy_datagram_const_iovec(const struct sk_buff *skb, int offset,
 			int err;
 			u8  *vaddr;
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-			struct page *page = frag->page;
+			struct page *page = skb_frag_page(frag);
 
 			if (copy > len)
 				copy = len;
@@ -508,7 +508,7 @@ int skb_copy_datagram_from_iovec(struct sk_buff *skb, int offset,
 			int err;
 			u8  *vaddr;
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-			struct page *page = frag->page;
+			struct page *page = skb_frag_page(frag);
 
 			if (copy > len)
 				copy = len;
@@ -594,7 +594,7 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
 			int err = 0;
 			u8  *vaddr;
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-			struct page *page = frag->page;
+			struct page *page = skb_frag_page(frag);
 
 			if (copy > len)
 				copy = len;
diff --git a/net/core/dev.c b/net/core/dev.c
index 17d67b5..1380325 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1947,9 +1947,11 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 #ifdef CONFIG_HIGHMEM
 	int i;
 	if (!(dev->features & NETIF_F_HIGHDMA)) {
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-			if (PageHighMem(skb_shinfo(skb)->frags[i].page))
+		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+			if (PageHighMem(skb_frag_page(frag)))
 				return 1;
+		}
 	}
 
 	if (PCI_DMA_BUS_IS_PHYS) {
@@ -1958,7 +1960,8 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 		if (!pdev)
 			return 0;
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-			dma_addr_t addr = page_to_phys(skb_shinfo(skb)->frags[i].page);
+			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+			dma_addr_t addr = page_to_phys(skb_frag_page(frag));
 			if (!pdev->dma_mask || addr + PAGE_SIZE - 1 > *pdev->dma_mask)
 				return 1;
 		}
@@ -3424,7 +3427,7 @@ pull:
 		skb_shinfo(skb)->frags[0].size -= grow;
 
 		if (unlikely(!skb_shinfo(skb)->frags[0].size)) {
-			put_page(skb_shinfo(skb)->frags[0].page);
+			skb_frag_unref(skb, 0);
 			memmove(skb_shinfo(skb)->frags,
 				skb_shinfo(skb)->frags + 1,
 				--skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));
@@ -3488,10 +3491,9 @@ void skb_gro_reset_offset(struct sk_buff *skb)
 	NAPI_GRO_CB(skb)->frag0_len = 0;
 
 	if (skb->mac_header == skb->tail &&
-	    !PageHighMem(skb_shinfo(skb)->frags[0].page)) {
+	    !PageHighMem(skb_frag_page(&skb_shinfo(skb)->frags[0]))) {
 		NAPI_GRO_CB(skb)->frag0 =
-			page_address(skb_shinfo(skb)->frags[0].page) +
-			skb_shinfo(skb)->frags[0].page_offset;
+			skb_frag_address(&skb_shinfo(skb)->frags[0]);
 		NAPI_GRO_CB(skb)->frag0_len = skb_shinfo(skb)->frags[0].size;
 	}
 }
diff --git a/net/core/kmap_skb.h b/net/core/kmap_skb.h
index 283c2b9..81e1ed7 100644
--- a/net/core/kmap_skb.h
+++ b/net/core/kmap_skb.h
@@ -7,7 +7,7 @@ static inline void *kmap_skb_frag(const skb_frag_t *frag)
 
 	local_bh_disable();
 #endif
-	return kmap_atomic(frag->page, KM_SKB_DATA_SOFTIRQ);
+	return kmap_atomic(skb_frag_page(frag), KM_SKB_DATA_SOFTIRQ);
 }
 
 static inline void kunmap_skb_frag(void *vaddr)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index e35a6fb..796044a 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2602,8 +2602,7 @@ static void pktgen_finalize_skb(struct pktgen_dev *pkt_dev, struct sk_buff *skb,
 				if (!pkt_dev->page)
 					break;
 			}
-			skb_shinfo(skb)->frags[i].page = pkt_dev->page;
-			get_page(pkt_dev->page);
+			skb_frag_set_page(skb, i, pkt_dev->page);
 			skb_shinfo(skb)->frags[i].page_offset = 0;
 			/*last fragment, fill rest of data*/
 			if (i == (frags - 1))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 27002df..418f3435 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -326,7 +326,7 @@ static void skb_release_data(struct sk_buff *skb)
 		if (skb_shinfo(skb)->nr_frags) {
 			int i;
 			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-				put_page(skb_shinfo(skb)->frags[i].page);
+				skb_frag_unref(skb, i);
 		}
 
 		/*
@@ -807,7 +807,7 @@ struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
-			get_page(skb_shinfo(n)->frags[i].page);
+			skb_frag_ref(skb, i);
 		}
 		skb_shinfo(n)->nr_frags = i;
 	}
@@ -899,7 +899,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-			get_page(skb_shinfo(skb)->frags[i].page);
+			skb_frag_ref(skb, i);
 
 		if (skb_has_frag_list(skb))
 			skb_clone_fraglist(skb);
@@ -1179,7 +1179,7 @@ drop_pages:
 		skb_shinfo(skb)->nr_frags = i;
 
 		for (; i < nfrags; i++)
-			put_page(skb_shinfo(skb)->frags[i].page);
+			skb_frag_unref(skb, i);
 
 		if (skb_has_frag_list(skb))
 			skb_drop_fraglist(skb);
@@ -1348,7 +1348,7 @@ pull_pages:
 	k = 0;
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 		if (skb_shinfo(skb)->frags[i].size <= eat) {
-			put_page(skb_shinfo(skb)->frags[i].page);
+			skb_frag_unref(skb, i);
 			eat -= skb_shinfo(skb)->frags[i].size;
 		} else {
 			skb_shinfo(skb)->frags[k] = skb_shinfo(skb)->frags[i];
@@ -1607,7 +1607,8 @@ static int __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
 	for (seg = 0; seg < skb_shinfo(skb)->nr_frags; seg++) {
 		const skb_frag_t *f = &skb_shinfo(skb)->frags[seg];
 
-		if (__splice_segment(f->page, f->page_offset, f->size,
+		if (__splice_segment(skb_frag_page(f),
+				     f->page_offset, f->size,
 				     offset, len, skb, spd, 0, sk, pipe))
 			return 1;
 	}
@@ -2152,7 +2153,7 @@ static inline void skb_split_no_header(struct sk_buff *skb,
 				 *    where splitting is expensive.
 				 * 2. Split is accurately. We make this.
 				 */
-				get_page(skb_shinfo(skb)->frags[i].page);
+				skb_frag_ref(skb, i);
 				skb_shinfo(skb1)->frags[0].page_offset += len - pos;
 				skb_shinfo(skb1)->frags[0].size -= len - pos;
 				skb_shinfo(skb)->frags[i].size	= len - pos;
@@ -2227,7 +2228,8 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
 	 * commit all, so that we don't have to undo partial changes
 	 */
 	if (!to ||
-	    !skb_can_coalesce(tgt, to, fragfrom->page, fragfrom->page_offset)) {
+	    !skb_can_coalesce(tgt, to, skb_frag_page(fragfrom),
+			      fragfrom->page_offset)) {
 		merge = -1;
 	} else {
 		merge = to - 1;
@@ -2274,7 +2276,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
 			to++;
 
 		} else {
-			get_page(fragfrom->page);
+			__skb_frag_ref(fragfrom);
 			fragto->page = fragfrom->page;
 			fragto->page_offset = fragfrom->page_offset;
 			fragto->size = todo;
@@ -2296,7 +2298,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
 		fragto = &skb_shinfo(tgt)->frags[merge];
 
 		fragto->size += fragfrom->size;
-		put_page(fragfrom->page);
+		__skb_frag_unref(fragfrom);
 	}
 
 	/* Reposition in the original skb */
@@ -2541,8 +2543,7 @@ int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
 		left = PAGE_SIZE - frag->page_offset;
 		copy = (length > left)? left : length;
 
-		ret = getfrag(from, (page_address(frag->page) +
-			    frag->page_offset + frag->size),
+		ret = getfrag(from, skb_frag_address(frag) + frag->size,
 			    offset, copy, 0, skb);
 		if (ret < 0)
 			return -EFAULT;
@@ -2694,7 +2695,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, u32 features)
 
 		while (pos < offset + len && i < nfrags) {
 			*frag = skb_shinfo(skb)->frags[i];
-			get_page(frag->page);
+			__skb_frag_ref(frag);
 			size = frag->size;
 
 			if (pos < offset) {
@@ -2917,7 +2918,7 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 
 			if (copy > len)
 				copy = len;
-			sg_set_page(&sg[elt], frag->page, copy,
+			sg_set_page(&sg[elt], skb_frag_page(frag), copy,
 					frag->page_offset+offset-start);
 			elt++;
 			if (!(len -= copy))
diff --git a/net/core/sock.c b/net/core/sock.c
index bc745d0..72b8f06 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1533,7 +1533,6 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				skb_shinfo(skb)->nr_frags = npages;
 				for (i = 0; i < npages; i++) {
 					struct page *page;
-					skb_frag_t *frag;
 
 					page = alloc_pages(sk->sk_allocation, 0);
 					if (!page) {
@@ -1543,12 +1542,11 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 						goto failure;
 					}
 
-					frag = &skb_shinfo(skb)->frags[i];
-					frag->page = page;
-					frag->page_offset = 0;
-					frag->size = (data_len >= PAGE_SIZE ?
-						      PAGE_SIZE :
-						      data_len);
+					__skb_fill_page_desc(skb, i,
+							page, 0,
+							(data_len >= PAGE_SIZE ?
+							 PAGE_SIZE :
+							 data_len));
 					data_len -= PAGE_SIZE;
 				}
 
diff --git a/net/core/user_dma.c b/net/core/user_dma.c
index 25d717e..34e9664 100644
--- a/net/core/user_dma.c
+++ b/net/core/user_dma.c
@@ -78,7 +78,7 @@ int dma_skb_copy_datagram_iovec(struct dma_chan *chan,
 		copy = end - offset;
 		if (copy > 0) {
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-			struct page *page = frag->page;
+			struct page *page = skb_frag_page(frag);
 
 			if (copy > len)
 				copy = len;
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 03/75] net: ipv4: convert to SKB frag APIs
From: Ian Campbell @ 2011-08-19 13:26 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Ian Campbell, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy
In-Reply-To: <1313760393.5010.356.camel@zakaz.uk.xensource.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
---
 net/ipv4/inet_lro.c   |    2 +-
 net/ipv4/ip_output.c  |    7 ++++---
 net/ipv4/tcp.c        |    3 ++-
 net/ipv4/tcp_output.c |    2 +-
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inet_lro.c b/net/ipv4/inet_lro.c
index ef7ae60..8e6be5a 100644
--- a/net/ipv4/inet_lro.c
+++ b/net/ipv4/inet_lro.c
@@ -433,7 +433,7 @@ static struct sk_buff *__lro_proc_segment(struct net_lro_mgr *lro_mgr,
 	if (!lro_mgr->get_frag_header ||
 	    lro_mgr->get_frag_header(frags, (void *)&mac_hdr, (void *)&iph,
 				     (void *)&tcph, &flags, priv)) {
-		mac_hdr = page_address(frags->page) + frags->page_offset;
+		mac_hdr = skb_frag_address(frags);
 		goto out1;
 	}
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8c65633..ae3bb14 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -989,13 +989,13 @@ alloc_new_skb:
 			if (page && (left = PAGE_SIZE - off) > 0) {
 				if (copy >= left)
 					copy = left;
-				if (page != frag->page) {
+				if (page != skb_frag_page(frag)) {
 					if (i == MAX_SKB_FRAGS) {
 						err = -EMSGSIZE;
 						goto error;
 					}
-					get_page(page);
 					skb_fill_page_desc(skb, i, page, off, 0);
+					skb_frag_ref(skb, i);
 					frag = &skb_shinfo(skb)->frags[i];
 				}
 			} else if (i < MAX_SKB_FRAGS) {
@@ -1015,7 +1015,8 @@ alloc_new_skb:
 				err = -EMSGSIZE;
 				goto error;
 			}
-			if (getfrag(from, page_address(frag->page)+frag->page_offset+frag->size, offset, copy, skb->len, skb) < 0) {
+			if (getfrag(from, skb_frag_address(frag)+frag->size,
+				    offset, copy, skb->len, skb) < 0) {
 				err = -EFAULT;
 				goto error;
 			}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 46febca..5fe632c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3035,7 +3035,8 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 
 	for (i = 0; i < shi->nr_frags; ++i) {
 		const struct skb_frag_struct *f = &shi->frags[i];
-		sg_set_page(&sg, f->page, f->size, f->page_offset);
+		struct page *page = skb_frag_page(f);
+		sg_set_page(&sg, page, f->size, f->page_offset);
 		if (crypto_hash_update(desc, &sg, f->size))
 			return 1;
 	}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 882e0b0..0377c06 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1095,7 +1095,7 @@ static void __pskb_trim_head(struct sk_buff *skb, int len)
 	k = 0;
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 		if (skb_shinfo(skb)->frags[i].size <= eat) {
-			put_page(skb_shinfo(skb)->frags[i].page);
+			skb_frag_unref(skb, i);
 			eat -= skb_shinfo(skb)->frags[i].size;
 		} else {
 			skb_shinfo(skb)->frags[k] = skb_shinfo(skb)->frags[i];
-- 
1.7.2.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox