Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH V10 12/15] ptp: Added a brand new class driver for ptp clocks.
From: Richard Cochran @ 2011-02-11  8:15 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner, Benjamin Herrenschmidt, H. Peter Anvin,
	Ingo Molnar, Mike Frysinger, Paul Mackerras, Russell King
In-Reply-To: <1296612031.3336.201.camel@work-vm>

On Tue, Feb 01, 2011 at 06:00:31PM -0800, John Stultz wrote:
> On Thu, 2011-01-27 at 11:59 +0100, Richard Cochran wrote:

> You may want to tweak the kconfig a bit here. If I don't have pps
> enabled, if I go into the "PTP clock support" page, I get an empty
> screen.
> 
> Similarly, its not very discoverable to figure out what you need to
> enable to get the driver options to show up, as they depend the drivers
> enabled in the networking device section.

Okay, I'll see what I can come up with.

> > +#define PTP_MAX_ALARMS 4
> > +#define PTP_MAX_CLOCKS (MAX_CLOCKS/2)
> 
> Why MAX_CLOCKS/2 ? Should this scale as MAX_CLOCKS is increased?
> Or do you really just mean 8?

This is just left over from when I thought the PHCs would use the
static clock IDs. I'll fix that.

> > +static void enqueue_external_timestamp(struct timestamp_event_queue *queue,
> > +				       struct ptp_clock_event *src)
> > +{
> > +	struct ptp_extts_event *dst;
> > +	u32 remainder;
> > +
> > +	dst = &queue->buf[queue->tail];
> > +
> > +	dst->index = src->index;
> > +	dst->t.sec = div_u64_rem(src->timestamp, 1000000000, &remainder);
> > +	dst->t.nsec = remainder;
> > +
> > +	if (!queue_free(queue))
> > +		queue->overflow++;
> > +
> > +	queue->tail = (queue->tail + 1) % PTP_MAX_TIMESTAMPS;
> > +}
> 
> So what is serializing access to the timestamp_event_queue here? I don't
> see any usage of tsevq_mux by the callers. Am I missing it? It looks
> like its called from interrupt context, so do you really need a spinlock
> and not a mutex here?

The external timestamp FIFO is written only from interrupt context.
The readers are from user space via read() or sysfs. The readers must
hold a mutex. As you know, FIFOs with exactly one reader and one
writer don't need locking.

However, looking again at my own code (after spending a long time in
the posicx clock stuff), I notice that, although FIFO overflow is
detected, I do not offer a way for user space to find this out or to
clear the error. I'll fix that.

> > +#define PTP_MAX_TIMESTAMPS 128
> > +
> > +struct timestamp_event_queue {
> > +	struct ptp_extts_event buf[PTP_MAX_TIMESTAMPS];
> > +	int head;
> > +	int tail;
> > +	int overflow;
> > +};
> > +
> > +struct ptp_clock {
> > +	struct posix_clock clock;
> > +	struct device *dev;
> > +	struct ptp_clock_info *info;
> > +	dev_t devid;
> > +	int index; /* index into clocks.map */
> > +	struct pps_device *pps_source;
> > +	struct timestamp_event_queue tsevq; /* simple fifo for time stamps */
> > +	struct mutex tsevq_mux; /* one process at a time reading the fifo */
> > +	wait_queue_head_t tsev_wq;
> > +};
> > +
> > +static inline int queue_cnt(struct timestamp_event_queue *q)
> > +{
> > +	int cnt = q->tail - q->head;
> > +	return cnt < 0 ? PTP_MAX_TIMESTAMPS + cnt : cnt;
> > +}
> 
> This probably needs a comment as to its locking rules. Something like
> "Callers must hold tsevq_mux."

I'll add a comment explaining the readers mutex and why no
reader/writer locking is needed.

> > +struct ptp_clock_time {
> > +	__s64 sec;  /* seconds */
> > +	__u32 nsec; /* nanoseconds */
> > +	__u32 reserved;
> > +};
> > +
> > +struct ptp_clock_caps {
> > +	int max_adj;   /* Maximum frequency adjustment in parts per billon. */
> > +	int n_alarm;   /* Number of programmable alarms. */
> > +	int n_ext_ts;  /* Number of external time stamp channels. */
> > +	int n_per_out; /* Number of programmable periodic signals. */
> > +	int pps;       /* Whether the clock supports a PPS callback. */
> > +};
> > +
> > +struct ptp_extts_request {
> > +	unsigned int index; /* Which channel to configure. */
> > +	unsigned int flags; /* Bit field for PTP_xxx flags. */
> > +};
> > +
> > +struct ptp_perout_request {
> > +	struct ptp_clock_time start;  /* Absolute start time. */
> > +	struct ptp_clock_time period; /* Desired period, zero means disable. */
> > +	unsigned int index;           /* Which channel to configure. */
> > +	unsigned int flags;           /* Reserved for future use. */
> > +};
> 
> Since these are all new API/ABI structures, would it be wise to pad
> these out a bit more? You've got a couple of reserved fields, which is
> good, but if you think we're going to expand this at all, we may want to
> have a bit more wiggle room. The timex structure had something like 12
> unused ints (which came in handy when the tai field was added).
> 
> Not sure what the wider opinion is on this though.

Okay, I'll pad them a bit more.

However, I don't intend to ever offer more than simple functionality
here. A general purpose DAQ API is not so easy to define (look at
comedi, for example). Also, the capabilities of the current crop of
clocks varies quite a bit.

So, I think the PHC should offer a PPS, simple period outputs, and
simple external timestamping. If someone want more complex DAQ like
functions, then they can offer that through comedi or whatever.

> > +struct ptp_extts_event {
> > +	struct ptp_clock_time t; /* Time event occured. */
> > +	unsigned int index;      /* Which channel produced the event. */
> > +};
> 
> Same padding suggestion for this as well.

Okay, and thanks for the review,

Richard

^ permalink raw reply

* Re: [PATCH net-2.6] xfrm: avoid possible oopse in xfrm_alloc_dst
From: David Miller @ 2011-02-11  7:08 UTC (permalink / raw)
  To: shimoda.hiroaki; +Cc: netdev, timo.teras, herbert
In-Reply-To: <1297406495.18018.76.camel@vega>

From: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Date: Fri, 11 Feb 2011 15:41:35 +0900

> Commit 80c802f3073e84 (xfrm: cache bundles instead of policies for
> outgoing flows) introduced possible oopse when dst_alloc returns NULL.
> 
> Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>

Good catch, applied.

^ permalink raw reply

* Re: GRO/GSO hiding PMTU?
From: David Miller @ 2011-02-11  7:07 UTC (permalink / raw)
  To: herbert; +Cc: netdev, netfilter-devel
In-Reply-To: <20110211063753.GA29940@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 11 Feb 2011 17:37:53 +1100

> What I wanted to do if I ever get enough time to work on this is
> to record the transport header length in a gso_hlen field so we
> can fix this properly.
> 
> We currently have a useless gso_segs field that only has one or
> two users that don't even need it.  We could easily get rid of it
> and use that space for gso_hlen instead.
> 
> The gso_hlen field only needs to be filled in at the few spots
> that generate GSO packets, i.e.,
> 
> 1) TCP
> 2) Virt backends like tun.c
> 3) GRO

Yep, that's good idea.

And even if we needed to add one more u32 to skb_shared_info()
that's still sort-of "free" because of SLAB slack space.

I'll look into doing this.

Thanks!

^ permalink raw reply

* Re: GRO/GSO hiding PMTU?
From: David Miller @ 2011-02-11  7:06 UTC (permalink / raw)
  To: herbert; +Cc: netdev, netfilter-devel
In-Reply-To: <20110211064138.GB29940@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 11 Feb 2011 17:41:38 +1100

> I think we need to do some length verifications here because for
> a malicious guest-generated packet the TCP header may not be present.

Indeed, good catch.

^ permalink raw reply

* Re: netfilter is not a filesystem
From: Richard Cochran @ 2011-02-11  6:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, linux-kernel
In-Reply-To: <20110210141119.56d789fc.akpm@linux-foundation.org>

On Thu, Feb 10, 2011 at 02:11:19PM -0800, Andrew Morton wrote:
> On Thu, 10 Feb 2011 21:55:26 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=28862
> > 
> >            Summary: /proc/net/ip_conntrack: no space left on device
> >                     systematically
> 
> This is why I'm forever nagging people to not just grab some errno
> because its name happens to sound similar to the error you just detected.

Today my brain has thrown an -EMIXEDMESSAGES:

   https://lkml.org/lkml/2011/2/10/172

Sorry, couldn't resist,

Richard

^ permalink raw reply

* Re: [PATCH 2/2] network: Allow af_packet to transmit +4 bytes for VLAN packets.
From: Eric Dumazet @ 2011-02-11  6:57 UTC (permalink / raw)
  To: greearb; +Cc: netdev
In-Reply-To: <1297375149-18458-2-git-send-email-greearb@candelatech.com>

Le jeudi 10 février 2011 à 13:59 -0800, greearb@candelatech.com a
écrit :
> From: Ben Greear <greearb@candelatech.com>
> 
> This allows user-space to send a '1500' MTU VLAN packet on a
> 1500 MTU ethernet frame.  The extra 4 bytes of a VLAN header is
> not usually charged against the MTU when other parts of the
> network stack is transmitting vlans...
> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
> :100644 100644 91cb1d7... ef7f378... M	net/packet/af_packet.c
>  net/packet/af_packet.c |   31 +++++++++++++++++++++++++++++--
>  1 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 91cb1d7..ef7f378 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -466,7 +466,7 @@ retry:
>  	 */
>  
>  	err = -EMSGSIZE;
> -	if (len > dev->mtu + dev->hard_header_len)
> +	if (len > dev->mtu + dev->hard_header_len + VLAN_HLEN)
>  		goto out_unlock;
>  
>  	if (!skb) {
> @@ -497,6 +497,19 @@ retry:
>  		goto retry;
>  	}
>  
> +	if (len > (dev->mtu + dev->hard_header_len)) {
> +		/* Earlier code assumed this would be a VLAN pkt,
> +		 * double-check this now that we have the actual
> +		 * packet in hand.
> +		 */
> +		struct ethhdr *ehdr;
> +		skb_reset_mac_header(skb);
> +		ehdr = eth_hdr(skb);
> +		if (ehdr->h_proto != htons(ETH_P_8021Q)) {
> +			err = -EMSGSIZE;
> +			goto out_unlock;

This would leak skb.

> +		}
> +	}
>  
>  	skb->protocol = proto;
>  	skb->dev = dev;



^ permalink raw reply

* Re: GRO/GSO hiding PMTU?
From: Herbert Xu @ 2011-02-11  6:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel
In-Reply-To: <20110210.223544.189709102.davem@davemloft.net>

On Thu, Feb 10, 2011 at 10:35:44PM -0800, David Miller wrote:
>
> Herbert how does this look for now?

This should work.

> Of course, we need to do something similar in all kinds of other spots.
> 
> Even places like bridging :-/

Yeah every place that does skb->len and skb_is_gso checks will need
this.

> +static bool send_frag_needed(struct sk_buff *skb, struct rtable *rt)
> +{
> +	unsigned int len_to_check = skb->len;
> +
> +	if (skb_is_gso(skb)) {
> +		unsigned int gso_size = skb_shinfo(skb)->gso_size;
> +		unsigned int ihl = ip_hdr(skb)->ihl * 4;
> +		struct tcphdr th_stack, *th;
> +
> +		if (WARN_ON_ONCE(ip_hdr(skb)->protocol != IPPROTO_TCP))
> +			return false;
> +
> +		th = skb_header_pointer(skb, ihl, sizeof(th_stack),
> +					&th_stack);
> +		if (!th)
> +			return false;
> +
> +		len_to_check = gso_size + ihl + (th->doff * 4);

I think we need to do some length verifications here because for
a malicious guest-generated packet the TCP header may not be present.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH net-2.6] xfrm: avoid possible oopse in xfrm_alloc_dst
From: Hiroaki SHIMODA @ 2011-02-11  6:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, timo.teras, herbert

Commit 80c802f3073e84 (xfrm: cache bundles instead of policies for
outgoing flows) introduced possible oopse when dst_alloc returns NULL.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
---
 net/xfrm/xfrm_policy.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8b3ef40..6459588 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1340,10 +1340,13 @@ static inline struct xfrm_dst *xfrm_alloc_dst(struct net *net, int family)
 	default:
 		BUG();
 	}
-	xdst = dst_alloc(dst_ops) ?: ERR_PTR(-ENOBUFS);
+	xdst = dst_alloc(dst_ops);
 	xfrm_policy_put_afinfo(afinfo);
 
-	xdst->flo.ops = &xfrm_bundle_fc_ops;
+	if (likely(xdst))
+		xdst->flo.ops = &xfrm_bundle_fc_ops;
+	else
+		xdst = ERR_PTR(-ENOBUFS);
 
 	return xdst;
 }


^ permalink raw reply related

* Re: GRO/GSO hiding PMTU?
From: Herbert Xu @ 2011-02-11  6:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel
In-Reply-To: <20110210.222216.104050992.davem@davemloft.net>

On Thu, Feb 10, 2011 at 10:22:16PM -0800, David Miller wrote:
>
> I gave it a shot but it isn't easy.  We can figure out the length of
> the IP headers just fine, but the rest of the value we need to add
> to the MSS (the TCP header length) is transport specific which kind
> of implies a transport dependent gso proto op of some sort.

That's pretty much where I gave up :)

> Or we just hack it, admit that only TCP creates GSO packets, and
> directly check for TCP protcol and then inspect the TCP header
> length :-)

Sure we can do that for now.

What I wanted to do if I ever get enough time to work on this is
to record the transport header length in a gso_hlen field so we
can fix this properly.

We currently have a useless gso_segs field that only has one or
two users that don't even need it.  We could easily get rid of it
and use that space for gso_hlen instead.

The gso_hlen field only needs to be filled in at the few spots
that generate GSO packets, i.e.,

1) TCP
2) Virt backends like tun.c
3) GRO

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: GRO/GSO hiding PMTU?
From: David Miller @ 2011-02-11  6:35 UTC (permalink / raw)
  To: herbert; +Cc: netdev, netfilter-devel
In-Reply-To: <20110210.222216.104050992.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 10 Feb 2011 22:22:16 -0800 (PST)

> I gave it a shot but it isn't easy.  We can figure out the length of
> the IP headers just fine, but the rest of the value we need to add
> to the MSS (the TCP header length) is transport specific which kind
> of implies a transport dependent gso proto op of some sort.
> 
> Or we just hack it, admit that only TCP creates GSO packets, and
> directly check for TCP protcol and then inspect the TCP header
> length :-)

Herbert how does this look for now?

Of course, we need to do something similar in all kinds of other spots.

Even places like bridging :-/

--------------------
ipv4: Check MSS properly in ip_forward() GSO check.

When we forward packets we decide whether we should send
a frag-needed ICMP back based upon the skb length.

But if this is a GSO packet, we wholesale elide the length
check entirely.

This is wrong, we do have to check things.  Except that the
length validation in this case is not straighforward.

We have to take the gso_size (which is the MSS) and add in
the IP and TCP header to arrive at the length we should use
to compare against the MTU.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 99461f0..7449890 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -51,6 +51,36 @@ static int ip_forward_finish(struct sk_buff *skb)
 	return dst_output(skb);
 }
 
+static bool send_frag_needed(struct sk_buff *skb, struct rtable *rt)
+{
+	unsigned int len_to_check = skb->len;
+
+	if (skb_is_gso(skb)) {
+		unsigned int gso_size = skb_shinfo(skb)->gso_size;
+		unsigned int ihl = ip_hdr(skb)->ihl * 4;
+		struct tcphdr th_stack, *th;
+
+		if (WARN_ON_ONCE(ip_hdr(skb)->protocol != IPPROTO_TCP))
+			return false;
+
+		th = skb_header_pointer(skb, ihl, sizeof(th_stack),
+					&th_stack);
+		if (!th)
+			return false;
+
+		len_to_check = gso_size + ihl + (th->doff * 4);
+	}
+
+	if (len_to_check <= dst_mtu(&rt->dst))
+		return false;
+	if (!(ip_hdr(skb)->frag_off & htons(IP_DF)))
+		return false;
+	if (skb->local_df)
+		return false;
+
+	return true;
+}
+
 int ip_forward(struct sk_buff *skb)
 {
 	struct iphdr *iph;	/* Our header */
@@ -87,8 +117,7 @@ int ip_forward(struct sk_buff *skb)
 	if (opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
 		goto sr_failed;
 
-	if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
-		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
+	if (unlikely(send_frag_needed(skb, rt))) {
 		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(dst_mtu(&rt->dst)));

^ permalink raw reply related

* Re: GRO/GSO hiding PMTU?
From: David Miller @ 2011-02-11  6:22 UTC (permalink / raw)
  To: herbert; +Cc: netdev, netfilter-devel
In-Reply-To: <20110210235022.GA25293@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 11 Feb 2011 10:50:22 +1100

> On Thu, Feb 10, 2011 at 02:55:55PM -0800, David Miller wrote:
>>
>> I suspect that the packet arrives on eth1, accumulates into GRO, and
>> thus marked as GSO as well, then GSO/TSO on output to eth0 is
>> re-segmenting things transparently, and we're not getting the ICMP
>> frag-needed message and the packet drop because of the skb_is_gso()
>> check in ip_forward().
>> 
>> 	if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
>> 		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
>> 		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
>> 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
>> 			  htonl(dst_mtu(&rt->dst)));
>> 		goto drop;
>> 	}
>> 
>> So if that's what is happening, that's cute, but I think we need to
>> fix this :-)
> 
> Yes this is a known problem and we do need to fix this, even if
> it doesn't appear to be the cause of your immediate issue :)

I gave it a shot but it isn't easy.  We can figure out the length of
the IP headers just fine, but the rest of the value we need to add
to the MSS (the TCP header length) is transport specific which kind
of implies a transport dependent gso proto op of some sort.

Or we just hack it, admit that only TCP creates GSO packets, and
directly check for TCP protcol and then inspect the TCP header
length :-)

^ permalink raw reply

* Re: [PATCH] net: fix ifenslave build flags
From: David Miller @ 2011-02-11  4:04 UTC (permalink / raw)
  To: randy.dunlap; +Cc: netdev, alexey.salmin
In-Reply-To: <4D54826D.3060406@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Thu, 10 Feb 2011 16:27:25 -0800

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> -I (include path) should be specified for host builds.
> This one was overlooked somehow.  Fixes
> https://bugzilla.kernel.org/show_bug.cgi?id=25902
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> Reported-by: Alexey Salmin <alexey.salmin@gmail.com>

I'll apply this, thanks Randy.

^ permalink raw reply

* Re: [PATCH] tg3: Expand 5719 workaround
From: David Miller @ 2011-02-11  4:04 UTC (permalink / raw)
  To: mcarlson; +Cc: netdev
In-Reply-To: <1297389447-5324-1-git-send-email-mcarlson@broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Thu, 10 Feb 2011 17:57:27 -0800

> As a precautionary measure, expand the fix submitted in commit
> 4d163b75e979833979cc401ae433cb1d7743d57e entitled "tg3: Fix 5719 A0 tx
> completion bug" to apply to all 5719 revisions.
> 
> Signed-off-by: Matt Carlson <mcarlson@broadcom.com>

Applied to net-next-2.6, since that is where this applied cleanly.

If you want this in net-2.6 too, just submit a patch which applies
cleanly there.

Thanks.

^ permalink raw reply

* Re: [PATCH] Add JMEMCMP to Berkeley Packet Filters
From: Ian Molton @ 2011-02-11  2:02 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Eric Dumazet, netdev, rdunlap, isdn, paulus, arnd, davem, herbert,
	ebiederm, alban.crequy, astanciu
In-Reply-To: <AANLkTimOzvMx0FUaqxaa2QVV2FzsuhP8gQVX_CFmhiNR@mail.gmail.com>

On 10/02/11 15:27, Octavian Purdila wrote:
> On Thu, Feb 10, 2011 at 3:35 PM, Ian Molton<ian.molton@collabora.co.uk>  wrote:
>> On 10/02/11 13:24, Eric Dumazet wrote:
>>
>> Hi!
>>
>> Thanks for reviewing! :)
>>
>>>> * Can sk_run_filter() be called in a context where kmalloc(GFP_KERNEL) is
>>>>    not allowed (I think not)
>>>
>>> You cannot use GFP_KERNEL in sk_run_filter() : We run in {soft}irq mode,
>>> in input path.
>>
>> Ok, that can be sorted.
>>
>>>> * Data section allocated with second call to sock_kmalloc().
>>>> * Should the patch be broken into two - one to add the data uploading,
>>>>    one to add the JMEMCMP insn. ?
>>>
>>> May I ask why it is needed at all ?
>>
>> So we can match strings in packet filters... I don't think I understand the
>> question...
>>
>
> Adding a data section (some sort of persistent memory storage that the
> filter can access) is also useful for creating capture triggers, e.g.
> starting capturing after a marker packet has arrived.

Nice to see that people are thinking of more use-cases.

Eric, I think I understand what you meant now...

Our use case is experimental for now, so I wanted to see if other people 
would find this useful, as adding an experimental feature that is never 
used seems pointless.

In our case, we need to match strings in d-bus packets. if the packet is 
not intended for the recipient, it gets dropped, thus avoiding a 
pointless context switch.

-Ian

^ permalink raw reply

* [PATCH] tg3: Expand 5719 workaround
From: Matt Carlson @ 2011-02-11  1:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson

As a precautionary measure, expand the fix submitted in commit
4d163b75e979833979cc401ae433cb1d7743d57e entitled "tg3: Fix 5719 A0 tx
completion bug" to apply to all 5719 revisions.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/tg3.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index cc06952..ecb3eb0 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -13318,7 +13318,7 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
 	}
 
 	/* Determine TSO capabilities */
-	if (tp->pci_chip_rev_id == CHIPREV_ID_5719_A0)
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5719)
 		; /* Do nothing. HW bug. */
 	else if (tp->tg3_flags3 & TG3_FLG3_5717_PLUS)
 		tp->tg3_flags2 |= TG3_FLG2_HW_TSO_3;
@@ -13372,7 +13372,7 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
 	}
 
 	if ((tp->tg3_flags3 & TG3_FLG3_5717_PLUS) &&
-	    tp->pci_chip_rev_id != CHIPREV_ID_5719_A0)
+	    GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5719)
 		tp->tg3_flags3 |= TG3_FLG3_USE_JUMBO_BDFLAG;
 
 	if (!(tp->tg3_flags2 & TG3_FLG2_5705_PLUS) ||
-- 
1.7.3.4



^ permalink raw reply related

* [ethtool PATCH 2/2] Add RX packet classification interface
From: Alexander Duyck @ 2011-02-11  1:18 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev
In-Reply-To: <20110211010806.23554.98333.stgit@gitlad.jf.intel.com>

From: Santwona Behera <santwona.behera@sun.com>

This patch was originally introduced as:
  [PATCH 1/3] [ethtool] Add rx pkt classification interface
  Signed-off-by: Santwona Behera <santwona.behera@sun.com>
  http://patchwork.ozlabs.org/patch/23223/

I have updated it to address a number of issues.  As a result I removed the
local caching of rules due to the fact that there were memory leaks in this
code and the rule manager would consume over 1Mb of space for an 8K table
when all that was needed was 1K in order to store which rules were active
and which were not.

In addition I dropped the use of regions as there were multiple issue found
including the fact that the regions were not properly expanding beyond 2
and the fact that the regions required reading all of the rules in order to
correctly expand beyond 2.  By dropping the regions from the rule manager
it is possible to write a much cleaner interface leaving region management
to be done by either the driver or by external management scripts.

I also added an ethtool bitops interface to allow for simple bit set and
test activities since the rule manager can most efficiently store the list
of active rules via a bitmap.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 Makefile.am      |    3 
 ethtool-bitops.h |   25 ++
 ethtool-util.h   |   13 +
 ethtool.8.in     |  101 ++++++-
 ethtool.c        |  144 +++++++++-
 rxclass.c        |  809 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1077 insertions(+), 18 deletions(-)
 create mode 100644 ethtool-bitops.h
 create mode 100644 rxclass.c

diff --git a/Makefile.am b/Makefile.am
index a0d2116..0262c31 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -8,7 +8,8 @@ ethtool_SOURCES = ethtool.c ethtool-copy.h ethtool-util.h	\
 		  amd8111e.c de2104x.c e100.c e1000.c igb.c	\
 		  fec_8xx.c ibm_emac.c ixgb.c ixgbe.c natsemi.c	\
 		  pcnet32.c realtek.c tg3.c marvell.c vioc.c	\
-		  smsc911x.c at76c50x-usb.c sfc.c stmmac.c
+		  smsc911x.c at76c50x-usb.c sfc.c stmmac.c	\
+		  rxclass.c
 
 dist-hook:
 	cp $(top_srcdir)/ethtool.spec $(distdir)
diff --git a/ethtool-bitops.h b/ethtool-bitops.h
new file mode 100644
index 0000000..7101056
--- /dev/null
+++ b/ethtool-bitops.h
@@ -0,0 +1,25 @@
+#ifndef ETHTOOL_BITOPS_H__
+#define ETHTOOL_BITOPS_H__
+
+#define BITS_PER_LONG		__WORDSIZE
+#define BITS_PER_BYTE		8
+#define DIV_ROUND_UP(n, d)	(((n) + (d) - 1) / (d))
+#define BITS_TO_LONGS(nr)	DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
+
+static inline void set_bit(int nr, unsigned long *addr)
+{
+	addr[nr / BITS_PER_LONG] |= 1UL << (nr % BITS_PER_LONG);
+}
+
+static inline void clear_bit(int nr, unsigned long *addr)
+{
+	addr[nr / BITS_PER_LONG] &= ~(1UL << (nr % BITS_PER_LONG));
+}
+
+static __always_inline int test_bit(unsigned int nr, const unsigned long *addr)
+{
+	return ((1UL << (nr % BITS_PER_LONG)) &
+		(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0UL;
+}
+
+#endif
diff --git a/ethtool-util.h b/ethtool-util.h
index f053028..e9300e2 100644
--- a/ethtool-util.h
+++ b/ethtool-util.h
@@ -103,4 +103,17 @@ int sfc_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
 int st_mac100_dump_regs(struct ethtool_drvinfo *info,
 			struct ethtool_regs *regs);
 int st_gmac_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
+
+/* Rx flow classification */
+#include <sys/ioctl.h>
+#include <net/if.h>
+
+int rxclass_parse_ruleopts(char **optstr, int opt_cnt,
+			   struct ethtool_rx_flow_spec *fsp, __u8 *loc_valid);
+int rxclass_rule_getall(int fd, struct ifreq *ifr);
+int rxclass_rule_get(int fd, struct ifreq *ifr, __u32 loc);
+int rxclass_rule_ins(int fd, struct ifreq *ifr,
+		     struct ethtool_rx_flow_spec *fsp, __u8 loc_valid);
+int rxclass_rule_del(int fd, struct ifreq *ifr, __u32 loc);
+
 #endif
diff --git a/ethtool.8.in b/ethtool.8.in
index 133825b..c183a3d 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -40,21 +40,36 @@
 [\\fB\\$1\\fP\ \\fIN\\fP]
 ..
 .\"
+.\"	.BM - same as above but has a mask field for format "[value N [m N]]"
+.\"
+.de BM
+[\\fB\\$1\\fP\ \\fIN\\fP\ [\\fBm\\fP\ \\fIN\\fP]]
+..
+.\"
 .\"	\(*MA - mac address
 .\"
 .ds MA \fIxx\fP\fB:\fP\fIyy\fP\fB:\fP\fIzz\fP\fB:\fP\fIaa\fP\fB:\fP\fIbb\fP\fB:\fP\fIcc\fP
 .\"
+.\"	\(*PA - IP address
+.\"
+.ds PA \fIx\fP\fB.\fP\fIx\fP\fB.\fP\fIx\fP\fB.\fP\fIx\fP
+.\"
 .\"	\(*WO - wol flags
 .\"
 .ds WO \fBp\fP|\fBu\fP|\fBm\fP|\fBb\fP|\fBa\fP|\fBg\fP|\fBs\fP|\fBd\fP...
 .\"
 .\"	\(*FL - flow type values
 .\"
-.ds FL \fBtcp4\fP|\fBudp4\fP|\fBah4\fP|\fBsctp4\fP|\fBtcp6\fP|\fBudp6\fP|\fBah6\fP|\fBsctp6\fP
+.ds FL \fBtcp4\fP|\fBudp4\fP|\fBah4\fP||\fBesp4\fP|\fBsctp4\fP|\fBtcp6\fP|\fBudp6\fP|\fBah6\fP|\fBesp6\fP|\fBsctp6\fP
 .\"
 .\"	\(*HO - hash options
 .\"
 .ds HO \fBm\fP|\fBv\fP|\fBt\fP|\fBs\fP|\fBd\fP|\fBf\fP|\fBn\fP|\fBr\fP...
+.\"
+.\"	\(*L4 - L4 proto options
+.\"
+.ds L4 \fBtcp\fP|\fBudp\fP|\fBsctp\fP|\fBah\fP|\fBesp\fP|\fIN\fP
+.\"
 .\" Start URL.
 .de UR
 .  ds m1 \\$1\"
@@ -224,11 +239,27 @@ ethtool \- query or control network driver and hardware settings
 .B ethtool \-n
 .I ethX
 .RB [ rx-flow-hash \ \*(FL]
+.RB [ rx-rings ]
+.RB [ rx-class-rule-all ]
+.RB [ rx-class-rule
+.IR N ]
 
 .B ethtool \-N
 .I ethX
-.RB [ rx-flow-hash \ \*(FL
-.RB \ \*(HO]
+.RB [ rx-flow-hash \ \*(FL \  \*(HO]
+.BN rx-class-rule-del
+.RB [ rx-class-rule-add \  ip4 | ip6 \ \*(L4
+.RB [ sip \ \*(PA\ [ m \ \*(PA]]
+.RB [ dip \ \*(PA\ [ m \ \*(PA]]
+.BM tos
+.BM sport
+.BM dport
+.BM spi
+.RB [ ring
+.IR N |
+.BR drop ]
+.RB [ loc
+.IR N ]]
 
 .B ethtool \-x|\-\-show\-rxfh\-indir
 .I ethX
@@ -624,6 +655,15 @@ Retrieves the hash options for the specified network traffic type.
 .PD
 .RE
 .TP
+.B rx-rings
+Retrieves the number of RX rings available for this interface.
+.TP
+.B rx-class-rule-all
+Retrieves all the RX classification rules programmed for this interface.
+.TP
+.BI rx-class-rule \ N
+Retrieves the RX classification rule with the given ID.
+.TP
 .B \-N \-\-config-nfc
 Configures the receive network flow classification.
 .TP
@@ -654,10 +694,63 @@ Hash on bytes 0 and 1 of the Layer 4 header of the rx packet.
 Hash on bytes 2 and 3 of the Layer 4 header of the rx packet.
 .TP 3
 .B r
-Discard all packets of this flow type. When this option is set, all other options are ignored.
+Discard all packets of this flow type. When this option is set, all
+other options are ignored.
 .PD
 .RE
 .TP
+.BI rx-class-rule-del \ N
+Deletes the RX classification rule with the given ID.
+.TP
+.BR rx-class-rule-add
+Adds an RX packet classification rule.
+.TP
+.A1 ip4 ip6
+Select IP version for the rule - IPv4 or IPv6
+.TP
+.RB \*(L4
+Select the L4 protocol for the rule. An integer value for a user defined
+protocol can be specified.
+.TP
+.BR sip \ \*(PA\ [ m \ \*(PA]
+Specify the source IP address of the incoming packet to
+match along with an optional mask.
+.TP
+.BR dip \ \*(PA\ [ m \ \*(PA]
+Specify the destination IP address of the incoming packet to
+match along with an optional mask.
+.TP
+.BI tos \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the Type of Service field in the incoming packet to
+match along with an optional mask.
+.TP
+.BI sport \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the source port field (applicable to
+TCP/UDP packets)in the incoming packet to match along with an
+optional mask.
+.TP
+.BI dport \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the destination port field (applicable to
+TCP/UDP packets)in the incoming packet to match along with an
+optional mask.
+.TP
+.BI spi \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the SPI field (applicable to
+SCTP packets)in the incoming packet to match along with an
+optional mask.
+.TP
+.BI ring \ N
+.B |
+.BR drop
+Specify the RX ring index to which a packet matching this
+rule should be steered, or specify if the matching packet
+should be dropped.
+.TP
+.BI loc \ \ \ N
+Specify the location/ID to insert the rule. This will overwrite
+any rule present in that location and will not go through any
+of the rule ordering process.
+.TP
 .B \-x \-\-show\-rxfh\-indir
 Retrieves the receive flow hash indirection table.
 .TP
diff --git a/ethtool.c b/ethtool.c
index 1afdfe4..b624980 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -6,6 +6,7 @@
  * Kernel 2.4 update Copyright 2001 Jeff Garzik <jgarzik@mandrakesoft.com>
  * Wake-on-LAN,natsemi,misc support by Tim Hockin <thockin@sun.com>
  * Portions Copyright 2002 Intel
+ * Portions Copyright (C) Sun Microsystems 2008
  * do_test support by Eli Kupermann <eli.kupermann@intel.com>
  * ETHTOOL_PHYS_ID support by Chris Leech <christopher.leech@intel.com>
  * e1000 support by Scott Feldman <scott.feldman@intel.com>
@@ -14,6 +15,7 @@
  * amd8111e support by Reeja John <reeja.john@amd.com>
  * long arguments by Andi Kleen.
  * SMSC LAN911x support by Steve Glendinning <steve.glendinning@smsc.com>
+ * Rx Network Flow Control configuration support <santwona.behera@sun.com>
  * Various features by Ben Hutchings <bhutchings@solarflare.com>;
  *	Copyright 2009, 2010 Solarflare Communications
  *
@@ -32,7 +34,7 @@
 #include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <stdio.h>
-#include <string.h>
+#include <strings.h>
 #include <errno.h>
 #include <net/if.h>
 #include <sys/utsname.h>
@@ -44,6 +46,8 @@
 #include <arpa/inet.h>
 
 #include <linux/sockios.h>
+#include <sys/socket.h>
+#include <arpa/inet.h>
 #include "ethtool-util.h"
 
 
@@ -232,15 +236,28 @@ static struct option {
     { "-S", "--statistics", MODE_GSTATS, "Show adapter statistics" },
     { "-n", "--show-nfc", MODE_GNFC, "Show Rx network flow classification "
 		"options",
-		"		[ rx-flow-hash tcp4|udp4|ah4|sctp4|"
-		"tcp6|udp6|ah6|sctp6 ]\n" },
+		"		[ rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|"
+		"tcp6|udp6|ah6|esp6|sctp6 ]\n"
+		"		[ rx-rings ]\n"
+		"		[ rx-class-rule-all ]\n"
+		"		[ rx-class-rule %d ]\n"},
     { "-f", "--flash", MODE_FLASHDEV, "FILENAME " "Flash firmware image "
     		"from the specified file to a region on the device",
 		"               [ REGION-NUMBER-TO-FLASH ]\n" },
     { "-N", "--config-nfc", MODE_SNFC, "Configure Rx network flow "
 		"classification options",
-		"		[ rx-flow-hash tcp4|udp4|ah4|sctp4|"
-		"tcp6|udp6|ah6|sctp6 m|v|t|s|d|f|n|r... ]\n" },
+		"		[ rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|"
+		"tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r... ]\n"
+		"		[ rx-class-rule-del %d ]\n"
+		"		[ rx-class-rule-add ip4|ip6 tcp|udp|sctp|ah|esp|%d\n"
+		"			[sip %d.%d.%d.%d [m %d.%d.%d.%d]]\n"
+		"			[dip %d.%d.%d.%d [m %d.%d.%d.%d]]\n"
+		"			[tos %d [m %x]]\n"
+		"			[sport %d [m %x]]\n"
+		"			[dport %d [m %x]]\n"
+		"			[spi %d [m %x]]\n"
+		"			[ring %d | drop]\n"
+		"			[loc %d]]\n"},
     { "-x", "--show-rxfh-indir", MODE_GRXFHINDIR, "Show Rx flow hash "
 		"indirection" },
     { "-X", "--set-rxfh-indir", MODE_SRXFHINDIR, "Set Rx flow hash indirection",
@@ -408,6 +425,14 @@ static int msglvl_changed;
 static u32 msglvl_wanted = 0;
 static u32 msglvl_mask = 0;
 
+static int rx_rings_get = 0;
+static int rx_class_rule_get = -1;
+static int rx_class_rule_getall = 0;
+static int rx_class_rule_del = -1;
+static int rx_class_rule_added = 0;
+static struct ethtool_rx_flow_spec rx_rule_fs;
+static u8 rxclass_loc_valid = 0;
+
 static enum {
 	ONLINE=0,
 	OFFLINE,
@@ -777,7 +802,9 @@ static int rxflow_str_to_type(const char *str)
 	else if (!strcmp(str, "udp4"))
 		flow_type = UDP_V4_FLOW;
 	else if (!strcmp(str, "ah4"))
-		flow_type = AH_ESP_V4_FLOW;
+		flow_type = AH_V4_FLOW;
+	else if (!strcmp(str, "esp4"))
+		flow_type = ESP_V4_FLOW;
 	else if (!strcmp(str, "sctp4"))
 		flow_type = SCTP_V4_FLOW;
 	else if (!strcmp(str, "tcp6"))
@@ -785,7 +812,9 @@ static int rxflow_str_to_type(const char *str)
 	else if (!strcmp(str, "udp6"))
 		flow_type = UDP_V6_FLOW;
 	else if (!strcmp(str, "ah6"))
-		flow_type = AH_ESP_V6_FLOW;
+		flow_type = AH_V6_FLOW;
+	else if (!strcmp(str, "esp6"))
+		flow_type = ESP_V6_FLOW;
 	else if (!strcmp(str, "sctp6"))
 		flow_type = SCTP_V6_FLOW;
 	else if (!strcmp(str, "ether"))
@@ -945,6 +974,23 @@ static void parse_cmdline(int argc, char **argp)
 						rxflow_str_to_type(argp[i]);
 					if (!rx_fhash_get)
 						show_usage(1);
+				} else if (!strcmp(argp[i], "rx-rings")) {
+					i += 1;
+					rx_rings_get = 1;
+				} else if (!strcmp(argp[i],
+						   "rx-class-rule-all")) {
+					i += 1;
+					rx_class_rule_getall = 1;
+				} else if (!strcmp(argp[i], "rx-class-rule")) {
+					i += 1;
+					if (i >= argc) {
+						show_usage(1);
+						break;
+					}
+					rx_class_rule_get =
+						strtol(argp[i], NULL, 0);
+					if (rx_class_rule_get < 0)
+						show_usage(1);
 				} else
 					show_usage(1);
 				break;
@@ -978,8 +1024,37 @@ static void parse_cmdline(int argc, char **argp)
 						show_usage(1);
 					else
 						rx_fhash_changed = 1;
-				} else
+				} else if (!strcmp(argp[i],
+						   "rx-class-rule-del")) {
+					i += 1;
+					if (i >= argc) {
+						show_usage(1);
+						break;
+					}
+					rx_class_rule_del =
+						strtol(argp[i], NULL, 0);
+					if (rx_class_rule_del < 0)
+						show_usage(1);
+				} else if (!strcmp(argp[i],
+						   "rx-class-rule-add")) {
+					i += 1;
+					if (i >= argc) {
+						show_usage(1);
+						break;
+					}
+					if (rxclass_parse_ruleopts(&argp[i],
+								   argc - i,
+								   &rx_rule_fs,
+								   &rxclass_loc_valid) < 0) {
+						show_usage(1);
+					} else {
+						i = argc;
+						rx_class_rule_added = 1;
+					}
+				} else {
 					show_usage(1);
+				}
+
 				break;
 			}
 			if (mode == MODE_SRXFHINDIR) {
@@ -1917,9 +1992,12 @@ static int dump_rxfhash(int fhash, u64 val)
 	case SCTP_V4_FLOW:
 		fprintf(stdout, "SCTP over IPV4 flows");
 		break;
-	case AH_ESP_V4_FLOW:
+	case AH_V4_FLOW:
 		fprintf(stdout, "IPSEC AH over IPV4 flows");
 		break;
+	case ESP_V4_FLOW:
+		fprintf(stdout, "IPSEC ESP over IPV4 flows");
+		break;
 	case TCP_V6_FLOW:
 		fprintf(stdout, "TCP over IPV6 flows");
 		break;
@@ -1929,9 +2007,12 @@ static int dump_rxfhash(int fhash, u64 val)
 	case SCTP_V6_FLOW:
 		fprintf(stdout, "SCTP over IPV6 flows");
 		break;
-	case AH_ESP_V6_FLOW:
+	case AH_V6_FLOW:
 		fprintf(stdout, "IPSEC AH over IPV6 flows");
 		break;
+	case ESP_V6_FLOW:
+		fprintf(stdout, "IPSEC ESP over IPV6 flows");
+		break;
 	default:
 		break;
 	}
@@ -2911,14 +2992,12 @@ static int do_gstats(int fd, struct ifreq *ifr)
 	return 0;
 }
 
-
 static int do_srxclass(int fd, struct ifreq *ifr)
 {
 	int err;
+	struct ethtool_rxnfc nfccmd;
 
 	if (rx_fhash_changed) {
-		struct ethtool_rxnfc nfccmd;
-
 		nfccmd.cmd = ETHTOOL_SRXFH;
 		nfccmd.flow_type = rx_fhash_set;
 		nfccmd.data = rx_fhash_val;
@@ -2930,6 +3009,20 @@ static int do_srxclass(int fd, struct ifreq *ifr)
 
 	}
 
+	if (rx_class_rule_added) {
+		err = rxclass_rule_ins(fd, ifr, &rx_rule_fs,
+				       rxclass_loc_valid);
+		if (err < 0)
+			fprintf(stderr, "Cannot insert RX classification rule\n");
+	}
+
+	if (rx_class_rule_del >= 0) {
+		err = rxclass_rule_del(fd, ifr, rx_class_rule_del);
+
+		if (err < 0)
+			fprintf(stderr, "Cannot delete RX classification rule\n");
+	}
+
 	return 0;
 }
 
@@ -2950,6 +3043,31 @@ static int do_grxclass(int fd, struct ifreq *ifr)
 			dump_rxfhash(rx_fhash_get, nfccmd.data);
 	}
 
+	if (rx_rings_get) {
+		struct ethtool_rxnfc nfccmd;
+
+		nfccmd.cmd = ETHTOOL_GRXRINGS;
+		ifr->ifr_data = (caddr_t)&nfccmd;
+		err = ioctl(fd, SIOCETHTOOL, ifr);
+		if (err < 0)
+			perror("Cannot get RX rings");
+		else
+			fprintf(stdout, "%d RX rings available\n",
+				(int)nfccmd.data);
+	}
+
+	if (rx_class_rule_get >= 0) {
+		err = rxclass_rule_get(fd, ifr, rx_class_rule_get);
+		if (err < 0)
+			fprintf(stderr, "Cannot get RX classification rule\n");
+	}
+
+	if (rx_class_rule_getall) {
+		err = rxclass_rule_getall(fd, ifr);
+		if (err < 0)
+			fprintf(stderr, "RX classification rule retrieval failed\n");
+	}
+
 	return 0;
 }
 
diff --git a/rxclass.c b/rxclass.c
new file mode 100644
index 0000000..fd01a32
--- /dev/null
+++ b/rxclass.c
@@ -0,0 +1,809 @@
+/*
+ * Copyright (C) 2008 Sun Microsystems, Inc. All rights reserved.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <strings.h>
+
+#include <linux/sockios.h>
+#include <arpa/inet.h>
+#include "ethtool-util.h"
+#include "ethtool-bitops.h"
+
+/*
+ * This is a rule manager implementation for ordering rx flow
+ * classification rules in a longest prefix first match order.
+ * The assumption is that this rule manager is the only one adding rules to
+ * the device's hardware classifier.
+ */
+
+struct rmgr_ctrl {
+	/* slot contains a bitmap indicating which filters are valid */
+	unsigned long		*slot;
+	__u32			n_rules;
+	__u32			size;
+};
+
+static struct rmgr_ctrl rmgr;
+static int rmgr_init_done = 0;
+
+#ifndef SIOCETHTOOL
+#define SIOCETHTOOL     0x8946
+#endif
+
+static void rmgr_print_ipv4_rule(struct ethtool_rx_flow_spec *fsp)
+{
+	char		chan[16];
+	char		l4_proto[16];
+	__u32		sip, dip, sipm, dipm;
+
+	sip = ntohl(fsp->h_u.tcp_ip4_spec.ip4src);
+	dip = ntohl(fsp->h_u.tcp_ip4_spec.ip4dst);
+	sipm = ntohl(fsp->m_u.tcp_ip4_spec.ip4src);
+	dipm = ntohl(fsp->m_u.tcp_ip4_spec.ip4dst);
+
+	if (fsp->ring_cookie != RX_CLS_FLOW_DISC)
+		sprintf(chan, "Rx Ring [%d]", (int)fsp->ring_cookie);
+	else
+		sprintf(chan, "Discard");
+
+	switch (fsp->flow_type) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+	case IP_USER_FLOW:
+		fprintf(stdout,
+			"      IPv4 Rule:  ID[%d] Target[%s]\n"
+			"      IP src addr[%d.%d.%d.%d] mask[%d.%d.%d.%d]\n"
+			"      IP dst addr[%d.%d.%d.%d] mask[%d.%d.%d.%d]\n"
+			"      IP TOS[0x%x] mask[0x%x]\n",
+			fsp->location, chan,
+			(sip & 0xff000000) >> 24,
+			(sip & 0xff0000) >> 16,
+			(sip & 0xff00) >> 8,
+			sip & 0xff,
+			(sipm & 0xff000000) >> 24,
+			(sipm & 0xff0000) >> 16,
+			(sipm & 0xff00) >> 8,
+			sipm & 0xff,
+			(dip & 0xff000000) >> 24,
+			(dip & 0xff0000) >> 16,
+			(dip & 0xff00) >> 8,
+			dip & 0xff,
+			(dipm & 0xff000000) >> 24,
+			(dipm & 0xff0000) >> 16,
+			(dipm & 0xff00) >> 8,
+			dipm & 0xff,
+			fsp->h_u.tcp_ip4_spec.tos,
+			fsp->m_u.tcp_ip4_spec.tos);
+
+		switch (fsp->flow_type) {
+		case TCP_V4_FLOW:
+			sprintf(l4_proto, "TCP");
+			break;
+		case UDP_V4_FLOW:
+			sprintf(l4_proto, "UDP");
+			break;
+		case SCTP_V4_FLOW:
+			sprintf(l4_proto, "SCTP");
+			break;
+		case AH_V4_FLOW:
+			sprintf(l4_proto, "AH");
+			break;
+		case ESP_V4_FLOW:
+			sprintf(l4_proto, "ESP");
+			break;
+		default:
+			break;
+		}
+		switch (fsp->flow_type) {
+		case TCP_V4_FLOW:
+		case UDP_V4_FLOW:
+		case SCTP_V4_FLOW:
+			fprintf(stdout,
+				"      L4 proto[%s]\n"
+				"      L4 src port[%d] mask[0x%x]\n"
+				"      L4 dst port[%d] mask[0x%x]\n",
+				l4_proto,
+				ntohs(fsp->h_u.tcp_ip4_spec.psrc),
+				ntohs(fsp->m_u.tcp_ip4_spec.psrc),
+				ntohs(fsp->h_u.tcp_ip4_spec.pdst),
+				ntohs(fsp->m_u.tcp_ip4_spec.pdst));
+			break;
+		case AH_V4_FLOW:
+		case ESP_V4_FLOW:
+			fprintf(stdout,
+				"      L4 proto[%s]\n"
+				"      L4 SPI[%d] mask[0x%x]\n",
+				l4_proto,
+				ntohl(fsp->h_u.esp_ip4_spec.spi),
+				ntohl(fsp->m_u.esp_ip4_spec.spi));
+			break;
+		case IP_USER_FLOW:
+			fprintf(stdout,
+				"      L4 user proto[%d]\n"
+				"      L4 first 4 bytes[0x%x] mask[0x%x]\n",
+				fsp->h_u.usr_ip4_spec.proto,
+				ntohl(fsp->h_u.usr_ip4_spec.l4_4_bytes),
+				ntohl(fsp->m_u.usr_ip4_spec.l4_4_bytes));
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		fprintf(stdout,
+			"      Unknown L4 proto, type[%d]\n", fsp->flow_type);
+		break;
+	}
+
+	fprintf(stdout, "\n\n");
+}
+
+static void rmgr_print_rule(struct ethtool_rx_flow_spec *fsp)
+{
+	/* print the rule in this location */
+	switch (fsp->flow_type) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+		rmgr_print_ipv4_rule(fsp);
+		break;
+	case IP_USER_FLOW:
+		if (fsp->h_u.usr_ip4_spec.ip_ver == ETH_RX_NFC_IP4) {
+			rmgr_print_ipv4_rule(fsp);
+			break;
+		}
+		/* IPv6 User Flow falls through to the case below */
+	case TCP_V6_FLOW:
+	case UDP_V6_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V6_FLOW:
+		fprintf(stderr, "IPv6 flows not implemented\n");
+		break;
+	default:
+		fprintf(stderr, "rmgr: Unknown flow type\n");
+		break;
+	}
+}
+
+static int rmgr_ins(__u32 loc)
+{
+	/* verify location is in rule manager range */
+	if ((loc < 0) || (loc >= rmgr.size)) {
+		fprintf(stderr, "rmgr: Location out of range\n");
+		return -1;
+	}
+
+	/* set bit for the rule */
+	set_bit(loc, rmgr.slot);
+
+	return 0;
+}
+
+static int rmgr_find(__u32 loc)
+{
+	/* verify location is in rule manager range */
+	if ((loc < 0) || (loc >= rmgr.size)) {
+		fprintf(stderr, "rmgr: Location out of range\n");
+		return -1;
+	}
+
+	/* if slot is found return 0 indicating success */
+	if (test_bit(loc, rmgr.slot))
+		return 0;
+
+	/* rule not found */
+	fprintf(stderr, "rmgr: No such rule\n");
+	return -1;
+}
+
+static int rmgr_del(__u32 loc)
+{
+	/* verify rule exists before attempting to delete */
+	int err = rmgr_find(loc);
+	if (err)
+		return err;
+
+	/* clear bit for the rule */
+	clear_bit(loc, rmgr.slot);
+
+	return 0;
+}
+
+static int rmgr_add(struct ethtool_rx_flow_spec *fsp, __u8 loc_valid)
+{
+	__u32 loc = fsp->location;
+
+	/* location provided, insert rule and update regions to match rule */
+	if (loc_valid)
+		return rmgr_ins(loc);
+
+	/* find an open slot */
+	for (loc = 0; loc < rmgr.size; loc += BITS_PER_LONG) {
+		if ((rmgr.slot[loc / BITS_PER_LONG]) != ~0UL)
+			break;
+	}
+
+	/* find and use available location in slot */
+	for (; loc < rmgr.size; loc++) {
+		if (!test_bit(loc, rmgr.slot)) {
+			fsp->location = loc;
+			return rmgr_ins(loc);
+		}
+	}
+
+	/* No space to add this rule */
+	fprintf(stderr, "rmgr: Cannot find appropriate slot to insert rule\n");
+
+	return -1;
+}
+
+static int rmgr_init(int fd, struct ifreq *ifr)
+{
+	struct ethtool_rxnfc *nfccmd;
+	int err, i;
+	__u32 *rule_locs;
+
+	if (rmgr_init_done)
+		return 0;
+
+	/* clear rule manager settings */
+	bzero((void *)&rmgr, sizeof(struct rmgr_ctrl));
+
+	/* allocate memory for count request */
+	nfccmd = calloc(1, sizeof(*nfccmd));
+	if (!nfccmd) {
+		perror("rmgr: Cannot allocate memory for RX class rule data");
+		return -1;
+	}
+
+	/* request count and store in rmgr.n_rules */
+	nfccmd->cmd = ETHTOOL_GRXCLSRLCNT;
+	ifr->ifr_data = (caddr_t)nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	rmgr.n_rules = nfccmd->rule_cnt;
+	free(nfccmd);
+	if (err < 0) {
+		perror("rmgr: Cannot get RX class rule count");
+		return -1;
+	}
+
+	/* alloc memory for request of location list */
+	nfccmd = calloc(1, sizeof(*nfccmd) + (rmgr.n_rules * sizeof(__u32)));
+	if (!nfccmd) {
+		perror("rmgr: Cannot allocate memory for RX class rule locations");
+		return -1;
+	}
+
+	/* request location list */
+	nfccmd->cmd = ETHTOOL_GRXCLSRLALL;
+	nfccmd->rule_cnt = rmgr.n_rules;
+	ifr->ifr_data = (caddr_t)nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot get RX class rules");
+		free(nfccmd);
+		return -1;
+	}
+
+	/* intitialize bitmap for storage of valid locations */
+	rmgr.size = nfccmd->data;
+	rmgr.slot = calloc(1, BITS_TO_LONGS(rmgr.size) * sizeof(long));
+	if (!rmgr.slot) {
+		perror("rmgr: Cannot allocate memory for RX class rules");
+		return -1;
+	}
+
+	/* write locations to bitmap */
+	rule_locs = nfccmd->rule_locs;
+	for (i = 0; i < rmgr.n_rules; i++) {
+		err = rmgr_ins(rule_locs[i]);
+		if (err < 0)
+			break;
+	}
+
+	/* free memory and set flag to avoid reinit */
+	free(nfccmd);
+	rmgr_init_done = 1;
+
+	return err;
+}
+
+static void rmgr_cleanup(void)
+{
+	if (!rmgr_init_done)
+		return;
+
+	rmgr_init_done = 0;
+
+	free(rmgr.slot);
+	rmgr.slot = NULL;
+	rmgr.size = 0;
+}
+
+int rxclass_rule_getall(int fd, struct ifreq *ifr)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err, i, j;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	fprintf(stdout, "Total %d rules\n\n", rmgr.n_rules);
+
+	/* fetch and display all available rules */
+	for (i = 0; i < rmgr.size; i += BITS_PER_LONG) {
+		if (rmgr.slot[i / BITS_PER_LONG] == 0UL)
+			continue;
+		for (j = 0; j < BITS_PER_LONG; j++) {
+			if (!test_bit(i + j, rmgr.slot))
+				continue;
+			nfccmd.cmd = ETHTOOL_GRXCLSRULE;
+			bzero(&nfccmd.fs, sizeof(struct ethtool_rx_flow_spec));
+			nfccmd.fs.location = i + j;
+			ifr->ifr_data = (caddr_t)&nfccmd;
+			err = ioctl(fd, SIOCETHTOOL, ifr);
+			if (err < 0) {
+				perror("rmgr: Cannot get RX class rule");
+				return -1;
+			}
+			rmgr_print_rule(&nfccmd.fs);
+		}
+	}
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+int rxclass_rule_get(int fd, struct ifreq *ifr, __u32 loc)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	/* verify rule exists before attempting to display */
+	err = rmgr_find(loc);
+	if (err < 0)
+		return err;
+
+	/* fetch rule from netdev and display */
+	nfccmd.cmd = ETHTOOL_GRXCLSRULE;
+	bzero(&nfccmd.fs, sizeof(struct ethtool_rx_flow_spec));
+	nfccmd.fs.location = loc;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot get RX class rule");
+		return -1;
+	}
+	rmgr_print_rule(&nfccmd.fs);
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+int rxclass_rule_ins(int fd, struct ifreq *ifr,
+		     struct ethtool_rx_flow_spec *fsp, __u8 loc_valid)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	/* verify rule location */
+	err = rmgr_add(fsp, loc_valid);
+	if (err < 0)
+		return err;
+
+	/* notify netdev of new rule */
+	nfccmd.cmd = ETHTOOL_SRXCLSRLINS;
+	nfccmd.fs = *fsp;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot insert RX class rule");
+		return -1;
+	}
+	rmgr.n_rules++;
+
+	printf("Added rule with ID %d\n", fsp->location);
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+int rxclass_rule_del(int fd, struct ifreq *ifr, __u32 loc)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	/* verify rule exists */
+	err = rmgr_del(loc);
+	if (err < 0)
+		return err;
+
+	/* notify netdev of rule removal */
+	nfccmd.cmd = ETHTOOL_SRXCLSRLDEL;
+	nfccmd.fs.location = loc;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot delete RX class rule");
+		return -1;
+	}
+	rmgr.n_rules--;
+
+	rmgr_cleanup();
+
+	return 0;
+}
+
+int rxclass_parse_ruleopts(char **optstr, int opt_cnt,
+			   struct ethtool_rx_flow_spec *fsp,
+			   u_int8_t *loc_valid)
+{
+	int i = 0;
+
+	u_int8_t discard, ring_set;
+	u_int32_t ipsa, ipsm, ipda, ipdm, spi, spim;
+	u_int16_t sp, spm, dp, dpm;
+	u_int8_t ip_ver, proto, tos, tm;
+	struct in_addr in_addr;
+
+	if (*optstr == NULL || **optstr == '\0' || opt_cnt < 2) {
+		fprintf(stdout, "Add rule, invalid syntax\n");
+		return -1;
+	}
+
+	bzero(fsp, sizeof(struct ethtool_rx_flow_spec));
+	ipsa = ipda = ipsm = ipdm = spi = spim = 0x0;
+	sp = dp = spm = dpm = 0x0;
+	ip_ver = proto = tos = tm = 0x0;
+	discard = ring_set = 0;
+
+	if (!strcmp(optstr[i], "ip4")) {
+		ip_ver = ETH_RX_NFC_IP4;
+	} else if (!strcmp(optstr[i], "ip6")) {
+		fprintf(stdout, "IPv6 not yet implemented\n");
+		return -1;
+	} else {
+		fprintf(stdout, "Add rule, invalid syntax for IP version\n");
+		return -1;
+	}
+
+	i++;
+
+	switch (ip_ver) {
+	case ETH_RX_NFC_IP4:
+		if (!strcmp(optstr[i], "tcp"))
+			fsp->flow_type = TCP_V4_FLOW;
+		else if (!strcmp(optstr[i], "udp"))
+			fsp->flow_type = UDP_V4_FLOW;
+		else if (!strcmp(optstr[i], "sctp"))
+			fsp->flow_type = SCTP_V4_FLOW;
+		else if (!strcmp(optstr[i], "ah"))
+			fsp->flow_type = AH_V4_FLOW;
+		else if (!strcmp(optstr[i], "esp"))
+			fsp->flow_type = ESP_V4_FLOW;
+		break;
+	default:
+		fprintf(stdout, "Add rule, Invalid IP version %d\n", ip_ver);
+			return -1;
+	}
+
+	if (fsp->flow_type == 0) {
+		proto = (u_int8_t)strtoul(optstr[i], (char **)NULL, 0);
+		if (proto != 0) {
+			fprintf(stdout, "Add rule, user defined proto %d\n",
+				proto);
+			fsp->flow_type = IP_USER_FLOW;
+			fsp->h_u.usr_ip4_spec.proto = proto;
+			fsp->h_u.usr_ip4_spec.ip_ver = ip_ver;
+		} else {
+			fprintf(stdout, "Add rule, invalid IP proto %s\n",
+				optstr[i]);
+			return -1;
+		}
+	}
+
+	for (i = 2; i < opt_cnt;) {
+		if (!strcmp(optstr[i], "tos")) {
+			tos = (u_int8_t)strtoul(optstr[i+1], (char **)NULL,
+						 0);
+			tm = 0xff;
+			fsp->h_u.tcp_ip4_spec.tos = tos;
+
+			i += 2;
+			if (opt_cnt > (i+1)) {
+				if (!strcmp(optstr[i], "m")) {
+					tm = (u_int8_t)strtoul(optstr[i+1],
+							       (char **)NULL,
+							       16);
+					i += 2;
+				}
+			}
+			fsp->m_u.tcp_ip4_spec.tos = tm;
+		} else if (!strcmp(optstr[i], "sip")) {
+			if (strchr(optstr[i+1], '.') == NULL) {
+				ipsa = strtoul(optstr[i+1], (char **)NULL, 16);
+			} else {
+				if (!inet_pton(AF_INET, optstr[i+1], &in_addr)) {
+					fprintf(stdout,
+						"Invalid src address [%s]\n" ,
+						optstr[i+1]);
+					return -1;
+				}
+				ipsa = ntohl(in_addr.s_addr);
+			}
+			ipsm = 0xffffffff;
+			fsp->h_u.tcp_ip4_spec.ip4src = ipsa;
+
+			i += 2;
+			if (opt_cnt > (i+1)) {
+				if (!strcmp(optstr[i], "m")) {
+					if (strchr(optstr[i+1], '.') == NULL) {
+						ipsm = strtoul(optstr[i+1],
+							       (char **)NULL,
+							       16);
+					} else {
+						if (!inet_pton(AF_INET,
+							       optstr[i+1],
+							       &in_addr)) {
+							fprintf(stdout,
+								"Invalid smask"
+								"[%s]\n",
+								optstr[i+1]);
+								return -1;
+						}
+						ipsm = ntohl(in_addr.s_addr);
+					}
+					i += 2;
+				}
+			}
+			fsp->m_u.tcp_ip4_spec.ip4src = ipsm;
+		} else if (!strcmp(optstr[i], "dip")) {
+			if (strchr(optstr[i+1], '.') == NULL) {
+				ipda = strtoul(optstr[i+1], (char **)NULL, 16);
+			} else {
+				if (!inet_pton(AF_INET, optstr[i+1], &in_addr)) {
+					fprintf(stdout,
+						"Invalid dst address [%s]\n",
+						optstr[i+1]);
+					return -1;
+				}
+				ipda = ntohl(in_addr.s_addr);
+			}
+			ipdm = 0xffffffff;
+			fsp->h_u.tcp_ip4_spec.ip4dst = ipda;
+
+			i += 2;
+			if (opt_cnt > (i+1)) {
+				if (!strcmp(optstr[i], "m")) {
+					if (strchr(optstr[i+1], '.') == NULL) {
+						ipdm = strtoul(optstr[i+1],
+							       (char **)NULL,
+							       16);
+					} else {
+						if (!inet_pton(AF_INET,
+							       optstr[i+1],
+							       &in_addr)) {
+							fprintf(stdout,
+								"Invalid dmask"
+								"[%s]\n",
+								optstr[i+1]);
+								return -1;
+						}
+						ipdm = ntohl(in_addr.s_addr);
+					}
+					i += 2;
+				}
+			}
+			fsp->m_u.tcp_ip4_spec.ip4dst = ipdm;
+		} else if (!strcmp(optstr[i], "sport")) {
+			switch (fsp->flow_type) {
+			case TCP_V4_FLOW:
+			case UDP_V4_FLOW:
+			case SCTP_V4_FLOW:
+			case TCP_V6_FLOW:
+			case UDP_V6_FLOW:
+			case SCTP_V6_FLOW:
+			case IP_USER_FLOW:
+				break;
+			default:
+				fprintf(stdout, "Invalid option <sport> "
+					"for this flow\n");
+				return -1;
+			}
+			sp = (u_int16_t)strtoul(optstr[i+1], (char **)NULL,
+						 0);
+			spm = 0xffff;
+			if (fsp->flow_type == IP_USER_FLOW) {
+				fsp->h_u.usr_ip4_spec.l4_4_bytes |=
+					((u32)sp << 16);
+			} else {
+				fsp->h_u.tcp_ip4_spec.psrc = sp;
+			}
+			i += 2;
+			if (opt_cnt > (i+1)) {
+				if (!strcmp(optstr[i], "m")) {
+					spm = (u_int16_t)strtoul(optstr[i+1],
+								 (char **)NULL,
+								 16);
+					i += 2;
+				}
+			}
+			if (fsp->flow_type == IP_USER_FLOW) {
+				fsp->m_u.usr_ip4_spec.l4_4_bytes |=
+					((u32)spm << 16);
+			} else {
+				fsp->m_u.tcp_ip4_spec.psrc = spm;
+			}
+		} else if (!strcmp(optstr[i], "dport")) {
+			switch (fsp->flow_type) {
+			case TCP_V4_FLOW:
+			case UDP_V4_FLOW:
+			case SCTP_V4_FLOW:
+			case TCP_V6_FLOW:
+			case UDP_V6_FLOW:
+			case SCTP_V6_FLOW:
+			case IP_USER_FLOW:
+				break;
+			default:
+				fprintf(stdout, "Invalid option <dport> "
+					"for this flow\n");
+				return -1;
+			}
+			dp = (u_int16_t)strtoul(optstr[i+1], (char **)NULL,
+						 0);
+			dpm = 0xffff;
+			if (fsp->flow_type == IP_USER_FLOW)
+				fsp->h_u.usr_ip4_spec.l4_4_bytes |= dp;
+			else
+				fsp->h_u.tcp_ip4_spec.pdst = dp;
+
+			i += 2;
+			if (opt_cnt > (i+1)) {
+				if (!strcmp(optstr[i], "m")) {
+					dpm = (u_int16_t)strtoul(optstr[i+1],
+								 (char **)NULL,
+								 16);
+					i += 2;
+				}
+			}
+			if (fsp->flow_type == IP_USER_FLOW)
+				fsp->m_u.usr_ip4_spec.l4_4_bytes |= dpm;
+			else
+				fsp->m_u.tcp_ip4_spec.pdst = dpm;
+		} else if (!strcmp(optstr[i], "spi")) {
+			switch (fsp->flow_type) {
+			case AH_V4_FLOW:
+			case ESP_V4_FLOW:
+			case AH_V6_FLOW:
+			case ESP_V6_FLOW:
+			case IP_USER_FLOW:
+				break;
+			case TCP_V4_FLOW:
+			case UDP_V4_FLOW:
+			case SCTP_V4_FLOW:
+			case TCP_V6_FLOW:
+			case UDP_V6_FLOW:
+			case SCTP_V6_FLOW:
+			default:
+				fprintf(stdout, "Invalid option <spi> "
+					"for this flow\n");
+				return -1;
+			}
+			spi = (u_int32_t)strtoul(optstr[i+1], (char **)NULL,
+						 0);
+			spim = 0xffffffff;
+			if (fsp->flow_type == IP_USER_FLOW)
+				fsp->h_u.usr_ip4_spec.l4_4_bytes = spi;
+			else
+				fsp->h_u.esp_ip4_spec.spi = spi;
+
+			i += 2;
+			if (opt_cnt > (i+1)) {
+				if (!strcmp(optstr[i], "m")) {
+					spim = (u_int32_t)strtoul(optstr[i+1],
+								 (char **)NULL,
+								 16);
+					i += 2;
+				}
+			}
+			if (fsp->flow_type == IP_USER_FLOW)
+				fsp->m_u.usr_ip4_spec.l4_4_bytes = spim;
+			else
+				fsp->m_u.esp_ip4_spec.spi = spim;
+		} else if (!strcmp(optstr[i], "ring")) {
+			if (discard == 1) {
+				fprintf(stdout, "Invalid syntax - <drop> "
+					"option already specified\n");
+				return -1;
+			}
+			fsp->ring_cookie = strtol(optstr[i+1], (char **)NULL,
+						  0);
+			i += 2;
+			ring_set = 1;
+		} else if (!strcmp(optstr[i], "drop")) {
+			if (ring_set == 1) {
+				fprintf(stdout, "Invalid syntax - <ring> "
+					"option already specified\n");
+				return -1;
+			}
+			fsp->ring_cookie = RX_CLS_FLOW_DISC;
+			i++;
+			discard = 1;
+		} else if (!strcmp(optstr[i], "loc")) {
+			fsp->location = strtol(optstr[i+1], (char **)NULL,
+						  0);
+			i += 2;
+			*loc_valid = 1;
+		} else {
+			fprintf(stdout, "Add rule, invalid syntax\n");
+			return -1;
+		}
+	}
+
+	/* Convert all multibyte network fields to network byte order */
+	fsp->h_u.tcp_ip4_spec.ip4src = htonl(fsp->h_u.tcp_ip4_spec.ip4src);
+	fsp->h_u.tcp_ip4_spec.ip4dst = htonl(fsp->h_u.tcp_ip4_spec.ip4dst);
+	fsp->m_u.tcp_ip4_spec.ip4src = htonl(fsp->m_u.tcp_ip4_spec.ip4src);
+	fsp->m_u.tcp_ip4_spec.ip4dst = htonl(fsp->m_u.tcp_ip4_spec.ip4dst);
+
+	switch (fsp->flow_type) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+	case TCP_V6_FLOW:
+	case UDP_V6_FLOW:
+	case SCTP_V6_FLOW:
+		fsp->h_u.tcp_ip4_spec.psrc = htons(fsp->h_u.tcp_ip4_spec.psrc);
+		fsp->h_u.tcp_ip4_spec.pdst = htons(fsp->h_u.tcp_ip4_spec.pdst);
+		fsp->m_u.tcp_ip4_spec.psrc = htons(fsp->m_u.tcp_ip4_spec.psrc);
+		fsp->m_u.tcp_ip4_spec.pdst = htons(fsp->m_u.tcp_ip4_spec.pdst);
+		break;
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V6_FLOW:
+		fsp->h_u.esp_ip4_spec.spi = htonl(fsp->h_u.esp_ip4_spec.spi);
+		fsp->m_u.esp_ip4_spec.spi = htonl(fsp->m_u.esp_ip4_spec.spi);
+		break;
+	case IP_USER_FLOW:
+		fsp->h_u.usr_ip4_spec.l4_4_bytes =
+			htonl(fsp->h_u.usr_ip4_spec.l4_4_bytes);
+		fsp->m_u.usr_ip4_spec.l4_4_bytes =
+			htonl(fsp->m_u.usr_ip4_spec.l4_4_bytes);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}


^ permalink raw reply related

* [ethtool PATCH 1/2] Add macro for displaying [value N] formatting to manpage
From: Alexander Duyck @ 2011-02-11  1:18 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev
In-Reply-To: <20110211010806.23554.98333.stgit@gitlad.jf.intel.com>

This change adds a macro for displaying optional values that take a numeric
value to the manpage.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 ethtool.8.in |  102 ++++++++++++++++++++++------------------------------------
 1 files changed, 38 insertions(+), 64 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 0ee91a0..133825b 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -34,6 +34,12 @@
 [\\fB\\$1\\fP\ \\fB\\$2\\fP|\\fB\\$3\\fP|\\fB\\$4\\fP|\\fB\\$5\\fP]
 ..
 .\"
+.\"	.BN - value with a numeric input as in "[value N]"
+.\"
+.de BN
+[\\fB\\$1\\fP\ \\fIN\\fP]
+..
+.\"
 .\"	\(*MA - mac address
 .\"
 .ds MA \fIxx\fP\fB:\fP\fIyy\fP\fB:\fP\fIzz\fP\fB:\fP\fIaa\fP\fB:\fP\fIbb\fP\fB:\fP\fIcc\fP
@@ -110,60 +116,36 @@ ethtool \- query or control network driver and hardware settings
 .I ethX
 .B2 adaptive-rx on off
 .B2 adaptive-tx on off
-.RB [ rx-usecs
-.IR N ]
-.RB [ rx-frames
-.IR N ]
-.RB [ rx-usecs-irq
-.IR N ]
-.RB [ rx-frames-irq
-.IR N ]
-.RB [ tx-usecs
-.IR N ]
-.RB [ tx-frames
-.IR N ]
-.RB [ tx-usecs-irq
-.IR N ]
-.RB [ tx-frames-irq
-.IR N ]
-.RB [ stats-block-usecs
-.IR N ]
-.RB [ pkt-rate-low
-.IR N ]
-.RB [ rx-usecs-low
-.IR N ]
-.RB [ rx-frames-low
-.IR N ]
-.RB [ tx-usecs-low
-.IR N ]
-.RB [ tx-frames-low
-.IR N ]
-.RB [ pkt-rate-high
-.IR N ]
-.RB [ rx-usecs-high
-.IR N ]
-.RB [ rx-frames-high
-.IR N ]
-.RB [ tx-usecs-high
-.IR N ]
-.RB [ tx-frames-high
-.IR N ]
-.RB [ sample-interval
-.IR N ]
+.BN rx-usecs
+.BN rx-frames
+.BN rx-usecs-irq
+.BN rx-frames-irq
+.BN tx-usecs
+.BN tx-frames
+.BN tx-usecs-irq
+.BN tx-frames-irq
+.BN stats-block-usecs
+.BN pkt-rate-low
+.BN rx-usecs-low
+.BN rx-frames-low
+.BN tx-usecs-low
+.BN tx-frames-low
+.BN pkt-rate-high
+.BN rx-usecs-high
+.BN rx-frames-high
+.BN tx-usecs-high
+.BN tx-frames-high
+.BN sample-interval
 
 .B ethtool \-g|\-\-show\-ring
 .I ethX
 
 .B ethtool \-G|\-\-set\-ring
 .I ethX
-.RB [ rx
-.IR N ]
-.RB [ rx-mini
-.IR N ]
-.RB [ rx-jumbo
-.IR N ]
-.RB [ tx
-.IR N ]
+.BN rx
+.BN rx-mini
+.BN rx-jumbo
+.BN tx
 
 .B ethtool \-i|\-\-driver
 .I ethX
@@ -178,21 +160,15 @@ ethtool \- query or control network driver and hardware settings
 .B ethtool \-e|\-\-eeprom\-dump
 .I ethX
 .B2 raw on off
-.RB [ offset
-.IR N ]
-.RB [ length
-.IR N ]
+.BN offset
+.BN length
 
 .B ethtool \-E|\-\-change\-eeprom
 .I ethX
-.RB [ magic
-.IR N ]
-.RB [ offset
-.IR N ]
-.RB [ length
-.IR N ]
-.RB [ value
-.IR N ]
+.BN magic
+.BN offset
+.BN length
+.BN value
 
 .B ethtool \-k|\-\-show\-offload
 .I ethX
@@ -234,10 +210,8 @@ ethtool \- query or control network driver and hardware settings
 .B2 duplex half full
 .B4 port tp aui bnc mii fibre
 .B2 autoneg on off
-.RB [ advertise
-.IR N ]
-.RB [ phyad
-.IR N ]
+.BN advertise
+.BN phyad
 .B2 xcvr internal external
 .RB [ wol \ \*(WO]
 .RB [ sopass \ \*(MA]


^ permalink raw reply related

* [ethtool PATCH 0/2] Add support for RX network flow classifier rules
From: Alexander Duyck @ 2011-02-11  1:18 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev

These two patches add support for the RX network flow classifier set and get
rules calls already supported by the kernel.  The first patch is actually just
a cleanup patch in the manpage to make it a bit easier to do the value/mask
pairing follow ons in the second patch.

The second patch is the meat of the changes and implements the packet
classifier interface.  In updating and testing it I found a number of issues
that had to be addressed.  In resolving them though I found several features
that I felt were better off removed such as prioritization of filters which
was causing multiple issues including memory leaks and non-constructive
blocking of filter insertions due to the state of expanded priorities not
being saved.

I'm still currently exploring my options in resolving the needs for ixgbe to
have the ability to retain and display its filters and plan to submit RFC
patches in the next few days to get some feedback on that.

Thanks,

Alex

---

Alexander Duyck (1):
      Add macro for displaying [value N] formatting to manpage

Santwona Behera (1):
      Add RX packet classification interface


 Makefile.am      |    3 
 ethtool-bitops.h |   25 ++
 ethtool-util.h   |   13 +
 ethtool.8.in     |  203 +++++++++-----
 ethtool.c        |  144 +++++++++-
 rxclass.c        |  809 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1115 insertions(+), 82 deletions(-)
 create mode 100644 ethtool-bitops.h
 create mode 100644 rxclass.c

-- 

^ permalink raw reply

* [PATCH] net: fix ifenslave build flags
From: Randy Dunlap @ 2011-02-11  0:27 UTC (permalink / raw)
  To: Netdev, David Miller; +Cc: Alexey Salmin

From: Randy Dunlap <randy.dunlap@oracle.com>

-I (include path) should be specified for host builds.
This one was overlooked somehow.  Fixes
https://bugzilla.kernel.org/show_bug.cgi?id=25902

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-by: Alexey Salmin <alexey.salmin@gmail.com>
---
 Documentation/networking/Makefile |    2 ++
 1 file changed, 2 insertions(+)

--- lnx-2638-rc4.orig/Documentation/networking/Makefile
+++ lnx-2638-rc4/Documentation/networking/Makefile
@@ -4,6 +4,8 @@ obj- := dummy.o
 # List of programs to build
 hostprogs-y := ifenslave
 
+HOSTCFLAGS_ifenslave.o += -I$(objtree)/usr/include
+
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
 

^ permalink raw reply

* (unknown), 
From: COCA COLA NOTIFICATION @ 2011-02-11  0:20 UTC (permalink / raw)


DEPT COCA-COLA AVENUE
STAMFORD BRIDGE LONDON.
SW1V 3DW UNITED KINGDOM

Attention Winner

This email is to notify you that your email address was
randomly selected and entered into our free Third Category
draws.You have subsequently emerged a winner and therefore
entitled to a substantial amount of 1,000,000.00 Great British
Pounds.kindly confirm receipt of this email, by forwarding
Your Details to the claims department.

Name: Tommy Roger
Email:drawsupdate105@hotmail.co.uk

IMPORTANT FILL OUT THIS WINNERS VERIFICATION FORM BELOW:

FULL NAMES----------
DATE OF BIRTH---------
SEX.----------------
CONTACT ADDRESS----------
COUNTRY--------------------
MOBILE NUMBER--------------
OCCUPATION----------
E-MAIL ID--------------

Congratulations once again.
Online Co-coordinator

The Coca-Cola Company. Copy Right 2010 All Right Reserve

^ permalink raw reply

* Re: GRO/GSO hiding PMTU?
From: Herbert Xu @ 2011-02-11  0:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel
In-Reply-To: <20110210.150759.59671055.davem@davemloft.net>

On Thu, Feb 10, 2011 at 03:07:59PM -0800, David Miller wrote:
>
> Nevermind, I turned off gso/tso on eth0 (outgoing interface) and it still
> happens.

Turning off GSO/TSO has no effect if it's GRO generating the packet
since it still has to be segmented regardless.

So I think this is the problem that you found and we need to fix it
by checking the gso_size parameter.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/3] tg3: Avoid setting power.can_wakeup for devices that cannot wake up
From: Matt Carlson @ 2011-02-11  0:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Carlson, netdev@vger.kernel.org, David Miller,
	Michael Chan, Linux PM mailing list, Thomas Fjellstrom,
	Jay Cliburn, Chris Snook, Jie Yang
In-Reply-To: <201102102208.42410.rjw@sisk.pl>

On Thu, Feb 10, 2011 at 01:08:42PM -0800, Rafael J. Wysocki wrote:
> On Thursday, February 10, 2011, Matt Carlson wrote:
> > On Thu, Feb 10, 2011 at 08:53:09AM -0800, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > The tg3 driver uses device_init_wakeup() in such a way that the
> > > device's power.can_wakeup flag may be set even though the PCI
> > > subsystem cleared it before, in which case the device cannot wake
> > > up the system from sleep states.  Modify the driver to only change
> > > the power.can_wakeup flag if the device is not capable of generating
> > > wakeup signals.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > > ---
> > >  drivers/net/tg3.c |    6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > Index: linux-2.6/drivers/net/tg3.c
> > > ===================================================================
> > > --- linux-2.6.orig/drivers/net/tg3.c
> > > +++ linux-2.6/drivers/net/tg3.c
> > > @@ -12403,9 +12403,11 @@ static void __devinit tg3_get_eeprom_hw_
> > >  			tp->tg3_flags3 |= TG3_FLG3_RGMII_EXT_IBND_TX_EN;
> > >  	}
> > >  done:
> > > -	device_init_wakeup(&tp->pdev->dev, tp->tg3_flags & TG3_FLAG_WOL_CAP);
> > > -	device_set_wakeup_enable(&tp->pdev->dev,
> > > +	if (tp->tg3_flags & TG3_FLAG_WOL_CAP)
> > > +		device_set_wakeup_enable(&tp->pdev->dev,
> > >  				 tp->tg3_flags & TG3_FLAG_WOL_ENABLE);
> > > +	else
> > > +		device_set_wakeup_capable(&tp->pdev->dev, false);
> > 
> > I did this because I couldn't see where 'can_wakeup' gets set.  I don't
> > see a call to device_init_wakeup() that would be relevant to tg3
> > devices.  I do see a couple calls to device_set_wakeup_capable() in
> > acpi/glue.c and acpi/scan.c.  Is that the place?
> 
> No, it's pci_pm_init() or platform_pci_wakeup_init() and they both use
> device_set_wakeup_capable() rather tha device_init_wakeup(), which is just a
> combination of device_set_wakeup_capable() and device_set_wakeup_enable()
> anyway.
> 
> And there's no why reason PCI drivers should use device_pm_init() at all.

O.K.

Acked-by: Matt Carlson <mcarlson@broadcom.com>

> > >  }
> > >  
> > >  static int __devinit tg3_issue_otp_command(struct tg3 *tp, u32 cmd)
> > 
> > This is something I was always curious about too.  The TG3_FLAG_WOL_CAP
> > tracks whether or not the device can handle WOL.  Is it safe to do away
> > with this flag and lean on the 'can_wakeup' flag instead?
> 
> I don't think so.  power.can_wakeup only tracks the capability to generate
> wakeup signals from the PCI perspective and it is set by PCI if the device
> appears to be able to generate wakeup signals.
> 
> So, I think the driver should work as in the $subject patch - reset the
> power.can_wakeup flag if TG3_FLAG_WOL_CAP is unset and don't touch it
> otherwise.

That makes sense.  Thanks.


^ permalink raw reply

* Re: GRO/GSO hiding PMTU?
From: Herbert Xu @ 2011-02-10 23:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel
In-Reply-To: <20110210.145555.39165146.davem@davemloft.net>

On Thu, Feb 10, 2011 at 02:55:55PM -0800, David Miller wrote:
>
> I suspect that the packet arrives on eth1, accumulates into GRO, and
> thus marked as GSO as well, then GSO/TSO on output to eth0 is
> re-segmenting things transparently, and we're not getting the ICMP
> frag-needed message and the packet drop because of the skb_is_gso()
> check in ip_forward().
> 
> 	if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
> 		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
> 		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
> 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
> 			  htonl(dst_mtu(&rt->dst)));
> 		goto drop;
> 	}
> 
> So if that's what is happening, that's cute, but I think we need to
> fix this :-)

Yes this is a known problem and we do need to fix this, even if
it doesn't appear to be the cause of your immediate issue :)

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: jme driver loses connection after resuming from suspend
From: Guo-Fu Tseng @ 2011-02-10 23:26 UTC (permalink / raw)
  To: Leonardo  L. P. da Mata; +Cc: linux-kernel, netdev
In-Reply-To: <AANLkTikV9Ncy3Jki1G20SrzA_HMiAk_hF3qY8U+vpZ1D@mail.gmail.com>

Hi Leonardo  L. P. da Mata:
The value of register which holds receive Unicast MAC Address somehow
get messed-up after resume!

Would you see if this source fix the issue:
http://bbs.cooldavid.org/git/?p=jme.git;a=snapshot;h=refs/heads/resumefix;sf=tbz2

The shortlog is here:
http://bbs.cooldavid.org/git/?p=jme.git;a=shortlog;h=refs/heads/resumefix


On Thu, 10 Feb 2011 19:07:31 -0200, Leonardo  L. P. da Mata wrote
> Updated information on the bug, Guo-Fu Tseng says that might not be a
> bug on the driver, but i've tested other network cards and they don't
> share the same issue.
> 
> Also, on interesting update that people may consider is that after
> running tcpdump on the device, the network starts working again.
> Information are updated in the bug.
> 
> On Fri, Jan 28, 2011 at 10:33 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > (cc's added)
> >
> > On Fri, 28 Jan 2011 16:03:11 -0200
> > "Leonardo &#160;L. P. da Mata" <barroca@gmail.com> wrote:
> >
> >> Hello, i'm testing the kernel 2.6.37 on my hardware, Once connect on
> >> wired network, i call the suspend with:
> >> echo "mem" >/sys/power/state
> >>
> >> The system goes on suspend. After resuming from suspend, the network card
> >> cannot be used anymore.
> >>
> >>
> >> The bug is reported here:
> >> &#160;https://bugzilla.kernel.org/show_bug.cgi?id=27692
> >>
> >> Can you please point me to similar problems on other network cards so
> >> i can get possible solutions on this.
> >
> >
> 
> -- 
> Leonardo Luiz Padovani da Mata
> barroca@gmail.com
> 
> "May the force be with you, always"
> "Nerd Pride... eu tenho. Voce tem?"
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Guo-Fu Tseng

^ permalink raw reply

* Re:...
From: Young Chang @ 2011-02-10 23:13 UTC (permalink / raw)


***********************
This message has been scanned by the InterScan for CSC SSM and found to be free of known security risks.
***********************



May I ask if you would be eligible to pursue a Business Proposal of $19.7m with me if you dont mind? Let me know if you are interested.

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

E-mail transmission cannot be guaranteed to be secure or error-free as information could be
intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses.
The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.

Wijeya Newspapers Ltd : 2010


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox