Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Divide error in bictcp_cong_avoid ?
From: Randy Dunlap @ 2011-05-04 17:49 UTC (permalink / raw)
  To: TB, netdev; +Cc: linux-kernel
In-Reply-To: <4DC178D3.6030308@techboom.com>

[add cc to netdev]


On Wed, 04 May 2011 12:03:31 -0400 TB wrote:

> We're having this issue sporadically on a few servers and this is the 
> backtrace we get from netconsole.
> 
> 
> [28522.642419] divide error: 0000 [#1] SMP
> [28522.642457] last sysfs file: 
> /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/vendor
> [28522.642504] CPU 0
> [28522.642511] Modules linked in:
>   i2c_i801
>   i2c_core
>   evdev
>   button
> [28522.642570]
> [28522.642590] Pid: 0, comm: swapper Not tainted 2.6.38.5 #6
> 
>   Supermicro X8DTH-i/6/iF/6F
> /
>   X8DTH
> 
> [28522.642651] RIP: 0010:[<ffffffff8150b27b>] [<ffffffff8150b27b>] 
> bictcp_cong_avoid+0x21a/0x247
> [28522.642708] RSP: 0018:ffff8800bf403a90  EFLAGS: 00010202
> [28522.642735] RAX: 0000000000000010 RBX: ffff880352aa6400 RCX: 
> 0000000000000000
> [28522.642765] RDX: 0000000000000000 RSI: ffff880352aa67c0 RDI: 
> 0000000000001607
> [28522.642795] RBP: 000000007caa5a1b R08: 00000000000035c2 R09: 
> 00000000000000e6
> [28522.642825] R10: ffff88003d499c00 R11: ffff880109831b00 R12: 
> ffffffff817cecd0
> [28522.642855] R13: 0000000000000004 R14: 000000000001001b R15: 
> 0000000000000123
> [28522.642886] FS:  0000000000000000(0000) GS:ffff8800bf400000(0000) 
> knlGS:0000000000000000
> [28522.642932] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [28522.642959] CR2: 00007fb4c6ffd000 CR3: 000000042e4a1000 CR4: 
> 00000000000006f0
> [28522.642990] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [28522.643020] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [28522.643050] Process swapper (pid: 0, threadinfo ffffffff8176c000, 
> task ffffffff81777020)
> [28522.643095] Stack:
> [28522.643116]  ffff880352aa6400
>   ffffffff817cecd0
>   0000000000000004
>   0000000000000406
> 
> [28522.643171]  ffff880352aa6400
>   ffffffff814e3dc5
>   000000000000111c
>   ffff88022a9b3200
> 
> [28522.643226]  0000000000000000
>   7caa5a1b7caa4ec3
>   0000000000000000
>   0000000000000000
> 
> [28522.643281] Call Trace:
> [28522.643303]  <IRQ>
> 
> [28522.643330]  [<ffffffff814e3dc5>] ? tcp_ack+0x18b5/0x1a89
> [28522.643359]  [<ffffffff814e45c2>] ? tcp_rcv_established+0xd1/0xa13
> [28522.643389]  [<ffffffff814ec60b>] ? tcp_v4_do_rcv+0x1b2/0x382
> [28522.643418]  [<ffffffff814c95d4>] ? nf_iterate+0x40/0x78
> [28522.643446]  [<ffffffff814ecc5f>] ? tcp_v4_rcv+0x484/0x797
> [28522.643475]  [<ffffffff814d11c7>] ? ip_local_deliver_finish+0xab/0x139
> [28522.643505]  [<ffffffff814ae2b3>] ? __netif_receive_skb+0x31c/0x349
> [28522.643535]  [<ffffffff814aec82>] ? netif_receive_skb+0x67/0x6d
> [28522.643564]  [<ffffffff814af1fb>] ? napi_gro_receive+0x9d/0xab
> [28522.643592]  [<ffffffff814aed57>] ? napi_skb_finish+0x1c/0x31
> [28522.643623]  [<ffffffff813e4248>] ? igb_poll+0x7d5/0xb2e
> [28522.643653]  [<ffffffff812b6b22>] ? blk_run_queue+0x23/0x37
> [28522.643683]  [<ffffffff813520d4>] ? scsi_run_queue+0x2ee/0x381
> [28522.643712]  [<ffffffff81353810>] ? scsi_io_completion+0x3e0/0x409
> [28522.643741]  [<ffffffff814af337>] ? net_rx_action+0xa7/0x212
> [28522.643771]  [<ffffffff8103b6c2>] ? __do_softirq+0xbe/0x184
> [28522.643800]  [<ffffffff8100364c>] ? call_softirq+0x1c/0x28
> [28522.643828]  [<ffffffff81005085>] ? do_softirq+0x31/0x63
> [28522.643856]  [<ffffffff8103b56c>] ? irq_exit+0x36/0x78
> [28522.643883]  [<ffffffff81004784>] ? do_IRQ+0x98/0xae
> [28522.643912]  [<ffffffff81562c13>] ? ret_from_intr+0x0/0xe
> [28522.643938]  <EOI>
> 
> [28522.643963]  [<ffffffff81009a41>] ? mwait_idle+0xb9/0xf3
> [28522.643991]  [<ffffffff81001c6e>] ? cpu_idle+0x57/0x8d
> [28522.644019]  [<ffffffff81801c49>] ? start_kernel+0x34e/0x35a
> [28522.644048]  [<ffffffff81801398>] ? x86_64_start_kernel+0xf3/0xf9
> [28522.644075] Code:
>   39  c9  76  18  44  29  c9  31  d2  44  89  c8  f7  f1  39  83  c0  03 
>   00  00
>   76  06  89  83  c0  03  00  00  8b  83  c0  03  00  00  31  d2  c1  e0 
>   04  0f
>   b7  4e  2c
>   f7>  f1  ba  01  00  00  00  85  c0  0f  45  d0  89  93  c0  03  00 
> 00  8b  b3
>   c0
> 
> [28522.644338] RIP
>   [<ffffffff8150b27b>] bictcp_cong_avoid+0x21a/0x247
> [28522.644371]  RSP <ffff8800bf403a90>
> [28522.644733] ---[ end trace 9db294ef7ff3a7b5 ]---
> [28522.644800] Kernel panic - not syncing: Fatal exception in interrupt
> [28522.644871] Pid: 0, comm: swapper Tainted: G      D     2.6.38.5 #6
> [28522.644942] Call Trace:
> [28522.645012]  <IRQ>
>   [<ffffffff81560690>] ? panic+0x9d/0x1a0
> [28522.645131]  [<ffffffff81562c13>] ? ret_from_intr+0x0/0xe
> [28522.645200]  [<ffffffff810365bb>] ? kmsg_dump+0x46/0xec
> [28522.645268]  [<ffffffff81006176>] ? oops_end+0x9f/0xac
> [28522.645335]  [<ffffffff810040d8>] ? do_divide_error+0x7f/0x89
> [28522.645404]  [<ffffffff8150b27b>] ? bictcp_cong_avoid+0x21a/0x247
> [28522.645473]  [<ffffffff814b057c>] ? dev_queue_xmit+0x4a4/0x4b2
> [28522.645545]  [<ffffffff814d5390>] ? ip_queue_xmit+0x2e9/0x32f
> [28522.645614]  [<ffffffff81003375>] ? divide_error+0x15/0x20
> [28522.645685]  [<ffffffff8150b27b>] ? bictcp_cong_avoid+0x21a/0x247
> [28522.645754]  [<ffffffff814e3dc5>] ? tcp_ack+0x18b5/0x1a89
> [28522.645823]  [<ffffffff814e45c2>] ? tcp_rcv_established+0xd1/0xa13
> [28522.645892]  [<ffffffff814ec60b>] ? tcp_v4_do_rcv+0x1b2/0x382
> [28522.645961]  [<ffffffff814c95d4>] ? nf_iterate+0x40/0x78
> [28522.646029]  [<ffffffff814ecc5f>] ? tcp_v4_rcv+0x484/0x797
> [28522.646097]  [<ffffffff814d11c7>] ? ip_local_deliver_finish+0xab/0x139
> [28522.646167]  [<ffffffff814ae2b3>] ? __netif_receive_skb+0x31c/0x349
> [28522.646240]  [<ffffffff814aec82>] ? netif_receive_skb+0x67/0x6d
> [28522.646308]  [<ffffffff814af1fb>] ? napi_gro_receive+0x9d/0xab
> [28522.646377]  [<ffffffff814aed57>] ? napi_skb_finish+0x1c/0x31
> [28522.646445]  [<ffffffff813e4248>] ? igb_poll+0x7d5/0xb2e
> [28522.646513]  [<ffffffff812b6b22>] ? blk_run_queue+0x23/0x37
> [28522.646582]  [<ffffffff813520d4>] ? scsi_run_queue+0x2ee/0x381
> [28522.646651]  [<ffffffff81353810>] ? scsi_io_completion+0x3e0/0x409
> [28522.646721]  [<ffffffff814af337>] ? net_rx_action+0xa7/0x212
> [28522.646791]  [<ffffffff8103b6c2>] ? __do_softirq+0xbe/0x184
> [28522.646884]  [<ffffffff8100364c>] ? call_softirq+0x1c/0x28
> [28522.646953]  [<ffffffff81005085>] ? do_softirq+0x31/0x63
> [28522.647021]  [<ffffffff8103b56c>] ? irq_exit+0x36/0x78
> [28522.647089]  [<ffffffff81004784>] ? do_IRQ+0x98/0xae
> [28522.647164]  [<ffffffff81562c13>] ? ret_from_intr+0x0/0xe
> [28522.647239]  <EOI>
>   [<ffffffff81009a41>] ? mwait_idle+0xb9/0xf3
> [28522.647354]  [<ffffffff81001c6e>] ? cpu_idle+0x57/0x8d
> [28522.647422]  [<ffffffff81801c49>] ? start_kernel+0x34e/0x35a
> [28522.647491]  [<ffffffff81801398>] ? x86_64_start_kernel+0xf3/0xf9
> --


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: Support e1000 M88 PHY registers in -d
From: Jeff Kirsher @ 2011-05-04 18:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Anthony DeRobertis, netdev, e1000-devel@lists.sourceforge.net,
	574574@bugs.debian.org
In-Reply-To: <1304530875.2926.29.camel@bwh-desktop>

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

On Wed, 2011-05-04 at 10:41 -0700, Ben Hutchings wrote:
> On Mon, 2011-04-04 at 10:41 +0100, Ben Hutchings wrote:
> > On Mon, 2011-04-04 at 01:36 -0700, Jeff Kirsher wrote:
> > > On Sat, Apr 2, 2011 at 09:24, Ben Hutchings <bhutchings@solarflare.com> wrote:
> > > > Anthony,
> > > >
> > > > I'm now upstream maintainer for ethtool so I've picked up your patch
> > > > again.
> > > >
> > > > On Fri, 2010-03-19 at 00:32 -0400, Anthony DeRobertis wrote:
> > > >> Package: ethtool
> > > >> Version: 1:2.6.33-1
> > > >> Severity: wishlist
> > > >>
> > > >> The M88 PHY registers contain useful information like the cable length
> > > >> estimate and the MDI/MDIX status. The attached patch makes -d dump
> > > >> them.
> > > >
> > > > Patches for ethtool should include a commit message and Signed-off-by
> > > > line, as in the Linux kernel.  See sections 2 and 12 of
> > > > <http://www.kernel.org/doc/Documentation/SubmittingPatches>.  They
> > > > should be sent to this address and to netdev.
> > > >
> > > > I'm forwarding this patch to netdev and the e1000 developers for review.
> > > >
> > > > Ben.
> > > 
> > > Thanks Ben.  Just to be clear, have you applied these e1000 changes to
> > > the ethtool?
> > [...]
> > 
> > I have not applied these changes either anywhere.
> 
> ...but I will if I don't hear back from you soon.
> 
> Ben.
> 

Go ahead and apply the changes.  I apologize, I thought I had already
pushed this patch.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [ethtool PATCH] FW dump support
From: Anirban Chakraborty @ 2011-05-04 18:02 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, David Miller
In-Reply-To: <1304530807.2926.28.camel@bwh-desktop>


On May 4, 2011, at 10:40 AM, Ben Hutchings wrote:

> On Mon, 2011-05-02 at 16:29 -0700, anirban.chakraborty@qlogic.com wrote:
>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>> 
>> Added support to take FW dump via ethtool.
> [...]
>> --- a/ethtool.c
>> +++ b/ethtool.c
> [...]
>> @@ -263,6 +270,12 @@ static struct option {
>> 		"Get Rx ntuple filters and actions\n" },
>>     { "-P", "--show-permaddr", MODE_PERMADDR,
>> 		"Show permanent hardware address" },
>> +    { "-W", "--get-dump", MODE_GET_DUMP,
>> +		"Get dump level\n" },
>> +    { "-Wd", "--get-dump-data", MODE_GET_DUMP_DATA,
>> +		"Get dump data", "FILENAME " "Name of the dump file\n" },
> 
> The short options should only include one letter.  Also the general
> pattern is that 'get' options use lower-case letters and 'set' options
> use upper-case letters.  No, I'm not sure how best to handle a set of 3
> options.  Maybe you can combine --get-dump and --get-dump-data, making
> the filename optional?

Thanks for the info, I was not aware of it. Will address it.

> 
>> +    { "-w", "--set-dump", MODE_SET_DUMP,
>> +		"Set dump level", "DUMPLEVEL " "Dump level for the device\n" },
> 
> The field this sets is described as 'flags' so does it consist of flags
> or is it a level?
The idea is to have an opaque flag for the driver that it uses to set the dump level
(and hence the size) of it. I will use flag instead of level here so that there is no
ambiguity.

> 
> [...]
>> @@ -3241,6 +3270,86 @@ static int do_grxntuple(int fd, struct ifreq *ifr)
>> 	return 0;
>> }
>> 
>> +static void do_writedump(struct ethtool_dump *dump)
>> +{
>> +	FILE *f = fopen(dump_file, "wb+");
>> +	size_t bytes;
>> +
>> +	if (!f ) {
>> +		fprintf(stderr, "Can't open file %s: %s\n",
>> +			dump_file, strerror(errno));
>> +		return;
> 
> On error, we must exit with code 1.
Will do.

> 
>> +	}
>> +
>> +	bytes = fwrite(dump->data, 1, dump->len, f);
>> +	fclose(f);
> [...]
> 
> These functions can also fail and need to be checked.  (Yes, fclose()
> can fail, since it may have to flush buffered data.)

I agree. Will fix it. Thanks for the feedback.

-Anirban



^ permalink raw reply

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Ben Hutchings @ 2011-05-04 18:05 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Dimitris Michailidis, davem@davemloft.net, Kirsher, Jeffrey T,
	netdev@vger.kernel.org
In-Reply-To: <4DC18FB2.8060604@intel.com>

On Wed, 2011-05-04 at 10:41 -0700, Alexander Duyck wrote:
> On 5/4/2011 10:33 AM, Dimitris Michailidis wrote:
> > On 05/04/2011 10:24 AM, Ben Hutchings wrote:
> >> On Wed, 2011-05-04 at 10:09 -0700, Dimitris Michailidis wrote:
> >>> On 05/03/2011 04:34 PM, Ben Hutchings wrote:
> >>>> On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
> >>>>> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where there is
> >>>>> enough knowledge to pick an appropriate slot.  So I'd remove the
> >>>>>
> >>>>>       if (loc == RX_CLS_LOC_UNSPEC)
> >>>>>
> >>>>> block above, let the driver pick a slot, and then pass the selected location
> >>>>> back for ethtool to report.
> >>>> But first we have to specify this in the ethtool API.  So please propose
> >>>> a patch to ethtool.h.
> >>> In the past we discussed that being able to specify the first available slot or
> >>> the last available would be useful, so something like the below?
> >>>
> >>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> >>> index 4194a20..909ef79 100644
> >>> --- a/include/linux/ethtool.h
> >>> +++ b/include/linux/ethtool.h
> >>> @@ -442,7 +442,8 @@ struct ethtool_flow_ext {
> >>>     *	includes the %FLOW_EXT flag.
> >>>     * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
> >>>     *	if packets should be discarded
> >>> - * @location: Index of filter in hardware table
> >>> + * @location: Index of filter in hardware table, or %RX_CLS_FLOW_FIRST_LOC for
> >>> + *	first available index, or %RX_CLS_FLOW_LAST_LOC for last available
> >> [...]
> >>
> >> I think that's reasonable.  We should also explicitly state that
> >> location determines priority, i.e. if a packet matches two filters then
> >> the one with the lower location wins.
> >
> > Easy and true for a TCAM.  For hashing would you use the location to decide how
> > to order filters that fall in the same bucket?
> >
> 
> The problem is none of this is backwards compatible.  The niu driver has 
> supported the network flow classifier rules since 2.6.30.  Adding this 
> would cause all rule setups for niu to fail because these locations 
> would have to exist outside of the current rule locations.

Those rule setups would have to be using some unofficial patched
ethtool.  I don't think that should be a major concern for us.
However...

> This is why I was suggesting that the best approach would be to update 
> the kernel to add a separate ioctl for letting the driver setup the 
> location.
>
> We can just attempt to make that call and when we get the 
> EOPNOTSUPP errno we know the device driver doesn't support it and can 
> then let the rule manager take over.

How about having ETHTOOL_GRXCLSRLCNT set a flag in the 'data' field to
indicate that the driver can assign locations?  (We would have to
specify that for compatibility with older kernels the application must
initialise this filed to 0.)

rmgr_init() would then check for this flag.

I hope someone is actually going to test this on niu, as it sounds like
that will be the only driver using a TCAM... David?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* bug fix for out-of-tree igb driver
From: Ben Greear @ 2011-05-04 18:06 UTC (permalink / raw)
  To: netdev

Just in case someone is using the out-of-tree igb driver, I
think I found a bug.

The tx_queue_mapping method should have r_idx >= adapter->num_tx_queues
as shown below.  The driver had: r_idx > adapter->num_tx_queues originally.

Its possible my code is buggy for sending queue_mapping of 1
when num_tx_queues is 1, but either way, the driver
shouldn't crash.

Upstream igb driver in mainline kernels does not have this problem.

static inline struct igb_ring *igb_tx_queue_mapping(struct igb_adapter *adapter,
                                                     struct sk_buff *skb)
{
	unsigned int r_idx = skb->queue_mapping & (IGB_MAX_TX_QUEUES - 1);

	if (r_idx >= adapter->num_tx_queues)
		r_idx = r_idx % adapter->num_tx_queues;

	return adapter->tx_ring[r_idx];
}

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH -next] x86/net: only select BPF_JIT when NET is enabled
From: David Miller @ 2011-05-04 18:06 UTC (permalink / raw)
  To: randy.dunlap; +Cc: sfr, netdev, linux-next, linux-kernel
In-Reply-To: <20110504095651.60082594.randy.dunlap@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Wed, 4 May 2011 09:56:51 -0700

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> Fix kconfig unmet dependency warning: HAVE_BPF_JIT depends on NET, so
> make the "select" of it depend on NET also.
> 
> warning: (X86) selects HAVE_BPF_JIT which has unmet direct dependencies (NET)
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>

Applied, thanks Randy.

^ permalink raw reply

* Re: [ethtool PATCH] FW dump support
From: Ben Hutchings @ 2011-05-04 18:07 UTC (permalink / raw)
  To: anirban.chakraborty; +Cc: netdev, --no-chain-reply-to, davem
In-Reply-To: <1304530807.2926.28.camel@bwh-desktop>

One more thing: please can you update the manual page as well as the
built-in usage information.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: 2.6.38.2, kernel panic, probably related to framentation handling
From: Denys Fedoryshchenko @ 2011-05-04 18:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1304528581.32152.15.camel@edumazet-laptop>

 On Wed, 04 May 2011 19:03:01 +0200, Eric Dumazet wrote:
> Le mercredi 04 mai 2011 à 14:36 +0300, Denys Fedoryshchenko a écrit :
>> Seems once more, during trying to bring another type of tunnel (this
>>  time userspace, working over tun device) and switching routes got 
>> one
>>  more kernel panic
>>  It is vanilla kernel, but many source routing rules, firewall, QoS 
>> and
>>  etc, including this tunnel now also. Here is what i got on 
>> netconsole:
>>  Any other info required?
>>
>>  netc [1192230.881002]
>>  netc [1192230.881002] Pid: 0, comm: kworker/0:1 Not tainted
>>  2.6.38.2-devel2 #2
>>  netc
>>  netc Dell Inc. PowerEdge 1950
>>  netc /
>>  netc 0D8635
>>  netc
>>  netc [1192230.881002] EIP: 0060:[<c03c0847>] EFLAGS: 00010206 CPU: 
>> 3
>>  netc [1192230.881002] EIP is at icmp_send+0x39/0x396
>>  netc [1192230.881002] EAX: 121a8aca EBX: d1d28600 ECX: 00000001 
>> EDX:
>>  c63b6600
>>  netc [1192230.881002] ESI: d1d28600 EDI: c33438a0 EBP: f2a41840 
>> ESP:
>>  f64b5e8c
>>  netc [1192230.881002]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>  netc [1192230.881002] Process kworker/0:1 (pid: 0, ti=f64b4000
>>  task=f64a4a80 task.ti=f64b0000)
>>  netc [1192230.881002] Stack:
>>  netc [1192230.881002]  c0113ea1
>>  netc 00000001
>>  netc 0000000b
>>  netc 00000000
>>  netc efed48c7
>>  netc 0000114a
>>  netc 00000000
>>  netc f6b01fa8
>>  netc
>>  netc [1192230.881002]  e21be5e1
>>  netc c0148bf3
>>  netc e217d5d0
>>  netc 00043c53
>>  netc 00000001
>>  netc f6b02d14
>>  netc e21be5e1
>>  netc 00043c53
>>  netc
>>  netc [1192230.881002]  e21be5e1
>>  netc 00043c53
>>  netc c0148d2e
>>  netc 00000000
>>  netc 00000058
>>  netc 00000000
>>  netc c0140779
>>  netc ce9f5aa9
>>  netc
>>  netc [1192230.881002] Call Trace:
>>  netc [1192230.881002]  [<c0113ea1>] ? lapic_next_event+0x13/0x16
>>  netc [1192230.881002]  [<c0148bf3>] ? 
>> tick_dev_program_event+0x26/0x116
>>  netc [1192230.881002]  [<c0148d2e>] ? tick_program_event+0x1b/0x1f
>>  netc [1192230.881002]  [<c0140779>] ? hrtimer_interrupt+0x10c/0x1ca
>>  netc [1192230.881002]  [<c0140e49>] ? hrtimer_start+0x20/0x25
>>  netc [1192230.881002]  [<c012f18e>] ? irq_exit+0x36/0x59
>>  netc [1192230.881002]  [<c0114933>] ?
>>  smp_apic_timer_interrupt+0x71/0x7d
>>  netc [1192230.881002]  [<c03f2752>] ? 
>> apic_timer_interrupt+0x2a/0x30
>>  netc [1192230.881002]  [<c039f527>] ? ip_expire+0xf2/0x11b
>>  netc [1192230.881002]  [<c039f435>] ? ip_expire+0x0/0x11b
>>  netc [1192230.881002]  [<c0133421>] ? run_timer_softirq+0x140/0x1c7
>>  netc [1192230.881002]  [<c012f28f>] ? __do_softirq+0x6b/0x104
>>  netc [1192230.881002]  [<c012f224>] ? __do_softirq+0x0/0x104
>>  netc [1192230.881002]  [<c012f224>] ? __do_softirq+0x0/0x104
>>  netc [1192230.881002]  <IRQ>
>>  netc
>>  netc [1192230.881002]  [<c012f17e>] ? irq_exit+0x26/0x59
>>  netc [1192230.881002]  [<c0103b3d>] ? do_IRQ+0x81/0x95
>>  netc [1192230.881002]  [<c0114933>] ?
>>  smp_apic_timer_interrupt+0x71/0x7d
>>  netc [1192230.881002]  [<c0102ca9>] ? common_interrupt+0x29/0x30
>>  netc [1192230.881002]  [<c010807a>] ? mwait_idle+0x51/0x56
>>  netc [1192230.881002]  [<c0101a97>] ? cpu_idle+0x41/0x5e
>>  netc [1192230.881002] Code:
>>  netc 08
>>  netc 89
>>  netc c6
>>  netc 89
>>  netc 4c
>>  netc 24
>>  netc 04
>>  netc 8b
>>  netc 40
>>  netc 48
>>  netc 89
>>                        netc c2
>>  netc 83
>>  netc e2
>>  netc fe
>>  netc 0f
>>  netc 84
>>  netc 66
>>  netc 03
>>  netc 00
>>  netc 00
>>  netc 89
>>  netc 94
>>  netc 24
>>  netc c0
>>  netc 00
>>  netc 00
>>  netc 00
>>  netc 8b
>>  netc 42
>>  netc 0c
>>  netc 8b
>>  netc be
>>  netc 94
>>  netc 00
>>  netc 00
>>  netc 00
>>  netc 3b
>>  netc be
>>  netc a4
>>  netc 00
>>  netc 00
>>  netc 00
>>  May  4 11:17:12 217.151.224.119 unparseable log message: "<8b> "
>>  netc 80
>>  netc 80
>>  netc 02
>>  netc 00
>>  netc 00
>>  netc 89
>>  netc 44
>>  netc 24
>>  netc 10
>>  netc 0f
>>  netc 82
>>  netc 40
>>  netc 03
>>  netc 00
>>  netc 00
>>  netc 8d
>>  netc 47
>>  netc 14
>>  netc 39
>>  netc 86
>>  netc
>>  netc [1192230.881002] EIP: [<c03c0847>]
>>  netc icmp_send+0x39/0x396
>>  netc SS:ESP 0068:f64b5e8c
>>  netc [1192230.881002] CR2: 00000000121a8d4a
>>  netc [1192230.910072] ---[ end trace 42aae79d7fb08725 ]---
>>  netc [1192230.910354] Kernel panic - not syncing: Fatal exception 
>> in
>>  interrupt
>>  netc [1192230.911062] Rebooting in 5 seconds..
>>
>
> Hi Denys
>
> Is it reproductible, and possibly on latest kernel ?
>
> We fixed some bugs lately (assuming you also use a bridge ?)
>
> Could you send the disassembled code on your kernel of icmp_send() ?
>
> Thanks !
 No bridge. Tunnel was used in TUN mode, not TAP.
 I will try to reproduce at night on latest kernel.



^ permalink raw reply

* Re: 2.6.38.2, kernel panic, probably related to framentation handling
From: Eric Dumazet @ 2011-05-04 18:11 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev
In-Reply-To: <1304528581.32152.15.camel@edumazet-laptop>

Le mercredi 04 mai 2011 à 19:03 +0200, Eric Dumazet a écrit :

> Hi Denys
> 
> Is it reproductible, and possibly on latest kernel ?
> 
> We fixed some bugs lately (assuming you also use a bridge ?)
> 
> Could you send the disassembled code on your kernel of icmp_send() ?

Oh well, I think I found the problem, I am working on a patch and send
it shortly.

Thanks



^ permalink raw reply

* Re: bug fix for out-of-tree igb driver
From: Jeff Kirsher @ 2011-05-04 18:12 UTC (permalink / raw)
  To: Ben Greear, Carolyn Wyborny; +Cc: netdev
In-Reply-To: <4DC1959B.4010609@candelatech.com>

On Wed, May 4, 2011 at 11:06, Ben Greear <greearb@candelatech.com> wrote:
> Just in case someone is using the out-of-tree igb driver, I
> think I found a bug.
>
> The tx_queue_mapping method should have r_idx >= adapter->num_tx_queues
> as shown below.  The driver had: r_idx > adapter->num_tx_queues originally.
>
> Its possible my code is buggy for sending queue_mapping of 1
> when num_tx_queues is 1, but either way, the driver
> shouldn't crash.
>
> Upstream igb driver in mainline kernels does not have this problem.
>
> static inline struct igb_ring *igb_tx_queue_mapping(struct igb_adapter
> *adapter,
>                                                    struct sk_buff *skb)
> {
>        unsigned int r_idx = skb->queue_mapping & (IGB_MAX_TX_QUEUES - 1);
>
>        if (r_idx >= adapter->num_tx_queues)
>                r_idx = r_idx % adapter->num_tx_queues;
>
>        return adapter->tx_ring[r_idx];
> }
>
> Thanks,
> Ben
>

Thanks Ben, I have copied Carolyn so that she is aware of it.

-- 
Cheers,
Jeff

^ permalink raw reply

* RE: [ethtool PATCH] FW dump support
From: Ajit.Khaparde @ 2011-05-04 18:15 UTC (permalink / raw)
  To: anirban.chakraborty; +Cc: netdev, --no-chain-reply-to, davem, bhutchings
In-Reply-To: <1304530807.2926.28.camel@bwh-desktop>

________________________________________
From: netdev-owner@vger.kernel.org [netdev-owner@vger.kernel.org] On Behalf Of Ben Hutchings [bhutchings@solarflare.com]
Sent: Wednesday, May 04, 2011 12:40 PM
To: anirban.chakraborty@qlogic.com
Cc: netdev@vger.kernel.org; --no-chain-reply-to@mv.qlogic.com; davem@davemloft.net
Subject: Re: [ethtool PATCH] FW dump support

> On Mon, 2011-05-02 at 16:29 -0700, anirban.chakraborty@qlogic.com wrote:
>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>>
>> Added support to take FW dump via ethtool.
> [...]
>> --- a/ethtool.c
>> +++ b/ethtool.c
> [...]
>> @@ -263,6 +270,12 @@ static struct option {
>>               "Get Rx ntuple filters and actions\n" },
>>      { "-P", "--show-permaddr", MODE_PERMADDR,
>>               "Show permanent hardware address" },
>> +    { "-W", "--get-dump", MODE_GET_DUMP,
>> +             "Get dump level\n" },
>> +    { "-Wd", "--get-dump-data", MODE_GET_DUMP_DATA,
>> +             "Get dump data", "FILENAME " "Name of the dump file\n" },
>
> The short options should only include one letter.  Also the general
> pattern is that 'get' options use lower-case letters and 'set' options
> use upper-case letters.  No, I'm not sure how best to handle a set of 3
> options.  Maybe you can combine --get-dump and --get-dump-data, making
> the filename optional?

ethtool already has "-f" option to flash/write the FW image.
Can you use "-F" to get the FW dump data out?
And then may be, these options can be extended to get the get/set dump levels?

-Ajit

^ permalink raw reply

* tap/bridge: Dropping NETIF_F_GSO/NETIF_F_SG
From: Michael S. Tsirkin @ 2011-05-04 18:18 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Ben Hutchings, herbert

BTW, I just noticed that net-next spits out
many of the following when I run any VMs:

tap0: Dropping NETIF_F_SG since no checksum feature.
tap0: Dropping NETIF_F_GSO since no SG feature.
tap0: Features changed: 0x401b4849 -> 0x40004040
device msttap0 entered promiscuous mode
br0: Dropping NETIF_F_GSO since no SG feature.
br0: port 1(msttap0) entering forwarding state
br0: port 1(msttap0) entering forwarding state
tap0: Features changed: 0x40004040 -> 0x40024849
tap0: Dropping NETIF_F_SG since no checksum feature.
tap0: Dropping NETIF_F_GSO since no SG feature.
tap0: Features changed: 0x40024849 -> 0x40004040
br0: Dropping NETIF_F_GSO since no SG feature.
tap0: Dropping NETIF_F_SG since no checksum feature.
tap0: Dropping NETIF_F_GSO since no SG feature.
tap0: Dropping NETIF_F_SG since no checksum feature.
tap0: Dropping NETIF_F_GSO since no SG feature.
tap0: Dropping NETIF_F_SG since no checksum feature.
tap0: Dropping NETIF_F_GSO since no SG feature.
tap0: Dropping NETIF_F_SG since no checksum feature.
tap0: Dropping NETIF_F_GSO since no SG feature.
tap0: Features changed: 0x40004040 -> 0x401b4849
tap0: Dropping NETIF_F_SG since no checksum feature.
tap0: Dropping NETIF_F_GSO since no SG feature.
tap0: Features changed: 0x401b4849 -> 0x40004040
br0: Dropping NETIF_F_GSO since no SG feature.

My problem is not primarily with warnings:

will that linearize all packets and disable GSO
for tap and bridge? If yes it can't be good
for performance...

-- 
MST

^ permalink raw reply

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Dimitris Michailidis @ 2011-05-04 18:18 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Ben Hutchings, davem@davemloft.net, Kirsher, Jeffrey T,
	netdev@vger.kernel.org
In-Reply-To: <4DC18FB2.8060604@intel.com>

On 05/04/2011 10:41 AM, Alexander Duyck wrote:
> On 5/4/2011 10:33 AM, Dimitris Michailidis wrote:
>> On 05/04/2011 10:24 AM, Ben Hutchings wrote:
>>> On Wed, 2011-05-04 at 10:09 -0700, Dimitris Michailidis wrote:
>>>> On 05/03/2011 04:34 PM, Ben Hutchings wrote:
>>>>> On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
>>>>>> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where 
>>>>>> there is
>>>>>> enough knowledge to pick an appropriate slot.  So I'd remove the
>>>>>>
>>>>>>       if (loc == RX_CLS_LOC_UNSPEC)
>>>>>>
>>>>>> block above, let the driver pick a slot, and then pass the 
>>>>>> selected location
>>>>>> back for ethtool to report.
>>>>> But first we have to specify this in the ethtool API.  So please 
>>>>> propose
>>>>> a patch to ethtool.h.
>>>> In the past we discussed that being able to specify the first 
>>>> available slot or
>>>> the last available would be useful, so something like the below?
>>>>
>>>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>>>> index 4194a20..909ef79 100644
>>>> --- a/include/linux/ethtool.h
>>>> +++ b/include/linux/ethtool.h
>>>> @@ -442,7 +442,8 @@ struct ethtool_flow_ext {
>>>>     *    includes the %FLOW_EXT flag.
>>>>     * @ring_cookie: RX ring/queue index to deliver to, or 
>>>> %RX_CLS_FLOW_DISC
>>>>     *    if packets should be discarded
>>>> - * @location: Index of filter in hardware table
>>>> + * @location: Index of filter in hardware table, or 
>>>> %RX_CLS_FLOW_FIRST_LOC for
>>>> + *    first available index, or %RX_CLS_FLOW_LAST_LOC for last 
>>>> available
>>> [...]
>>>
>>> I think that's reasonable.  We should also explicitly state that
>>> location determines priority, i.e. if a packet matches two filters then
>>> the one with the lower location wins.
>>
>> Easy and true for a TCAM.  For hashing would you use the location to 
>> decide how
>> to order filters that fall in the same bucket?
>>
> 
> The problem is none of this is backwards compatible.  The niu driver has 
> supported the network flow classifier rules since 2.6.30.  Adding this 
> would cause all rule setups for niu to fail because these locations 
> would have to exist outside of the current rule locations.

Looking at niu it already has a problem with its handling of location because 
it does

	u16 idx;

	idx = nfc->fs.location;

i.e., it disregards the upper 16 bits of location.  With the current code niu 
would return -EINVAL for the two new constants, which seems correct.

> 
> This is why I was suggesting that the best approach would be to update 
> the kernel to add a separate ioctl for letting the driver setup the 
> location.  We can just attempt to make that call and when we get the 
> EOPNOTSUPP errno we know the device driver doesn't support it and can 
> then let the rule manager take over.

The problem with this is the location is dependent on the type of filter being 
added.  I.e., the ioctl would need to get all the information the existing 
ioctl carries making the new ioctl a small superset of the current one.
Additionally, if the driver only allocates a location in a separate ioctl how 
does it know that the app is actually going to use it?

^ permalink raw reply

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Alexander Duyck @ 2011-05-04 18:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Dimitris Michailidis, davem@davemloft.net, Kirsher, Jeffrey T,
	netdev@vger.kernel.org
In-Reply-To: <1304532342.2926.46.camel@bwh-desktop>

On 5/4/2011 11:05 AM, Ben Hutchings wrote:
> On Wed, 2011-05-04 at 10:41 -0700, Alexander Duyck wrote:
>> On 5/4/2011 10:33 AM, Dimitris Michailidis wrote:
>>> On 05/04/2011 10:24 AM, Ben Hutchings wrote:
>>>> On Wed, 2011-05-04 at 10:09 -0700, Dimitris Michailidis wrote:
>>>>> On 05/03/2011 04:34 PM, Ben Hutchings wrote:
>>>>>> On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
>>>>>>> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where there is
>>>>>>> enough knowledge to pick an appropriate slot.  So I'd remove the
>>>>>>>
>>>>>>>        if (loc == RX_CLS_LOC_UNSPEC)
>>>>>>>
>>>>>>> block above, let the driver pick a slot, and then pass the selected location
>>>>>>> back for ethtool to report.
>>>>>> But first we have to specify this in the ethtool API.  So please propose
>>>>>> a patch to ethtool.h.
>>>>> In the past we discussed that being able to specify the first available slot or
>>>>> the last available would be useful, so something like the below?
>>>>>
>>>>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>>>>> index 4194a20..909ef79 100644
>>>>> --- a/include/linux/ethtool.h
>>>>> +++ b/include/linux/ethtool.h
>>>>> @@ -442,7 +442,8 @@ struct ethtool_flow_ext {
>>>>>      *	includes the %FLOW_EXT flag.
>>>>>      * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
>>>>>      *	if packets should be discarded
>>>>> - * @location: Index of filter in hardware table
>>>>> + * @location: Index of filter in hardware table, or %RX_CLS_FLOW_FIRST_LOC for
>>>>> + *	first available index, or %RX_CLS_FLOW_LAST_LOC for last available
>>>> [...]
>>>>
>>>> I think that's reasonable.  We should also explicitly state that
>>>> location determines priority, i.e. if a packet matches two filters then
>>>> the one with the lower location wins.
>>>
>>> Easy and true for a TCAM.  For hashing would you use the location to decide how
>>> to order filters that fall in the same bucket?
>>>
>>
>> The problem is none of this is backwards compatible.  The niu driver has
>> supported the network flow classifier rules since 2.6.30.  Adding this
>> would cause all rule setups for niu to fail because these locations
>> would have to exist outside of the current rule locations.
>
> Those rule setups would have to be using some unofficial patched
> ethtool.  I don't think that should be a major concern for us.
> However...

No, what I am saying is that if we were to add those locations to the 
ETHTOOL_SRXCLSRLINS then niu would not work anymore as it would treat 
them as invalid locations.  This existing setup will at least work with 
the existing niu from what I can tell.  If we were to try adding rules 
with locations outside of actual allowable rules then we would likely 
receive an indication that we provided an invalid argument.

>> This is why I was suggesting that the best approach would be to update
>> the kernel to add a separate ioctl for letting the driver setup the
>> location.
>>
>> We can just attempt to make that call and when we get the
>> EOPNOTSUPP errno we know the device driver doesn't support it and can
>> then let the rule manager take over.
>
> How about having ETHTOOL_GRXCLSRLCNT set a flag in the 'data' field to
> indicate that the driver can assign locations?  (We would have to
> specify that for compatibility with older kernels the application must
> initialise this filed to 0.)
>
> rmgr_init() would then check for this flag.
>
> I hope someone is actually going to test this on niu, as it sounds like
> that will be the only driver using a TCAM... David?
>
> Ben.
>

Honestly what I would prefer to see is a seperate call added such as an 
ETHTOOL_GRSCLSRLLOC that we could pass the flow specifier to and perhaps 
include first/last location call in that and then let the driver return 
where it wants to drop the rule.  That way we can avoid having to create 
an overly complicated rule manager that can handle all the bizarre rule 
ordering options that I am sure all the different network devices support.

The only reason I am not implementing this now is because there aren't 
any drivers in place that would currently use it.  I figure once cxgb 
has a means in place of supporting flow classifier rules then Dimitris 
can add the necessary code to ethtool and the kernel to allow the driver 
to specify rule locations.  I would prefer to avoid adding features 
based on speculation of what will be needed and would like to be able to 
actually see how the features will be used.

Thanks,

Alex


^ permalink raw reply

* RE: [ethtool PATCH] FW dump support
From: Ben Hutchings @ 2011-05-04 18:25 UTC (permalink / raw)
  To: Ajit.Khaparde; +Cc: anirban.chakraborty, netdev, --no-chain-reply-to, davem
In-Reply-To: <49395329523DD64492581B505F80C86D5A5BCD3990@EXMAIL.ad.emulex.com>

On Wed, 2011-05-04 at 11:15 -0700, Ajit.Khaparde@Emulex.Com wrote:
> ________________________________________
> From: netdev-owner@vger.kernel.org [netdev-owner@vger.kernel.org] On Behalf Of Ben Hutchings [bhutchings@solarflare.com]
> Sent: Wednesday, May 04, 2011 12:40 PM
> To: anirban.chakraborty@qlogic.com
> Cc: netdev@vger.kernel.org; --no-chain-reply-to@mv.qlogic.com; davem@davemloft.net
> Subject: Re: [ethtool PATCH] FW dump support
> 
> > On Mon, 2011-05-02 at 16:29 -0700, anirban.chakraborty@qlogic.com wrote:
> >> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
> >>
> >> Added support to take FW dump via ethtool.
> > [...]
> >> --- a/ethtool.c
> >> +++ b/ethtool.c
> > [...]
> >> @@ -263,6 +270,12 @@ static struct option {
> >>               "Get Rx ntuple filters and actions\n" },
> >>      { "-P", "--show-permaddr", MODE_PERMADDR,
> >>               "Show permanent hardware address" },
> >> +    { "-W", "--get-dump", MODE_GET_DUMP,
> >> +             "Get dump level\n" },
> >> +    { "-Wd", "--get-dump-data", MODE_GET_DUMP_DATA,
> >> +             "Get dump data", "FILENAME " "Name of the dump file\n" },
> >
> > The short options should only include one letter.  Also the general
> > pattern is that 'get' options use lower-case letters and 'set' options
> > use upper-case letters.  No, I'm not sure how best to handle a set of 3
> > options.  Maybe you can combine --get-dump and --get-dump-data, making
> > the filename optional?
> 
> ethtool already has "-f" option to flash/write the FW image.
> Can you use "-F" to get the FW dump data out?
> And then may be, these options can be extended to get the get/set dump levels?

Although '-F' would be nicely mnemonic, firmware update and dump are
different enough that I don't think they make sense as a pair.  If a
driver implements both then there is the risk of a typo causing the
firmware image file to be overwritten with a firmware dump.  I also
don't think it's possible to extend the '-f' option without breaking
compatibility with scripts that already use it.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: Divide error in bictcp_cong_avoid ?
From: Stephen Hemminger @ 2011-05-04 18:33 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: TB, netdev, linux-kernel
In-Reply-To: <20110504104940.cd0d9e9a.rdunlap@xenotime.net>

On Wed, 4 May 2011 10:49:40 -0700
Randy Dunlap <rdunlap@xenotime.net> wrote:

> [add cc to netdev]
> 
> 
> On Wed, 04 May 2011 12:03:31 -0400 TB wrote:
> 
> > We're having this issue sporadically on a few servers and this is the 
> > backtrace we get from netconsole.
> > 
> > 
> > [28522.642419] divide error: 0000 [#1] SMP
> > [28522.642457] last sysfs file: 
> > /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/vendor
> > [28522.642504] CPU 0
> > [28522.642511] Modules linked in:
> >   i2c_i801
> >   i2c_core
> >   evdev
> >   button
> > [28522.642570]
> > [28522.642590] Pid: 0, comm: swapper Not tainted 2.6.38.5 #6
> > 
> >   Supermicro X8DTH-i/6/iF/6F
> > /
> >   X8DTH
> > 

What kernel version? I suspect they are running something really old
kernel like RHEL 5 since BIC has not been the default congestion control for
several years.

-- 

^ permalink raw reply

* Re: Divide error in bictcp_cong_avoid ?
From: Randy Dunlap @ 2011-05-04 18:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: TB, netdev, linux-kernel
In-Reply-To: <20110504113351.4643a0c9@nehalam>

On Wed, 4 May 2011 11:33:51 -0700 Stephen Hemminger wrote:

> On Wed, 4 May 2011 10:49:40 -0700
> Randy Dunlap <rdunlap@xenotime.net> wrote:
> 
> > [add cc to netdev]
> > 
> > 
> > On Wed, 04 May 2011 12:03:31 -0400 TB wrote:
> > 
> > > We're having this issue sporadically on a few servers and this is the 
> > > backtrace we get from netconsole.
> > > 
> > > 
> > > [28522.642419] divide error: 0000 [#1] SMP
> > > [28522.642457] last sysfs file: 
> > > /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/vendor
> > > [28522.642504] CPU 0
> > > [28522.642511] Modules linked in:
> > >   i2c_i801
> > >   i2c_core
> > >   evdev
> > >   button
> > > [28522.642570]
> > > [28522.642590] Pid: 0, comm: swapper Not tainted 2.6.38.5 #6   <<<<<<<<<<<<<<
> > > 
> > >   Supermicro X8DTH-i/6/iF/6F
> > > /
> > >   X8DTH
> > > 
> 
> What kernel version? I suspect they are running something really old
> kernel like RHEL 5 since BIC has not been the default congestion control for
> several years.

Please see about 6 lines above.... at the "<<<<<<<<<<<".


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Alexander Duyck @ 2011-05-04 18:35 UTC (permalink / raw)
  To: Dimitris Michailidis
  Cc: Ben Hutchings, davem@davemloft.net, Kirsher, Jeffrey T,
	netdev@vger.kernel.org
In-Reply-To: <4DC1986D.5020706@chelsio.com>

On 5/4/2011 11:18 AM, Dimitris Michailidis wrote:
> On 05/04/2011 10:41 AM, Alexander Duyck wrote:
>> On 5/4/2011 10:33 AM, Dimitris Michailidis wrote:
>>> On 05/04/2011 10:24 AM, Ben Hutchings wrote:
>>>> On Wed, 2011-05-04 at 10:09 -0700, Dimitris Michailidis wrote:
>>>>> On 05/03/2011 04:34 PM, Ben Hutchings wrote:
>>>>>> On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
>>>>>>> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where
>>>>>>> there is
>>>>>>> enough knowledge to pick an appropriate slot.  So I'd remove the
>>>>>>>
>>>>>>>        if (loc == RX_CLS_LOC_UNSPEC)
>>>>>>>
>>>>>>> block above, let the driver pick a slot, and then pass the
>>>>>>> selected location
>>>>>>> back for ethtool to report.
>>>>>> But first we have to specify this in the ethtool API.  So please
>>>>>> propose
>>>>>> a patch to ethtool.h.
>>>>> In the past we discussed that being able to specify the first
>>>>> available slot or
>>>>> the last available would be useful, so something like the below?
>>>>>
>>>>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>>>>> index 4194a20..909ef79 100644
>>>>> --- a/include/linux/ethtool.h
>>>>> +++ b/include/linux/ethtool.h
>>>>> @@ -442,7 +442,8 @@ struct ethtool_flow_ext {
>>>>>      *    includes the %FLOW_EXT flag.
>>>>>      * @ring_cookie: RX ring/queue index to deliver to, or
>>>>> %RX_CLS_FLOW_DISC
>>>>>      *    if packets should be discarded
>>>>> - * @location: Index of filter in hardware table
>>>>> + * @location: Index of filter in hardware table, or
>>>>> %RX_CLS_FLOW_FIRST_LOC for
>>>>> + *    first available index, or %RX_CLS_FLOW_LAST_LOC for last
>>>>> available
>>>> [...]
>>>>
>>>> I think that's reasonable.  We should also explicitly state that
>>>> location determines priority, i.e. if a packet matches two filters then
>>>> the one with the lower location wins.
>>>
>>> Easy and true for a TCAM.  For hashing would you use the location to
>>> decide how
>>> to order filters that fall in the same bucket?
>>>
>>
>> The problem is none of this is backwards compatible.  The niu driver has
>> supported the network flow classifier rules since 2.6.30.  Adding this
>> would cause all rule setups for niu to fail because these locations
>> would have to exist outside of the current rule locations.
>
> Looking at niu it already has a problem with its handling of location because
> it does
>
> 	u16 idx;
>
> 	idx = nfc->fs.location;
>
> i.e., it disregards the upper 16 bits of location.  With the current code niu
> would return -EINVAL for the two new constants, which seems correct.

Actually it looks like this is going to trigger multiple bugs. 
Specifically since the upper 16 bits are being discarded it is possible 
to specify a value such as 0x10001 hex and it will misread that as 
0x0001.  This is something that will probably need to be fixed in niu.

>>
>> This is why I was suggesting that the best approach would be to update
>> the kernel to add a separate ioctl for letting the driver setup the
>> location.  We can just attempt to make that call and when we get the
>> EOPNOTSUPP errno we know the device driver doesn't support it and can
>> then let the rule manager take over.
>
> The problem with this is the location is dependent on the type of filter being
> added.  I.e., the ioctl would need to get all the information the existing
> ioctl carries making the new ioctl a small superset of the current one.
> Additionally, if the driver only allocates a location in a separate ioctl how
> does it know that the app is actually going to use it?

It doesn't know that the application is actually going to use it.  What 
should happen is that the location should be verified by the driver when 
it is used in the rule insertion call.  After all it is fully possible 
for the user to specify a location out of range since the insert call 
does no validation in ethtool if the user specified the location.  That 
responsibility now lies with the driver.

Thanks,

Alex


^ permalink raw reply

* RE: bug fix for out-of-tree igb driver
From: Wyborny, Carolyn @ 2011-05-04 18:36 UTC (permalink / raw)
  To: Ben Greear, netdev
In-Reply-To: <4DC1959B.4010609@candelatech.com>



>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Ben Greear
>Sent: Wednesday, May 04, 2011 11:06 AM
>To: netdev
>Subject: bug fix for out-of-tree igb driver
>
>Just in case someone is using the out-of-tree igb driver, I
>think I found a bug.
>
>The tx_queue_mapping method should have r_idx >= adapter->num_tx_queues
>as shown below.  The driver had: r_idx > adapter->num_tx_queues
>originally.
>
>Its possible my code is buggy for sending queue_mapping of 1
>when num_tx_queues is 1, but either way, the driver
>shouldn't crash.
>
>Upstream igb driver in mainline kernels does not have this problem.
>
>static inline struct igb_ring *igb_tx_queue_mapping(struct igb_adapter
>*adapter,
>                                                     struct sk_buff
>*skb)
>{
>	unsigned int r_idx = skb->queue_mapping & (IGB_MAX_TX_QUEUES - 1);
>
>	if (r_idx >= adapter->num_tx_queues)
>		r_idx = r_idx % adapter->num_tx_queues;
>
>	return adapter->tx_ring[r_idx];
>}
>
>Thanks,
>Ben
>
>--
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks Ben,

I'll be fixing that ASAP and updating the driver version there.

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation




^ permalink raw reply

* [ethtool PATCH 0/4] v6 Add support for network flow classifier
From: Alexander Duyck @ 2011-05-04 18:41 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, dm, bhutchings; +Cc: netdev

This series is nearly identical to the last series with only a few changes in
the RX packet classification interface patch.  Specifically I have narrowed
down the rule manager to the point where it is only used to identify a
location for a rule if the rule location is unspecified.  The display
functionality for rules now just pulls the rule list in and goes through the
list from top to bottom displaying the rules instead of initializing the rule
manager and pulling the rules form there.

The main advantage to this approach is that the rule manager could easily be
replaced by a future ioctl call and on the failure of that ioctl call with a
return of EOPNOTSUPP it should be easy to fall back and just use the rule
manager.

Thanks,

Alex

---

Alexander Duyck (3):
      Add support for __be64 and bitops, centralize several needed macros
      Cleanup defines and header includes to address several issues
      ethtool: remove strings based approach for displaying n-tuple

Santwona Behera (1):
      v6 Add RX packet classification interface

 Makefile.am      |    3 
 ethtool-bitops.h |   25 +
 ethtool-util.h   |   47 ++
 ethtool.8.in     |  194 +++++-----
 ethtool.c        |  449 +++++++++++------------
 rxclass.c        | 1073 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1467 insertions(+), 324 deletions(-)
 create mode 100644 ethtool-bitops.h
 create mode 100644 rxclass.c

-- 

^ permalink raw reply

* [ethtool PATCH 2/4] Cleanup defines and header includes to address several issues
From: Alexander Duyck @ 2011-05-04 18:41 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, dm, bhutchings; +Cc: netdev
In-Reply-To: <20110504183752.26394.29231.stgit@gitlad.jf.intel.com>

This change is meant to address several issues.  First it moves the check
for ethtool-config.h into ethtool-util.h the reason for this change is so
that any references to ethtool-util.h outside of ethtool.c will use the
correct defines for the endian types.

In addition I have pulled several headers that will be common to both
ethtool.c and rxclass.c into the ethtool-util.h header file.  I am also
centralizing several macros that will be needed across multiple files when
I implement the network flow classifier rules.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 ethtool-util.h |   17 +++++++++++++++--
 ethtool.c      |   17 +----------------
 2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/ethtool-util.h b/ethtool-util.h
index f053028..6a4f3f4 100644
--- a/ethtool-util.h
+++ b/ethtool-util.h
@@ -3,8 +3,13 @@
 #ifndef ETHTOOL_UTIL_H__
 #define ETHTOOL_UTIL_H__
 
+#ifdef HAVE_CONFIG_H
+#include "ethtool-config.h"
+#endif
 #include <sys/types.h>
 #include <endian.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
 
 /* ethtool.h expects these to be defined by <linux/types.h> */
 #ifndef HAVE_BE_TYPES
@@ -12,14 +17,14 @@ typedef __uint16_t __be16;
 typedef __uint32_t __be32;
 #endif
 
-#include "ethtool-copy.h"
-
 typedef unsigned long long u64;
 typedef __uint32_t u32;
 typedef __uint16_t u16;
 typedef __uint8_t u8;
 typedef __int32_t s32;
 
+#include "ethtool-copy.h"
+
 #if __BYTE_ORDER == __BIG_ENDIAN
 static inline u16 cpu_to_be16(u16 value)
 {
@@ -40,6 +45,14 @@ static inline u32 cpu_to_be32(u32 value)
 }
 #endif
 
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#ifndef SIOCETHTOOL
+#define SIOCETHTOOL     0x8946
+#endif
+
 /* National Semiconductor DP83815, DP83816 */
 int natsemi_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
 int natsemi_dump_eeprom(struct ethtool_drvinfo *info,
diff --git a/ethtool.c b/ethtool.c
index 9ad7000..24d4e4f 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -21,19 +21,12 @@
  *   * show settings for all devices
  */
 
-#ifdef HAVE_CONFIG_H
-#  include "ethtool-config.h"
-#endif
-
-#include <sys/types.h>
+#include "ethtool-util.h"
 #include <string.h>
 #include <stdlib.h>
-#include <sys/types.h>
-#include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <stdio.h>
 #include <errno.h>
-#include <net/if.h>
 #include <sys/utsname.h>
 #include <limits.h>
 #include <ctype.h>
@@ -43,18 +36,10 @@
 #include <arpa/inet.h>
 
 #include <linux/sockios.h>
-#include "ethtool-util.h"
 
-
-#ifndef SIOCETHTOOL
-#define SIOCETHTOOL     0x8946
-#endif
 #ifndef MAX_ADDR_LEN
 #define MAX_ADDR_LEN	32
 #endif
-#ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-#endif
 
 #ifndef HAVE_NETIF_MSG
 enum {


^ permalink raw reply related

* [ethtool PATCH 4/4] v6 Add RX packet classification interface
From: Alexander Duyck @ 2011-05-04 18:41 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, dm, bhutchings; +Cc: netdev
In-Reply-To: <20110504183752.26394.29231.stgit@gitlad.jf.intel.com>

From: Santwona Behera <santwona.behera@sun.com>

This patch was originally introduced as:
  [PATCH 1/3] [ethtool] Add rx pkt classification interface
  Signed-off-by: Santwona Behera <santwona.behera@sun.com>
  http://patchwork.ozlabs.org/patch/23223/

v2:
I have updated it to address a number of issues.  As a result I removed the
local caching of rules due to the fact that there were memory leaks in this
code and the rule manager would consume over 1Mb of space for an 8K table
when all that was needed was 1K in order to store which rules were active
and which were not.

In addition I dropped the use of regions as there were multiple issue found
including the fact that the regions were not properly expanding beyond 2
and the fact that the regions required reading all of the rules in order to
correctly expand beyond 2.  By dropping the regions from the rule manager
it is possible to write a much cleaner interface leaving region management
to be done by either the driver or by external management scripts.

v3:
The latest update to this patch now inverts the masks to match the mask
types used for n-tuple.  As such a network flow classifier is defined using
the exact same syntax as n-tuple, and the tool will correct for the fact
that NFC uses the 1's compliment of the n-tuple mask.

I also updated the ordering of new rules being added.  All new rules will
take the highest numbered open rule when no location is specified.

Since NFC now uses the same syntax as n-tuple I added code such that now
when location is not specified the -U option will first try to add a new
n-tuple rule, and if that fails with a ENOTSUPP it will then try to add the
rule via the NFC interface.

Finally I split out the addition of bitops and the updates to documentation
into separate patches.  This makes the total patch size a bit more
manageable since the addition of NFC and the merging of it with n-tuple were
combined into this patch.

v4:
This change merges the ntuple and network flow classifier rules so that if
we setup a rule and the device has the NTUPLE flag set we will first try to
use set_rx_ntuple.  If that fails with EOPNOTSUPP we then will attempt to
use the network flow classifier rule insertion.  This way we can support
legacy configurations such as niu on kernels prior to 2.6.40 that support
network flow classifier but not ntuple, but for drivers such as ixgbe we
can test for ntuple first, and then network flow classifier.

This patch has also updated the output to make use of the updated network
flow classifier extensions that have been accepted into the kernel.

v5:
This change merged the documentation update into this patch.  In addition the
documentation changes were made such that there is only one listing of the
individual options and they are all listed as optional.

In addition this patch contains several fixes to address things such as the
fact that we were maintaining the table logic even though we only need it for
displaying all of the rules, or when adding a rule with no location specified.
As such all of the logic for deleting or finding rules in the table has been
removed.

v6:
This change further reduced the use of the rule manager so that it is only
initialized and used when a rule is added without a location specified.  For
future expansion it should be possible to add a call to a in kernel ethtool
ioctl that can effectively replace the rule manager entirely.  However this
basic rule manager will be necessary to support legacy kernels.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 Makefile.am    |    3 
 ethtool-util.h |   14 +
 ethtool.8.in   |  194 +++++-----
 ethtool.c      |  400 ++++++++++++---------
 rxclass.c      | 1073 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1418 insertions(+), 266 deletions(-)
 create mode 100644 rxclass.c

diff --git a/Makefile.am b/Makefile.am
index a0d2116..0262c31 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -8,7 +8,8 @@ ethtool_SOURCES = ethtool.c ethtool-copy.h ethtool-util.h	\
 		  amd8111e.c de2104x.c e100.c e1000.c igb.c	\
 		  fec_8xx.c ibm_emac.c ixgb.c ixgbe.c natsemi.c	\
 		  pcnet32.c realtek.c tg3.c marvell.c vioc.c	\
-		  smsc911x.c at76c50x-usb.c sfc.c stmmac.c
+		  smsc911x.c at76c50x-usb.c sfc.c stmmac.c	\
+		  rxclass.c
 
 dist-hook:
 	cp $(top_srcdir)/ethtool.spec $(distdir)
diff --git a/ethtool-util.h b/ethtool-util.h
index d8b621c..79be7f2 100644
--- a/ethtool-util.h
+++ b/ethtool-util.h
@@ -65,6 +65,8 @@ static inline u64 cpu_to_be64(u64 value)
 #define SIOCETHTOOL     0x8946
 #endif
 
+#define	RX_CLS_LOC_UNSPEC	0xffffffffUL
+
 /* National Semiconductor DP83815, DP83816 */
 int natsemi_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
 int natsemi_dump_eeprom(struct ethtool_drvinfo *info,
@@ -128,4 +130,14 @@ int sfc_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
 int st_mac100_dump_regs(struct ethtool_drvinfo *info,
 			struct ethtool_regs *regs);
 int st_gmac_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
-#endif
+
+/* Rx flow classification */
+int rxclass_parse_ruleopts(char **optstr, int opt_cnt,
+			   struct ethtool_rx_flow_spec *fsp);
+int rxclass_rule_getall(int fd, struct ifreq *ifr);
+int rxclass_rule_get(int fd, struct ifreq *ifr, __u32 loc);
+int rxclass_rule_ins(int fd, struct ifreq *ifr,
+		     struct ethtool_rx_flow_spec *fsp);
+int rxclass_rule_del(int fd, struct ifreq *ifr, __u32 loc);
+
+#endif /* ETHTOOL_UTIL_H__ */
diff --git a/ethtool.8.in b/ethtool.8.in
index 9f484fb..c923319 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -42,10 +42,20 @@
 [\\fB\\$1\\fP\ \\fIN\\fP]
 ..
 .\"
+.\"	.BM - same as above but has a mask field for format "[value N [m N]]"
+.\"
+.de BM
+[\\fB\\$1\\fP\ \\fIN\\fP\ [\\fBm\\fP\ \\fIN\\fP]]
+..
+.\"
 .\"	\(*MA - mac address
 .\"
 .ds MA \fIxx\fP\fB:\fP\fIyy\fP\fB:\fP\fIzz\fP\fB:\fP\fIaa\fP\fB:\fP\fIbb\fP\fB:\fP\fIcc\fP
 .\"
+.\"	\(*PA - IP address
+.\"
+.ds PA \fIx\fP\fB.\fP\fIx\fP\fB.\fP\fIx\fP\fB.\fP\fIx\fP
+.\"
 .\"	\(*WO - wol flags
 .\"
 .ds WO \fBp\fP|\fBu\fP|\fBm\fP|\fBb\fP|\fBa\fP|\fBg\fP|\fBs\fP|\fBd\fP...
@@ -57,6 +67,12 @@
 .\"	\(*HO - hash options
 .\"
 .ds HO \fBm\fP|\fBv\fP|\fBt\fP|\fBs\fP|\fBd\fP|\fBf\fP|\fBn\fP|\fBr\fP...
+.\"
+.\"	\(*NC - Network Classifier type values
+.\"
+.ds NC \fBether\fP|\fBip4\fP|\fBtcp4\fP|\fBudp4\fP|\fBsctp4\fP|\fBah4\fP|\fBesp4\fP
+
+.\"
 .\" Start URL.
 .de UR
 .  ds m1 \\$1\"
@@ -236,8 +252,7 @@ ethtool \- query or control network driver and hardware settings
 .HP
 .B ethtool \-N
 .I ethX
-.RB [ rx\-flow\-hash \ \*(FL
-.RB \ \*(HO]
+.RB [ rx\-flow\-hash \ \*(FL \  \*(HO]
 .HP
 .B ethtool \-x|\-\-show\-rxfh\-indir
 .I ethX
@@ -257,50 +272,28 @@ ethtool \- query or control network driver and hardware settings
 .HP
 .B ethtool \-u|\-\-show\-ntuple
 .I ethX
-.TP
+.BN rule
+.HP
+
 .BI ethtool\ \-U|\-\-config\-ntuple \ ethX
-.RB {
-.A3 flow\-type tcp4 udp4 sctp4
-.RB [ src\-ip
-.IR addr
-.RB [ src\-ip\-mask
-.IR mask ]]
-.RB [ dst\-ip
-.IR addr
-.RB [ dst\-ip\-mask
-.IR mask ]]
-.RB [ src\-port
-.IR port
-.RB [ src\-port\-mask
-.IR mask ]]
-.RB [ dst\-port
-.IR port
-.RB [ dst\-port\-mask
-.IR mask ]]
-.br
-.RB | \ flow\-type\ ether
-.RB [ src
-.IR mac\-addr
-.RB [ src\-mask
-.IR mask ]]
-.RB [ dst
-.IR mac\-addr
-.RB [ dst\-mask
-.IR mask ]]
-.RB [ proto
-.IR N
-.RB [ proto\-mask
-.IR mask ]]\ }
-.br
-.RB [ vlan
-.IR VLAN\-tag
-.RB [ vlan\-mask
-.IR mask ]]
-.RB [ user\-def
-.IR data
-.RB [ user\-def\-mask
-.IR mask ]]
-.RI action \ N
+.BN delete
+.RB [\  flow\-type \ \*(NC
+.RB [ src \ \*(MA\ [ m \ \*(MA]]
+.RB [ dst \ \*(MA\ [ m \ \*(MA]]
+.BM proto
+.RB [ src\-ip \ \*(PA\ [ m \ \*(PA]]
+.RB [ dst\-ip \ \*(PA\ [ m \ \*(PA]]
+.BM tos
+.BM l4proto
+.BM src\-port
+.BM dst\-port
+.BM spi
+.BM vlan\-etype
+.BM vlan
+.BM user\-def
+.BN action
+.BN loc
+.RB ]
 .
 .\" Adjust lines (i.e. full justification) and hyphenate.
 .ad
@@ -630,77 +623,90 @@ Default region is 0 which denotes all regions in the flash.
 .TP
 .B \-u \-\-show\-ntuple
 Get Rx ntuple filters and actions, then display them to the user.
+.TP
+.BI rule \ N
+Retrieves the RX classification rule with the given ID.
 .PD
 .RE
 .TP
 .B \-U \-\-config\-ntuple
 Configure Rx ntuple filters and actions
 .TP
-.B flow\-type tcp4|udp4|sctp4|ether
+.BI delete \ N
+Deletes the RX classification rule with the given ID.
+.TP
+.B flow\-type \*(NC
 .TS
 nokeep;
 lB	l.
+ether	Ethernet
+ip4	Raw IPv4
 tcp4	TCP over IPv4
 udp4	UDP over IPv4
 sctp4	SCTP over IPv4
-ether	Ethernet
+ah4	IPSEC AH over IPv4
+esp4	IPSEC ESP over IPv4
 .TE
+.PP
+All fields below that include a mask option may either use "m" to
+indicate a mask, or may use the full name of the field with a "-mask"
+appended to indicate that this is the mask for a given field.
+.PD
+.RE
 .TP
-.BI src\-ip \ addr
-Includes the source IP address, specified using dotted-quad notation
-or as a single 32-bit number.
-.TP
-.BI src\-ip\-mask \ mask
-Specify a mask for the source IP address.
-.TP
-.BI dst\-ip \ addr
-Includes the destination IP address.
-.TP
-.BI dst\-ip\-mask \ mask
-Specify a mask for the destination IP address.
-.TP
-.BI src\-port \ port
-Includes the source port.
-.TP
-.BI src\-port\-mask \ mask
-Specify a mask for the source port.
+.BR src \ \*(MA\ [ m \ \*(MA]
+Includes the source MAC address, specified as 6 bytes in hexadecimal
+separated by colons, along with an optional mask.  Valid only for
+flow-type ether.
 .TP
-.BI dst\-port \ port
-Includes the destination port.
+.BR dst \ \*(MA\ [ m \ \*(MA]
+Includes the destination MAC address, specified as 6 bytes in hexadecimal
+separated by colons, along with an optional mask.  Valid only for
+flow-type ether.
 .TP
-.BI dst\-port\-mask \ mask
-Specify a mask for the destination port.
+.BI proto \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Includes the Ethernet protocol number (ethertype) and an optional mask.
+Valid only for flow-type ether.
 .TP
-.BI src \ mac\-addr
-Includes the source MAC address, specified as 6 bytes in hexadecimal
-separated by colons.
+.BR src\-ip \ \*(PA\ [ m \ \*(PA]
+Specify the source IP address of the incoming packet to match along with
+an optional mask.  Valid for all IPv4 based flow-types.
 .TP
-.BI src\-mask \ mask
-Specify a mask for the source MAC address.
+.BR dst\-ip \ \*(PA\ [ m \ \*(PA]
+Specify the destination IP address of the incoming packet to match along
+with an optional mask.  Valid for all IPv4 based flow-types.
 .TP
-.BI dst \ mac\-addr
-Includes the destination MAC address.
+.BI tos \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the Type of Service field in the incoming packet to
+match along with an optional mask.  Applies to all IPv4 based flow-types.
 .TP
-.BI dst\-mask \ mask
-Specify a mask for the destination MAC address.
+.BI l4proto \ N \\fR\ [\\fPl4m \ N \\fR]\\fP
+Includes the layer 4 protocol number and optional mask.  Valid only for
+flow-type ip4.
 .TP
-.BI proto \ N
-Includes the Ethernet protocol number (ethertype).
+.BI src\-port \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the source port field (applicable to TCP/UDP packets)
+in the incoming packet to match along with an optional mask.  Valid for
+flow-types ip4, tcp4, udp4, and sctp4.
 .TP
-.BI proto\-mask \ mask
-Specify a mask for the Ethernet protocol number.
+.BI dst\-port \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the destination port field (applicable to TCP/UDP
+packets)in the incoming packet to match along with an optional mask.
+Valid for flow-types ip4, tcp4, udp4, and sctp4.
 .TP
-.BI vlan \ VLAN\-tag
-Includes the VLAN tag.
+.BI spi \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the security parameter index field (applicable to
+AH/ESP packets)in the incoming packet to match along with an optional
+mask.  Valid for flow-types ip4, ah4, and esp4.
 .TP
-.BI vlan\-mask \ mask
-Specify a mask for the VLAN tag.
+.BI vlan\-etype \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Includes the VLAN tag Ethertype and an optional mask.
 .TP
-.BI user\-def \ data
-Includes 64-bits of user-specific data.
+.BI vlan \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Includes the VLAN tag and an optional mask.
 .TP
-.BI user\-def\-mask \ mask
-Specify a mask for the user-specific data.
+.BI user\-def \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Includes 64-bits of user-specific data and an optional mask.
 .TP
 .BI action \ N
 Specifies the Rx queue to send packets to, or some other action.
@@ -711,6 +717,11 @@ lB	l.
 -1	Drop the matched flow
 0 or higher	Rx queue to route the flow
 .TE
+.TP
+.BI loc \ N
+Specify the location/ID to insert the rule. This will overwrite
+any rule present in that location and will not go through any
+of the rule ordering process.
 .SH BUGS
 Not supported (in part or whole) on all network drivers.
 .SH AUTHOR
@@ -724,7 +735,8 @@ Jakub Jelinek,
 Andre Majorel,
 Eli Kupermann,
 Scott Feldman,
-Andi Kleen.
+Andi Kleen,
+Alexander Duyck.
 .SH AVAILABILITY
 .B ethtool
 is available from
diff --git a/ethtool.c b/ethtool.c
index 24d4e4f..2e04d87 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -6,6 +6,7 @@
  * Kernel 2.4 update Copyright 2001 Jeff Garzik <jgarzik@mandrakesoft.com>
  * Wake-on-LAN,natsemi,misc support by Tim Hockin <thockin@sun.com>
  * Portions Copyright 2002 Intel
+ * Portions Copyright (C) Sun Microsystems 2008
  * do_test support by Eli Kupermann <eli.kupermann@intel.com>
  * ETHTOOL_PHYS_ID support by Chris Leech <christopher.leech@intel.com>
  * e1000 support by Scott Feldman <scott.feldman@intel.com>
@@ -14,6 +15,7 @@
  * amd8111e support by Reeja John <reeja.john@amd.com>
  * long arguments by Andi Kleen.
  * SMSC LAN911x support by Steve Glendinning <steve.glendinning@smsc.com>
+ * Rx Network Flow Control configuration support <santwona.behera@sun.com>
  * Various features by Ben Hutchings <bhutchings@solarflare.com>;
  *	Copyright 2009, 2010 Solarflare Communications
  *
@@ -85,14 +87,13 @@ static int do_gstats(int fd, struct ifreq *ifr);
 static int rxflow_str_to_type(const char *str);
 static int parse_rxfhashopts(char *optstr, u32 *data);
 static char *unparse_rxfhashopts(u64 opts);
-static void parse_rxntupleopts(int argc, char **argp, int first_arg);
 static int dump_rxfhash(int fhash, u64 val);
 static int do_srxclass(int fd, struct ifreq *ifr);
 static int do_grxclass(int fd, struct ifreq *ifr);
 static int do_grxfhindir(int fd, struct ifreq *ifr);
 static int do_srxfhindir(int fd, struct ifreq *ifr);
-static int do_srxntuple(int fd, struct ifreq *ifr);
-static int do_grxntuple(int fd, struct ifreq *ifr);
+static int do_srxclsrule(int fd, struct ifreq *ifr);
+static int do_grxclsrule(int fd, struct ifreq *ifr);
 static int do_flash(int fd, struct ifreq *ifr);
 static int do_permaddr(int fd, struct ifreq *ifr);
 
@@ -123,8 +124,8 @@ static enum {
 	MODE_SNFC,
 	MODE_GRXFHINDIR,
 	MODE_SRXFHINDIR,
-	MODE_SNTUPLE,
-	MODE_GNTUPLE,
+	MODE_SCLSRULE,
+	MODE_GCLSRULE,
 	MODE_FLASHDEV,
 	MODE_PERMADDR,
 } mode = MODE_GSET;
@@ -230,22 +231,28 @@ static struct option {
 		"indirection" },
     { "-X", "--set-rxfh-indir", MODE_SRXFHINDIR, "Set Rx flow hash indirection",
 		"		equal N | weight W0 W1 ...\n" },
-    { "-U", "--config-ntuple", MODE_SNTUPLE, "Configure Rx ntuple filters "
+    { "-U", "--config-ntuple", MODE_SCLSRULE, "Configure Rx ntuple filters "
 		"and actions",
-		"		{ flow-type tcp4|udp4|sctp4\n"
-		"		  [ src-ip ADDR [src-ip-mask MASK] ]\n"
-		"		  [ dst-ip ADDR [dst-ip-mask MASK] ]\n"
-		"		  [ src-port PORT [src-port-mask MASK] ]\n"
-		"		  [ dst-port PORT [dst-port-mask MASK] ]\n"
-		"		| flow-type ether\n"
-		"		  [ src MAC-ADDR [src-mask MASK] ]\n"
-		"		  [ dst MAC-ADDR [dst-mask MASK] ]\n"
-		"		  [ proto N [proto-mask MASK] ] }\n"
-		"		[ vlan VLAN-TAG [vlan-mask MASK] ]\n"
-		"		[ user-def DATA [user-def-mask MASK] ]\n"
-		"		action N\n" },
-    { "-u", "--show-ntuple", MODE_GNTUPLE,
-		"Get Rx ntuple filters and actions\n" },
+		"		[ delete %d ] |\n"
+		"		[ flow-type ether|ip4|tcp4|udp4|sctp4|ah4|esp4\n"
+		"			[ src %x:%x:%x:%x:%x:%x [m %x:%x:%x:%x:%x:%x] ]\n"
+		"			[ dst %x:%x:%x:%x:%x:%x [m %x:%x:%x:%x:%x:%x] ]\n"
+		"			[ proto %d [m %x] ]\n"
+		"			[ src-ip %d.%d.%d.%d [m %d.%d.%d.%d] ]\n"
+		"			[ dst-ip %d.%d.%d.%d [m %d.%d.%d.%d] ]\n"
+		"			[ tos %d [m %x] ]\n"
+		"			[ l4proto %d [m %x] ]\n"
+		"			[ src-port %d [m %x] ]\n"
+		"			[ dst-port %d [m %x] ]\n"
+		"			[ spi %d [m %x] ]\n"
+		"			[ vlan-etype %x [m %x] ]\n"
+		"			[ vlan %x [m %x] ]\n"
+		"			[ user-def %x [m %x] ]\n"
+		"			[ action %d ]\n"
+		"			[ loc %d]]\n" },
+    { "-u", "--show-ntuple", MODE_GCLSRULE,
+		"Get Rx ntuple filters and actions",
+		"		[ rule %d ]\n"},
     { "-P", "--show-permaddr", MODE_PERMADDR,
 		"Show permanent hardware address" },
     { "-h", "--help", MODE_HELP, "Show this help" },
@@ -371,26 +378,6 @@ static u32 rx_fhash_val = 0;
 static int rx_fhash_changed = 0;
 static int rxfhindir_equal = 0;
 static char **rxfhindir_weight = NULL;
-static int sntuple_changed = 0;
-static struct ethtool_rx_ntuple_flow_spec ntuple_fs;
-static int ntuple_ip4src_seen = 0;
-static int ntuple_ip4src_mask_seen = 0;
-static int ntuple_ip4dst_seen = 0;
-static int ntuple_ip4dst_mask_seen = 0;
-static int ntuple_psrc_seen = 0;
-static int ntuple_psrc_mask_seen = 0;
-static int ntuple_pdst_seen = 0;
-static int ntuple_pdst_mask_seen = 0;
-static int ntuple_ether_dst_seen = 0;
-static int ntuple_ether_dst_mask_seen = 0;
-static int ntuple_ether_src_seen = 0;
-static int ntuple_ether_src_mask_seen = 0;
-static int ntuple_ether_proto_seen = 0;
-static int ntuple_ether_proto_mask_seen = 0;
-static int ntuple_vlan_tag_seen = 0;
-static int ntuple_vlan_tag_mask_seen = 0;
-static int ntuple_user_def_seen = 0;
-static int ntuple_user_def_mask_seen = 0;
 static char *flash_file = NULL;
 static int flash = -1;
 static int flash_region = -1;
@@ -399,6 +386,11 @@ static int msglvl_changed;
 static u32 msglvl_wanted = 0;
 static u32 msglvl_mask = 0;
 
+static int rx_class_rule_get = -1;
+static int rx_class_rule_del = -1;
+static int rx_class_rule_added = 0;
+static struct ethtool_rx_flow_spec rx_rule_fs;
+
 static enum {
 	ONLINE=0,
 	OFFLINE,
@@ -511,58 +503,6 @@ static struct cmdline_info cmdline_coalesce[] = {
 	{ "tx-frames-high", CMDL_S32, &coal_tx_frames_high_wanted, &ecoal.tx_max_coalesced_frames_high },
 };
 
-static struct cmdline_info cmdline_ntuple_tcp_ip4[] = {
-	{ "src-ip", CMDL_IP4, &ntuple_fs.h_u.tcp_ip4_spec.ip4src, NULL,
-	  0, &ntuple_ip4src_seen },
-	{ "src-ip-mask", CMDL_IP4, &ntuple_fs.m_u.tcp_ip4_spec.ip4src, NULL,
-	  0, &ntuple_ip4src_mask_seen },
-	{ "dst-ip", CMDL_IP4, &ntuple_fs.h_u.tcp_ip4_spec.ip4dst, NULL,
-	  0, &ntuple_ip4dst_seen },
-	{ "dst-ip-mask", CMDL_IP4, &ntuple_fs.m_u.tcp_ip4_spec.ip4dst, NULL,
-	  0, &ntuple_ip4dst_mask_seen },
-	{ "src-port", CMDL_BE16, &ntuple_fs.h_u.tcp_ip4_spec.psrc, NULL,
-	  0, &ntuple_psrc_seen },
-	{ "src-port-mask", CMDL_BE16, &ntuple_fs.m_u.tcp_ip4_spec.psrc, NULL,
-	  0, &ntuple_psrc_mask_seen },
-	{ "dst-port", CMDL_BE16, &ntuple_fs.h_u.tcp_ip4_spec.pdst, NULL,
-	  0, &ntuple_pdst_seen },
-	{ "dst-port-mask", CMDL_BE16, &ntuple_fs.m_u.tcp_ip4_spec.pdst, NULL,
-	  0, &ntuple_pdst_mask_seen },
-	{ "vlan", CMDL_U16, &ntuple_fs.vlan_tag, NULL,
-	  0, &ntuple_vlan_tag_seen },
-	{ "vlan-mask", CMDL_U16, &ntuple_fs.vlan_tag_mask, NULL,
-	  0, &ntuple_vlan_tag_mask_seen },
-	{ "user-def", CMDL_U64, &ntuple_fs.data, NULL,
-	  0, &ntuple_user_def_seen },
-	{ "user-def-mask", CMDL_U64, &ntuple_fs.data_mask, NULL,
-	  0, &ntuple_user_def_mask_seen },
-	{ "action", CMDL_S32, &ntuple_fs.action, NULL },
-};
-
-static struct cmdline_info cmdline_ntuple_ether[] = {
-	{ "dst", CMDL_MAC, ntuple_fs.h_u.ether_spec.h_dest, NULL,
-	  0, &ntuple_ether_dst_seen },
-	{ "dst-mask", CMDL_MAC, ntuple_fs.m_u.ether_spec.h_dest, NULL,
-	  0, &ntuple_ether_dst_mask_seen },
-	{ "src", CMDL_MAC, ntuple_fs.h_u.ether_spec.h_source, NULL,
-	  0, &ntuple_ether_src_seen },
-	{ "src-mask", CMDL_MAC, ntuple_fs.m_u.ether_spec.h_source, NULL,
-	  0, &ntuple_ether_src_mask_seen },
-	{ "proto", CMDL_BE16, &ntuple_fs.h_u.ether_spec.h_proto, NULL,
-	  0, &ntuple_ether_proto_seen },
-	{ "proto-mask", CMDL_BE16, &ntuple_fs.m_u.ether_spec.h_proto, NULL,
-	  0, &ntuple_ether_proto_mask_seen },
-	{ "vlan", CMDL_U16, &ntuple_fs.vlan_tag, NULL,
-	  0, &ntuple_vlan_tag_seen },
-	{ "vlan-mask", CMDL_U16, &ntuple_fs.vlan_tag_mask, NULL,
-	  0, &ntuple_vlan_tag_mask_seen },
-	{ "user-def", CMDL_U64, &ntuple_fs.data, NULL,
-	  0, &ntuple_user_def_seen },
-	{ "user-def-mask", CMDL_U64, &ntuple_fs.data_mask, NULL,
-	  0, &ntuple_user_def_mask_seen },
-	{ "action", CMDL_S32, &ntuple_fs.action, NULL },
-};
-
 static struct cmdline_info cmdline_msglvl[] = {
 	{ "drv", CMDL_FLAG, &msglvl_wanted, NULL,
 	  NETIF_MSG_DRV, &msglvl_mask },
@@ -833,8 +773,8 @@ static void parse_cmdline(int argc, char **argp)
 			    (mode == MODE_SNFC) ||
 			    (mode == MODE_GRXFHINDIR) ||
 			    (mode == MODE_SRXFHINDIR) ||
-			    (mode == MODE_SNTUPLE) ||
-			    (mode == MODE_GNTUPLE) ||
+			    (mode == MODE_SCLSRULE) ||
+			    (mode == MODE_GCLSRULE) ||
 			    (mode == MODE_PHYS_ID) ||
 			    (mode == MODE_FLASHDEV) ||
 			    (mode == MODE_PERMADDR)) {
@@ -918,16 +858,45 @@ static void parse_cmdline(int argc, char **argp)
 				i = argc;
 				break;
 			}
-			if (mode == MODE_SNTUPLE) {
+			if (mode == MODE_SCLSRULE) {
 				if (!strcmp(argp[i], "flow-type")) {
 					i += 1;
 					if (i >= argc) {
 						exit_bad_args();
 						break;
 					}
-					parse_rxntupleopts(argc, argp, i);
-					i = argc;
-					break;
+					if (rxclass_parse_ruleopts(&argp[i],
+							argc - i,
+							&rx_rule_fs) < 0) {
+						exit_bad_args();
+					} else {
+						i = argc;
+						rx_class_rule_added = 1;
+					}
+				} else if (!strcmp(argp[i], "delete")) {
+					i += 1;
+					if (i >= argc) {
+						exit_bad_args();
+						break;
+					}
+					rx_class_rule_del =
+						get_uint_range(argp[i], 0,
+							       INT_MAX);
+				} else {
+					exit_bad_args();
+				}
+				break;
+			}
+			if (mode == MODE_GCLSRULE) {
+				if (!strcmp(argp[i], "rule")) {
+					i += 1;
+					if (i >= argc) {
+						exit_bad_args();
+						break;
+					}
+					rx_class_rule_get =
+						get_uint_range(argp[i], 0,
+							       INT_MAX);
 				} else {
 					exit_bad_args();
 				}
@@ -1598,66 +1567,6 @@ static char *unparse_rxfhashopts(u64 opts)
 	return buf;
 }
 
-static void parse_rxntupleopts(int argc, char **argp, int i)
-{
-	ntuple_fs.flow_type = rxflow_str_to_type(argp[i]);
-
-	switch (ntuple_fs.flow_type) {
-	case TCP_V4_FLOW:
-	case UDP_V4_FLOW:
-	case SCTP_V4_FLOW:
-		parse_generic_cmdline(argc, argp, i + 1,
-				      &sntuple_changed,
-				      cmdline_ntuple_tcp_ip4,
-				      ARRAY_SIZE(cmdline_ntuple_tcp_ip4));
-		if (!ntuple_ip4src_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.ip4src = 0xffffffff;
-		if (!ntuple_ip4dst_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.ip4dst = 0xffffffff;
-		if (!ntuple_psrc_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.psrc = 0xffff;
-		if (!ntuple_pdst_seen)
-			ntuple_fs.m_u.tcp_ip4_spec.pdst = 0xffff;
-		ntuple_fs.m_u.tcp_ip4_spec.tos = 0xff;
-		break;
-	case ETHER_FLOW:
-		parse_generic_cmdline(argc, argp, i + 1,
-				      &sntuple_changed,
-				      cmdline_ntuple_ether,
-				      ARRAY_SIZE(cmdline_ntuple_ether));
-		if (!ntuple_ether_dst_seen)
-			memset(ntuple_fs.m_u.ether_spec.h_dest, 0xff, ETH_ALEN);
-		if (!ntuple_ether_src_seen)
-			memset(ntuple_fs.m_u.ether_spec.h_source, 0xff,
-			       ETH_ALEN);
-		if (!ntuple_ether_proto_seen)
-			ntuple_fs.m_u.ether_spec.h_proto = 0xffff;
-		break;
-	default:
-		fprintf(stderr, "Unsupported flow type \"%s\"\n", argp[i]);
-		exit(106);
-		break;
-	}
-
-	if (!ntuple_vlan_tag_seen)
-		ntuple_fs.vlan_tag_mask = 0xffff;
-	if (!ntuple_user_def_seen)
-		ntuple_fs.data_mask = 0xffffffffffffffffULL;
-
-	if ((ntuple_ip4src_mask_seen && !ntuple_ip4src_seen) ||
-	    (ntuple_ip4dst_mask_seen && !ntuple_ip4dst_seen) ||
-	    (ntuple_psrc_mask_seen && !ntuple_psrc_seen) ||
-	    (ntuple_pdst_mask_seen && !ntuple_pdst_seen) ||
-	    (ntuple_ether_dst_mask_seen && !ntuple_ether_dst_seen) ||
-	    (ntuple_ether_src_mask_seen && !ntuple_ether_src_seen) ||
-	    (ntuple_ether_proto_mask_seen && !ntuple_ether_proto_seen) ||
-	    (ntuple_vlan_tag_mask_seen && !ntuple_vlan_tag_seen) ||
-	    (ntuple_user_def_mask_seen && !ntuple_user_def_seen)) {
-		fprintf(stderr, "Cannot specify mask without value\n");
-		exit(107);
-	}
-}
-
 static struct {
 	const char *name;
 	int (*func)(struct ethtool_drvinfo *info, struct ethtool_regs *regs);
@@ -2019,10 +1928,10 @@ static int doit(void)
 		return do_grxfhindir(fd, &ifr);
 	} else if (mode == MODE_SRXFHINDIR) {
 		return do_srxfhindir(fd, &ifr);
-	} else if (mode == MODE_SNTUPLE) {
-		return do_srxntuple(fd, &ifr);
-	} else if (mode == MODE_GNTUPLE) {
-		return do_grxntuple(fd, &ifr);
+	} else if (mode == MODE_SCLSRULE) {
+		return do_srxclsrule(fd, &ifr);
+	} else if (mode == MODE_GCLSRULE) {
+		return do_grxclsrule(fd, &ifr);
 	} else if (mode == MODE_FLASHDEV) {
 		return do_flash(fd, &ifr);
 	} else if (mode == MODE_PERMADDR) {
@@ -3155,21 +3064,143 @@ static int do_permaddr(int fd, struct ifreq *ifr)
 	return err;
 }
 
+static int flow_spec_to_ntuple(struct ethtool_rx_flow_spec *fsp,
+			       struct ethtool_rx_ntuple_flow_spec *ntuple)
+{
+	size_t i;
+
+	/* verify location is not specified */
+	if (fsp->location != RX_CLS_LOC_UNSPEC)
+		return -1;
+
+	/* verify ring cookie can transfer to action */
+	if (fsp->ring_cookie > INT_MAX && fsp->ring_cookie < (u64)(-2))
+		return -1;
+
+	/* verify only one field is setting data field */
+	if ((fsp->flow_type & FLOW_EXT) &&
+	    (fsp->m_ext.data[0] || fsp->m_ext.data[1]) &&
+	    fsp->m_ext.vlan_etype)
+		return -1;
+
+	/* Set entire ntuple to ~0 to guarantee all masks are set */
+	memset(ntuple, ~0, sizeof(*ntuple));
+
+	/* set non-filter values */
+	ntuple->flow_type = fsp->flow_type;
+	ntuple->action = fsp->ring_cookie;
+
+	/*
+	 * Copy over header union, they are identical in layout however
+	 * the ntuple union contains additional padding on the end
+	 */
+	memcpy(&ntuple->h_u, &fsp->h_u, sizeof(fsp->h_u));
+
+	/*
+	 * The same rule mentioned above applies to the mask union.  However,
+	 * in addition we need to invert the mask bits to match the ntuple
+	 * mask which is 1 for masked, versus 0 for masked as seen in nfc.
+	 */
+	memcpy(&ntuple->m_u, &fsp->m_u, sizeof(fsp->m_u));
+	for (i = 0; i < sizeof(fsp->m_u); i++)
+		ntuple->m_u.hdata[i] ^= 0xFF;
+
+	/* copy extended fields */
+	if (fsp->flow_type & FLOW_EXT) {
+		ntuple->vlan_tag =
+			ntohs(fsp->h_ext.vlan_tci);
+		ntuple->vlan_tag_mask =
+			~ntohs(fsp->m_ext.vlan_tci);
+		if (fsp->m_ext.vlan_etype) {
+			/*
+			 * vlan_etype and user data are mutually exclusive
+			 * in ntuple configuration as they occupy the same
+			 * space.
+			 */
+			if (fsp->m_ext.data[0] || fsp->m_ext.data[1])
+				return -1;
+			ntuple->data =
+				ntohl(fsp->h_ext.vlan_etype);
+			ntuple->data_mask =
+				~(u64)ntohl(fsp->m_ext.vlan_etype);
+		} else {
+			ntuple->data =
+				(u64)ntohl(fsp->h_ext.data[0]) << 32;
+			ntuple->data |=
+				(u64)ntohl(fsp->h_ext.data[1]);
+			ntuple->data_mask =
+				(u64)ntohl(~fsp->m_ext.data[0]) << 32;
+			ntuple->data_mask |=
+				(u64)ntohl(~fsp->m_ext.data[1]);
+		}
+	}
+
+	return 0;
+}
+
 static int do_srxntuple(int fd, struct ifreq *ifr)
 {
+	struct ethtool_rx_ntuple ntuplecmd;
+	struct ethtool_value eval;
 	int err;
 
-	if (sntuple_changed) {
-		struct ethtool_rx_ntuple ntuplecmd;
+	/* attempt to convert the flow classifier to an ntuple classifier */
+	err = flow_spec_to_ntuple(&rx_rule_fs, &ntuplecmd.fs);
+	if (err)
+		return -1;
 
-		ntuplecmd.cmd = ETHTOOL_SRXNTUPLE;
-		memcpy(&ntuplecmd.fs, &ntuple_fs,
-		       sizeof(struct ethtool_rx_ntuple_flow_spec));
+	/*
+	 * Check to see if the flag is set for N-tuple, this allows
+	 * us to avoid the possible EINVAL response for the N-tuple
+	 * flag not being set on the device
+	 */
+	eval.cmd = ETHTOOL_GFLAGS;
+	ifr->ifr_data = (caddr_t)&eval;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err || !(eval.data & ETH_FLAG_NTUPLE))
+		return -1;
 
-		ifr->ifr_data = (caddr_t)&ntuplecmd;
-		err = ioctl(fd, SIOCETHTOOL, ifr);
-		if (err < 0)
-			perror("Cannot add new RX n-tuple filter");
+	/* send rule via N-tuple */
+	ntuplecmd.cmd = ETHTOOL_SRXNTUPLE;
+	ifr->ifr_data = (caddr_t)&ntuplecmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+
+	/*
+	 * Display error only if reponse is something other than op not
+	 * supported.  It is possible that the interface uses the network
+	 * flow classifier interface instead of N-tuple. 
+	 */ 
+	if (err && errno != EOPNOTSUPP)
+		perror("Cannot add new rule via N-tuple");
+
+	return err;
+}
+
+static int do_srxclsrule(int fd, struct ifreq *ifr)
+{
+	int err;
+
+	if (rx_class_rule_added) {
+		/* attempt to add rule via N-tuple specifier */
+		err = do_srxntuple(fd, ifr);
+		if (!err)
+			return 0;
+
+		/* attempt to add rule via network flow classifier */
+		err = rxclass_rule_ins(fd, ifr, &rx_rule_fs);
+		if (err < 0) {
+			fprintf(stderr, "Cannot insert"
+				" classification rule\n");
+			return 1;
+		}
+	} else if (rx_class_rule_del >= 0) {
+		err = rxclass_rule_del(fd, ifr, rx_class_rule_del);
+
+		if (err < 0) {
+			fprintf(stderr, "Cannot delete"
+				" classification rule\n");
+			return 1;
+		}
 	} else {
 		exit_bad_args();
 	}
@@ -3177,9 +3208,32 @@ static int do_srxntuple(int fd, struct ifreq *ifr)
 	return 0;
 }
 
-static int do_grxntuple(int fd, struct ifreq *ifr)
+static int do_grxclsrule(int fd, struct ifreq *ifr)
 {
-	return 0;
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	if (rx_class_rule_get >= 0) {
+		err = rxclass_rule_get(fd, ifr, rx_class_rule_get);
+		if (err < 0)
+			fprintf(stderr, "Cannot get RX classification rule\n");
+		return err ? 1 : 0;
+	}
+
+	nfccmd.cmd = ETHTOOL_GRXRINGS;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0)
+		perror("Cannot get RX rings");
+	else
+		fprintf(stdout, "%d RX rings available\n",
+			(int)nfccmd.data);
+
+	err = rxclass_rule_getall(fd, ifr);
+	if (err < 0)
+		fprintf(stderr, "RX classification rule retrieval failed\n");
+
+	return err ? 1 : 0;
 }
 
 static int send_ioctl(int fd, struct ifreq *ifr)
diff --git a/rxclass.c b/rxclass.c
new file mode 100644
index 0000000..ee486f7
--- /dev/null
+++ b/rxclass.c
@@ -0,0 +1,1073 @@
+/*
+ * Copyright (C) 2008 Sun Microsystems, Inc. All rights reserved.
+ */
+#include <stdio.h>
+#include <stdint.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <linux/sockios.h>
+#include <arpa/inet.h>
+#include "ethtool-util.h"
+#include "ethtool-bitops.h"
+
+static void invert_flow_mask(struct ethtool_rx_flow_spec *fsp)
+{
+	size_t i;
+
+	for (i = 0; i < sizeof(fsp->m_u); i++)
+		fsp->m_u.hdata[i] ^= 0xFF;
+}
+
+static void rxclass_print_ipv4_rule(__be32 sip, __be32 sipm, __be32 dip,
+				    __be32 dipm, u8 tos, u8 tosm)
+{
+	char sip_str[INET_ADDRSTRLEN];
+	char sipm_str[INET_ADDRSTRLEN];
+	char dip_str[INET_ADDRSTRLEN];
+	char dipm_str[INET_ADDRSTRLEN];
+
+	fprintf(stdout,
+		"\tSrc IP addr: %s mask: %s\n"
+		"\tDest IP addr: %s mask: %s\n"
+		"\tTOS: 0x%x mask: 0x%x\n",
+		inet_ntop(AF_INET, &sip, sip_str, INET_ADDRSTRLEN),
+		inet_ntop(AF_INET, &sipm, sipm_str, INET_ADDRSTRLEN),
+		inet_ntop(AF_INET, &dip, dip_str, INET_ADDRSTRLEN),
+		inet_ntop(AF_INET, &dipm, dipm_str, INET_ADDRSTRLEN),
+		tos, tosm);
+}
+
+static void rxclass_print_nfc_spec_ext(struct ethtool_rx_flow_spec *fsp)
+{
+	u64 data, datam;
+	__u16 etype, etypem, tci, tcim;
+
+	if (!(fsp->flow_type & FLOW_EXT))
+		return;
+
+	etype = ntohs(fsp->h_ext.vlan_etype);
+	etypem = ntohs(~fsp->m_ext.vlan_etype);
+	tci = ntohs(fsp->h_ext.vlan_tci);
+	tcim = ntohs(~fsp->m_ext.vlan_tci);
+	data = (u64)ntohl(fsp->h_ext.data[0]) << 32;
+	data = (u64)ntohl(fsp->h_ext.data[1]);
+	datam = (u64)ntohl(~fsp->m_ext.data[0]) << 32;
+	datam |= (u64)ntohl(~fsp->m_ext.data[1]);
+
+	fprintf(stdout,
+		"\tVLAN EtherType: 0x%x mask: 0x%x\n"
+		"\tVLAN: 0x%x mask: 0x%x\n"
+		"\tUser-defined: 0x%llx mask: 0x%llx\n",
+		etype, etypem, tci, tcim, data, datam);
+}
+
+static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
+{
+	unsigned char	*smac, *smacm, *dmac, *dmacm;
+	__u32		flow_type;
+
+	if (fsp->location != RX_CLS_LOC_UNSPEC)
+		fprintf(stdout,	"Filter: %d\n", fsp->location);
+	else
+		fprintf(stdout,	"Filter: Unspecified\n");
+
+	flow_type = fsp->flow_type & ~FLOW_EXT;
+
+	invert_flow_mask(fsp);
+
+	switch (flow_type) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+		if (flow_type == TCP_V4_FLOW)
+			fprintf(stdout, "\tRule Type: TCP over IPv4\n");
+		else if (flow_type == UDP_V4_FLOW)
+			fprintf(stdout, "\tRule Type: UDP over IPv4\n");
+		else
+			fprintf(stdout, "\tRule Type: SCTP over IPv4\n");
+		rxclass_print_ipv4_rule(fsp->h_u.tcp_ip4_spec.ip4src,
+				     fsp->m_u.tcp_ip4_spec.ip4src,
+				     fsp->h_u.tcp_ip4_spec.ip4dst,
+				     fsp->m_u.tcp_ip4_spec.ip4dst,
+				     fsp->h_u.tcp_ip4_spec.tos,
+				     fsp->m_u.tcp_ip4_spec.tos);
+		fprintf(stdout,
+			"\tSrc port: %d mask: 0x%x\n"
+			"\tDest port: %d mask: 0x%x\n",
+			ntohs(fsp->h_u.tcp_ip4_spec.psrc),
+			ntohs(fsp->m_u.tcp_ip4_spec.psrc),
+			ntohs(fsp->h_u.tcp_ip4_spec.pdst),
+			ntohs(fsp->m_u.tcp_ip4_spec.pdst));
+		break;
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+		if (flow_type == AH_V4_FLOW)
+			fprintf(stdout, "\tRule Type: IPSEC AH over IPv4\n");
+		else
+			fprintf(stdout, "\tRule Type: IPSEC ESP over IPv4\n");
+		rxclass_print_ipv4_rule(fsp->h_u.ah_ip4_spec.ip4src,
+				     fsp->m_u.ah_ip4_spec.ip4src,
+				     fsp->h_u.ah_ip4_spec.ip4dst,
+				     fsp->m_u.ah_ip4_spec.ip4dst,
+				     fsp->h_u.ah_ip4_spec.tos,
+				     fsp->m_u.ah_ip4_spec.tos);
+		fprintf(stdout,
+			"\tSPI: %d mask: 0x%x\n",
+			ntohl(fsp->h_u.esp_ip4_spec.spi),
+			ntohl(fsp->m_u.esp_ip4_spec.spi));
+		break;
+	case IP_USER_FLOW:
+		fprintf(stdout, "\tRule Type: Raw IPv4\n");
+		rxclass_print_ipv4_rule(fsp->h_u.usr_ip4_spec.ip4src,
+				     fsp->m_u.usr_ip4_spec.ip4src,
+				     fsp->h_u.usr_ip4_spec.ip4dst,
+				     fsp->m_u.usr_ip4_spec.ip4dst,
+				     fsp->h_u.usr_ip4_spec.tos,
+				     fsp->m_u.usr_ip4_spec.tos);
+		fprintf(stdout,
+			"\tProtocol: %d mask: 0x%x\n"
+			"\tL4 bytes: 0x%x mask: 0x%x\n",
+			fsp->h_u.usr_ip4_spec.proto,
+			fsp->m_u.usr_ip4_spec.proto,
+			ntohl(fsp->h_u.usr_ip4_spec.l4_4_bytes),
+			ntohl(fsp->m_u.usr_ip4_spec.l4_4_bytes));
+		break;
+	case ETHER_FLOW:
+		dmac = fsp->h_u.ether_spec.h_dest;
+		dmacm = fsp->m_u.ether_spec.h_dest;
+		smac = fsp->h_u.ether_spec.h_source;
+		smacm = fsp->m_u.ether_spec.h_source;
+
+		fprintf(stdout,
+			"\tFlow Type: Raw Ethernet\n"
+			"\tSrc MAC addr: %02X:%02X:%02X:%02X:%02X:%02X"
+			" mask: %02X:%02X:%02X:%02X:%02X:%02X\n"
+			"\tDest MAC addr: %02X:%02X:%02X:%02X:%02X:%02X"
+			" mask: %02X:%02X:%02X:%02X:%02X:%02X\n"
+			"\tEthertype: 0x%X mask: 0x%X\n",
+			smac[0], smac[1], smac[2], smac[3], smac[4], smac[5],
+			smacm[0], smacm[1], smacm[2], smacm[3], smacm[4],
+			smacm[5], dmac[0], dmac[1], dmac[2], dmac[3], dmac[4],
+			dmac[5], dmacm[0], dmacm[1], dmacm[2], dmacm[3],
+			dmacm[4], dmacm[5],
+			ntohs(fsp->h_u.ether_spec.h_proto),
+			ntohs(fsp->m_u.ether_spec.h_proto));
+		break;
+	default:
+		fprintf(stdout,
+			"\tUnknown Flow type: %d\n", flow_type);
+		break;
+	}
+
+	rxclass_print_nfc_spec_ext(fsp);
+
+	if (fsp->ring_cookie != RX_CLS_FLOW_DISC)
+		fprintf(stdout, "\tAction: Direct to queue %llu\n",
+			fsp->ring_cookie);
+	else
+		fprintf(stdout, "\tAction: Drop\n");
+
+	fprintf(stdout, "\n");
+}
+
+static void rxclass_print_rule(struct ethtool_rx_flow_spec *fsp)
+{
+	/* print the rule in this location */
+	switch (fsp->flow_type & ~FLOW_EXT) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+	case ETHER_FLOW:
+		rxclass_print_nfc_rule(fsp);
+		break;
+	case IP_USER_FLOW:
+		if (fsp->h_u.usr_ip4_spec.ip_ver == ETH_RX_NFC_IP4) {
+			rxclass_print_nfc_rule(fsp);
+			break;
+		}
+		/* IPv6 User Flow falls through to the case below */
+	case TCP_V6_FLOW:
+	case UDP_V6_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V6_FLOW:
+		fprintf(stderr, "IPv6 flows not implemented\n");
+		break;
+	default:
+		fprintf(stderr, "rxclass: Unknown flow type\n");
+		break;
+	}
+}
+
+static int rxclass_get_count(int fd, struct ifreq *ifr, __u32 *count)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* request count and store */
+	nfccmd.cmd = ETHTOOL_GRXCLSRLCNT;
+	nfccmd.rule_cnt = 0;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	*count = nfccmd.rule_cnt;
+	if (err < 0)
+		perror("rxclass: Cannot get RX class rule count");
+
+	return err;
+}
+
+int rxclass_rule_get(int fd, struct ifreq *ifr, __u32 loc)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* fetch rule from netdev */
+	nfccmd.cmd = ETHTOOL_GRXCLSRULE;
+	memset(&nfccmd.fs, 0, sizeof(struct ethtool_rx_flow_spec));
+	nfccmd.fs.location = loc;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rxclass: Cannot get RX class rule");
+		return err;
+	}
+
+	/* display rule */
+	rxclass_print_rule(&nfccmd.fs);
+	return err;
+}
+
+int rxclass_rule_getall(int fd, struct ifreq *ifr)
+{
+	struct ethtool_rxnfc *nfccmd;
+	__u32 *rule_locs;
+	int err, i;
+	__u32 count;
+
+	/* determine rule count */
+	err = rxclass_get_count(fd, ifr, &count);
+	if (err < 0)
+		return err;
+
+	fprintf(stdout, "Total %d rules\n\n", count);
+
+	/* alloc memory for request of location list */
+	nfccmd = calloc(1, sizeof(*nfccmd) + (count * sizeof(__u32)));
+	if (!nfccmd) {
+		perror("rxclass: Cannot allocate memory for"
+		       " RX class rule locations");
+		return -ENOMEM;
+	}
+
+	/* request location list */
+	nfccmd->cmd = ETHTOOL_GRXCLSRLALL;
+	nfccmd->rule_cnt = count;
+	ifr->ifr_data = (caddr_t)nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rxclass: Cannot get RX class rules");
+		free(nfccmd);
+		return err;
+	}
+
+	/* write locations to bitmap */
+	rule_locs = nfccmd->rule_locs;
+	for (i = 0; i < count; i++) {
+		err = rxclass_rule_get(fd, ifr, rule_locs[i]);
+		if (err < 0)
+			break;
+	}
+
+	/* free memory and set flag to avoid reinit */
+	free(nfccmd);
+
+	return err;
+}
+
+/*
+ * This is a simple rule manager implementation for ordering rx flow
+ * classification rules based on newest rules being first in the list.
+ * The assumption is that this rule manager is the only one adding rules to
+ * the device's hardware classifier.
+ */
+
+struct rmgr_ctrl {
+	/* slot contains a bitmap indicating which filters are valid */
+	unsigned long		*slot;
+	__u32			n_rules;
+	__u32			size;
+};
+
+static struct rmgr_ctrl rmgr;
+static int rmgr_init_done = 0;
+
+static int rmgr_ins(__u32 loc)
+{
+	/* verify location is in rule manager range */
+	if (loc >= rmgr.size) {
+		fprintf(stderr, "rmgr: Location out of range\n");
+		return -1;
+	}
+
+	/* set bit for the rule */
+	set_bit(loc, rmgr.slot);
+
+	return 0;
+}
+
+static int rmgr_find_empty_slot(struct ethtool_rx_flow_spec *fsp)
+{
+	__u32 loc;
+	__u32 slot_num;
+
+	/* start at the end of the list since it is lowest priority */
+	loc = rmgr.size - 1;
+
+	/* locate the first slot a rule can be placed in */
+	slot_num = loc / BITS_PER_LONG;
+
+	/*
+	 * Avoid testing individual bits by inverting the word and checking
+	 * to see if any bits are left set, if so there are empty spots.  By
+	 * moving 1 + loc % BITS_PER_LONG we align ourselves to the last bit
+	 * in the previous word.
+	 *
+	 * If loc rolls over it should be greater than or equal to rmgr.size
+	 * and as such we know we have reached the end of the list.
+	 */
+	if (!~(rmgr.slot[slot_num] | (~1UL << rmgr.size % BITS_PER_LONG))) {
+		loc -= 1 + (loc % BITS_PER_LONG);
+		slot_num--;
+	}
+
+	/*
+	 * Now that we are aligned with the last bit in each long we can just
+	 * go though and eliminate all the longs with no free bits
+	 */
+	while (loc < rmgr.size && !~(rmgr.slot[slot_num])) {
+		loc -= BITS_PER_LONG;
+		slot_num--;
+	}
+
+	/*
+	 * If we still are inside the range, test individual bits as one is
+	 * likely available for our use.
+	 */
+	while (loc < rmgr.size && test_bit(loc, rmgr.slot))
+		loc--;
+
+	/* location found, insert rule */
+	if (loc < rmgr.size) {
+		fsp->location = loc;
+		return rmgr_ins(loc);
+	}
+
+	/* No space to add this rule */
+	fprintf(stderr, "rmgr: Cannot find appropriate slot to insert rule\n");
+
+	return -1;
+}
+
+static int rmgr_init(int fd, struct ifreq *ifr)
+{
+	struct ethtool_rxnfc *nfccmd;
+	int err, i;
+	__u32 *rule_locs;
+
+	if (rmgr_init_done)
+		return 0;
+
+	/* clear rule manager settings */
+	memset(&rmgr, 0, sizeof(struct rmgr_ctrl));
+
+	/* request count and store in rmgr.n_rules */
+	err = rxclass_get_count(fd, ifr, &rmgr.n_rules);
+	if (err < 0)
+		return err;
+
+	/* alloc memory for request of location list */
+	nfccmd = calloc(1, sizeof(*nfccmd) + (rmgr.n_rules * sizeof(__u32)));
+	if (!nfccmd) {
+		perror("rmgr: Cannot allocate memory for"
+		       " RX class rule locations");
+		return -1;
+	}
+
+	/* request location list */
+	nfccmd->cmd = ETHTOOL_GRXCLSRLALL;
+	nfccmd->rule_cnt = rmgr.n_rules;
+	ifr->ifr_data = (caddr_t)nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0) {
+		perror("rmgr: Cannot get RX class rules");
+		free(nfccmd);
+		return err;
+	}
+
+	/* make certain the table size is valid */
+	rmgr.size = nfccmd->data;
+	if (rmgr.size == 0 || rmgr.size < rmgr.n_rules) {
+		perror("rmgr: Invalid RX class rules table size");
+		return -1;
+	}
+
+	/* initialize bitmap for storage of valid locations */
+	rmgr.slot = calloc(1, BITS_TO_LONGS(rmgr.size) * sizeof(long));
+	if (!rmgr.slot) {
+		perror("rmgr: Cannot allocate memory for RX class rules");
+		return -1;
+	}
+
+	/* write locations to bitmap */
+	rule_locs = nfccmd->rule_locs;
+	for (i = 0; i < rmgr.n_rules; i++) {
+		err = rmgr_ins(rule_locs[i]);
+		if (err < 0)
+			break;
+	}
+
+	/* free memory and set flag to avoid reinit */
+	free(nfccmd);
+	rmgr_init_done = 1;
+
+	return err;
+}
+
+static void rmgr_cleanup(void)
+{
+	if (!rmgr_init_done)
+		return;
+
+	rmgr_init_done = 0;
+
+	free(rmgr.slot);
+	rmgr.slot = NULL;
+	rmgr.size = 0;
+}
+
+static int rmgr_set_location(int fd, struct ifreq *ifr,
+			     struct ethtool_rx_flow_spec *fsp)
+{
+	int err;
+
+	/* init table of available rules */
+	err = rmgr_init(fd, ifr);
+	if (err < 0)
+		return err;
+
+	/* verify rule location */
+	err = rmgr_find_empty_slot(fsp);
+
+	/* cleanup table and free resources */
+	rmgr_cleanup();
+
+	return err;
+}
+
+int rxclass_rule_ins(int fd, struct ifreq *ifr,
+		     struct ethtool_rx_flow_spec *fsp)
+{
+	struct ethtool_rxnfc nfccmd;
+	__u32 loc = fsp->location;
+	int err;
+
+	/*
+	 * if location is unspecified pull rules from device
+	 * and allocate a free rule for our use
+	 */
+	if (loc == RX_CLS_LOC_UNSPEC) {
+		err = rmgr_set_location(fd, ifr, fsp);
+		if (err < 0)
+			return err;
+	}
+
+	/* notify netdev of new rule */
+	nfccmd.cmd = ETHTOOL_SRXCLSRLINS;
+	nfccmd.fs = *fsp;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0)
+		perror("rmgr: Cannot insert RX class rule");
+	else if (loc == RX_CLS_LOC_UNSPEC)
+		printf("Added rule with ID %d\n", fsp->location);
+
+	return 0;
+}
+
+int rxclass_rule_del(int fd, struct ifreq *ifr, __u32 loc)
+{
+	struct ethtool_rxnfc nfccmd;
+	int err;
+
+	/* notify netdev of rule removal */
+	nfccmd.cmd = ETHTOOL_SRXCLSRLDEL;
+	nfccmd.fs.location = loc;
+	ifr->ifr_data = (caddr_t)&nfccmd;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err < 0)
+		perror("rmgr: Cannot delete RX class rule");
+
+	return err;
+}
+
+typedef enum {
+	OPT_NONE = 0,
+	OPT_S32,
+	OPT_U8,
+	OPT_U16,
+	OPT_U32,
+	OPT_U64,
+	OPT_BE16,
+	OPT_BE32,
+	OPT_BE64,
+	OPT_IP4,
+	OPT_MAC,
+} rule_opt_type_t;
+
+#define NFC_FLAG_RING		0x001
+#define NFC_FLAG_LOC		0x002
+#define NFC_FLAG_SADDR		0x004
+#define NFC_FLAG_DADDR		0x008
+#define NFC_FLAG_SPORT		0x010
+#define NFC_FLAG_DPORT		0x020
+#define NFC_FLAG_SPI		0x030
+#define NFC_FLAG_TOS		0x040
+#define NFC_FLAG_PROTO		0x080
+#define NTUPLE_FLAG_VLAN	0x100
+#define NTUPLE_FLAG_UDEF	0x200
+#define NTUPLE_FLAG_VETH	0x400
+
+struct rule_opts {
+	const char	*name;
+	rule_opt_type_t	type;
+	u32		flag;
+	int		offset;
+	int		moffset;
+};
+
+static struct rule_opts rule_nfc_tcp_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.tos) },
+	{ "src-port", OPT_BE16, NFC_FLAG_SPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.psrc),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.psrc) },
+	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.pdst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.pdst) },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+};
+
+static struct rule_opts rule_nfc_esp_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.tos) },
+	{ "spi", OPT_BE32, NFC_FLAG_SPI,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.spi),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.spi) },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+};
+
+static struct rule_opts rule_nfc_usr_ip4[] = {
+	{ "src-ip", OPT_IP4, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.ip4src),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.ip4src) },
+	{ "dst-ip", OPT_IP4, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.ip4dst),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.ip4dst) },
+	{ "tos", OPT_U8, NFC_FLAG_TOS,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.tos),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.tos) },
+	{ "l4proto", OPT_U8, NFC_FLAG_PROTO,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.proto),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.proto) },
+	{ "spi", OPT_BE32, NFC_FLAG_SPI,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) },
+	{ "src-port", OPT_BE16, NFC_FLAG_SPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) },
+	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes) + 2,
+	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) + 2 },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+};
+
+static struct rule_opts rule_nfc_ether[] = {
+	{ "src", OPT_MAC, NFC_FLAG_SADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ether_spec.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ether_spec.h_dest) },
+	{ "dst", OPT_MAC, NFC_FLAG_DADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ether_spec.h_source),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ether_spec.h_source) },
+	{ "proto", OPT_BE16, NFC_FLAG_PROTO,
+	  offsetof(struct ethtool_rx_flow_spec, h_u.ether_spec.h_proto),
+	  offsetof(struct ethtool_rx_flow_spec, m_u.ether_spec.h_proto) },
+	{ "action", OPT_U64, NFC_FLAG_RING,
+	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
+	{ "loc", OPT_U32, NFC_FLAG_LOC,
+	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
+	{ "vlan-etype", OPT_BE16, NTUPLE_FLAG_VETH,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_etype),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_etype) },
+	{ "vlan", OPT_BE16, NTUPLE_FLAG_VLAN,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.vlan_tci),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.vlan_tci) },
+	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+};
+
+static int rxclass_get_long(char *str, long long *val, int size)
+{
+	long long max = ~0ULL >> (65 - size);
+	char *endp;
+
+	errno = 0;
+
+	*val = strtoll(str, &endp, 0);
+
+	if (*endp || errno || (*val > max) || (*val < ~max))
+		return -1;
+
+	return 0;
+}
+
+static int rxclass_get_ulong(char *str, unsigned long long *val, int size)
+{
+	long long max = ~0ULL >> (64 - size);
+	char *endp;
+
+	errno = 0;
+
+	*val = strtoull(str, &endp, 0);
+
+	if (*endp || errno || (*val > max))
+		return -1;
+
+	return 0;
+}
+
+static int rxclass_get_ipv4(char *str, __be32 *val)
+{
+	if (!inet_pton(AF_INET, str, val))
+		return -1;
+
+	return 0;
+}
+
+static int rxclass_get_ether(char *str, unsigned char *val)
+{
+	unsigned int buf[ETH_ALEN];
+	int count;
+
+	if (!strchr(str, ':'))
+		return -1;
+
+	count = sscanf(str, "%2x:%2x:%2x:%2x:%2x:%2x",
+		       &buf[0], &buf[1], &buf[2],
+		       &buf[3], &buf[4], &buf[5]);
+
+	if (count != ETH_ALEN)
+		return -1;
+
+	do {
+		count--;
+		val[count] = buf[count];
+	} while (count);
+
+	return 0;
+}
+
+static int rxclass_get_val(char *str, unsigned char *p, u32 *flags,
+			   const struct rule_opts *opt)
+{
+	unsigned long long mask = ~0ULL;
+	int err = 0;
+
+	if (*flags & opt->flag)
+		return -1;
+
+	*flags |= opt->flag;
+
+	switch (opt->type) {
+	case OPT_S32: {
+		long long val;
+		err = rxclass_get_long(str, &val, 32);
+		if (err)
+			return -1;
+		*(int *)&p[opt->offset] = (int)val;
+		if (opt->moffset >= 0)
+			*(int *)&p[opt->moffset] = (int)mask;
+		break;
+	}
+	case OPT_U8: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 8);
+		if (err)
+			return -1;
+		*(u8 *)&p[opt->offset] = (u8)val;
+		if (opt->moffset >= 0)
+			*(u8 *)&p[opt->moffset] = (u8)mask;
+		break;
+	}
+	case OPT_U16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(u16 *)&p[opt->offset] = (u16)val;
+		if (opt->moffset >= 0)
+			*(u16 *)&p[opt->moffset] = (u16)mask;
+		break;
+	}
+	case OPT_U32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(u32 *)&p[opt->offset] = (u32)val;
+		if (opt->moffset >= 0)
+			*(u32 *)&p[opt->moffset] = (u32)mask;
+		break;
+	}
+	case OPT_U64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(u64 *)&p[opt->offset] = (u64)val;
+		if (opt->moffset >= 0)
+			*(u64 *)&p[opt->moffset] = (u64)mask;
+		break;
+	}
+	case OPT_BE16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(__be16 *)&p[opt->offset] = htons((u16)val);
+		if (opt->moffset >= 0)
+			*(__be16 *)&p[opt->moffset] = (__be16)mask;
+		break;
+	}
+	case OPT_BE32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->offset] = htonl((u32)val);
+		if (opt->moffset >= 0)
+			*(__be32 *)&p[opt->moffset] = (__be32)mask;
+		break;
+	}
+	case OPT_BE64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(__be64 *)&p[opt->offset] = htonll((u64)val);
+		if (opt->moffset >= 0)
+			*(__be64 *)&p[opt->moffset] = (__be64)mask;
+		break;
+	}
+	case OPT_IP4: {
+		__be32 val;
+		err = rxclass_get_ipv4(str, &val);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->offset] = val;
+		if (opt->moffset >= 0)
+			*(__be32 *)&p[opt->moffset] = (__be32)mask;
+		break;
+	}
+	case OPT_MAC: {
+		unsigned char val[ETH_ALEN];
+		err = rxclass_get_ether(str, val);
+		if (err)
+			return -1;
+		memcpy(&p[opt->offset], val, ETH_ALEN);
+		if (opt->moffset >= 0)
+			memcpy(&p[opt->moffset], &mask, ETH_ALEN);
+		break;
+	}
+	case OPT_NONE:
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static int rxclass_get_mask(char *str, unsigned char *p,
+			    const struct rule_opts *opt)
+{
+	int err = 0;
+
+	if (opt->moffset < 0)
+		return -1;
+
+	switch (opt->type) {
+	case OPT_S32: {
+		long long val;
+		err = rxclass_get_long(str, &val, 32);
+		if (err)
+			return -1;
+		*(int *)&p[opt->moffset] = ~(int)val;
+		break;
+	}
+	case OPT_U8: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 8);
+		if (err)
+			return -1;
+		*(u8 *)&p[opt->moffset] = ~(u8)val;
+		break;
+	}
+	case OPT_U16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(u16 *)&p[opt->moffset] = ~(u16)val;
+		break;
+	}
+	case OPT_U32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(u32 *)&p[opt->moffset] = ~(u32)val;
+		break;
+	}
+	case OPT_U64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(u64 *)&p[opt->moffset] = ~(u64)val;
+		break;
+	}
+	case OPT_BE16: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 16);
+		if (err)
+			return -1;
+		*(__be16 *)&p[opt->moffset] = ~htons((u16)val);
+		break;
+	}
+	case OPT_BE32: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 32);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->moffset] = ~htonl((u32)val);
+		break;
+	}
+	case OPT_BE64: {
+		unsigned long long val;
+		err = rxclass_get_ulong(str, &val, 64);
+		if (err)
+			return -1;
+		*(__be64 *)&p[opt->moffset] = ~htonll((u64)val);
+		break;
+	}
+	case OPT_IP4: {
+		__be32 val;
+		err = rxclass_get_ipv4(str, &val);
+		if (err)
+			return -1;
+		*(__be32 *)&p[opt->moffset] = ~val;
+		break;
+	}
+	case OPT_MAC: {
+		unsigned char val[ETH_ALEN];
+		int i;
+		err = rxclass_get_ether(str, val);
+		if (err)
+			return -1;
+
+		for (i = 0; i < ETH_ALEN; i++)
+			val[i] = ~val[i];
+
+		memcpy(&p[opt->moffset], val, ETH_ALEN);
+		break;
+	}
+	case OPT_NONE:
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+int rxclass_parse_ruleopts(char **argp, int argc,
+			   struct ethtool_rx_flow_spec *fsp)
+{
+	const struct rule_opts *options;
+	unsigned char *p = (unsigned char *)fsp;
+	int i = 0, n_opts, err;
+	u32 flags = 0;
+	int flow_type;
+
+	if (argc < 1)
+		goto syntax_err;
+
+	if (!strcmp(argp[0], "tcp4"))
+		flow_type = TCP_V4_FLOW;
+	else if (!strcmp(argp[0], "udp4"))
+		flow_type = UDP_V4_FLOW;
+	else if (!strcmp(argp[0], "sctp4"))
+		flow_type = SCTP_V4_FLOW;
+	else if (!strcmp(argp[0], "ah4"))
+		flow_type = AH_V4_FLOW;
+	else if (!strcmp(argp[0], "esp4"))
+		flow_type = ESP_V4_FLOW;
+	else if (!strcmp(argp[0], "ip4"))
+		flow_type = IP_USER_FLOW;
+	else if (!strcmp(argp[0], "ether"))
+		flow_type = ETHER_FLOW;
+	else
+		goto syntax_err;
+
+	switch (flow_type) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+	case SCTP_V4_FLOW:
+		options = rule_nfc_tcp_ip4;
+		n_opts = ARRAY_SIZE(rule_nfc_tcp_ip4);
+		break;
+	case AH_V4_FLOW:
+	case ESP_V4_FLOW:
+		options = rule_nfc_esp_ip4;
+		n_opts = ARRAY_SIZE(rule_nfc_esp_ip4);
+		break;
+	case IP_USER_FLOW:
+		options = rule_nfc_usr_ip4;
+		n_opts = ARRAY_SIZE(rule_nfc_usr_ip4);
+		break;
+	case ETHER_FLOW:
+		options = rule_nfc_ether;
+		n_opts = ARRAY_SIZE(rule_nfc_ether);
+		break;
+	default:
+		fprintf(stderr, "Add rule, invalid rule type[%s]\n", argp[0]);
+		return -1;
+	}
+
+	memset(p, 0, sizeof(*fsp));
+	fsp->flow_type = flow_type;
+	fsp->location = RX_CLS_LOC_UNSPEC;
+
+	for (i = 1; i < argc;) {
+		const struct rule_opts *opt;
+		int idx;
+		for (opt = options, idx = 0; idx < n_opts; idx++, opt++) {
+			char mask_name[16];
+
+			if (strcmp(argp[i], opt->name))
+				continue;
+
+			i++;
+			if (i >= argc)
+				break;
+
+			err = rxclass_get_val(argp[i], p, &flags, opt);
+			if (err) {
+				fprintf(stderr, "Invalid %s value[%s]\n",
+					opt->name, argp[i]);
+				return -1;
+			}
+
+			i++;
+			if (i >= argc)
+				break;
+
+			sprintf(mask_name, "%s-mask", opt->name);
+			if (strcmp(argp[i], "m") && strcmp(argp[i], mask_name))
+				break;
+
+			i++;
+			if (i >= argc)
+				goto syntax_err;
+
+			err = rxclass_get_mask(argp[i], p, opt);
+			if (err) {
+				fprintf(stderr, "Invalid %s mask[%s]\n",
+					opt->name, argp[i]);
+				return -1;
+			}
+
+			i++;
+
+			break;
+		}
+		if (idx == n_opts) {
+			fprintf(stdout, "Add rule, unrecognized option[%s]\n",
+				argp[i]);
+			return -1;
+		}
+	}
+
+	if (flags & (NTUPLE_FLAG_VLAN | NTUPLE_FLAG_UDEF | NTUPLE_FLAG_VETH))
+		fsp->flow_type |= FLOW_EXT;
+
+	return 0;
+
+syntax_err:
+	fprintf(stderr, "Add rule, invalid syntax\n");
+	return -1;
+}


^ permalink raw reply related

* [ethtool PATCH 1/4] ethtool: remove strings based approach for displaying n-tuple
From: Alexander Duyck @ 2011-05-04 18:41 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, dm, bhutchings; +Cc: netdev
In-Reply-To: <20110504183752.26394.29231.stgit@gitlad.jf.intel.com>

This change is meant to remove the strings based approach for displaying
n-tuple filters.  A follow-on patch will replace that functionality with a
network flow classification based approach that will get the number of
filters, get their locations, and then request and display them
individually.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 ethtool.c |   44 --------------------------------------------
 1 files changed, 0 insertions(+), 44 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index cfdac65..9ad7000 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3194,50 +3194,6 @@ static int do_srxntuple(int fd, struct ifreq *ifr)
 
 static int do_grxntuple(int fd, struct ifreq *ifr)
 {
-	struct ethtool_sset_info *sset_info;
-	struct ethtool_gstrings *strings;
-	int sz_str, n_strings, err, i;
-
-	sset_info = malloc(sizeof(struct ethtool_sset_info) + sizeof(u32));
-	sset_info->cmd = ETHTOOL_GSSET_INFO;
-	sset_info->sset_mask = (1ULL << ETH_SS_NTUPLE_FILTERS);
-	ifr->ifr_data = (caddr_t)sset_info;
-	err = send_ioctl(fd, ifr);
-
-	if ((err < 0) ||
-	    (!(sset_info->sset_mask & (1ULL << ETH_SS_NTUPLE_FILTERS)))) {
-		perror("Cannot get driver strings info");
-		return 100;
-	}
-
-	n_strings = sset_info->data[0];
-	free(sset_info);
-	sz_str = n_strings * ETH_GSTRING_LEN;
-
-	strings = calloc(1, sz_str + sizeof(struct ethtool_gstrings));
-	if (!strings) {
-		fprintf(stderr, "no memory available\n");
-		return 95;
-	}
-
-	strings->cmd = ETHTOOL_GRXNTUPLE;
-	strings->string_set = ETH_SS_NTUPLE_FILTERS;
-	strings->len = n_strings;
-	ifr->ifr_data = (caddr_t) strings;
-	err = send_ioctl(fd, ifr);
-	if (err < 0) {
-		perror("Cannot get Rx n-tuple information");
-		free(strings);
-		return 101;
-	}
-
-	n_strings = strings->len;
-	fprintf(stdout, "Rx n-tuple filters:\n");
-	for (i = 0; i < n_strings; i++)
-		fprintf(stdout, "%s", &strings->data[i * ETH_GSTRING_LEN]);
-
-	free(strings);
-
 	return 0;
 }
 


^ permalink raw reply related

* [ethtool PATCH 3/4] Add support for __be64 and bitops, centralize several needed macros
From: Alexander Duyck @ 2011-05-04 18:41 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher, dm, bhutchings; +Cc: netdev
In-Reply-To: <20110504183752.26394.29231.stgit@gitlad.jf.intel.com>

This change is meant to add support for __be64 values and bitops to
ethtool.  In addition the patch pulls the SIOCETHTOOL define and the
ARRAY_SIZE define into ethtool-util.h for later use by the rxclass files.
These changes will be needed in order to support network flow
classifier rule configuration.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 ethtool-bitops.h |   25 +++++++++++++++++++++++++
 ethtool-util.h   |   16 ++++++++++++++--
 2 files changed, 39 insertions(+), 2 deletions(-)
 create mode 100644 ethtool-bitops.h

diff --git a/ethtool-bitops.h b/ethtool-bitops.h
new file mode 100644
index 0000000..b1eb426
--- /dev/null
+++ b/ethtool-bitops.h
@@ -0,0 +1,25 @@
+#ifndef ETHTOOL_BITOPS_H__
+#define ETHTOOL_BITOPS_H__
+
+#define BITS_PER_BYTE		8
+#define BITS_PER_LONG		(BITS_PER_BYTE * sizeof(long))
+#define DIV_ROUND_UP(n, d)	(((n) + (d) - 1) / (d))
+#define BITS_TO_LONGS(nr)	DIV_ROUND_UP(nr, BITS_PER_LONG)
+
+static inline void set_bit(int nr, unsigned long *addr)
+{
+	addr[nr / BITS_PER_LONG] |= 1UL << (nr % BITS_PER_LONG);
+}
+
+static inline void clear_bit(int nr, unsigned long *addr)
+{
+	addr[nr / BITS_PER_LONG] &= ~(1UL << (nr % BITS_PER_LONG));
+}
+
+static inline int test_bit(unsigned int nr, const unsigned long *addr)
+{
+	return !!((1UL << (nr % BITS_PER_LONG)) &
+		  (((unsigned long *)addr)[nr / BITS_PER_LONG]));
+}
+
+#endif
diff --git a/ethtool-util.h b/ethtool-util.h
index 6a4f3f4..d8b621c 100644
--- a/ethtool-util.h
+++ b/ethtool-util.h
@@ -15,6 +15,7 @@
 #ifndef HAVE_BE_TYPES
 typedef __uint16_t __be16;
 typedef __uint32_t __be32;
+typedef unsigned long long __be64;
 #endif
 
 typedef unsigned long long u64;
@@ -28,11 +29,15 @@ typedef __int32_t s32;
 #if __BYTE_ORDER == __BIG_ENDIAN
 static inline u16 cpu_to_be16(u16 value)
 {
-    return value;
+	return value;
 }
 static inline u32 cpu_to_be32(u32 value)
 {
-    return value;
+	return value;
+}
+static inline u64 cpu_to_be64(u64 value)
+{
+	return value;
 }
 #else
 static inline u16 cpu_to_be16(u16 value)
@@ -43,8 +48,15 @@ static inline u32 cpu_to_be32(u32 value)
 {
 	return cpu_to_be16(value >> 16) | (cpu_to_be16(value) << 16);
 }
+static inline u64 cpu_to_be64(u64 value)
+{
+	return cpu_to_be32(value >> 32) | ((u64)cpu_to_be32(value) << 32);
+}
 #endif
 
+#define ntohll cpu_to_be64
+#define htonll cpu_to_be64
+
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 #endif


^ permalink raw reply related

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Ben Hutchings @ 2011-05-04 18:45 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Dimitris Michailidis, davem@davemloft.net, Kirsher, Jeffrey T,
	netdev@vger.kernel.org
In-Reply-To: <4DC19912.3000803@intel.com>

On Wed, 2011-05-04 at 11:21 -0700, Alexander Duyck wrote:
[...]
> Honestly what I would prefer to see is a seperate call added such as an 
> ETHTOOL_GRSCLSRLLOC that we could pass the flow specifier to and perhaps 
> include first/last location call in that and then let the driver return 
> where it wants to drop the rule.

This must not be done as a separate operation because it's racy (in fact
that's an inherent problem with the rule manager).  In the sfc driver
(and probably others in future) filters could be inserted for RFS at any
time.

> That way we can avoid having to create 
> an overly complicated rule manager that can handle all the bizarre rule 
> ordering options that I am sure all the different network devices support.

Right, the rule manager can't implement that.

> The only reason I am not implementing this now is because there aren't 
> any drivers in place that would currently use it.  I figure once cxgb 
> has a means in place of supporting flow classifier rules then Dimitris 
> can add the necessary code to ethtool and the kernel to allow the driver 
> to specify rule locations.  I would prefer to avoid adding features 
> based on speculation of what will be needed and would like to be able to 
> actually see how the features will be used.

If you are going to implement the same interface in ixgbe as in niu
(modulo bugs), then I have no objection to going ahead with this and
then adding the option for driver-assigned locations later.

Please can you confirm that the location specified for
ETHTOOL_SRXCLSRLINS will indeed be used as a priority in case of
overlapping filters?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox