Netdev List
 help / color / mirror / Atom feed
* [PATCH v4] kptr_restrict for hiding kernel pointers
From: Dan Rosenberg @ 2010-12-18 21:41 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-security-module
  Cc: jmorris, eric.dumazet, tgraf, eugeneteo, kees.cook, mingo, davem,
	a.p.zijlstra, akpm, eparis

Add the %pK printk format specifier and
the /proc/sys/kernel/kptr_restrict sysctl.

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, if the current user (intended to
be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (which is
currently in the LSM tree), kernel pointers using %pK are printed as
0's.  If kptr_restrict is set to 2, kernel pointers using %pK are
printed as 0's regardless of privileges.  Replacing with 0's was chosen
over the default "(null)", which cannot be parsed by userland %p, which
expects "(nil)".


v4 incorporates Eric Paris' suggestion of using
has_capability_noaudit(), since failing this capability check is not a
policy violation but rather a code path choice and shouldn't generate
potentially excessive log noise.  Adjusted IRQ comment for clarity.

v3 adds the "2" setting, cleans up documentation, removes the CONFIG,
and incorporates changes and suggestions from Andrew Morton.

v2 improves checking for inappropriate context, on suggestion by Peter
Zijlstra.  Thanks to Thomas Graf for suggesting use of a centralized
format specifier.


Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Paris <eparis@parisplace.org>
---

 Documentation/sysctl/kernel.txt |   14 ++++++++++++++
 include/linux/kernel.h          |    2 ++
 kernel/sysctl.c                 |    9 +++++++++
 lib/vsprintf.c                  |   24 ++++++++++++++++++++++++
 4 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 209e158..3cff47e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
 - hotplug
 - java-appletviewer           [ binfmt_java, obsolete ]
 - java-interpreter            [ binfmt_java, obsolete ]
+- kptr_restrict
 - kstack_depth_to_print       [ X86 only ]
 - l2cr                        [ PPC only ]
 - modprobe                    ==> Documentation/debugging-modules.txt
@@ -261,6 +262,19 @@ This flag controls the L2 cache of G3 processor boards. If
 
 ==============================================================
 
+kptr_restrict:
+
+This toggle indicates whether restrictions are placed on
+exposing kernel addresses via /proc and other interfaces.  When
+kptr_restrict is set to (0), the default, there are no
+restrictions.  When kptr_restrict is set to (1), kernel pointers
+printed using the %pK format specifier will be replaced with 0's
+unless the user has CAP_SYSLOG.  When kptr_restrict is set to
+(2), kernel pointers printed using %pK will be replaced with 0's
+regardless of privileges.
+
+==============================================================
+
 kstack_depth_to_print: (X86 only)
 
 Controls the number of words to print when dumping the raw
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index b6de9a6..b4f4863 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,6 +201,8 @@ extern int sscanf(const char *, const char *, ...)
 extern int vsscanf(const char *, const char *, va_list)
 	__attribute__ ((format (scanf, 2, 0)));
 
+extern int kptr_restrict;	/* for sysctl */
+
 extern int get_option(char **str, int *pint);
 extern char *get_options(const char *str, int nints, int *ints);
 extern unsigned long long memparse(const char *ptr, char **retptr);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5abfa15..236fa91 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -713,6 +713,15 @@ static struct ctl_table kern_table[] = {
 	},
 #endif
 	{
+		.procname	= "kptr_restrict",
+		.data		= &kptr_restrict,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
 		.procname	= "ngroups_max",
 		.data		= &ngroups_max,
 		.maxlen		= sizeof (int),
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c150d3d..313f767 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -936,6 +936,8 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
 	return string(buf, end, uuid, spec);
 }
 
+int kptr_restrict;
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -979,6 +981,7 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
  *       Implements a "recursive vsnprintf".
  *       Do not use this feature without some mechanism to verify the
  *       correctness of the format string and va_list arguments.
+ * - 'K' For a kernel pointer that should be hidden from unprivileged users
  *
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
@@ -1035,6 +1038,27 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 		return buf + vsnprintf(buf, end - buf,
 				       ((struct va_format *)ptr)->fmt,
 				       *(((struct va_format *)ptr)->va));
+	case 'K':
+		/*
+		 * %pK cannot be used in IRQ context because its test
+		 * for CAP_SYSLOG would be meaningless.
+		 */
+		if (in_irq() || in_serving_softirq() || in_nmi())
+			WARN_ONCE(1, "%%pK used in interrupt context.\n");
+
+		if (!kptr_restrict)
+			break;		/* %pK does not obscure pointers */
+
+		if ((kptr_restrict != 2) &&
+		    has_capability_noaudit(current, CAP_SYSLOG))
+			break;		/* privileged apps expose pointers,
+					   unless kptr_restrict is 2 */
+
+		if (spec.field_width == -1) {
+			spec.field_width = 2 * sizeof(void *);
+			spec.flags |= ZEROPAD;
+		}
+		return number(buf, end, 0, spec);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {



^ permalink raw reply related

* Re: "x86: allocate space within a region top-down" causes bar0 access issue
From: Jon Mason @ 2010-12-18 22:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Ramkrishna Vepa
In-Reply-To: <201012171717.55300.bjorn.helgaas@hp.com>

On Fri, Dec 17, 2010 at 04:17:54PM -0800, Bjorn Helgaas wrote:
> On Friday, December 17, 2010 04:12:11 pm Jon Mason wrote:
> > On Fri, Dec 17, 2010 at 12:16:12PM -0800, Bjorn Helgaas wrote:
> > > On Friday, December 17, 2010 12:44:58 pm Jon Mason wrote:
> > > > The following patch is causing problem with the vxge driver/adapter on
> > > > HP x86-64 systems. Reads to bar0 to return 0xffffffffffffffff instead
> > > > of their intended value.  This prevents the vxge module from loading
> > > > by failing sanity checks in the driver for certain values in bar0.  We
> > > > are not seeing any issues with this patch on non-HP systems in our
> > > > lab.
> > > >
> > > > Can this patch be removed from 2.6.37 until a better solution can be
> > > > found?
> > >
> > > There were several issues related to that patch, and it's about to
> > > be reverted.  I am curious about the failure you're seeing, though,
> > > and I'd like to understand the cause and make sure it's one of the
> > > issues I've already investigated.
> > >
> > > Can you send me the complete dmesg log of a failing boot?
> >
> > Below is the dmesg of a failing system.
>
> Thanks.  This is interesting.  All the reported PCI windows are below 4GB:
>
> > ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> > pci_root PNP0A08:00: host bridge window [io  0x0000-0x0bff]
> > pci_root PNP0A08:00: host bridge window [io  0x0d00-0xffff]
> > pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
> > pci_root PNP0A08:00: host bridge window [mem 0x000d0000-0x000dffff]
> > pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xffffffff]
>
> But the BIOS configured many devices *above* 4GB (and they probably
> work fine there), so we complain about them, zero out their resources,
> then think they conflict with some PNP devices (which they really
> don't):
>
> > pci 0000:00:1f.3: reg 10: [mem 0xffffffc00-0xffffffcff 64bit]
> > pci 0000:05:00.0: reg 10: [mem 0xfff000000-0xfff7fffff 64bit pref]
> > pci 0000:05:00.0: reg 18: [mem 0xfffcfe000-0xfffcfffff 64bit pref]
> > pci 0000:05:00.0: reg 20: [mem 0xfffcfc000-0xfffcfdfff 64bit pref]
> > pci 0000:00:06.0: PCI bridge to [bus 05-05]
> > pci 0000:00:06.0:   bridge window [mem 0xfff000000-0xfffcfffff 64bit pref]
> > pci 0000:00:1c.0: PCI bridge to [bus 09-0b]
> > pci 0000:00:1c.0:   bridge window [mem 0xfffd00000-0xfffefffff 64bit pref]
> > pci 0000:0b:04.0: reg 10: [mem 0xfffef8000-0xfffefffff 64bit pref]
> > pci 0000:0b:04.0: reg 18: [mem 0xfffd00000-0xfffdfffff 64bit pref]
> > pci 0000:0b:04.0: reg 20: [mem 0xfffef7800-0xfffef7fff 64bit pref]
> > pci 0000:09:00.0: PCI bridge to [bus 0b-0b]
> > pci 0000:09:00.0:   bridge window [mem 0xfffd00000-0xfffefffff 64bit pref]
> ...
> > pci 0000:00:06.0: no compatible bridge window for [mem 0xfff000000-0xfffcfffff 64bit pref]
> > pci 0000:00:1c.0: no compatible bridge window for [mem 0xfffd00000-0xfffefffff 64bit pref]
> > pci 0000:09:00.0: no compatible bridge window for [mem 0xfffd00000-0xfffefffff 64bit pref]
> > pci 0000:00:1f.3: no compatible bridge window for [mem 0xffffffc00-0xffffffcff 64bit]
> > pci 0000:05:00.0: no compatible bridge window for [mem 0xfff000000-0xfff7fffff 64bit pref]
> > pci 0000:05:00.0: no compatible bridge window for [mem 0xfffcfe000-0xfffcfffff 64bit pref]
> > pci 0000:05:00.0: no compatible bridge window for [mem 0xfffcfc000-0xfffcfdfff 64bit pref]
> > pci 0000:0b:04.0: no compatible bridge window for [mem 0xfffef8000-0xfffefffff 64bit pref]
> > pci 0000:0b:04.0: no compatible bridge window for [mem 0xfffd00000-0xfffdfffff 64bit pref]
> > pci 0000:0b:04.0: no compatible bridge window for [mem 0xfffef7800-0xfffef7fff 64bit pref]
> ...
> > pnp 00:0e: disabling [mem 0x00000000-0x0009ffff] because it overlaps 0000:05:00.0 BAR 0 [mem 0x00000000-0x007fffff 64bit pref]
> > pnp 00:0e: disabling [mem 0x000c0000-0x000cffff] because it overlaps 0000:05:00.0 BAR 0 [mem 0x00000000-0x007fffff 64bit pref]
>
> ACPI helpfully tells us that the high 6MB below 4GB is reserved, but
> we don't handle that correctly:
>
> > pnp 00:08: [mem 0xffa00000-0xfffffffe]
> > system 00:08: [mem 0xffa00000-0xfffffffe] could not be reserved
>
> And finally, we drop some of those PCI devices, including the vxge
> device on top of that ACPI PNP0C02 device, which of course doesn't
> work:
>
> > pci 0000:00:06.0: BAR 9: assigned [mem 0xff000000-0xffbfffff 64bit pref]
> > pci 0000:05:00.0: BAR 0: assigned [mem 0xff000000-0xff7fffff 64bit pref]
> > vxge: Reading of hardware info failed.Please try upgrading the firmware.
> > vxge: probe of 0000:05:00.0 failed with error -22
>
> So there's probably a BIOS bug (not reporting the windows above 4GB),
> and definitely a Linux bus (allowing PCI to allocate things on top
> of ACPI devices).
>
> This is a known Linux issue, and the top-down allocation scheme made
> it much more likely that we'd run into problems like this.  Reverting
> to bottom-up allocation doesn't fix the problem, but makes it much less
> likely that we'll trip over it.
>
> Thanks a lot for reporting this and collecting the dmesg!

Thanks for taking a look at it.  Let me know if there is anything I
can do to help resolve this issue.

Thanks,
Jon

>
> Bjorn

The information and any attached documents contained in this message
may be confidential and/or legally privileged.  The message is
intended solely for the addressee(s).  If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful.  If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.

^ permalink raw reply

* Re: "x86: allocate space within a region top-down" causes bar0 access issue
From: Jon Mason @ 2010-12-18 22:01 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Bjorn Helgaas, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, Ramkrishna Vepa
In-Reply-To: <20101217114721.04eafeca@jbarnes-desktop>

On Fri, Dec 17, 2010 at 11:47:21AM -0800, Jesse Barnes wrote:
> On Fri, 17 Dec 2010 13:44:58 -0600
> Jon Mason <jon.mason@exar.com> wrote:
>
> > The following patch is causing problem with the vxge driver/adapter on
> > HP x86-64 systems. Reads to bar0 to return 0xffffffffffffffff instead
> > of their intended value.  This prevents the vxge module from loading
> > by failing sanity checks in the driver for certain values in bar0.  We
> > are not seeing any issues with this patch on non-HP systems in our
> > lab.
> >
> > Can this patch be removed from 2.6.37 until a better solution can be
> > found?
>
> Can you try my for-linus branch and see if the problem still exists
> there?  It's at
> git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6.git.

The for-linus branch does NOT have the issue
The master branch does have the issue

Thanks,
Jon

>
> Thanks,
> --
> Jesse Barnes, Intel Open Source Technology Center

The information and any attached documents contained in this message
may be confidential and/or legally privileged.  The message is
intended solely for the addressee(s).  If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful.  If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.

^ permalink raw reply

* [PATCH] ISDN cmx: Avoid potential NULL deref in dsp_cmx_send_member() and shrink code size.
From: Jesper Juhl @ 2010-12-18 22:33 UTC (permalink / raw)
  To: Karsten Keil
  Cc: David S. Miller, Julia Lawall, Tejun Heo, netdev, linux-kernel,
	Andreas Eversberg

Hi there,

In drivers/isdn/mISDN/dsp_cmx.c::dsp_cmx_send_member() we currently have 
this code:

           if (dsp->conf && dsp->conf->software && dsp->conf->hardware)
                   tx_data_only = 1;
           if (dsp->conf->software && dsp->echo.hardware)
                   tx_data_only = 1;

The first line implies that 'dsp->conf' may be NULL. If it is, then the 
third line will dereference a NULL pointer.

This patch reworks the code so that we avoid the potential NULL deref.
It also has the added benefit that the object file size shrinks a bit.

before:
   text    data     bss     dec     hex filename
  18840     112    5784   24736    60a0 drivers/isdn/mISDN/dsp_cmx.o
after:
   text    data     bss     dec     hex filename
  18816     112    5776   24704    6080 drivers/isdn/mISDN/dsp_cmx.o


Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 dsp_cmx.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

 compile tested only

diff --git a/drivers/isdn/mISDN/dsp_cmx.c b/drivers/isdn/mISDN/dsp_cmx.c
index 76d9e67..f76f595 100644
--- a/drivers/isdn/mISDN/dsp_cmx.c
+++ b/drivers/isdn/mISDN/dsp_cmx.c
@@ -1326,10 +1326,9 @@ dsp_cmx_send_member(struct dsp *dsp, int len, s32 *c, int members)
 			dsp->last_tx = 0;
 			return;
 		}
-		if (dsp->conf && dsp->conf->software && dsp->conf->hardware)
-			tx_data_only = 1;
-		if (dsp->conf->software && dsp->echo.hardware)
-			tx_data_only = 1;
+		if (dsp->conf && dsp->conf->software)
+			if (dsp->conf->hardware || dsp->echo.hardware)
+				tx_data_only = 1;
 	}
 
 #ifdef CMX_DEBUG



-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.

^ permalink raw reply related

* Re: [PATCHv4] fragment locally generated tunnel-mode IPSec6 packets as needed
From: David Miller @ 2010-12-18 22:34 UTC (permalink / raw)
  To: herbert; +Cc: dlstevens, netdev
In-Reply-To: <20101218021606.GA26888@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 18 Dec 2010 10:16:06 +0800

> On Fri, Dec 17, 2010 at 01:42:42PM -0800, David L Stevens wrote:
>> This patch modifies IPsec6 to fragment IPv6 packets that are
>> locally generated as needed.
>> 
>> This version of the patch only fragments in tunnel mode, so that fragment
>> headers will not be obscured by ESP in transport mode.
>> 
>> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

So is the TAHI test regression caused by v3 fixed here in v4?

^ permalink raw reply

* Re: [RFC PATCH 02/12] net: Introduce new feature setting ops
From: Michał Mirosław @ 2010-12-19  0:49 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1292541186.18294.16.camel@bwh-desktop>

On Thu, Dec 16, 2010 at 11:13:06PM +0000, Ben Hutchings wrote:
> On Wed, 2010-12-15 at 23:24 +0100, Michał Mirosław wrote:
> > This introduces a new framework to handle device features setting.
> > It consists of:
> >   - new fields in struct net_device:
> > 	+ hw_features - features that hw/driver supports toggling
> > 	+ wanted_features - features that user wants enabled, when possible
> >   - new netdev_ops:
> > 	+ feat = ndo_fix_features(dev, feat) - API checking constraints for
> > 		enabling features or their combinations
> > 	+ ndo_set_features(dev) - API updating hardware state to match
> > 		changed dev->features
> >   - new ethtool commands:
> > 	+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
> > 		and trigger device reconfiguration if resulting dev->features
> > 		changed
> > 	[TODO: this might be extended to support device-specific flags, and
> > 	keep NETIF_F flags from becoming part of ABI by using GET_STRINGS
> > 	for describing the bits]
> We already have ETHTOOL_{G,S}PFLAGS for that, though.

Since we're almost out of bits in features, I thought that I might as well
just take care of that in one go. I'll implement the GET_STRINGS part to
show what I mean about bits not becoming an ABI.

> > 	[Note: ETHTOOL_GFEATURES and ETHTOOL_SFEATURES' data is supposed to
> > 	be 'compatible', so that you can R/M/W without additional copying]
> But if you expect userland to do that, what is the point of the 'valid'
> mask?  Shouldn't userland do something like:
> 
> 	struct ethtool_features feat = { ETHTOOL_SFEATURES };
> 	...
> 	if (off_tso_wanted >= 0)
> 		feat.features[0].valid |= NETIF_F_TSO;
> 	if (off_tso_wanted > 0)
> 		feat.features[0].requested |= NETIF_F_TSO;
> 	...

That looks even better than what I thought originally. So this means I can
just get rid of the combined ethtool cmd struct.

> [...]
> > --- a/include/linux/ethtool.h
> > +++ b/include/linux/ethtool.h
> > @@ -523,6 +523,31 @@ struct ethtool_flash {
> >         char    data[ETHTOOL_FLASH_MAX_FILENAME];
> >  };
> >  
> > +/* for returning feature sets */
> > +#define ETHTOOL_DEV_FEATURE_WORDS      1
> > +
> > +struct ethtool_get_features_block {
> > +       __u32   available;      /* features togglable */
> > +       __u32   requested;      /* features requested to be enabled */
> > +       __u32   active;         /* features currently enabled */
> > +       __u32   __pad[1];
> > +};
> > +
> > +struct ethtool_set_features_block {
> > +       __u32   valid;          /* bits valid in .requested */
> > +       __u32   requested;      /* features requested */
> > +       __u32   __pad[2];
> > +};
> > +
> > +struct ethtool_features {
> > +       __u32   cmd;
> > +       __u32   count;          /* blocks */
> > +       union {
> > +               struct ethtool_get_features_block get;
> > +               struct ethtool_set_features_block set;
> > +       } features[0];
> > +};
> I want kernel-doc comments with a proper description of semantics.

Will do.

> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> [...]
> > @@ -934,6 +949,14 @@ struct net_device {
> >  				 NETIF_F_SG | NETIF_F_HIGHDMA |		\
> >  				 NETIF_F_FRAGLIST)
> >  
> > +	/* toggable features with no driver requirements */
> > +#define NETIF_F_SOFT_FEATURES	(NETIF_F_GSO | NETIF_F_GRO)
> > +
> > +	/* ethtool-toggable features */
> 
> The verb is 'toggle' so this adjective should be either 'togglable' or
> 'toggleable'.  Or you could choose a different adjective.

'changed'. Or fixed. ;-)

> > +	unsigned long		hw_features;
> > +	/* ethtool-requested features */
> > +	unsigned long		wanted_features;
> > +
> While you're at it, you could change all these 'features' fields and
> parameters to u32, since we presumably won't be defining features that
> can only be enabled on 64-bit architectures.

Done. I also went through functions that pass features along and fixed
their types. Patch will be in the next version of this series.

> [...]
> > diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> > index 1774178..f08e7f1 100644
> > --- a/net/core/ethtool.c
> > +++ b/net/core/ethtool.c
> > @@ -171,6 +171,55 @@ EXPORT_SYMBOL(ethtool_ntuple_flush);
> >  
> >  /* Handlers for each ethtool command */
> >  
> > +static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
> > +{
> > +	struct ethtool_features cmd = {
> > +		.cmd = ETHTOOL_GFEATURES,
> > +		.count = ETHTOOL_DEV_FEATURE_WORDS,
> > +	};
> > +	struct ethtool_get_features_block features[ETHTOOL_DEV_FEATURE_WORDS] = {
> > +		{
> > +			.available = dev->hw_features,
> > +			.requested = dev->wanted_features,
> > +			.active = dev->features,
> > +		},
> > +	};
> > +
> > +	if (copy_to_user(useraddr, &cmd, sizeof(cmd)))
> > +		return -EFAULT;
> > +	useraddr += sizeof(cmd);
> > +	if (copy_to_user(useraddr, features, sizeof(features)))
> > +		return -EFAULT;
> If ETHTOOL_DEV_FEATURE_WORDS increases, how do you know the user buffer
> will be big enough?

I'll change that to use count from GET_STRINGS.

> > +	return 0;
> > +}
> > +
> > +static int ethtool_set_features(struct net_device *dev, void __user *useraddr)
> > +{
> > +	struct ethtool_features cmd;
> > +	struct ethtool_set_features_block features[ETHTOOL_DEV_FEATURE_WORDS];
> > +
> > +	if (copy_from_user(&cmd, useraddr, sizeof(cmd)))
> > +		return -EFAULT;
> > +	useraddr += sizeof(cmd);
> > +
> > +	if (cmd.count > ETHTOOL_DEV_FEATURE_WORDS)
> > +		cmd.count = ETHTOOL_DEV_FEATURE_WORDS;
> So additional feature words will be silently ignored...
> > +	if (copy_from_user(features, useraddr, sizeof(*features) * cmd.count))
> > +		return -EFAULT;
> > +	memset(features + cmd.count, 0,
> > +		sizeof(features) - sizeof(*features) * cmd.count);
> > +
> > +	features[0].valid &= dev->hw_features | NETIF_F_SOFT_FEATURES;
> [...]
> 
> ...as will any other unsupported features.  This is not a good idea.
> (However, remembering which features are wanted does seem like a good
> idea.)

That's intentional. Unsupported features can't be enabled anyway.
hw_features is supposed to contain all features that the device can support
and is able to enable/disable. This set should be constant and anything that
is in the wanted_features set but is not supported because of other conditions
will be stripped by ndo_fix_features() call.

Other way would be to return EINVAL when bits not changeable are present in
the valid mask. I don't want to do that, since then your example of changing
a feature without GFEATURES first will not work.

Best Regards,
Michał Mirosław
 

^ permalink raw reply

* Re: [RFC PATCH 03/12] net: ethtool: use ndo_fix_features for offload setting
From: Michał Mirosław @ 2010-12-19  0:54 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1292541829.18294.23.camel@bwh-desktop>

On Thu, Dec 16, 2010 at 11:23:49PM +0000, Ben Hutchings wrote:
> On Wed, 2010-12-15 at 23:24 +0100, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  include/linux/netdevice.h |    2 +
> >  net/core/dev.c            |    8 ++
> >  net/core/ethtool.c        |  252 +++++++++++++++++++-------------------------
> >  3 files changed, 119 insertions(+), 143 deletions(-)
> > 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 4b20944..7634cab 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -941,6 +941,8 @@ struct net_device {
> >  #define NETIF_F_V6_CSUM		(NETIF_F_GEN_CSUM | NETIF_F_IPV6_CSUM)
> >  #define NETIF_F_ALL_CSUM	(NETIF_F_V4_CSUM | NETIF_F_V6_CSUM)
> >  
> > +#define NETIF_F_ALL_TSO 	(NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
> > +
> >  	/*
> >  	 * If one device supports one of these features, then enable them
> >  	 * for all in netdev_increment_features.
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 1e616bb..95d0a49 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5054,6 +5054,14 @@ unsigned long netdev_fix_features(unsigned long features, const char *name)
> >  		features &= ~NETIF_F_TSO;
> >  	}
> >  
> > +	/* Software GSO depends on SG. */
> > +	if ((features & NETIF_F_GSO) && !(features & NETIF_F_SG)) {
> > +		if (name)
> > +			printk(KERN_NOTICE "%s: Dropping NETIF_F_GSO since no "
> > +			       "SG feature.\n", name);
> > +		features &= ~NETIF_F_GSO;
> > +	}
> > +
> >  	if (features & NETIF_F_UFO) {
> >  		/* maybe split UFO into V4 and V6? */
> >  		if (!((features & NETIF_F_GEN_CSUM) ||
> The severity of these messages will need to be reduced if ethtool relies
> on this function to propagate feature changes.  (And I wonder why some
> of them are ERR and some NOTICE.)

Patch is ready (all converted to netdev_info).

> > diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> > index f08e7f1..b068738 100644
> > --- a/net/core/ethtool.c
> > +++ b/net/core/ethtool.c
> > @@ -55,6 +55,7 @@ int ethtool_op_set_tx_csum(struct net_device *dev, u32 data)
> >  
> >  	return 0;
> >  }
> > +EXPORT_SYMBOL(ethtool_op_set_tx_csum);
> >  
> >  int ethtool_op_set_tx_hw_csum(struct net_device *dev, u32 data)
> >  {
> > @@ -220,6 +221,85 @@ static int ethtool_set_features(struct net_device *dev, void __user *useraddr)
> >  	return 0;
> >  }
> >  
> > +static unsigned long ethtool_get_feature_mask(u32 eth_cmd)
> > +{
> > +	switch (eth_cmd) {
> > +	case ETHTOOL_GTXCSUM:
> > +	case ETHTOOL_STXCSUM:
> > +		return NETIF_F_ALL_CSUM|NETIF_F_SCTP_CSUM;
> I wonder whether this should cover NETIF_F_FCOE_CRC as well (ixgbe
> currently doesn't seem to allow toggling it).

This mask is only for 'legacy' set_tx_csum ethtool commands. Other
checksumming options - or whatever bits are included in hw_features - can be
toggled by ETHTOOL_SFEATURES.

> There should be spaces around the '|' operator.

Fixed.

> > +static int __ethtool_set_tx_csum(struct net_device *dev, u32 data);
> > +static int __ethtool_set_sg(struct net_device *dev, u32 data);
> > +static int __ethtool_set_tso(struct net_device *dev, u32 data);
> > +static int __ethtool_set_ufo(struct net_device *dev, u32 data);
> > +
> > +static int ethtool_set_one_feature(struct net_device *dev,
> > +	void __user *useraddr, u32 ethcmd)
> > +{
> > +	struct ethtool_value edata;
> > +	unsigned long mask;
> > +
> > +	if (copy_from_user(&edata, useraddr, sizeof(edata)))
> > +		return -EFAULT;
> > +
> > +	mask = ethtool_get_feature_mask(ethcmd);
> > +	mask &= dev->hw_features | NETIF_F_SOFT_FEATURES;
> > +	if (mask) {
> > +		if (edata.data)
> > +			dev->wanted_features |= mask;
> > +		else
> > +			dev->wanted_features &= ~mask;
> > +
> > +		netdev_update_features(dev);
> > +		return 0;
> > +	}
> > +
> > +	switch (ethcmd) {
> > +	case ETHTOOL_STXCSUM:
> > +		return __ethtool_set_tx_csum(dev, edata.data);
> > +	case ETHTOOL_SSG:
> > +		return __ethtool_set_sg(dev, edata.data);
> > +	case ETHTOOL_STSO:
> > +		return __ethtool_set_tso(dev, edata.data);
> > +	case ETHTOOL_SUFO:
> > +		return __ethtool_set_ufo(dev, edata.data);
> > +	default:
> > +		return -EOPNOTSUPP;
> > +	}
> > +}
> [...]
> 
> This deserves some comments.

Sure.

This is compatibility layer for current ethtool tools, and drivers
not converted to ndo_fix_features() and friends.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC PATCH 04/12] net: introduce NETIF_F_RXCSUM
From: Michał Mirosław @ 2010-12-19  0:57 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1292542049.18294.25.camel@bwh-desktop>

On Thu, Dec 16, 2010 at 11:27:29PM +0000, Ben Hutchings wrote:
> On Wed, 2010-12-15 at 23:24 +0100, Michał Mirosław wrote:
> > Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
> > offload. Integrate it with ndo_fix_features.
> > 
> > ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.
> Not explicitly, but what about drivers that turn TX and RX checksumming
> on and off together?  They are implicitly covered by the case which
> you're removing:
> [...]
> > -	case ETHTOOL_GRXCSUM:
> > -		rc = ethtool_get_value(dev, useraddr, ethcmd,
> > -				       (dev->ethtool_ops->get_rx_csum ?
> > -					dev->ethtool_ops->get_rx_csum :
> > -					ethtool_op_get_rx_csum));
> [...]

I had an impression that drivers not defining get_rx_csum() had no
RX checksumming or didn't allow it to be toggled. I'll have a closer look
on this.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC PATCH 07/12] vlan: convert VLAN devices to use ndo_fix_features()
From: Michał Mirosław @ 2010-12-19  1:01 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1292542618.18294.29.camel@bwh-desktop>

On Thu, Dec 16, 2010 at 11:36:58PM +0000, Ben Hutchings wrote:
> On Wed, 2010-12-15 at 23:24 +0100, Michał Mirosław wrote:
> > Note: get_flags was actually broken, because it should return the
> > flags capped with vlan_features. This is now done implicitly by
> > limiting netdev->hw_features.
> > 
> > RX checksumming offload control is (and was) broken, as there was no way
> > before to say whether it's done for tagged packets.
> > 
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  net/8021q/vlan.c     |    3 +-
> >  net/8021q/vlan_dev.c |   51 ++++++++++++++-----------------------------------
> >  2 files changed, 16 insertions(+), 38 deletions(-)
> > 
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 6e64f7c..583d47b 100644
> > --- a/net/8021q/vlan.c
> > +++ b/net/8021q/vlan.c
> > @@ -329,8 +329,7 @@ static void vlan_transfer_features(struct net_device *dev,
> >  {
> >  	unsigned long old_features = vlandev->features;
> >  
> > -	vlandev->features &= ~dev->vlan_features;
> > -	vlandev->features |= dev->features & dev->vlan_features;
> > +	netdev_update_features(vlandev);
> >  	vlandev->gso_max_size = dev->gso_max_size;
> >  
> >  	if (dev->features & NETIF_F_HW_VLAN_TX)
> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> > index be73753..468c899 100644
> > --- a/net/8021q/vlan_dev.c
> > +++ b/net/8021q/vlan_dev.c
> > @@ -691,8 +691,8 @@ static int vlan_dev_init(struct net_device *dev)
> >  					  (1<<__LINK_STATE_DORMANT))) |
> >  		      (1<<__LINK_STATE_PRESENT);
> >  
> > -	dev->features |= real_dev->features & real_dev->vlan_features;
> > -	dev->features |= NETIF_F_LLTX;
> > +	dev->hw_features = real_dev->vlan_features;
> [...]
> net_device::hw_features is supposed to represent features that can be
> toggled, but the inclusion of a flag in net_device::vlan_features does
> not mean the feature can be toggled.

All TX offloads are by definition togglable since it just a matter of not
issuing skb's that need them. RX offloads on the other hand are controlled
by underlying device, so probably should be removed from this set and
forced on or left alone in ndo_fix_features().

> If this is to be a straight conversion, that line should be:
> 	dev->hw_features = real_dev->vlan_features & NETIF_F_TSO;

I'll mask this with ALL_CSUM+SG+ALL_TSO if you don't mind extending the
possibilities anyway.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCHv4] fragment locally generated tunnel-mode IPSec6 packets as needed
From: Herbert Xu @ 2010-12-19  1:03 UTC (permalink / raw)
  To: David Miller; +Cc: dlstevens, netdev
In-Reply-To: <20101218.143423.189688024.davem@davemloft.net>

On Sat, Dec 18, 2010 at 02:34:23PM -0800, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sat, 18 Dec 2010 10:16:06 +0800
> 
> > On Fri, Dec 17, 2010 at 01:42:42PM -0800, David L Stevens wrote:
> >> This patch modifies IPsec6 to fragment IPv6 packets that are
> >> locally generated as needed.
> >> 
> >> This version of the patch only fragments in tunnel mode, so that fragment
> >> headers will not be obscured by ESP in transport mode.
> >> 
> >> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> > 
> > Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> So is the TAHI test regression caused by v3 fixed here in v4?

I think so since that was probably caused by the incorrect
fragmentation of transport-mode packets before encapsulation.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 2.6.36] vlan: Avoid hwaccel vlan packets when vid not used
From: Jesse Gross @ 2010-12-19  3:38 UTC (permalink / raw)
  To: Michael Leun
  Cc: Matt Carlson, Michael Chan, Eric Dumazet, David Miller,
	Ben Greear, linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20101215081606.18e12fc9@xenia.leun.net>

On Tue, Dec 14, 2010 at 11:16 PM, Michael Leun
<lkml20101129@newton.leun.net> wrote:
> OK - all tests done on that DL320G5:
>
> For completeness, 2.6.37-rc5 unpatched:
>
> eth0, no vlan configured: totally broken - see double tagged vlans
> without tag, single or untagged packets missing at all

Random behavior?  This one is somewhat hard to explain - maybe there
are some other factors.  eth0 has ASF on, so it always strips tags.  I
would expect it to behave like the vlan configured case.

>
> eth0, vlan configured: see packets without vlan tag (see double tagged
> packets with one vlan tag)

Both ASF and vlan group configured cause tag stripping to be enabled.
Missing tag.

>
> eth1 same as originally reported:
> without vlan configured see vlan tags (single and double tagged as
> expected)

No ASF and no vlan group means tag stripping is disabled.  Have tag.

> with vlan configured: see packets without vlan tag (see double tagged
> packets with one vlan tag)

Configuring vlan group causes stripping to be enabled.  Missing tag.

>
>
> 2.6.37-rc5, your tg3 use new vlan-code patch:
>
> eth0, no vlan configured:  see packets without vlan tag (see double
> tagged packets with one vlan tag)

ASF enables tag stripping.  Missing tag.

> eth1, no vlan configured: see vlan tags (single and double tagged as
> expected)

No ASF, no vlan group means no stripping.  Have tag.

>
>
> eth0, vlan configured: as without vlan

ASF enables stripping.  Missing tag.

> eth1, vlan configured: as without vlan

With this patch vlan stripping is only enabled when ASF is on, so no
stripping.  Have tag.

>
> 2.6.37-rc5, your tg3 use new vlan-code patch with test patch ontop
>
> eth1 no vlan configured: see packets without vlan tag (see double tagged
> packets with one vlan tag)

With the second patch, vlan stripping is always enabled.  Missing tag.

> eth1 with vlan: the same

Stripping still always enabled.  Missing tag.

The bottom line is whenever vlan stripping is enabled we're missing
the outer tag.  It might be worth adding some debugging in the area
before napi_gro_receive/vlan_gro_receive (depending on version).  My
guess is that (desc->type_flags & RXD_FLAG_VLAN) is false even for
vlan packets on this NIC.

You said that everything works on the 5752?  Matt, is it possible that
the 5714 either has a problem with vlan stripping or a different way
of reporting it?

Also, why does ASF require vlan stripping?

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Add Nic partitioning mode (57712 devices)
From: Matt Domsch @ 2010-12-19  5:49 UTC (permalink / raw)
  To: Dimitris Michailidis
  Cc: Eilon Greenstein, Dmitry Kravkov, davem@davemloft.net,
	netdev@vger.kernel.org, narendra_k@dell.com,
	jordan_hargrave@dell.com
In-Reply-To: <4D0BEE9A.5070505@chelsio.com>

On Fri, Dec 17, 2010 at 03:13:30PM -0800, Dimitris Michailidis wrote:
> Matt Domsch wrote:
> >On Thu, Dec 09, 2010 at 04:49:25PM +0200, Eilon Greenstein wrote:
> >>On Mon, 2010-12-06 at 10:21 -0800, Dimitris Michailidis wrote:
> >>>Matt Domsch wrote:
> >>...
> >>>/sys/class/net/<ifname>/dev_id indicates the physical port <ifname> is 
> >>>associated with.  At least a few drivers set up dev_id this way.
> >>>
> >>>
> >>So we are on agreement? This can satisf all needs? If so, we will add
> >>this scheme to the bnx2x as well.
> >
> >I don't think that's enough.  Necessary, but not sufficient.
> >
> >If dev_id is a field that starts over with each PCI device (e.g. is
> >used to distinguish multiple ports that share the same PCI
> >device), that's enough to handle the Chelsio case, but not the NPAR &
> >SR-IOV case.
> 
> My understanding is that dev_id indicates the physical port of the card 
> associated with an interface.  It does not reset when you move to a new 
> function of the device.
> 
> >
> >If the above is true, then a value of dev_id=0 for all 1:1 PCI Device
> >: Port relations is fine, leaving the three drivers that set dev_id
> >non-zero are all multi-port, single PCI device controllers.
> >
> >cxgb4/t4_hw.c:          adap->port[i]->dev_id = j;
> 
> The HW cxgb4 deals with is multi-function (actually the driver uses 
> primarily function 4 nowadays) but it's virtualizable and the association 
> between functions and ports very flexible.  For example, you may have a 
> 2-port card but maybe the driver will be given just (a slice of) port 1.  
> So the driver will create one netdev with dev_id==1 and there won't be 
> anything with dev_id 0.  You cannot determine this by looking at anything 
> PCI-related or any static table.
>
> For this driver you can get two pieces of information for an interface:
> - /sys/class/net/<interface>/device points to the PCI function handling the 
> interface
> - /sys/class/net/<interface>/dev_id indicates the physical port of the 
> interface
>
> You can have several interfaces with same device link and different dev_id. 
>  While the current driver doesn't do it you could also have several 
> interfaces with different device links but same dev_id (NPAR situation, 
> notice again that dev_ids are not per PCI function), or interfaces with 
> different device and dev_id, or even interfaces with same device and dev_id.

What is the scope of dev_id then?  It's not per PCI device like I
thought.  It sounds like it's per card, but how can I know the card
boundary?

If I have 2 cards driven by cxgb4 in the system, each with say 4
ports.  I could see a minimum of 8 PCI devices (fine), but the dev_id
values would be?  0,1,2,3; 0,1,2,3 ?  How can I tell that these are
two different cards, with two different sets of dev_id values, rather
than one card with 4 ports, 8 (NPAR or SR-IOV) interfaces, with each 2
interfaces mapping to the same port?

dev_id is not system-wide unique.  It's not even slot unique best as I
can tell.  If I had a PCI slot extender, with 2 PCI slots, and I put
two of the above cards in, I would see 0,1,2,3; 0,1,2,3.  To be fair,
my naming scheme doesn't really account for such an extender, though
currently it would go pci<slot>#<12345678>.

 
> dev_ids can handle NPAR but I do understand that dev_id 0 is ambiguous.  
> Two functions with dev_id 0 mean one thing for a driver that sets dev_id 
> and a very different thing for one that doesn't.

yeah, that sucks.

> >What if we did something like this?
> >
> >/sys/devices/net_ports/port0/
> >/sys/devices/pci0000:00/0000:00:1c.0/0000:0b:00.0/net/eth0/port -> 
> >    /../../../../../net_ports/port0
> >/sys/devices/pci0000:00/0000:00:1c.0/0000:0b:00.1/net/eth1/port -> 
> >    /../../../../../net_ports/port0
> >
> >
> >In this case, the port0 "name" is simply a way to group interfaces
> >into ports, it's not how ports are labeled on the chassis.
> 
> If I understand you right a "port" is a group of interfaces sharing one 
> physical port without saying which one.  I think dev_id does the same and 
> specifies which physical port.

And I don't think it does, or at least, not in an unambiguous way, the
dev_id=0 case and even != 0. Understanding the boundary of dev_id
domains is the key, and I clearly don't.

Please advise.

Thanks,
Matt

-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Add Nic partitioning mode (57712 devices)
From: Matt Domsch @ 2010-12-19  5:57 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Eilon Greenstein, Dimitris Michailidis, Dmitry Kravkov,
	davem@davemloft.net, netdev@vger.kernel.org, narendra_k@dell.com,
	jordan_hargrave@dell.com
In-Reply-To: <1292592157.3136.819.camel@localhost>

On Fri, Dec 17, 2010 at 01:22:37PM +0000, Ben Hutchings wrote:
> On Thu, 2010-12-16 at 20:45 -0600, Matt Domsch wrote:
> > On Thu, Dec 09, 2010 at 04:49:25PM +0200, Eilon Greenstein wrote:
> In the case of sfc, each port has a separate PCI function.  We read this
> register field to find out which port we're talking to, as
> virtualisation can alter the function number.  I don't know about the
> others.

For a single card then, this makes sense.

pci<slot>#<port>  where port = dev_id

If I have 2 such cards on a PCI extender though, I think this breaks.
Here, I'd see duplicate dev_id values, yes?

Do you label the ports on your cards in any fashion?  Do they have
labels like port 0, port 1, port 2, ... ?  Does it matter if we give
names starting at 0, or starting at 1?  latest biosdevname starts them
at 1, or uses whatever value BIOS actually provides, which on systems
I've tried, all start at 1.

Maybe the PCI extender case I should just ignore as being unsolvable
right now...  If I think N PCI devices are in the same slot, then
using the dev_id value as a per-slot value is fine.  If there's an
extender that would break this scheme, biosdevname now returns nothing
for any names it would otherwise suggest duplicate names for,
essentially throwing its hands up "I don't know".

-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO

^ permalink raw reply

* Re: [PATCH net-next] ipv6: fib6_ifdown cleanup
From: David Miller @ 2010-12-19  6:02 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20101216194240.2b231449@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 16 Dec 2010 19:42:40 -0800

> Remove (unnecessary) casts to make code cleaner.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: remove duplicate neigh_ifdown
From: David Miller @ 2010-12-19  6:02 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20101216194254.0f7c7e7b@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 16 Dec 2010 19:42:54 -0800

> When device is being set to down, neigh_ifdown was being called
> twice. Once from addrconf notifier and once from ndisc notifier.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: BUG: while bridging Ethernet and wireless device:
From: Tomas Winkler @ 2010-12-19  9:07 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-netdev, linux-wireless
In-Reply-To: <1292501797.3612.12.camel-8upI4CBIZJIJvtFkdXX2HixXY32XiHfO@public.gmane.org>

On Thu, Dec 16, 2010 at 2:16 PM, Johannes Berg
<johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> wrote:
> On Thu, 2010-12-16 at 14:11 +0200, Tomas Winkler wrote:
>
>> Dec 15 14:36:41 User-PC kernel: [175576.120452] kernel BUG at include/linux/skbuff.h:1178!
>
>> Dec 15 14:36:41 User-PC kernel: [175576.123193] EIP is at br_multicast_rcv+0xc95/0xe1c [bridge]
>
> So as I said to Tomas in private before -- it kinda looks like something
> here is not handling paged SKBs correctly? But I would imaging that
> causing more issues, unless there was a bug here that made bridging
> require more data in the skb header than we put in there right now -- it
> can end up being empty I believe.
>
> Thing is, I looked at the code and it seemed fine.
>
> johannes
>
opened bug https://bugzilla.kernel.org/show_bug.cgi?id=25202

>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/2] net: phy: Fixed some checkpatch errors
From: Joe Perches @ 2010-12-19  9:19 UTC (permalink / raw)
  To: Jean-Michel Hautbois
  Cc: davem, richard.cochran, shemminger, tj, randy.dunlap, netdev,
	linux-kernel
In-Reply-To: <1292697704-2886-2-git-send-email-jhautbois@gmail.com>

On Sat, 2010-12-18 at 19:41 +0100, Jean-Michel Hautbois wrote:
> Fixes some coding style issues (errors and warnings).

Hi Jean-Michel.

> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
[]
> @@ -47,11 +47,11 @@ void phy_print_status(struct phy_device *phydev)
>  	pr_info("PHY: %s - Link is %s", dev_name(&phydev->dev),
>  			phydev->link ? "Up" : "Down");
>  	if (phydev->link)
> -		printk(" - %d/%s", phydev->speed,
> +		pr_info(" - %d/%s", phydev->speed,
>  				DUPLEX_FULL == phydev->duplex ?
>  				"Full" : "Half");
>  
> -	printk("\n");
> +	pr_info("\n");

This isn't the right way to fix this actually.

The first pr_info is not terminated with a newline,
so the second printk should be printk(KERN_CONT or
pr_cont instead.

Ideally, you would make a different change.
There should be just one printk/pr_info.
Fewer messages, couldn't be interleaved with other
simultaneous output, easier to grep, etc...

I think something like this would be better:

 	if (phydev->link)
		pr_info("PHY: %s - Link is down\n", dev_name(&phydev->dev));
	else
		pr_info("PHY: %s - Link is up - %d/%s\n",
			dev_name(&phydev->dev),
			phydev->speed,
			phydev->duplex == DUPLEX_FULL ? "Full" : "Half");

or maybe use dev_info like this:

 	if (phydev->link)
		dev_info(&phydev->dev, "PHY: Link is down\n");
	else
		dev_info(&phydev->dev, "PHY: Link is up - %d/%s\n",
			 phydev->speed,
			 phydev->duplex == DUPLEX_FULL ? "Full" : "Half");

cheers, Joe

^ permalink raw reply

* Re: [PATCH] iproute2: ip: add wilcard support for device matching
From: Vlad Dogaru @ 2010-12-19  9:32 UTC (permalink / raw)
  To: hadi
  Cc: Octavian Purdila, Eric Dumazet, Stephen Hemminger, netdev,
	Lucian Adrian Grijincu
In-Reply-To: <1292593088.2668.43.camel@mojatatu>

On 12/17/2010 5:38 AM, jamal wrote:
> On Fri, 2010-12-10 at 20:06 +0200, Octavian Purdila wrote:
>
>>> $ ip link add link bond0 "vlan\*199" type vlan id 199
>>> $ ifconfig "vlan\*199"
>>
>> :) Then use a special dev keyword like dev* ?
>>
>> $ ip link set dev* dummy set
>>
>> Or use a new flag to allow expansion?
>>
>> $ ip -e link set dev dummy* set
>
> There was something ive always wanted to do but
> havent had time. It will cut time in a big way the
> user-kernel interaction in precisely your situation.
>
> Add a new general purpose netdev 32 bit tag. You can use this
> feature to "group" netdevs. The group "all netdevs" is 0 - which
> is the default.
> I can group individual dummy interfaces into group 1.
> ip link dev dummy0 set group 1
> ..
> ..
> ip link dev dummy99 set group 1
>
> Then i can send a query to only ifup and ignore
> the 1000 vlans that exist.
>
> ip link dev ls group 1
>
> or if i didnt list the group, then group 0 is assumed.

I'll try to implement this approach in the next few days.

Vlad

^ permalink raw reply

* [PATCH 1/4] net: phy: balance disable/enable irq on change
From: Jean-Michel Hautbois @ 2010-12-19 10:58 UTC (permalink / raw)
  To: davem
  Cc: richard.cochran, shemminger, tj, randy.dunlap, netdev,
	linux-kernel, Jean-Michel Hautbois

When phy interface changes its status, it calls phy_change() function.
This function calls the interrupt disabling functions for the driver
registered, but if this driver doesn't implement it, there is no IRQ
disabling. After doing the work, we call enable_irq and not the
respective driver function. This fixes it, as it could lead to an
unbalanced IRQ. Error code changed to EOPNOTSUPP.

Signed-off-by: Jean-Michel Hautbois <jhautbois@gmail.com>
---
 drivers/net/phy/phy.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 7670aac..5f23e8e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -89,7 +89,8 @@ static int phy_config_interrupt(struct phy_device *phydev, u32 interrupts)
 	phydev->interrupts = interrupts;
 	if (phydev->drv->config_intr)
 		err = phydev->drv->config_intr(phydev);
-
+	else
+		err = -EOPNOTSUPP;
 	return err;
 }
 
@@ -541,6 +542,10 @@ static int phy_enable_interrupts(struct phy_device *phydev)
 		return err;
 
 	err = phy_config_interrupt(phydev, PHY_INTERRUPT_ENABLED);
+	if (err == -EOPNOTSUPP) {
+		err = 0;
+		enable_irq(phydev->irq);
+	}
 
 	return err;
 }
@@ -556,7 +561,10 @@ static int phy_disable_interrupts(struct phy_device *phydev)
 	/* Disable PHY interrupts */
 	err = phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED);
 
-	if (err)
+	if (err == -EOPNOTSUPP) {
+		err = 0;
+		disable_irq(phydev->irq);
+	} else if (err != 0)
 		goto phy_err;
 
 	/* Clear the interrupt */
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 3/4] net: phy: White space correction and switch/case braces
From: Jean-Michel Hautbois @ 2010-12-19 10:58 UTC (permalink / raw)
  To: davem
  Cc: richard.cochran, shemminger, tj, randy.dunlap, netdev,
	linux-kernel, Jean-Michel Hautbois
In-Reply-To: <1292756331-3735-1-git-send-email-jhautbois@gmail.com>

Little corrections in the file :
- use linux and not asm for some include files
- Trailing whitespaces removed
- no lines over 80 characters
- switch and case on the same column

Signed-off-by: Jean-Michel Hautbois <jhautbois@gmail.com>
---
 drivers/net/phy/phy.c |  304 ++++++++++++++++++++++++------------------------
 1 files changed, 152 insertions(+), 152 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 4ab48d7..87a64c0 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -33,10 +33,10 @@
 #include <linux/timer.h>
 #include <linux/workqueue.h>
 
-#include <asm/atomic.h>
-#include <asm/io.h>
+#include <linux/atomic.h>
+#include <linux/io.h>
 #include <asm/irq.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 
 /**
  * phy_print_status - Convenience function to print out the current phy status
@@ -48,7 +48,8 @@ void phy_print_status(struct phy_device *phydev)
 		pr_info("PHY: %s - Link is up - %d/%s\n",
 				dev_name(&phydev->dev),
 				phydev->speed,
-				phydev->duplex == DUPLEX_FULL ? "Full" : "Half");
+				phydev->duplex == DUPLEX_FULL ?
+				"Full" : "Half");
 	else
 		pr_info("PHY: %s - Link is down\n",
 				dev_name(&phydev->dev));
@@ -325,7 +326,7 @@ int phy_mii_ioctl(struct phy_device *phydev,
 
 	case SIOCSMIIREG:
 		if (mii_data->phy_id == phydev->addr) {
-			switch(mii_data->reg_num) {
+			switch (mii_data->reg_num) {
 			case MII_BMCR:
 				if ((val & (BMCR_RESET|BMCR_ANENABLE)) == 0)
 					phydev->autoneg = AUTONEG_DISABLE;
@@ -352,7 +353,6 @@ int phy_mii_ioctl(struct phy_device *phydev,
 		}
 
 		phy_write(phydev, mii_data->reg_num, val);
-		
 		if (mii_data->reg_num == MII_BMCR &&
 		    val & BMCR_RESET &&
 		    phydev->drv->config_init) {
@@ -471,7 +471,7 @@ static void phy_force_reduction(struct phy_device *phydev)
 	int idx;
 
 	idx = phy_find_setting(phydev->speed, phydev->duplex);
-	
+
 	idx++;
 
 	idx = phy_find_valid(idx, phydev->supported);
@@ -749,16 +749,16 @@ void phy_start(struct phy_device *phydev)
 	mutex_lock(&phydev->lock);
 
 	switch (phydev->state) {
-		case PHY_STARTING:
-			phydev->state = PHY_PENDING;
-			break;
-		case PHY_READY:
-			phydev->state = PHY_UP;
-			break;
-		case PHY_HALTED:
-			phydev->state = PHY_RESUMING;
-		default:
-			break;
+	case PHY_STARTING:
+		phydev->state = PHY_PENDING;
+		break;
+	case PHY_READY:
+		phydev->state = PHY_UP;
+		break;
+	case PHY_HALTED:
+		phydev->state = PHY_RESUMING;
+	default:
+		break;
 	}
 	mutex_unlock(&phydev->lock);
 }
@@ -782,172 +782,157 @@ void phy_state_machine(struct work_struct *work)
 	if (phydev->adjust_state)
 		phydev->adjust_state(phydev->attached_dev);
 
-	switch(phydev->state) {
-		case PHY_DOWN:
-		case PHY_STARTING:
-		case PHY_READY:
-		case PHY_PENDING:
-			break;
-		case PHY_UP:
-			needs_aneg = 1;
+	switch (phydev->state) {
+	case PHY_DOWN:
+	case PHY_STARTING:
+	case PHY_READY:
+	case PHY_PENDING:
+		break;
+	case PHY_UP:
+		needs_aneg = 1;
 
-			phydev->link_timeout = PHY_AN_TIMEOUT;
+		phydev->link_timeout = PHY_AN_TIMEOUT;
+
+		break;
+	case PHY_AN:
+		err = phy_read_status(phydev);
 
+		if (err < 0)
 			break;
-		case PHY_AN:
-			err = phy_read_status(phydev);
 
-			if (err < 0)
-				break;
+		/* If the link is down, give up on
+		 * negotiation for now */
+		if (!phydev->link) {
+			phydev->state = PHY_NOLINK;
+			netif_carrier_off(phydev->attached_dev);
+			phydev->adjust_link(phydev->attached_dev);
+			break;
+		}
 
-			/* If the link is down, give up on
-			 * negotiation for now */
-			if (!phydev->link) {
-				phydev->state = PHY_NOLINK;
-				netif_carrier_off(phydev->attached_dev);
-				phydev->adjust_link(phydev->attached_dev);
-				break;
-			}
+		/* Check if negotiation is done.  Break
+		 * if there's an error */
+		err = phy_aneg_done(phydev);
+		if (err < 0)
+			break;
 
-			/* Check if negotiation is done.  Break
-			 * if there's an error */
-			err = phy_aneg_done(phydev);
-			if (err < 0)
-				break;
+		/* If AN is done, we're running */
+		if (err > 0) {
+			phydev->state = PHY_RUNNING;
+			netif_carrier_on(phydev->attached_dev);
+			phydev->adjust_link(phydev->attached_dev);
 
-			/* If AN is done, we're running */
-			if (err > 0) {
-				phydev->state = PHY_RUNNING;
-				netif_carrier_on(phydev->attached_dev);
-				phydev->adjust_link(phydev->attached_dev);
+		} else if (0 == phydev->link_timeout--) {
+			int idx;
 
-			} else if (0 == phydev->link_timeout--) {
-				int idx;
+			needs_aneg = 1;
+			/* If we have the magic_aneg bit,
+			 * we try again */
+			if (phydev->drv->flags & PHY_HAS_MAGICANEG)
+				break;
 
-				needs_aneg = 1;
-				/* If we have the magic_aneg bit,
-				 * we try again */
-				if (phydev->drv->flags & PHY_HAS_MAGICANEG)
-					break;
+			/* The timer expired, and we still
+			 * don't have a setting, so we try
+			 * forcing it until we find one that
+			 * works, starting from the fastest speed,
+			 * and working our way down */
+			idx = phy_find_valid(0, phydev->supported);
 
-				/* The timer expired, and we still
-				 * don't have a setting, so we try
-				 * forcing it until we find one that
-				 * works, starting from the fastest speed,
-				 * and working our way down */
-				idx = phy_find_valid(0, phydev->supported);
+			phydev->speed = settings[idx].speed;
+			phydev->duplex = settings[idx].duplex;
 
-				phydev->speed = settings[idx].speed;
-				phydev->duplex = settings[idx].duplex;
+			phydev->autoneg = AUTONEG_DISABLE;
 
-				phydev->autoneg = AUTONEG_DISABLE;
+			pr_info("Trying %d/%s\n", phydev->speed,
+					DUPLEX_FULL ==
+					phydev->duplex ?
+					"FULL" : "HALF");
+		}
+		break;
+	case PHY_NOLINK:
+		err = phy_read_status(phydev);
 
-				pr_info("Trying %d/%s\n", phydev->speed,
-						DUPLEX_FULL ==
-						phydev->duplex ?
-						"FULL" : "HALF");
-			}
+		if (err)
 			break;
-		case PHY_NOLINK:
-			err = phy_read_status(phydev);
 
-			if (err)
-				break;
+		if (phydev->link) {
+			phydev->state = PHY_RUNNING;
+			netif_carrier_on(phydev->attached_dev);
+			phydev->adjust_link(phydev->attached_dev);
+		}
+		break;
+	case PHY_FORCING:
+		err = genphy_update_link(phydev);
 
-			if (phydev->link) {
-				phydev->state = PHY_RUNNING;
-				netif_carrier_on(phydev->attached_dev);
-				phydev->adjust_link(phydev->attached_dev);
-			}
+		if (err)
 			break;
-		case PHY_FORCING:
-			err = genphy_update_link(phydev);
-
-			if (err)
-				break;
 
-			if (phydev->link) {
-				phydev->state = PHY_RUNNING;
-				netif_carrier_on(phydev->attached_dev);
-			} else {
-				if (0 == phydev->link_timeout--) {
-					phy_force_reduction(phydev);
-					needs_aneg = 1;
-				}
+		if (phydev->link) {
+			phydev->state = PHY_RUNNING;
+			netif_carrier_on(phydev->attached_dev);
+		} else {
+			if (0 == phydev->link_timeout--) {
+				phy_force_reduction(phydev);
+				needs_aneg = 1;
 			}
+		}
 
-			phydev->adjust_link(phydev->attached_dev);
-			break;
-		case PHY_RUNNING:
-			/* Only register a CHANGE if we are
-			 * polling */
-			if (PHY_POLL == phydev->irq)
-				phydev->state = PHY_CHANGELINK;
+		phydev->adjust_link(phydev->attached_dev);
+		break;
+	case PHY_RUNNING:
+		/* Only register a CHANGE if we are
+		 * polling */
+		if (PHY_POLL == phydev->irq)
+			phydev->state = PHY_CHANGELINK;
+		break;
+	case PHY_CHANGELINK:
+		err = phy_read_status(phydev);
+
+		if (err)
 			break;
-		case PHY_CHANGELINK:
-			err = phy_read_status(phydev);
 
-			if (err)
-				break;
+		if (phydev->link) {
+			phydev->state = PHY_RUNNING;
+			netif_carrier_on(phydev->attached_dev);
+		} else {
+			phydev->state = PHY_NOLINK;
+			netif_carrier_off(phydev->attached_dev);
+		}
 
-			if (phydev->link) {
-				phydev->state = PHY_RUNNING;
-				netif_carrier_on(phydev->attached_dev);
-			} else {
-				phydev->state = PHY_NOLINK;
-				netif_carrier_off(phydev->attached_dev);
-			}
+		phydev->adjust_link(phydev->attached_dev);
 
+		if (PHY_POLL != phydev->irq)
+			err = phy_config_interrupt(phydev,
+					PHY_INTERRUPT_ENABLED);
+		break;
+	case PHY_HALTED:
+		if (phydev->link) {
+			phydev->link = 0;
+			netif_carrier_off(phydev->attached_dev);
 			phydev->adjust_link(phydev->attached_dev);
+		}
+		break;
+	case PHY_RESUMING:
 
-			if (PHY_POLL != phydev->irq)
-				err = phy_config_interrupt(phydev,
-						PHY_INTERRUPT_ENABLED);
-			break;
-		case PHY_HALTED:
-			if (phydev->link) {
-				phydev->link = 0;
-				netif_carrier_off(phydev->attached_dev);
-				phydev->adjust_link(phydev->attached_dev);
-			}
-			break;
-		case PHY_RESUMING:
+		err = phy_clear_interrupt(phydev);
 
-			err = phy_clear_interrupt(phydev);
+		if (err)
+			break;
 
-			if (err)
-				break;
+		err = phy_config_interrupt(phydev,
+				PHY_INTERRUPT_ENABLED);
 
-			err = phy_config_interrupt(phydev,
-					PHY_INTERRUPT_ENABLED);
+		if (err)
+			break;
 
-			if (err)
+		if (AUTONEG_ENABLE == phydev->autoneg) {
+			err = phy_aneg_done(phydev);
+			if (err < 0)
 				break;
 
-			if (AUTONEG_ENABLE == phydev->autoneg) {
-				err = phy_aneg_done(phydev);
-				if (err < 0)
-					break;
-
-				/* err > 0 if AN is done.
-				 * Otherwise, it's 0, and we're
-				 * still waiting for AN */
-				if (err > 0) {
-					err = phy_read_status(phydev);
-					if (err)
-						break;
-
-					if (phydev->link) {
-						phydev->state = PHY_RUNNING;
-						netif_carrier_on(phydev->attached_dev);
-					} else
-						phydev->state = PHY_NOLINK;
-					phydev->adjust_link(phydev->attached_dev);
-				} else {
-					phydev->state = PHY_AN;
-					phydev->link_timeout = PHY_AN_TIMEOUT;
-				}
-			} else {
+			/* err > 0 if AN is done.
+			 * Otherwise, it's 0, and we're
+			 * still waiting for AN */
+			if (err > 0) {
 				err = phy_read_status(phydev);
 				if (err)
 					break;
@@ -958,8 +943,23 @@ void phy_state_machine(struct work_struct *work)
 				} else
 					phydev->state = PHY_NOLINK;
 				phydev->adjust_link(phydev->attached_dev);
+			} else {
+				phydev->state = PHY_AN;
+				phydev->link_timeout = PHY_AN_TIMEOUT;
 			}
-			break;
+		} else {
+			err = phy_read_status(phydev);
+			if (err)
+				break;
+
+			if (phydev->link) {
+				phydev->state = PHY_RUNNING;
+				netif_carrier_on(phydev->attached_dev);
+			} else
+				phydev->state = PHY_NOLINK;
+			phydev->adjust_link(phydev->attached_dev);
+		}
+		break;
 	}
 
 	mutex_unlock(&phydev->lock);
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 4/4] net: phy: EXPORT_SYMBOL not following the function
From: Jean-Michel Hautbois @ 2010-12-19 10:58 UTC (permalink / raw)
  To: davem
  Cc: richard.cochran, shemminger, tj, randy.dunlap, netdev,
	linux-kernel, Jean-Michel Hautbois
In-Reply-To: <1292756331-3735-1-git-send-email-jhautbois@gmail.com>

The symbol phy_stop was not following the related function.

Signed-off-by: Jean-Michel Hautbois <jhautbois@gmail.com>
---
 drivers/net/phy/phy.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 87a64c0..ed503cd 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -732,6 +732,7 @@ out_unlock:
 	 * will not reenable interrupts.
 	 */
 }
+EXPORT_SYMBOL(phy_stop);
 
 
 /**
@@ -762,7 +763,6 @@ void phy_start(struct phy_device *phydev)
 	}
 	mutex_unlock(&phydev->lock);
 }
-EXPORT_SYMBOL(phy_stop);
 EXPORT_SYMBOL(phy_start);
 
 /**
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 2/4] net: phy: printk issues fixed
From: Jean-Michel Hautbois @ 2010-12-19 10:58 UTC (permalink / raw)
  To: davem
  Cc: richard.cochran, shemminger, tj, randy.dunlap, netdev,
	linux-kernel, Jean-Michel Hautbois
In-Reply-To: <1292756331-3735-1-git-send-email-jhautbois@gmail.com>

When calling net link change, pr_info and printk where interleaved.
Changed the way to print messages.

Signed-off-by: Jean-Michel Hautbois <jhautbois@gmail.com>
---
 drivers/net/phy/phy.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 5f23e8e..4ab48d7 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -44,14 +44,14 @@
  */
 void phy_print_status(struct phy_device *phydev)
 {
-	pr_info("PHY: %s - Link is %s", dev_name(&phydev->dev),
-			phydev->link ? "Up" : "Down");
 	if (phydev->link)
-		printk(" - %d/%s", phydev->speed,
-				DUPLEX_FULL == phydev->duplex ?
-				"Full" : "Half");
-
-	printk("\n");
+		pr_info("PHY: %s - Link is up - %d/%s\n",
+				dev_name(&phydev->dev),
+				phydev->speed,
+				phydev->duplex == DUPLEX_FULL ? "Full" : "Half");
+	else
+		pr_info("PHY: %s - Link is down\n",
+				dev_name(&phydev->dev));
 }
 EXPORT_SYMBOL(phy_print_status);
 
-- 
1.7.0.4


^ permalink raw reply related

* Kernel panic eth2 mirred redirect to ifb0
From: Paweł Staszewski @ 2010-12-19 11:35 UTC (permalink / raw)
  To: Linux Network Development list

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

Hi all

I have panic with kernel 2.6.37-rc6-git2 when use iproute2 
mirred/redirect action


host1 (kernel 2.6.36.2)
netperf client -> eth3 (82598EB 10-Gigabit AT CX4) - directly connected 
to eth2 of host2
ethtool -k eth3
Offload parameters for eth3:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off

ethtool -i eth3
driver: ixgbe
version: 2.0.84-k2
firmware-version: 1.12-2
bus-info: 0000:03:00.1


host2 (kernel-2.6.37-rc6-git2)
netserver -> eth2 (82598EB 10-Gigabit AT CX4) - directly connected to 
eth3 of host1

ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

ethtool -i eth2
driver: ixgbe
version: 2.0.84-k2
firmware-version: 1.12-2
bus-info: 0000:03:00.0



Normally without ifb and redirect netperf show:
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

  87380  16384  16384    10.00    9042.14


Steps to reproduce panic:
ip link set dev ifb0 up

tc qdisc add dev eth2 ingress

tc filter add dev eth2 parent ffff: protocol ip prio 10 u32 \
match ip src 0.0.0.0/0 flowid 1:1 \
action mirred egress redirect dev ifb0


After this when i start netperf on host1 I have panic (screenshot in 
attached image).

Thanks
Pawel

[-- Attachment #2: ifb-eth2-mirred-redirect-panic.JPG --]
[-- Type: image/jpeg, Size: 105170 bytes --]

^ permalink raw reply

* Re: Kernel panic eth2 mirred redirect to ifb0
From: Paweł Staszewski @ 2010-12-19 11:39 UTC (permalink / raw)
  To: Linux Network Development list
In-Reply-To: <4D0DEDF9.7020102@itcare.pl>

[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]

W dniu 2010-12-19 12:35, Paweł Staszewski pisze:
> Hi all
>
> I have panic with kernel 2.6.37-rc6-git2 when use iproute2 
> mirred/redirect action
>
>
> host1 (kernel 2.6.36.2)
> netperf client -> eth3 (82598EB 10-Gigabit AT CX4) - directly 
> connected to eth2 of host2
> ethtool -k eth3
> Offload parameters for eth3:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off
> rx-vlan-offload: off
> tx-vlan-offload: off
> ntuple-filters: off
> receive-hashing: off
>
> ethtool -i eth3
> driver: ixgbe
> version: 2.0.84-k2
> firmware-version: 1.12-2
> bus-info: 0000:03:00.1
>
>
> host2 (kernel-2.6.37-rc6-git2)
> netserver -> eth2 (82598EB 10-Gigabit AT CX4) - directly connected to 
> eth3 of host1
>
> ethtool -k eth2
> Offload parameters for eth2:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off
> receive-hashing: off
>
> ethtool -i eth2
> driver: ixgbe
> version: 2.0.84-k2
> firmware-version: 1.12-2
> bus-info: 0000:03:00.0
>
>
>
> Normally without ifb and redirect netperf show:
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2 
> (192.168.0.2) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
>  87380  16384  16384    10.00    9042.14
>
>
> Steps to reproduce panic:
> ip link set dev ifb0 up
>
> tc qdisc add dev eth2 ingress
>
> tc filter add dev eth2 parent ffff: protocol ip prio 10 u32 \
> match ip src 0.0.0.0/0 flowid 1:1 \
> action mirred egress redirect dev ifb0
>
>
> After this when i start netperf on host1 I have panic (screenshot in 
> attached image).
>
> Thanks
> Pawel
In next attached image is second screenshot (different - trace) - but 
with the same configuration


Regards
Pawel

[-- Attachment #2: ifb-eth2-mirred-redirect-panic-part2.JPG --]
[-- Type: image/jpeg, Size: 91665 bytes --]

^ permalink raw reply

* 2.6.37-rc6-git4: Reported regressions from 2.6.36
From: Rafael J. Wysocki @ 2010-12-19 12:28 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux SCSI List, Linux ACPI, Network Development,
	Linux Wireless List, DRI, Florian Mickler, Andrew Morton,
	Kernel Testers List, Linus Torvalds, Linux PM List,
	Maciej Rutecki

This message contains a list of some regressions from 2.6.36,
for which there are no fixes in the mainline known to the tracking team.
If any of them have been fixed already, please let us know.

If you know of any other unresolved regressions from 2.6.36, please let us
know either and we'll add them to the list.  Also, please let us know
if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply
to this message with CCs to the people involved in reporting and handling
the issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2010-12-19       73       28          24
  2010-12-03       55       25          19
  2010-11-19       39       29          25


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=25012
Subject		: BUG: i915 causes NULL pointer dereference in 2.6.37-rc5-git4
Submitter	: Tõnu Raitviir <jussuf@linux.ee>
Date		: 2010-12-15 12:48 (5 days old)
Message-ID	: <alpine.DEB.2.00.1012151238570.4797@jbbyvx.ohzcpyho.rr>
References	: http://www.spinics.net/lists/dri-devel/msg06282.html


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24892
Subject		: Wireless warning, system locks up
Submitter	: Heinz Diehl <htd@fancy-poultry.org>
Date		: 2010-12-14 15:16 (6 days old)
Handled-By	: John W. Linville <linville@tuxdriver.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24882
Subject		: PM/Hibernate: Memory corruption patch introduces regression (2.6.36.2)
Submitter	:  <akwatts@ymail.com>
Date		: 2010-12-14 04:00 (6 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24822
Subject		: Embedded DisplayPort is detected wrongly on HP ProBook 5320m
Submitter	: Takashi Iwai <tiwai@suse.de>
Date		: 2010-12-13 11:09 (7 days old)
Handled-By	: Chris Wilson <chris@chris-wilson.co.uk>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24772
Subject		: Crash with btrfs rootfs on dm-crypt [ kernel BUG at fs/btrfs/inode.c:806! ] on linux 2.6.37-rc5
Submitter	: Fabio Comolli <fabio.comolli@gmail.com>
Date		: 2010-12-10 20:30 (10 days old)
Message-ID	: <AANLkTi=j9zsaYNcu=NgGV=HfE-3rNHzVswov8VrgwjQp@mail.gmail.com>
References	: http://marc.info/?l=linux-kernel&m=129201308706568&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24762
Subject		: BUG at perf_ctx_adjust_freq (kernel/perf_event.c:1582)
Submitter	: Chris Wilson <chris@chris-wilson.co.uk>
Date		: 2010-12-10 12:00 (10 days old)
Message-ID	: <c6d829$pqibha@fmsmga001.fm.intel.com>
References	: http://marc.info/?l=linux-kernel&m=129198247531612&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24722
Subject		: Disconnecting my USB mouse hangs the machine and issues kernel warning
Submitter	: Heinz Diehl <htd@fancy-poultry.org>
Date		: 2010-12-12 12:42 (8 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24592
Subject		: 2.6.37-rc5: NULL pointer oops in selinux_socket_unix_stream_connect
Submitter	: Jeremy Fitzhardinge <jeremy@goop.org>
Date		: 2010-12-08 21:09 (12 days old)
Message-ID	: <4CFFF3F3.90100@goop.org>
References	: http://marc.info/?l=linux-kernel&m=129184256629712&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24582
Subject		: Kernel Oops at tty_buffer_request_room when using pppd program (2.6.37-rc4)
Submitter	: baoyb <baoyb@avit.org.cn>
Date		: 2010-12-08 13:55 (12 days old)
Message-ID	: <EF6DDE218DB34702B1FA84D6CD7EA771@baoyb>
References	: http://marc.info/?l=linux-kernel&m=129181763525738&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24372
Subject		: kdump broken on 2.6.37-rc4
Submitter	: Stanislaw Gruszka <sgruszka@redhat.com>
Date		: 2010-12-03 11:16 (17 days old)
Message-ID	: <20101203111623.GA2741@redhat.com>
References	: http://marc.info/?l=linux-kernel&m=129137502323003&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24362
Subject		: perf hw  in kexeced kernel broken in tip
Submitter	: Yinghai Lu <yinghai@kernel.org>
Date		: 2010-12-01 8:00 (19 days old)
First-Bad-Commit: http://git.kernel.org/linus/33c6d6a7ad0ffab9b1b15f8e4107a2af072a05a0
Message-ID	: <4CF60095.1020900@kernel.org>
References	: http://marc.info/?l=linux-kernel&m=129119055922065&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24272
Subject		: iotop reports insane per-process disk read/write statistics
Submitter	: Brian Rogers <brian@xyzw.org>
Date		: 2010-12-03 12:00 (17 days old)
First-Bad-Commit: http://git.kernel.org/linus/85893120699f8bae8caa12a8ee18ab5fceac978e


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24142
Subject		: Crash at boot in udev while accessing bt848 device in 2.6.37-rc*
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2010-11-30 20:50 (20 days old)
References	: https://bugzilla.kernel.org/show_bug.cgi?id=24602


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=23902
Subject		: [BUG] 2.6.37-rc3 massive interactivity regression on ARM
Submitter	: Mikael Pettersson <mikpe@it.uu.se>
Date		: 2010-11-27 15:16 (23 days old)
First-Bad-Commit: http://git.kernel.org/linus/305e6835e05513406fa12820e40e4a8ecb63743c
Message-ID	: <<19697.8378.717761.236202@pilspetsen.it.uu.se>>
References	: http://marc.info/?l=linux-kernel&m=129087098911837&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=23472
Subject		: 2.6.37-rc2 vs. 2.6.36 laptop backlight changes?
Submitter	: Patrick Schaaf <netdev@bof.de>
Date		: 2010-11-17 13:41 (33 days old)
Message-ID	: <1290001262.5727.2.camel@lat1>
References	: http://marc.info/?l=linux-kernel&m=129000127920912&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=23102
Subject		: [bisected] i915 regression in post 2.6.36 kernels
Submitter	: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Date		: 2010-11-10 7:02 (40 days old)
Message-ID	: <201011100802.20332.johannes.hirte@fem.tu-ilmenau.de>
References	: http://marc.info/?l=linux-kernel&m=128937310017057&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22942
Subject		: [2.6.37-rc1, OOM] virtblk: OOM in do_virtblk_request()
Submitter	: Dave Chinner <david@fromorbit.com>
Date		: 2010-11-05 1:30 (45 days old)
Message-ID	: <20101105013003.GE13830@dastard>
References	: http://marc.info/?l=linux-kernel&m=128892062917641&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22912
Subject		: spi_lm70llp module crash on unload (2.6.37-rc1)
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2010-11-05 0:16 (45 days old)
Message-ID	: <20101104171620.00d8c95d.randy.dunlap@oracle.com>
References	: http://marc.info/?l=linux-kernel&m=128891627913647&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22882
Subject		: (2.6.37-rc1) amd64-agp module crashed on second load
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2010-11-05 0:13 (45 days old)
Message-ID	: <20101104171333.fea1f498.randy.dunlap@oracle.com>
References	: http://marc.info/?l=linux-kernel&m=128891605213447&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22812
Subject		: kernel oops on 2.6.37-rc1
Submitter	: Andrew <atswartz@gmail.com>
Date		: 2010-11-12 16:05 (38 days old)
First-Bad-Commit: http://git.kernel.org/linus/a68c439b1966c91f0ef474e2bf275d6792312726


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22642
Subject		: 2.6.37-rc1: Disk takes 10 seconds to resume - MacBook2,1
Submitter	: Tobias <devnull@plzk.org>
Date		: 2010-11-10 19:33 (40 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22562
Subject		: Regression in 2.6.37-rc1 - logs spammed with "unable to enumerate USB port" - bisected to commit 3df7169e
Submitter	: Larry Finger <Larry.Finger@lwfinger.net>
Date		: 2010-11-02 22:32 (48 days old)
First-Bad-Commit: http://git.kernel.org/linus/3df7169e73fc1d71a39cffeacc969f6840cdf52b
Message-ID	: <4CD09166.4060202@lwfinger.net>
References	: http://marc.info/?l=linux-kernel&m=128873713207906&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22542
Subject		: [2.6.37-rc1] drm:i195 errors
Submitter	: Paul Rolland <rol@witbe.net>
Date		: 2010-11-02 14:58 (48 days old)
Message-ID	: <20101102155813.09cb2c6e@tux.DEF.witbe.net>
References	: http://marc.info/?l=linux-kernel&m=128870991628970&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22472
Subject		: vga_switcheroo fails to switch from intel to ati
Submitter	: Radu Andries <admiral0@tuxfamily.org>
Date		: 2010-11-08 16:46 (42 days old)


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24602
Subject		: modprobe bttv crashes system
Submitter	: Sergej Pupykin <pupykin.s@gmail.com>
Date		: 2010-12-09 21:48 (11 days old)
References	: https://bugzilla.kernel.org/show_bug.cgi?id=24142
Patch		: https://patchwork.kernel.org/patch/414671/


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=24462
Subject		: r600: spread spectrum: flickering screen (bisected)
Submitter	: Luca Tettamanti <kronos.it@gmail.com>
Date		: 2010-12-08 16:18 (12 days old)
First-Bad-Commit: http://git.kernel.org/linus/ba032a58d1f320039e7850fb6e8651695c1aa571
Handled-By	: Alex Deucher <alexdeucher@gmail.com>
Patch		: https://bugzilla.kernel.org/attachment.cgi?id=39372


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22672
Subject		: Regression in 2.6.37-rc1 for Intel 945 Graphics Adapter - bisected to commit e9e331a
Submitter	: Larry Finger <Larry.Finger@lwfinger.net>
Date		: 2010-11-11 01:56 (39 days old)
References	: https://bugs.freedesktop.org/show_bug.cgi?id=31803
		  http://marc.info/?l=linux-kernel&m=128944001311444&w=2
		  http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg02235.html
Patch		: https://patchwork.kernel.org/patch/359472/
		  https://patchwork.kernel.org/patch/359502/


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=22662
Subject		: divide error in select_task_rq_fair()
Submitter	: Myron Stowe <myron.stowe@hp.com>
Date		: 2010-11-10 23:58 (40 days old)
Patch		: http://lkml.org/lkml/2010/11/13/176
		  http://lkml.org/lkml/2010/11/13/181


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.36,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=21782

Please let the tracking team know if there are any Bugzilla entries that
should be added to the list in there.

Thanks!

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox