Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] bridge: fix icmpv6 endian bug and other sparse warnings
From: David Miller @ 2012-12-13 18:02 UTC (permalink / raw)
  To: shemminger; +Cc: fengguang.wu, amwang, netdev
In-Reply-To: <20121213085128.1db11ee4@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 13 Dec 2012 08:51:28 -0800

> Fix the warnings reported by sparse on recent bridge multicast
> changes. Mostly just rcu annotation issues but in this case
> sparse found a real bug! The ICMPv6 mld2 query mrc
> values is in network byte order.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: netconsole fun
From: Neil Horman @ 2012-12-13 18:08 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Cong Wang, netdev
In-Reply-To: <1355410171.2605.17.camel@thor>

On Thu, Dec 13, 2012 at 09:49:31AM -0500, Peter Hurley wrote:
> On Thu, 2012-12-13 at 07:36 -0500, Neil Horman wrote:
> > On Wed, Dec 12, 2012 at 03:59:17PM -0500, Peter Hurley wrote:
> > > On Tue, 2012-12-11 at 11:45 -0500, Neil Horman wrote:
> > > > On Tue, Dec 11, 2012 at 10:16:51AM -0500, Peter Hurley wrote:
> > > > > On Tue, 2012-12-11 at 09:30 -0500, Neil Horman wrote:
> > > > > > On Tue, Dec 11, 2012 at 09:19:52AM -0500, Peter Hurley wrote:
> > > > > > > On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> > > > > > > > On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > > > > > > > > Now that netpoll has been disabled for slaved devices, is there a
> > > > > > > > > recommended method of running netconsole on a machine that has a slaved
> > > > > > > > > device?
> > > > > > > > >
> > > > > > > > 
> > > > > > > > Yes, running it on the master device instead.
> > > > > > > 
> > > > > > > Thanks for the suggestion, but:
> > > > > > > 
> > > > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
> > > > > > > ...
> > > > > > > [ 5.289869] netpoll: netconsole: local port 6665
> > > > > > > [ 5.289885] netpoll: netconsole: local IP 192.168.10.99
> > > > > > > [ 5.289892] netpoll: netconsole: interface 'br0'
> > > > > > > [ 5.289898] netpoll: netconsole: remote port 30000
> > > > > > > [ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
> > > > > > > [ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
> > > > > > > [ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
> > > > > > > [ 5.289929] netconsole: cleaning up
> > > > > > > ...
> > > > > > > [ 9.392291] Bridge firewalling registered
> > > > > > > [ 9.396805] device eth1 entered promiscuous mode
> > > > > > > [ 9.418350] eth1:  setting full-duplex.
> > > > > > > [ 9.421268] br0: port 1(eth1) entered forwarding state
> > > > > > > [ 9.423354] br0: port 1(eth1) entered forwarding state
> > > > > > > 
> > > > > > > 
> > > > > > > Is there a way to control or associate network device names prior to
> > > > > > > udev renaming?
> > > > > > > 
> > > > > > That looks like a systemd problem (or more specifically a boot dependency
> > > > > > problem).  You need to modify your netconsole unit/service file to start after
> > > > > > all your networking is up.  NetworkManager provides a dummy service file for
> > > > > > this purpose, called networkmanager-wait-online.service
> > > > > 
> > > > > Ok. So with a single physical network interface that will be bridged,
> > > > > netconsole cannot used for kernel boot messages.
> > > > > 
> > > > > With a machine with multiple nics, is there a way to control device
> > > > > naming so that the interface name to be used by netconsole specified on
> > > > > the boot command line will actually corresponding to the intended
> > > > > device. For example,
> > > > > 
> > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.1.123/eth0,30000@192.168.1.139/xx:xx:xx:xx:xx:xx
> > > > > ....
> > > > > [ 4.092184] 3c59x: Donald Becker and others.
> > > > > [ 4.092204] 0000:07:05.0: 3Com PCI 3c905C Tornado at ffffc9000186cf80.
> > > > > [ 4.094035] tg3.c:v3.125 (September 26, 2012)
> > > > > ....
> > > > > [ 4.125038] tg3 0000:08:00.0 eth1: Tigon3 [partno(BCM95754) rev b002] (PCI Express) MAC address xx:xx:xx:xx:xx:xx
> > > > > [ 4.125055] tg3 0000:08:00.0 eth1: attached PHY is 5787 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> > > > > [ 4.125062] tg3 0000:08:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> > > > > [ 4.125068] tg3 0000:08:00.0 eth1: dma_rwctrl[76180000] dma_mask[64-bit]
> > > > > 
> > > > > This is attaching netconsole to the wrong device because bus
> > > > > enumeration, and therefore load order, is not consistent from boot to
> > > > > boot.
> > > > > 
> > > > No, theres no way to do that.  As you note device ennumeration isn't consistent
> > > > accross boots, thats why udev creates rules to rename devices based on immutable
> > > > (or semi-immutable) data, like mac addresses, or pci bus locations).  Once that
> > > > happens, you'll have consistent names for your interfaces, and that work will be
> > > > guaranteed to be done after networkmanager has finished opening all the
> > > > interfaces that it needs (hence my suggestion to make netconsole service
> > > > dependent on networkmanager service startup completing).
> > > 
> > > Just wondering if you think something like the patch below is
> > > suitable/acceptable for insulating netconsole from inconsistent device
> > > name scenarios without changing the existing semantics. The basic idea
> > > is to allow an ethernet MAC address in the <dev> field of the
> > > netconsole= options, and if a MAC address was specified rather than a
> > > device name, to do the dev lookup from the MAC address instead.
> > > 
> > > This doesn't extend to, but also doesn't interfere with, the dynamic
> > > config of netconsole via configfs.
> > > 
> > > Would you mind reviewing it?
> > > 
> > > Regards,
> > > Peter
> > > 
> > This looks like a pretty good idea to me.  That said, something occured to me
> > when you wrote your summary above.  Have you looked at the netconsole service
> > scripts that most distros provide in their packaging?  I'm almost positive Red
> > Hat/Fedora (and also like Suse and Ubuntu), already implement this functionality
> > from user space.  Basically, instead of people just modprobing netconsole, they
> > create a service script that parses a config file that has contains all the
> > options needed to load the netconsole module, and it has the intellegence to see
> > if you specified a mac address rather than a device.  If you did that it finds
> > the corresponding device mac address and uses that as the device.  I'm sorry, I
> > don't know why I didn't think of that before.  Check that out though, that will
> > likey give you exactly what you need
> 
> Even with a udev rule to load netconsole that runs immediately after
> device renaming (so before scripting), most of the dynamic module
> loading has already happened so netconsole misses it. At least with the
> patch, netconsole will load and attach to the proper interface much
> earlier in the boot so that module-load-time messages will be caught.
> 
I'm not sure what you mean by this.  I get that other drivers are loaded and
that if the netconsole service runs too soon, you might not have a network path
to the netconsole host you want, but as long as the network is up, the service
script (if properly written), shouldn't make you specify a network device.
Instead of having to specify the device to the netconsole module directly, the
config file for the service script lets you specify the destination ip address
of the netconsole server, and figures out the output device based on the routing
tables, whatever that device name happens to be on that boot.  So you don't
really need this patch, because user space can do this work (and more) for you.
You just have to wait for the network to come up, which you need to do anyway.

> There is an unforeseen consequence of the patch: it breaks device
> renaming because the device will already be in use by netconsole. Which
> is the whole problem with userspace device renaming to begin with...
> 
That is bad, but see above, the netconsole service can work around this for you,
allowing you to never have to specify a particular device at all.

> I guess that leaves only the option of building in netconsole and the
> driver that supplies the interface.
> 

> Oh well.
> 
> Regards,
> Peter
> 
> 

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Stephen Hemminger @ 2012-12-13 18:09 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: John Fastabend, Jiri Pirko, netdev, davem, edumazet, bhutchings,
	mirqus, greearb
In-Reply-To: <20121213155423.07c47307@obelix.rh>

On Thu, 13 Dec 2012 15:54:23 -0200
Flavio Leitner <fbl@redhat.com> wrote:

> I am saying this because people are used to and there are scripts out
> there using something like:
> # ethtool <iface> | grep 'Link'
> to react an interface failure.

Then the script is broken. It is asking about hardware state.

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Flavio Leitner @ 2012-12-13 18:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: John Fastabend, Jiri Pirko, netdev, davem, edumazet, bhutchings,
	mirqus, greearb
In-Reply-To: <20121213100933.6d68a8e5@nehalam.linuxnetplumber.net>

On Thu, 13 Dec 2012 10:09:33 -0800
Stephen Hemminger <shemminger@vyatta.com> wrote:

> On Thu, 13 Dec 2012 15:54:23 -0200
> Flavio Leitner <fbl@redhat.com> wrote:
> 
> > I am saying this because people are used to and there are scripts out
> > there using something like:
> > # ethtool <iface> | grep 'Link'
> > to react an interface failure.
> 
> Then the script is broken. It is asking about hardware state.

I was talking about the team master interface, so it makes sense
to check its 'hardware' state. Just think on 'bond0' interface
with no slaves. It should report Link detected: no.

See bond_release(), what happens if bond->slave_cnt == 0, for instance.

-- 
fbl

^ permalink raw reply

* Re: [RFC] net : add tx timestamp to packet mmap.
From: Richard Cochran @ 2012-12-13 18:17 UTC (permalink / raw)
  To: Paul Chavent; +Cc: davem, edumazet, daniel.borkmann, xemul, ebiederm, netdev
In-Reply-To: <50C9FEC4.5050804@onera.fr>

On Thu, Dec 13, 2012 at 05:13:56PM +0100, Paul Chavent wrote:
> >
> >In order for time stamps to appear, somebody has to call
> >skb_tx_timestamp() ...
> Yes. "Somebody" means "the hardware driver" after completing xmit.
> That's true ?

Yes, the MAC driver must call this helper function, but not many
drivers do this yet. You didn't say which MAC driver you are using and
whether it supports Tx SO_TIMESTAMPING or not.

> Yes, it only sets some flags. I thought that those flags was
> required by the skb_tx_timestamp() in order to make the appropriate
> timestamping (hardware, software, etc).
> 
> So in order to have tx timestamp that work, both calls are needed ?

Yes.

> Why sock_tx_timestamp is called in packet_fill_skb and
> packet_sendmsg_spkt and not in tpacket_fill_skb ?
> Why i can retrieve timestamps when i add this call ?

Sorry, I don't know much about packet mmap. Last time I tried it, some
years ago, it wasn't really working.

Richard

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Stephen Hemminger @ 2012-12-13 18:20 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: John Fastabend, Jiri Pirko, netdev, davem, edumazet, bhutchings,
	mirqus, greearb
In-Reply-To: <20121213161733.7bffe897@obelix.rh>

On Thu, 13 Dec 2012 16:17:33 -0200
Flavio Leitner <fbl@redhat.com> wrote:

> On Thu, 13 Dec 2012 10:09:33 -0800
> Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
> > On Thu, 13 Dec 2012 15:54:23 -0200
> > Flavio Leitner <fbl@redhat.com> wrote:
> > 
> > > I am saying this because people are used to and there are scripts out
> > > there using something like:
> > > # ethtool <iface> | grep 'Link'
> > > to react an interface failure.
> > 
> > Then the script is broken. It is asking about hardware state.
> 
> I was talking about the team master interface, so it makes sense
> to check its 'hardware' state. Just think on 'bond0' interface
> with no slaves. It should report Link detected: no.
> 
> See bond_release(), what happens if bond->slave_cnt == 0, for instance.
> 

I was thinking more that ethtool operation for reporting link on
the team device should use the proper check rather than just using netif_carrier_ok(),
the team ethtool operation for get_link should be check IFF_RUNNING flag
in dev->flags which is controlled by operstate transistions.

^ permalink raw reply

* Re: [RFC PATCH net-next 4/4 V4] try to fix performance regression
From: Rick Jones @ 2012-12-13 18:25 UTC (permalink / raw)
  To: Weiping Pan; +Cc: David Laight, davem, brutus, netdev
In-Reply-To: <50C9E0A0.2040409@redhat.com>

On 12/13/2012 06:05 AM, Weiping Pan wrote:
> But if I just run normal tcp loopback for each message size, then the
> performance is stable.
> [root@intel-s3e3432-01 ~]# cat base.sh
> for s in 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
> 65536 131072 262144 524288 1048576
> do
> netperf -i -2,10 -I 95,20 -- -m $s -M $s | tail -n1
> done

The -i option goes max,min iterations:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002di_002c-Global-28

and src/netsh.c will apply some silent clipping to that:

     case 'i':
       /* set the iterations min and max for confidence intervals */
       break_args(optarg,arg1,arg2);
       if (arg1[0]) {
	iteration_max = convert(arg1);
       }
       if (arg2[0] ) {
	iteration_min = convert(arg2);
       }
       /* if the iteration_max is < iteration_min make iteration_max
	 equal iteration_min */
       if (iteration_max < iteration_min) iteration_max = iteration_min;
       /* limit minimum to 3 iterations */
       if (iteration_max < 3) iteration_max = 3;
       if (iteration_min < 3) iteration_min = 3;
       /* limit maximum to 30 iterations */
       if (iteration_max > 30) iteration_max = 30;
       if (iteration_min > 30) iteration_min = 30;
       if (confidence_level == 0) confidence_level = 99;
       if (interval == 0.0) interval = 0.05; /* five percent */
       break;

So, what will happen with your netperf command line above is it will set 
iteration max to 10 iterations and it will always run 10 iterations 
since min will equal max.  If you want it to possibly terminate sooner 
upon hitting the confidence intervals you would want to go with -i 10,3. 
  That will have netperf always run at least three and no more than 10 
iterations.

If I'm not mistaken, the use of the "| tail -n 1" there will cause the 
"classic" confidence intervals not met warning to be tossed (unless I 
suppose it is actually going to stderr?).

If you use the "omni" tests directly rather than via "migration" you 
will no longer get warnings about not hitting the confidence interval, 
but you can have netperf emit the confidence level it actually achieved 
as well as the number of iterations it took to get there.  You would use 
the omni output selection to do that.

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection

These may have been mentioned before...

Judging from that command line you have the potential variability of the 
socket buffer auto-tuning.  Does AF_UNIX do the same sort of auto 
tuning?  It may be desirable to add some test-specific -s and -S options 
to have a fixed socket buffer size.

Since the MTU for loopback is ~16K, the send sizes below that will 
probably have differing interactions with the Nagle algorithm. 
Particularly as I suspect the timing will differ between friends and no 
friends.

I would guess the most "consistent" comparison with AF_UNIX would be 
when Nagle is disabled for the TCP_STREAM tests.  That would be a 
test-specific -D option.

Perhaps a more "stable" way to compare friends, no-friends and unix 
would be to use the _RR tests.  That will be a more direct, less-prone 
to other heuristics measure of path-length differences - both in the 
reported transactions per second and in any CPU utilization/service 
demand if you enable that via -c.  I'm not sure it would be necessary to 
take the request/response size out beyond a couple KB.  Take it out to 
the MB level and you will probably return to the question of auto-tuning 
of the socket buffer sizes.

happy benchmarking,

rick jones

^ permalink raw reply

* [PATCH] Fix comment for packets without data
From: Florent Fourcot @ 2012-12-13 18:27 UTC (permalink / raw)
  To: pablo, yoshfuji; +Cc: netdev, netfilter-devel, Florent Fourcot

The double negation is probably a typo error

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
---
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 00ee17c..d9efe32 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -81,7 +81,7 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
 	}
 	protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off);
 	/*
-	 * (protoff == skb->len) mean that the packet doesn't have no data
+	 * (protoff == skb->len) mean that the packet does not have any data
 	 * except of IPv6 & ext headers. but it's tracked anyway. - YK
 	 */
 	if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
-- 
1.7.10.4


^ permalink raw reply related

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Dan Williams @ 2012-12-13 18:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Flavio Leitner, John Fastabend, Jiri Pirko, netdev, davem,
	edumazet, bhutchings, mirqus, greearb
In-Reply-To: <20121213102051.72ad69aa@nehalam.linuxnetplumber.net>

On Thu, 2012-12-13 at 10:20 -0800, Stephen Hemminger wrote:
> On Thu, 13 Dec 2012 16:17:33 -0200
> Flavio Leitner <fbl@redhat.com> wrote:
> 
> > On Thu, 13 Dec 2012 10:09:33 -0800
> > Stephen Hemminger <shemminger@vyatta.com> wrote:
> > 
> > > On Thu, 13 Dec 2012 15:54:23 -0200
> > > Flavio Leitner <fbl@redhat.com> wrote:
> > > 
> > > > I am saying this because people are used to and there are scripts out
> > > > there using something like:
> > > > # ethtool <iface> | grep 'Link'
> > > > to react an interface failure.
> > > 
> > > Then the script is broken. It is asking about hardware state.
> > 
> > I was talking about the team master interface, so it makes sense
> > to check its 'hardware' state. Just think on 'bond0' interface
> > with no slaves. It should report Link detected: no.
> > 
> > See bond_release(), what happens if bond->slave_cnt == 0, for instance.
> > 
> 
> I was thinking more that ethtool operation for reporting link on
> the team device should use the proper check rather than just using netif_carrier_ok(),
> the team ethtool operation for get_link should be check IFF_RUNNING flag
> in dev->flags which is controlled by operstate transistions.

That's not entirely sufficient, because not everything uses ethtool to
deterine whether the link/carrier is active.  eg, anything listenting to
netlink for IFF_RUNNING won't know anything about this.  I may have
missed previous conversation, but IFF_RUNNING is AFAIK the sole
mechanism to indicate to userspace that the link is operational and
ready for general traffic.  Can the team drivers simply not set
IFF_RUNNING until the interface is actually ready for general traffic?

I'd just caution people to be careful when it comes to carrier/link
states, as a lot of stuff relies on it, and people keep trying to
slightly change the meaning of it.  We need it to be consistent across
net interfaces and driver implementations, not have specific meanings to
certain drivers that don't apply to others.

If teamd needs to manage the carrier state of the teamX interface,
perhaps instead of munging the actual carrier state itself, it can flip
some team-private value that is one component of determining whether
IFF_RUNNING gets set or not?

Dan

^ permalink raw reply

* Re: [RFC][PATCH] bgmac: driver for GBit MAC core on BCMA bus
From: Joe Perches @ 2012-12-13 18:46 UTC (permalink / raw)
  To: Rafał Miłecki; +Cc: netdev, David S. Miller
In-Reply-To: <1355420611-25764-1-git-send-email-zajec5@gmail.com>

On Thu, 2012-12-13 at 18:43 +0100, Rafał Miłecki wrote:
> BCMA is a Broadcom specific bus with devices AKA cores. All recent BCMA
> based SoCs have gigabit ethernet provided by the GBit MAC core. This
> patch adds driver for such a cores registering itself as a netdev. It
> has been tested on a BCM4706 and BCM4718 chipsets.
[]
> It's just a RFC, so be aware of two ugly parts in this version:

Just some trivial notes.  Feel free to ignore.

> diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c

[]

> +#define pr_fmt(fmt)		KBUILD_MODNAME ": " fmt
> +
> +#define bgmac_err(bgmac, fmt, ...) \
> +	pr_err("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
> +#define bgmac_warn(bgmac, fmt, ...) \
> +	pr_warn("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
> +#define bgmac_info(bgmac, fmt, ...) \
> +	pr_info("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
> +#define bgmac_debug(bgmac, fmt, ...) \
> +	pr_debug("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)

Most subsystems use prefix_dbg now instead of prefix_debug.
Why not change these from pr_<level> to dev_<level>?
Maybe these should be in the bgmac.h file.

> +static bool bgmac_wait_value(struct bcma_device *core, u16 reg, u32 mask,
> +			     u32 value, int timeout)
> +{
[]
> +	pr_err("Timeout waiting for reg 0x%X\n", reg);

Maybe add new bmca_dev_(err|warn|info|dbg) macros too?

> +/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphyinit */
> +static void bgmac_phy_init(struct bgmac *bgmac)
> +{
> +	struct bcma_chipinfo *ci = &bgmac->core->bus->chipinfo;
> +	struct bcma_drv_cc *cc = &bgmac->core->bus->drv_cc;
> +	u8 i;
> +
> +	if (ci->id == BCMA_CHIP_ID_BCM5356) {
> +		for (i = 0; i < 5; i++) {
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x008b);
> +			bgmac_phy_write(bgmac, i, 0x15, 0x0100);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
> +			bgmac_phy_write(bgmac, i, 0x12, 0x2aaa);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
> +		}
> +	}
> +	if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg != 10) ||
> +	    (ci->id == BCMA_CHIP_ID_BCM4749 && ci->pkg != 10) ||
> +	    (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg != 9)) {
> +		bcma_chipco_chipctl_maskset(cc, 2, ~0xc0000000, 0);
> +		bcma_chipco_chipctl_maskset(cc, 4, ~0x80000000, 0);
> +		for (i = 0; i < 5; i++) {
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
> +			bgmac_phy_write(bgmac, i, 0x16, 0x5284);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x0010);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
> +			bgmac_phy_write(bgmac, i, 0x16, 0x5296);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x1073);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x9073);
> +			bgmac_phy_write(bgmac, i, 0x16, 0x52b6);
> +			bgmac_phy_write(bgmac, i, 0x17, 0x9273);
> +			bgmac_phy_write(bgmac, i, 0x1f, 0x000b);

That's a lot of magic numbers.

> +static void bgmac_chip_stats_update(struct bgmac *bgmac)
> +{
> +	int i;
> +
> +	if (bgmac->core->id.id != BCMA_CORE_4706_MAC_GBIT) {

Save an indent level by returning a call to
a new bgmac_4706_chip_stats_update instead?

	if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
		return bgmac_407_chip_stats_update(bgmac);

> +		for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
> +			bgmac->mib_tx_regs[i] =
> +				bgmac_read(bgmac,
> +					   BGMAC_TX_GOOD_OCTETS + (i * 4));
> +		for (i = 0; i < BGMAC_NUM_MIB_RX_REGS; i++)
> +			bgmac->mib_rx_regs[i] =
> +				bgmac_read(bgmac,
> +					   BGMAC_RX_GOOD_OCTETS + (i * 4));
> +	}

unindent

> +
> +	/* TODO: what else? how to handle BCM4706? */

and delete?

> +}
> +
> +static void bgmac_clear_mib(struct bgmac *bgmac)
> +{
> +	int i;
> +
> +	if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
> +		return;

Something like what this function does.

[]

> diff --git a/drivers/net/ethernet/broadcom/bgmac.h b/drivers/net/ethernet/broadcom/bgmac.h

[]

> +struct bgmac {
> +	struct bcma_device *core;
> +	struct bcma_device *cmn; /* Reference to CMN core for BCM4706 */
> +	struct net_device *net_dev;
> +	struct napi_struct napi;
> +
> +	u8 phyaddr;
> +	bool has_robosw;
> +
> +	u32 int_mask;
> +	u32 int_status;
> +
> +	bool loopback;
> +
> +	bool autoneg;
> +	bool full_duplex;
> +	int speed;
> +
> +	/* DMA */
> +	struct bgmac_dma_ring tx_ring[BGMAC_MAX_TX_RINGS];
> +	struct bgmac_dma_ring rx_ring[BGMAC_MAX_RX_RINGS];
> +
> +	/* Stats */
> +	bool stats_grabbed;
> +	u32 mib_tx_regs[BGMAC_NUM_MIB_TX_REGS];
> +	u32 mib_rx_regs[BGMAC_NUM_MIB_RX_REGS];
> +};

Maybe some minor reordering of the u8/bools to reduce padding.

^ permalink raw reply

* Re: [PATCH] Fix comment for packets without data
From: Rick Jones @ 2012-12-13 18:51 UTC (permalink / raw)
  To: Florent Fourcot; +Cc: pablo, yoshfuji, netdev, netfilter-devel
In-Reply-To: <1355423248-12448-1-git-send-email-florent.fourcot@enst-bretagne.fr>

On 12/13/2012 10:27 AM, Florent Fourcot wrote:
> The double negation is probably a typo error
>
> Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
> ---
>   net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> index 00ee17c..d9efe32 100644
> --- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> +++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> @@ -81,7 +81,7 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
>   	}
>   	protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off);
>   	/*
> -	 * (protoff == skb->len) mean that the packet doesn't have no data
> +	 * (protoff == skb->len) mean that the packet does not have any data
>   	 * except of IPv6 & ext headers. but it's tracked anyway. - YK
>   	 */
>   	if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
>

Might as well take it to completion and have it read "(protoff == 
skb->len) means the packet does not have any data"  or "(protoff == 
skb->len) means the packet has no data"  Not sure about that next line 
with the except.  Perhaps "(protoff == skb->len) means the packet has no 
data, just IPv6 and extension headers, but it is tracked anyway."

rick jones

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Vlad Yasevich @ 2012-12-13 18:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Or Gerlitz, netdev, davem, mst, john.r.fastabend
In-Reply-To: <20121213094719.3a7a9408@nehalam.linuxnetplumber.net>

On 12/13/2012 12:47 PM, Stephen Hemminger wrote:
> On Wed, 12 Dec 2012 18:36:38 -0500
> Vlad Yasevich <vyasevic@redhat.com> wrote:
>
>> On 12/12/2012 05:54 PM, Or Gerlitz wrote:
>>> On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
>>>> This series of patches provides an ability to add VLANs to the bridge
>>>>
>
> The bigger question is why is this impossible or too awkward with existing
> netfilter (ebtables) functionality? As a practical matter, I like to keep
> the bridging code as simple as possible and move the complexity away from
> the core.

Basic filtering can be achieved, but it is awkward and possible slow. 
We've seen results where long chains (due to a lot of running VMs) cause
a drastic regression (about 20%).   I suppose vlan tagging/stripping 
could also be done by the tables, but that would involve ever more chains.

Really interesting thing is that some products, when using vlans under 
the bridge, explicitly don't include the physical device into the bridge 
so that they don't have to specify a ton of rules or open security holes.

The desire here was to provide basic VLAN switching functionality on the 
bridge.

>
> Also, if the functionality lived in netfilter rules, the developer and user
> would have a more freedom to implement complex rulesets.

I guess some of the issue.  Complex rulesets to get something relatively 
simple causes performance regressions.  Also, in a VM environment, as 
the number of VMs increases, the number of rules also
increases causing management and more performance issues.

When I started working on this a while ago, I've asked in the first RFC 
series if this was worth pursuing.  Here is a quote from you:

> Initial reaction is that this is a useful. You can already do the same thing
> with ebtables, and ebtables allows more flexibility. But ebtables does slow
> things down, and is harder to configure.

Additionally, this is just an option.  If the new filtering is not 
configured, the bridge behaves exactly like it did before.  All the 
extra things one can do with the ebtables are still there as well..

-vlad
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* ethtool 3.7 released
From: Ben Hutchings @ 2012-12-13 18:57 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

ethtool version 3.7 has been released.

Home page: https://ftp.kernel.org/pub/software/network/ethtool/
Download link:
https://ftp.kernel.org/pub/software/network/ethtool/ethtool-3.7.tar.xz

Release notes:

	* Fix: Gracefully handle failure of register pretty-printer (-d option)
	* Feature: Add support for et131x registers (-d option)
	* Feature: Basic optical diagnostics for SFF-8472 modules (-m option)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: David Miller @ 2012-12-13 19:00 UTC (permalink / raw)
  To: shemminger; +Cc: vyasevic, or.gerlitz, netdev, mst, john.r.fastabend
In-Reply-To: <20121213094719.3a7a9408@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 13 Dec 2012 09:47:19 -0800

> On Wed, 12 Dec 2012 18:36:38 -0500
> Vlad Yasevich <vyasevic@redhat.com> wrote:
> 
>> On 12/12/2012 05:54 PM, Or Gerlitz wrote:
>> > On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:  
>> >> This series of patches provides an ability to add VLANs to the bridge
>> >> 
> 
> The bigger question is why is this impossible or too awkward with existing
> netfilter (ebtables) functionality? As a practical matter, I like to keep
> the bridging code as simple as possible and move the complexity away from
> the core.
> 
> Also, if the functionality lived in netfilter rules, the developer and user
> would have a more freedom to implement complex rulesets.

I do not consider it wise to create more, rather then fewer, users
of ebtables.

It is one of the most poorly constructed subsystems in the entire
networking.

Just my $0.02

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Stephen Hemminger @ 2012-12-13 19:04 UTC (permalink / raw)
  To: David Miller; +Cc: vyasevic, or.gerlitz, netdev, mst, john.r.fastabend
In-Reply-To: <20121213.140023.2131448980265576282.davem@davemloft.net>

On Thu, 13 Dec 2012 14:00:23 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Thu, 13 Dec 2012 09:47:19 -0800
> 
> > On Wed, 12 Dec 2012 18:36:38 -0500
> > Vlad Yasevich <vyasevic@redhat.com> wrote:
> > 
> >> On 12/12/2012 05:54 PM, Or Gerlitz wrote:
> >> > On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:  
> >> >> This series of patches provides an ability to add VLANs to the bridge
> >> >> 
> > 
> > The bigger question is why is this impossible or too awkward with existing
> > netfilter (ebtables) functionality? As a practical matter, I like to keep
> > the bridging code as simple as possible and move the complexity away from
> > the core.
> > 
> > Also, if the functionality lived in netfilter rules, the developer and user
> > would have a more freedom to implement complex rulesets.
> 
> I do not consider it wise to create more, rather then fewer, users
> of ebtables.
> 
> It is one of the most poorly constructed subsystems in the entire
> networking.

Maybe a better filtering architecture (at the bridge level) would
be a good project.

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Eric W. Biederman @ 2012-12-13 19:08 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka
In-Reply-To: <50CA135A.7060802@6wind.com>

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 12/12/2012 22:48, Eric W. Biederman a écrit :
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>
>>> It is very wrong to presume that without context you know the reason for
>>> the exsitence of any network namespace and that you should or even that
>>> you can manage it.  Think of running your multi-network namespace
>>> managing application in a container.
>>
>> A good example of a network namespace you don't want to mess with are
>> the network namespaces created by vsftp and chrome for security purposes
>> to remove any possibility of creating new connections to the network.
>>
> Ok, I get the point.
>
> A last question: from an administration point of view, is it intended to
> not be able to monitor which netns are currently used? Like it can be done
> for sockets, files, ...

No.  The difficulty monitoring which network namespaces are being used
is an unintended side effect.

My pending changes to /proc/<pid>/ns/net and friends that allow you to
stat those files and compare if two network are the same network
namespace should make that monitoring much easier.  It isn't perfect as
there currently isn't a way to take a socket and say which network
namespace is this socket in.  But the current solution should tell you
what is happening most of the time.

struct net allocates it's own slab type so /proc/slabinfo on a good day
can tell you how many network namespace structures have been allocated
and are in use.

Eric

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: John Fastabend @ 2012-12-13 19:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Stephen Hemminger, Flavio Leitner, John Fastabend, Jiri Pirko,
	netdev, davem, edumazet, bhutchings, mirqus, greearb
In-Reply-To: <1355423612.21918.34.camel@dcbw.foobar.com>

On 12/13/2012 10:33 AM, Dan Williams wrote:
> On Thu, 2012-12-13 at 10:20 -0800, Stephen Hemminger wrote:
>> On Thu, 13 Dec 2012 16:17:33 -0200
>> Flavio Leitner <fbl@redhat.com> wrote:
>>
>>> On Thu, 13 Dec 2012 10:09:33 -0800
>>> Stephen Hemminger <shemminger@vyatta.com> wrote:
>>>
>>>> On Thu, 13 Dec 2012 15:54:23 -0200
>>>> Flavio Leitner <fbl@redhat.com> wrote:
>>>>
>>>>> I am saying this because people are used to and there are scripts out
>>>>> there using something like:
>>>>> # ethtool <iface> | grep 'Link'
>>>>> to react an interface failure.
>>>>
>>>> Then the script is broken. It is asking about hardware state.
>>>
>>> I was talking about the team master interface, so it makes sense
>>> to check its 'hardware' state. Just think on 'bond0' interface
>>> with no slaves. It should report Link detected: no.
>>>
>>> See bond_release(), what happens if bond->slave_cnt == 0, for instance.
>>>
>>
>> I was thinking more that ethtool operation for reporting link on
>> the team device should use the proper check rather than just using netif_carrier_ok(),
>> the team ethtool operation for get_link should be check IFF_RUNNING flag
>> in dev->flags which is controlled by operstate transistions.
>
> That's not entirely sufficient, because not everything uses ethtool to
> deterine whether the link/carrier is active.  eg, anything listenting to
> netlink for IFF_RUNNING won't know anything about this.  I may have
> missed previous conversation, but IFF_RUNNING is AFAIK the sole
> mechanism to indicate to userspace that the link is operational and
> ready for general traffic.  Can the team drivers simply not set
> IFF_RUNNING until the interface is actually ready for general traffic?
>
> I'd just caution people to be careful when it comes to carrier/link
> states, as a lot of stuff relies on it, and people keep trying to
> slightly change the meaning of it.  We need it to be consistent across
> net interfaces and driver implementations, not have specific meanings to
> certain drivers that don't apply to others.
>
> If teamd needs to manage the carrier state of the teamX interface,
> perhaps instead of munging the actual carrier state itself, it can flip
> some team-private value that is one component of determining whether
> IFF_RUNNING gets set or not?
>
> Dan

Assuming the operstate documentation is correct IFF_RUNNING should only
be set if the operstate is up or unknown. Quoting the text again,

  32	ifinfomsg::if_flags & IFF_RUNNING:
  33	 Interface is in RFC2863 operational state UP or UNKNOWN. This is for
  34	 backward compatibility, routing daemons, dhcp clients can use this
  35	 flag to determine whether they should use the interface.

So my guess is having a team ethtool get_link handler AND managing the
operstate from teamd would satisfy the requirement. But maybe I missed
something.

.John

^ permalink raw reply

* Re: [PATCH] xfrm: do not check x->km.state
From: David Miller @ 2012-12-13 19:19 UTC (permalink / raw)
  To: steffen.klassert; +Cc: roy.qing.li, netdev
In-Reply-To: <20121213101948.GG18940@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 13 Dec 2012 11:19:48 +0100

> On Thu, Dec 13, 2012 at 05:06:00PM +0800, roy.qing.li@gmail.com wrote:
>> From: Li RongQing <roy.qing.li@gmail.com>
>> 
>> do not check x->km.state, it will be checked by succedent
>> xfrm_state_check_expire()
>> 
>> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
 ...
> This would remove the only place where the LINUX_MIB_XFRMINSTATEINVALID
> statistics counter is incremented. I think it would be better to ensure
> a valid state before we call xfrm_state_check_expire(). This would make
> the statistics more accurate and we can remove the x->km.state check
> from xfrm_state_check_expire().

Agreed.

^ permalink raw reply

* Re: [RFC][PATCH] bgmac: driver for GBit MAC core on BCMA bus
From: Rafał Miłecki @ 2012-12-13 19:24 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev, David S. Miller
In-Reply-To: <1355424373.13796.25.camel@joe-AO722>

2012/12/13 Joe Perches <joe@perches.com>:
> On Thu, 2012-12-13 at 18:43 +0100, Rafał Miłecki wrote:
>> BCMA is a Broadcom specific bus with devices AKA cores. All recent BCMA
>> based SoCs have gigabit ethernet provided by the GBit MAC core. This
>> patch adds driver for such a cores registering itself as a netdev. It
>> has been tested on a BCM4706 and BCM4718 chipsets.
> []
>> It's just a RFC, so be aware of two ugly parts in this version:
>
> Just some trivial notes.  Feel free to ignore.

Thanks :) I can always try to discuss some points, but I appreciate
every review!


>> diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
>
> []
>
>> +#define pr_fmt(fmt)          KBUILD_MODNAME ": " fmt
>> +
>> +#define bgmac_err(bgmac, fmt, ...) \
>> +     pr_err("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
>> +#define bgmac_warn(bgmac, fmt, ...) \
>> +     pr_warn("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
>> +#define bgmac_info(bgmac, fmt, ...) \
>> +     pr_info("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
>> +#define bgmac_debug(bgmac, fmt, ...) \
>> +     pr_debug("u%d: " fmt, (bgmac)->core->core_unit, ##__VA_ARGS__)
>
> Most subsystems use prefix_dbg now instead of prefix_debug.

Thanks, I didn't know about that. Seeing "pr_debug" suggested I also
should use *_debug.


> Why not change these from pr_<level> to dev_<level>?

One more thing I didn't know/think about (dev_*).


> Maybe these should be in the bgmac.h file.

OK.


>> +static bool bgmac_wait_value(struct bcma_device *core, u16 reg, u32 mask,
>> +                          u32 value, int timeout)
>> +{
> []
>> +     pr_err("Timeout waiting for reg 0x%X\n", reg);
>
> Maybe add new bmca_dev_(err|warn|info|dbg) macros too?

It's the only place where we are working on a specific coreunit, but I
didn't bother to print that. Was too lazy to add special code for this
single case, ups ;)


>> +/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphyinit */
>> +static void bgmac_phy_init(struct bgmac *bgmac)
>> +{
>> +     struct bcma_chipinfo *ci = &bgmac->core->bus->chipinfo;
>> +     struct bcma_drv_cc *cc = &bgmac->core->bus->drv_cc;
>> +     u8 i;
>> +
>> +     if (ci->id == BCMA_CHIP_ID_BCM5356) {
>> +             for (i = 0; i < 5; i++) {
>> +                     bgmac_phy_write(bgmac, i, 0x1f, 0x008b);
>> +                     bgmac_phy_write(bgmac, i, 0x15, 0x0100);
>> +                     bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
>> +                     bgmac_phy_write(bgmac, i, 0x12, 0x2aaa);
>> +                     bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
>> +             }
>> +     }
>> +     if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg != 10) ||
>> +         (ci->id == BCMA_CHIP_ID_BCM4749 && ci->pkg != 10) ||
>> +         (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg != 9)) {
>> +             bcma_chipco_chipctl_maskset(cc, 2, ~0xc0000000, 0);
>> +             bcma_chipco_chipctl_maskset(cc, 4, ~0x80000000, 0);
>> +             for (i = 0; i < 5; i++) {
>> +                     bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
>> +                     bgmac_phy_write(bgmac, i, 0x16, 0x5284);
>> +                     bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
>> +                     bgmac_phy_write(bgmac, i, 0x17, 0x0010);
>> +                     bgmac_phy_write(bgmac, i, 0x1f, 0x000f);
>> +                     bgmac_phy_write(bgmac, i, 0x16, 0x5296);
>> +                     bgmac_phy_write(bgmac, i, 0x17, 0x1073);
>> +                     bgmac_phy_write(bgmac, i, 0x17, 0x9073);
>> +                     bgmac_phy_write(bgmac, i, 0x16, 0x52b6);
>> +                     bgmac_phy_write(bgmac, i, 0x17, 0x9273);
>> +                     bgmac_phy_write(bgmac, i, 0x1f, 0x000b);
>
> That's a lot of magic numbers.

I agree, unfortunately there's nothing I can do :( I don't know names
for that registers not to mention the values (bits). We don't have
datasheets for that devices and I don't think we will ever have.


>> +static void bgmac_chip_stats_update(struct bgmac *bgmac)
>> +{
>> +     int i;
>> +
>> +     if (bgmac->core->id.id != BCMA_CORE_4706_MAC_GBIT) {
>
> Save an indent level by returning a call to
> a new bgmac_4706_chip_stats_update instead?
>
>         if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
>                 return bgmac_407_chip_stats_update(bgmac);
>
>> +             for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
>> +                     bgmac->mib_tx_regs[i] =
>> +                             bgmac_read(bgmac,
>> +                                        BGMAC_TX_GOOD_OCTETS + (i * 4));
>> +             for (i = 0; i < BGMAC_NUM_MIB_RX_REGS; i++)
>> +                     bgmac->mib_rx_regs[i] =
>> +                             bgmac_read(bgmac,
>> +                                        BGMAC_RX_GOOD_OCTETS + (i * 4));
>> +     }
>
> unindent
>
>> +
>> +     /* TODO: what else? how to handle BCM4706? */
>
> and delete?

I got an info from Hauke that we can still do/read sth on BCM4706.
It's just not documented yet, that's why I left "TODO". I can note the
specs are lacking to make it clearer.


>> +}
>> +
>> +static void bgmac_clear_mib(struct bgmac *bgmac)
>> +{
>> +     int i;
>> +
>> +     if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
>> +             return;
>
> Something like what this function does.


>> +     /* Stats */
>> +     bool stats_grabbed;
>> +     u32 mib_tx_regs[BGMAC_NUM_MIB_TX_REGS];
>> +     u32 mib_rx_regs[BGMAC_NUM_MIB_RX_REGS];
>> +};
>
> Maybe some minor reordering of the u8/bools to reduce padding.

Really? Compiler won't do that for me? I hope that compiler optimizes
structs (until using __packed) :(

-- 
Rafał

^ permalink raw reply

* Re: netconsole fun
From: Peter Hurley @ 2012-12-13 19:27 UTC (permalink / raw)
  To: Neil Horman; +Cc: Cong Wang, netdev
In-Reply-To: <20121213180815.GA14796@hmsreliant.think-freely.org>

On Thu, 2012-12-13 at 13:08 -0500, Neil Horman wrote:
> On Thu, Dec 13, 2012 at 09:49:31AM -0500, Peter Hurley wrote:
> > On Thu, 2012-12-13 at 07:36 -0500, Neil Horman wrote:
> > > On Wed, Dec 12, 2012 at 03:59:17PM -0500, Peter Hurley wrote:
> > > > On Tue, 2012-12-11 at 11:45 -0500, Neil Horman wrote:
> > > > > On Tue, Dec 11, 2012 at 10:16:51AM -0500, Peter Hurley wrote:
> > > > > > On Tue, 2012-12-11 at 09:30 -0500, Neil Horman wrote:
> > > > > > > On Tue, Dec 11, 2012 at 09:19:52AM -0500, Peter Hurley wrote:
> > > > > > > > On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> > > > > > > > > On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > > > > > > > > > Now that netpoll has been disabled for slaved devices, is there a
> > > > > > > > > > recommended method of running netconsole on a machine that has a slaved
> > > > > > > > > > device?
> > > > > > > > > >
> > > > > > > > > 
> > > > > > > > > Yes, running it on the master device instead.
> > > > > > > > 
> > > > > > > > Thanks for the suggestion, but:
> > > > > > > > 
> > > > > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
> > > > > > > > ...
> > > > > > > > [ 5.289869] netpoll: netconsole: local port 6665
> > > > > > > > [ 5.289885] netpoll: netconsole: local IP 192.168.10.99
> > > > > > > > [ 5.289892] netpoll: netconsole: interface 'br0'
> > > > > > > > [ 5.289898] netpoll: netconsole: remote port 30000
> > > > > > > > [ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
> > > > > > > > [ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
> > > > > > > > [ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
> > > > > > > > [ 5.289929] netconsole: cleaning up
> > > > > > > > ...
> > > > > > > > [ 9.392291] Bridge firewalling registered
> > > > > > > > [ 9.396805] device eth1 entered promiscuous mode
> > > > > > > > [ 9.418350] eth1:  setting full-duplex.
> > > > > > > > [ 9.421268] br0: port 1(eth1) entered forwarding state
> > > > > > > > [ 9.423354] br0: port 1(eth1) entered forwarding state
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Is there a way to control or associate network device names prior to
> > > > > > > > udev renaming?
> > > > > > > > 
> > > > > > > That looks like a systemd problem (or more specifically a boot dependency
> > > > > > > problem).  You need to modify your netconsole unit/service file to start after
> > > > > > > all your networking is up.  NetworkManager provides a dummy service file for
> > > > > > > this purpose, called networkmanager-wait-online.service
> > > > > > 
> > > > > > Ok. So with a single physical network interface that will be bridged,
> > > > > > netconsole cannot used for kernel boot messages.
> > > > > > 
> > > > > > With a machine with multiple nics, is there a way to control device
> > > > > > naming so that the interface name to be used by netconsole specified on
> > > > > > the boot command line will actually corresponding to the intended
> > > > > > device. For example,
> > > > > > 
> > > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.1.123/eth0,30000@192.168.1.139/xx:xx:xx:xx:xx:xx
> > > > > > ....
> > > > > > [ 4.092184] 3c59x: Donald Becker and others.
> > > > > > [ 4.092204] 0000:07:05.0: 3Com PCI 3c905C Tornado at ffffc9000186cf80.
> > > > > > [ 4.094035] tg3.c:v3.125 (September 26, 2012)
> > > > > > ....
> > > > > > [ 4.125038] tg3 0000:08:00.0 eth1: Tigon3 [partno(BCM95754) rev b002] (PCI Express) MAC address xx:xx:xx:xx:xx:xx
> > > > > > [ 4.125055] tg3 0000:08:00.0 eth1: attached PHY is 5787 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> > > > > > [ 4.125062] tg3 0000:08:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> > > > > > [ 4.125068] tg3 0000:08:00.0 eth1: dma_rwctrl[76180000] dma_mask[64-bit]
> > > > > > 
> > > > > > This is attaching netconsole to the wrong device because bus
> > > > > > enumeration, and therefore load order, is not consistent from boot to
> > > > > > boot.
> > > > > > 
> > > > > No, theres no way to do that.  As you note device ennumeration isn't consistent
> > > > > accross boots, thats why udev creates rules to rename devices based on immutable
> > > > > (or semi-immutable) data, like mac addresses, or pci bus locations).  Once that
> > > > > happens, you'll have consistent names for your interfaces, and that work will be
> > > > > guaranteed to be done after networkmanager has finished opening all the
> > > > > interfaces that it needs (hence my suggestion to make netconsole service
> > > > > dependent on networkmanager service startup completing).
> > > > 
> > > > Just wondering if you think something like the patch below is
> > > > suitable/acceptable for insulating netconsole from inconsistent device
> > > > name scenarios without changing the existing semantics. The basic idea
> > > > is to allow an ethernet MAC address in the <dev> field of the
> > > > netconsole= options, and if a MAC address was specified rather than a
> > > > device name, to do the dev lookup from the MAC address instead.
> > > > 
> > > > This doesn't extend to, but also doesn't interfere with, the dynamic
> > > > config of netconsole via configfs.
> > > > 
> > > > Would you mind reviewing it?
> > > > 
> > > > Regards,
> > > > Peter
> > > > 
> > > This looks like a pretty good idea to me.  That said, something occured to me
> > > when you wrote your summary above.  Have you looked at the netconsole service
> > > scripts that most distros provide in their packaging?  I'm almost positive Red
> > > Hat/Fedora (and also like Suse and Ubuntu), already implement this functionality
> > > from user space.  Basically, instead of people just modprobing netconsole, they
> > > create a service script that parses a config file that has contains all the
> > > options needed to load the netconsole module, and it has the intellegence to see
> > > if you specified a mac address rather than a device.  If you did that it finds
> > > the corresponding device mac address and uses that as the device.  I'm sorry, I
> > > don't know why I didn't think of that before.  Check that out though, that will
> > > likey give you exactly what you need
> > 
> > Even with a udev rule to load netconsole that runs immediately after
> > device renaming (so before scripting), most of the dynamic module
> > loading has already happened so netconsole misses it. At least with the
> > patch, netconsole will load and attach to the proper interface much
> > earlier in the boot so that module-load-time messages will be caught.
> > 
> I'm not sure what you mean by this.

This is the beginning of my netconsole log if I use userspace scripts to
start it.

[   19.125314] ip_tables: (C) 2000-2006 Netfilter Core Team
[   20.060925] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   21.829331] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   25.728370] at-spi-registry[1862]: segfault at 18 ip 00007f6dd1dd45f1 sp 00007fff49bcd760 error 4 in libgconf-2.so.4.1.5[7f6dd1dbd000+2d000]
[   26.778848] EXT4-fs (dm-3): re-mounted. Opts: errors=remount-ro,commit=0
[   30.643469] Bluetooth: RFCOMM TTY layer initialized
[   30.643509] Bluetooth: RFCOMM socket layer initialized
[   30.643512] Bluetooth: RFCOMM ver 1.11
[   30.784550] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   30.784567] Bluetooth: BNEP filters: protocol multicast
[   30.784584] Bluetooth: BNEP socket layer initialized
[   34.010813] init: plymouth-stop pre-start process (2205) terminated with status 1

This is the beginning of my netconsole log if I am able to specify
netconsole= options on the boot command line. Netconsole starts logging
much earlier because it is much loaded earlier.

[    8.764336] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
[    9.409379] firewire_core 0000:07:06.0: created device fw1: GUID 0800460301c2d69e, S400
[    9.567395] init: ureadahead main process (500) terminated with status 5
[   10.400338] Adding 10996456k swap on /dev/mapper/isw_cbdbfhdjad_Raid0p5.  Priority:-1 extents:1 across:10996456k 
[   10.496974] udevd[541]: starting version 173
[   10.725906] EXT4-fs (dm-4): re-mounted. Opts: errors=remount-ro
[   11.288352] lp: driver loaded but no devices found
[   12.240058] parport_pc 00:05: reported by Plug and Play ACPI
[   12.240145] parport0: PC-style at 0x378 (0x778), irq 7, using FIFO [PCSPP,TRISTATE,COMPAT,ECP]
[   12.336161] lp0: using parport0 (interrupt-driven).
[   12.342867] microcode: CPU0 sig=0x10676, pf=0x40, revision=0x60f
[   12.436657] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[   12.442245] ppdev: user-space parallel port driver
[   12.451592] net firewire0: IPv4 over IEEE 1394 on card 0000:07:06.0

Does that make more sense now?

Thanks again,
Peter

> > There is an unforeseen consequence of the patch: it breaks device
> > renaming because the device will already be in use by netconsole. Which
> > is the whole problem with userspace device renaming to begin with...
> > 
> That is bad, but see above, the netconsole service can work around this for you,
> allowing you to never have to specify a particular device at all.

Just to be clear here,

^ permalink raw reply

* Re: Network namespace bugs in L2TP
From: Eric W. Biederman @ 2012-12-13 19:31 UTC (permalink / raw)
  To: Tom Parkin; +Cc: netdev
In-Reply-To: <20121213165601.GA2423@raven>

Tom Parkin <tparkin@katalix.com> writes:

> On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
>> Tom Parkin <tparkin@katalix.com> writes:
>> >  1. Why do we need to change the namespace of the socket created in
>> >     l2tp_tunnel_sock_create?  So far as I can tell, sock_create
>> >     defaults to the namespace of the calling process.  Is the issue
>> >     here that this code may run from a work queue or similar?
>> 
>> Something similar.  At the very least l2tp_tunnel_create which calls
>> l2tp_tunnel_sock_create gets called from netlink.  The network namespace
>> of a socket is not necessarily the same as the network namespace of the
>> process that uses that socket.
>> 
>> So since current is not necessarily the right network namespace we need
>> push the desired network namespace of the socket down into
>> l2tp_tunnel_sock_create and use that when creating the socket.
>
> Ah, I see.  I hadn't appreciated that a process might swap between
> namespaces.
>
> I think that raises a question in the case of the L2TP tunnel sockets,
> though.  Currently l2tp_tunnel_sock_create uses the namespace of the
> current process for the socket.  The alternative is to pass in the
> desired namespace from l2tp_tunnel_create -- and this makes sense, I
> think.
>
> However, when l2tp_tunnel_create is called from the netlink code, the
> namespace passed is that of the netlink socket.  At the risk of sounding
> silly, what's the benefit of using the netlink socket namespace over the
> process namespace in this case?

Using the netlink socket namespace ensure that if the netlink socket is
passed between processes the semantics of sending messages down the
netlink socket don't change.

There is another thread on netdev discussing another variant of this
right now.  For some cases it is just a waste of resources to have one
copy of a daemon per network namespace.  In which case a controlling
daemon will open one netlink socket per network namespace and send
commands down the appropriate socket for the network namespace the
daemon wishes to control.

>> >  2. You mentioned the need to keep track of sockets allocated within a
>> >     namespace in order to be able to clean them up when the namespace
>> >     is deleted.  Should we be keeping a list of sockets we create and
>> >     then destroying them in the namespace pernet_ops exit function?
>> 
>> I think the issue that I was referring to and certainly the issue I am
>> thinking about is the issue where normal sockets hold a reference to a
>> network namespace and keep the network namespace alive.  Today l2tp uses
>> sock_create when creating a socket, and as such I think it pins it
>> current network namespace.  So I believe we can effectively have a
>> reference counting loop with l2tp sockets pinning the network namespace
>> and the network namespace keeping the l2tp device alive which keeps the
>> l2tp socket alive.
>
> OK, so presumably the way this would usually work is that a process
> creates sockets, and when the process exits those sockets go away.
> When all the processes in the namespace have exited, the namespace
> can close because there are no sockets holding it open.  Is that
> right?
>
> If that's correct, then I suppose the issue with the L2TP tunnel socket
> for an unmanaged tunnel is that it isn't owned by a process, per-se.
> So there's no obvious way to get rid of it, apart from sending a
> netlink message to tell the kernel to tear it down.
>
> But that doesn't seem too unreasonable.  A user would have to take
> explicit action to create an L2TP tunnel socket, and it might seem
> reasonable for that socket to keep the namespace alive until the user
> explicitly tears it down again.

Sending a netlink message to tear down the socket is not unreasonable.

Having a reference counting loop such that it is possible to close all
other sockets and all other references to a network namespace and not
have the network namespace go away because the L2TP tunnel socket holds
a reference to the unreachable and unuusable network namespace is
unreasonable.

We handle this with arp and icmp control sockets by not creating a
reference count.  And having a pernet cleanup routing clean up those
sockets.  Assuming I am right about the reference counting loop being
possible this is something to look at.

>> I don't remeber the specifics of l2tp as it creates some sockets, and
>> has other sockets passed in, and as such has rules that are not at all
>> normal.
>
> Ack.  Sockets are created in the kernel code for "unmanaged" tunnels,
> which don't run the control protocol over the top -- they're just for
> data encapsulation/de-encapsulation.  "Managed" tunnels have a
> userspace process looking after all the L2TP configuration and
> control/keepalive protocol, and in this case the daemon handles the
> creation of the tunnel socket.

Eric

^ permalink raw reply

* Re: [PATCH v2] ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
From: Joe Perches @ 2012-12-13 19:37 UTC (permalink / raw)
  To: David Miller; +Cc: djduanjiong, steffen.klassert, netdev
In-Reply-To: <20121213.125911.566802923957164958.davem@davemloft.net>

On Thu, 2012-12-13 at 12:59 -0500, David Miller wrote:
> > +     hdr = (u8 *)ndopts.nd_opts_rh;
> 
> (u8 *)[SPACE]ndopts...

Really?

checkpatch bleats a --strict message:

CHECK: No space is necessary after a cast
#5: FILE: t.c:5:
+
+	bar = *(int *) foo;

It's 2/1 no space v space in drivers/net and net.


$ git grep -E "\*\) " drivers/net net | wc -l
4293
$ git grep -E "\*\)[^ ;]" drivers/net net | wc -l
8261

There are many false positives in the first case,
not too many for the second.

I was a bit surprised about how many of the
first case existed.

^ permalink raw reply

* Re: [PATCH v2] ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
From: David Miller @ 2012-12-13 19:50 UTC (permalink / raw)
  To: joe; +Cc: djduanjiong, steffen.klassert, netdev
In-Reply-To: <1355427432.13796.43.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Thu, 13 Dec 2012 11:37:12 -0800

> On Thu, 2012-12-13 at 12:59 -0500, David Miller wrote:
>> > +     hdr = (u8 *)ndopts.nd_opts_rh;
>> 
>> (u8 *)[SPACE]ndopts...
> 
> Really?
> 
> checkpatch bleats a --strict message:
> 
> CHECK: No space is necessary after a cast
> #5: FILE: t.c:5:
> +
> +	bar = *(int *) foo;

Ho hum, I'm actually ambivalent about this so...

^ permalink raw reply

* Re: [RFT PATCH] 8139cp: properly support change of MTU values [v2]
From: John Greene @ 2012-12-13 19:56 UTC (permalink / raw)
  To: Francois Romieu; +Cc: David Miller, netdev, dwmw2
In-Reply-To: <20121203204608.GA9815@electric-eye.fr.zoreil.com>

On 12/03/2012 03:46 PM, Francois Romieu wrote:
> David Miller <davem@davemloft.net> :
> [...]
>> I've applied this to net-next, if it triggers any problems we have
>> some time to work it out before 3.8 is released.
>
> I have bounced the messages to David Woodhouse since he authored the
> last 8139cp changes in net-next and owns the hardware to notice
> regressions.
>
> My message of two days ago was wrong : it is not possible for the irq
> handler to process a Tx event after the rings have been freed. Things
> still look racy wrt netpoll though.
>
> Any objection against the patch below ?
>
> (I did not gotoize the dev == NULL test: it is really unlikely and
> should go away).
>
> diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
> index 0da3f5e..57cd542 100644
> --- a/drivers/net/ethernet/realtek/8139cp.c
> +++ b/drivers/net/ethernet/realtek/8139cp.c
> @@ -577,28 +577,30 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
>   {
>   	struct net_device *dev = dev_instance;
>   	struct cp_private *cp;
> +	int handled = 0;
>   	u16 status;
>
>   	if (unlikely(dev == NULL))
>   		return IRQ_NONE;
>   	cp = netdev_priv(dev);
>
> +	spin_lock(&cp->lock);
> +
>   	status = cpr16(IntrStatus);
>   	if (!status || (status == 0xFFFF))
> -		return IRQ_NONE;
> +		goto out_unlock;
> +
> +	handled = 1;
>
>   	netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x\n",
>   		  status, cpr8(Cmd), cpr16(CpCmd));
>
>   	cpw16(IntrStatus, status & ~cp_rx_intr_mask);
>
> -	spin_lock(&cp->lock);
> -
>   	/* close possible race's with dev_close */
>   	if (unlikely(!netif_running(dev))) {
>   		cpw16(IntrMask, 0);
> -		spin_unlock(&cp->lock);
> -		return IRQ_HANDLED;
> +		goto out_unlock;
>   	}
>
>   	if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr))
> @@ -612,8 +614,6 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
>   	if (status & LinkChg)
>   		mii_check_media(&cp->mii_if, netif_msg_link(cp), false);
>
> -	spin_unlock(&cp->lock);
> -
>   	if (status & PciErr) {
>   		u16 pci_status;
>
> @@ -625,7 +625,10 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
>   		/* TODO: reset hardware */
>   	}
>
> -	return IRQ_HANDLED;
> +out_unlock:
> +	spin_unlock(&cp->lock);
> +
> +	return IRQ_RETVAL(handled);
>   }
>
>   #ifdef CONFIG_NET_POLL_CONTROLLER
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Francois,

Have incorporated this and test it on virtual hardware with on top of
  cb64edb6b89491edfdbae52ba7db9a8b8391d339 (my original work now 
upstream).  I plan to submit your work herein upstream as well, with 
attribution to you, if that's ok?


-- 
John Greene
jogreene@redhat.com

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2012-12-13 20:08 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel

A pile of fixes in response to yesterday's big merge.  The SCTP
HMAC thing hasn't been addressed yet, I'll take care of that
myself if Neil and Vlad don't show signs of life by tomorrow.

1) Use after free of SKB in tuntap code.  Fix by Eric Dumazet, reported
   by Dave Jones.

2) NFC LLCP code emits annoying kernel log message, triggerable
   by the user.  From Dave Jones.

3) Fix several endianness bugs noticed by sparse in the bridging code,
   from Stephen Hemminger.

4) Ipv6 NDISC code doesn't take padding into account properly, fix from
   YOSHIFUJI Hideaki.

5) Add missing docs to ethtool_flow_ext struct, from Yan Burman.

Please pull, thanks.

The following changes since commit 6be35c700f742e911ecedd07fcc43d4439922334:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next (2012-12-12 18:07:07 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to eca2a43bb0d2c6ebd528be6acb30a88435abe307:

  bridge: fix icmpv6 endian bug and other sparse warnings (2012-12-13 12:58:11 -0500)

----------------------------------------------------------------
Dave Jones (1):
      nfc: remove noisy message from llcp_sock_sendmsg

Eric Dumazet (1):
      tuntap: dont use skb after netif_rx_ni(skb)

YOSHIFUJI Hideaki / 吉藤英明 (1):
      ndisc: Fix padding error in link-layer address option.

Yan Burman (1):
      net: ethool: Document struct ethtool_flow_ext

stephen hemminger (1):
      bridge: fix icmpv6 endian bug and other sparse warnings

 drivers/net/tun.c            |  7 ++++---
 include/uapi/linux/ethtool.h | 16 ++++++++++++++--
 net/bridge/br_multicast.c    | 10 ++++++----
 net/ipv6/ndisc.c             |  2 +-
 net/nfc/llcp/sock.c          |  4 ----
 5 files changed, 25 insertions(+), 14 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox