* Re: Unable to flush ICMP redirect routes in kernel 3.0+
From: Flavio Leitner @ 2011-11-17 0:33 UTC (permalink / raw)
To: Ivan Zahariev; +Cc: netdev
In-Reply-To: <4EC439F2.3080809@icdsoft.com>
On Thu, 17 Nov 2011 00:32:18 +0200
Ivan Zahariev <famzah@icdsoft.com> wrote:
> On 11/15/2011 11:09 PM, Eric Dumazet wrote:
> > Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
> >> Hello,
> >>
> >> We have changed nothing in our network infrastructure but only
> >> upgraded from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem
> >> we are experiencing:
> >>
> >> ICMP redirected routes are cached forever, and they can be cleared
> >> only by a reboot.
> >>
> ### (bug #1) even though we flushed the route cache, the <redirected>
> route resurrects from somewhere; even without making any TCP requests
> ### this time what "ip" returns is consistent with the real
> (incorrect) routing behavior of machine5
> root@machine5:~# ip route flush cache
> root@machine5:~# ip route list cache match 8.8.4.4
> root@machine5:~# ip route get 8.8.4.4
> 8.8.4.4 via 192.168.0.120 dev eth0 src 192.168.0.244
> cache <redirected> ipid 0x303a
>
> ### only a reboot clears the cached <redirected> routes
IIRC, the cache flush doesn't affect the inetpeer where the
redirected gateway is now stored, so even after flushing the
route cache, the inetpeer will restore the old info later.
fbl
^ permalink raw reply
* Re: [PATCH] Add ethtool to mii advertisment conversion helpers
From: Ben Hutchings @ 2011-11-17 0:34 UTC (permalink / raw)
To: Matt Carlson; +Cc: davem, netdev, Michael Chan
In-Reply-To: <1321394453-21076-1-git-send-email-mcarlson@broadcom.com>
On Tue, 2011-11-15 at 14:00 -0800, Matt Carlson wrote:
> Translating between ethtool advertisement settings and MII
> advertisements are common operations for ethernet drivers. This patch
> adds a set of helper functions that implements the conversion. The
> patch then modifies a couple of the drivers to use the new functions.
I like this, but:
[....]
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
[...]
> @@ -4903,23 +4880,19 @@ static int tg3_setup_fiber_mii_phy(struct tg3 *tp, int force_reset)
> (tp->phy_flags & TG3_PHYFLG_PARALLEL_DETECT)) {
> /* do nothing, just check for link up at the end */
> } else if (tp->link_config.autoneg == AUTONEG_ENABLE) {
> - u32 adv, new_adv;
> + u32 adv, newadv;
>
> err |= tg3_readphy(tp, MII_ADVERTISE, &adv);
> - new_adv = adv & ~(ADVERTISE_1000XFULL | ADVERTISE_1000XHALF |
> - ADVERTISE_1000XPAUSE |
> - ADVERTISE_1000XPSE_ASYM |
> - ADVERTISE_SLCT);
> -
> - new_adv |= tg3_advert_flowctrl_1000X(tp->link_config.flowctrl);
> + newadv = adv & ~(ADVERTISE_1000XFULL | ADVERTISE_1000XHALF |
> + ADVERTISE_1000XPAUSE |
> + ADVERTISE_1000XPSE_ASYM |
> + ADVERTISE_SLCT);
Not sure why you're renaming the variable here.
> - if (tp->link_config.advertising & ADVERTISED_1000baseT_Half)
> - new_adv |= ADVERTISE_1000XHALF;
> - if (tp->link_config.advertising & ADVERTISED_1000baseT_Full)
> - new_adv |= ADVERTISE_1000XFULL;
> + newadv |= tg3_advert_flowctrl_1000X(tp->link_config.flowctrl);
> + newadv |= ethtool_adv_to_mii_1000X(tp->link_config.advertising);
Doesn't this generate flow control advertising flags twice? Can the
link_config.flowctrl and link_config.advertising fields get out of
synch?
[...]
> --- a/include/linux/mii.h
> +++ b/include/linux/mii.h
> @@ -240,6 +240,171 @@ static inline unsigned int mii_duplex (unsigned int duplex_lock,
> }
>
> /**
> + * ethtool_adv_to_mii_100bt
> + * @ethadv: the ethtool advertisement settings
> + *
> + * A small helper function that translates ethtool advertisement
> + * settings to phy autonegotiation advertisements for the
> + * MII_ADVERTISE register.
> + */
> +static inline u32 ethtool_adv_to_mii_100bt(u32 ethadv)
[...]
> +static inline u32 mii_adv_to_ethtool_100bt(u32 adv)
[...]
> +static inline u32 ethtool_adv_to_mii_1000T(u32 ethadv)
[...]
> +static inline u32 mii_adv_to_ethtool_1000T(u32 adv)
[...]
> +#define mii_lpa_to_ethtool_100bt(lpa) mii_adv_to_ethtool_100bt(lpa)
Shouldn't this additionally translate LPA_LPACK into ADVERTISED_Autoneg?
[...]
> +static inline u32 mii_lpa_to_ethtool_1000T(u32 lpa)
[...]
> +static inline u32 ethtool_adv_to_mii_1000X(u32 ethadv)
[...]
> +static inline u32 mii_adv_to_ethtool_1000X(u32 adv)
[...]
I'm not convinced about the naming convention for these. Would it not
make more sense to name them consistently by register name and signal
type:
ethtool_adv_to_mii_adv_t
mii_adv_to_ethtool_adv_t
ethtool_adv_to_mii_ctrl1000_t
mii_ctrl1000_to_ethtool_adv_t
mii_lpa_to_ethtool_lpa_t
mii_stat1000_to_ethtool_lpa_t
ethtool_adv_to_mii_adv_x
mii_adv_to_ethtool_adv_x
Shouldn't there be mii_lpa_to_ethtool_1000X (or
mii_lpa_to_ethtool_lpa_x)?
Finally, do these need to be inline?
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH 2/2] net: verify GSO flag bits against netdev features
From: Ben Hutchings @ 2011-11-17 0:35 UTC (permalink / raw)
To: Michał Mirosław; +Cc: netdev, David S. Miller
In-Reply-To: <20111117001312.GB9596@rere.qmqm.pl>
On Thu, 2011-11-17 at 01:13 +0100, Michał Mirosław wrote:
> On Thu, Nov 17, 2011 at 12:09:56AM +0000, Ben Hutchings wrote:
> > On Thu, 2011-11-17 at 01:05 +0100, Michał Mirosław wrote:
> > > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > ---
> > > include/linux/netdevice.h | 9 +++++++++
> > > 1 files changed, 9 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index b35ffd7..31da3bb 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -2492,6 +2492,15 @@ netdev_features_t netif_skb_features(struct sk_buff *skb);
> > > static inline int net_gso_ok(netdev_features_t features, int gso_type)
> > > {
> > > netdev_features_t feature = gso_type << NETIF_F_GSO_SHIFT;
> > > +
> > > + /* check flags correspondence */
> > > + BUILD_BUG_ON(SKB_GSO_TCPV4 != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
> > > + BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_UFO >> NETIF_F_GSO_SHIFT));
> > > + BUILD_BUG_ON(SKB_GSO_DODGY != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
> > > + BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
> > > + BUILD_BUG_ON(SKB_GSO_TCPV6 != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
> > > + BUILD_BUG_ON(SKB_GSO_FCOE != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
> > > +
> > > return (features & feature) == feature;
> > > }
> > This is fine but should still be done at the same time as changing the
> > definitions.
>
> Agreed. But as Dave was quicker, we need to fix this in separate patch.
Sorry, I somehow failed to spot that.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: 3.1.0 rtl8169_open oops.
From: Francois Romieu @ 2011-11-17 0:34 UTC (permalink / raw)
To: Dave Jones; +Cc: netdev, kernel-team, hayeswang
In-Reply-To: <20111115223701.GA31118@redhat.com>
Dave Jones <davej@redhat.com> :
[...]
> (gdb) list *(rtl8169_open)+0x34e
> 0x4c0f is in rtl8169_open (drivers/net/r8169.c:1881).
> 1876 bool rc = false;
> 1877
> 1878 if (fw->size < FW_OPCODE_SIZE)
> 1879 goto out;
> 1880
> 1881 if (!fw_info->magic) {
> 1882 size_t i, size, start;
> 1883 u8 checksum = 0;
> 1884
> 1885 if (fw->size < sizeof(*fw_info))
Give me 24h. I hope it is not as trivial as it seems :o(
Btw the firmware included in F16 is not up-to-date (see
http://marc.info/?l=linux-kernel&m=131295492507686&w=2). The driver should
not mind though.
--
Ueimor
^ permalink raw reply
* Re: 3.1.0 rtl8169_open oops.
From: Dave Jones @ 2011-11-17 0:53 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, kernel-team, hayeswang, dwmw2
In-Reply-To: <20111117003436.GA3097@electric-eye.fr.zoreil.com>
On Thu, Nov 17, 2011 at 01:34:36AM +0100, Francois Romieu wrote:
> Dave Jones <davej@redhat.com> :
> [...]
> > (gdb) list *(rtl8169_open)+0x34e
> > 0x4c0f is in rtl8169_open (drivers/net/r8169.c:1881).
> > 1876 bool rc = false;
> > 1877
> > 1878 if (fw->size < FW_OPCODE_SIZE)
> > 1879 goto out;
> > 1880
> > 1881 if (!fw_info->magic) {
> > 1882 size_t i, size, start;
> > 1883 u8 checksum = 0;
> > 1884
> > 1885 if (fw->size < sizeof(*fw_info))
>
> Give me 24h. I hope it is not as trivial as it seems :o(
>
> Btw the firmware included in F16 is not up-to-date (see
> http://marc.info/?l=linux-kernel&m=131295492507686&w=2). The driver should
> not mind though.
That reminds me. Since the kernel.org compromise, where does
linux-firmware live now ?
That's probably why the version in Fedora hasn't been updated since July at a guess.
Dave
^ permalink raw reply
* RE: [PATCH net-next 3/5] bna: Convert MAC_ADDRLEN uses to ETH_ALEN
From: Rasesh Mody @ 2011-11-17 0:56 UTC (permalink / raw)
To: Joe Perches; +Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <8a6f9ad67a3d580565cafb7a44cec0165831c35d.1321472143.git.joe@perches.com>
>From: Joe Perches [mailto:joe@perches.com]
>Sent: Wednesday, November 16, 2011 11:38 AM
>
>Reduce the number of #defines, use the normal #define from if_ether.h
Patch looks fine.
Thanks,
Rasesh
Acked-by: Rasesh Mody <rmody@brocade.com>
^ permalink raw reply
* Re: under-performing bonded interfaces
From: Ben Hutchings @ 2011-11-17 0:57 UTC (permalink / raw)
To: Simon Chen; +Cc: Ben Greear, netdev
In-Reply-To: <CANj2EbfrtbysEbj+gMa+fOhWo0EWjBXVyGkw8WyiPAV9nPduuQ@mail.gmail.com>
On Wed, 2011-11-16 at 19:05 -0500, Simon Chen wrote:
> If used independently, I can get around 9.8Gbps.
[...]
> [ 11.572861] ixgbe 0000:03:00.0: (PCI Express:5.0Gb/s:Width x4) e8:9a:8f:23:42:1a
[...]
You need a PCIe 2.0 x8 slot to run 2 10 Gb ports at full speed. An x4
slot is only good for about 12-13 Gb aggregate throughput (dependent on
packet sizes and other details).
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH] Add ethtool to mii advertisment conversion helpers
From: Matt Carlson @ 2011-11-17 1:16 UTC (permalink / raw)
To: Ben Hutchings
Cc: Matthew Carlson, davem@davemloft.net, netdev@vger.kernel.org,
Michael Chan
In-Reply-To: <1321490078.2709.86.camel@bwh-desktop>
On Wed, Nov 16, 2011 at 04:34:37PM -0800, Ben Hutchings wrote:
> On Tue, 2011-11-15 at 14:00 -0800, Matt Carlson wrote:
> > Translating between ethtool advertisement settings and MII
> > advertisements are common operations for ethernet drivers. This patch
> > adds a set of helper functions that implements the conversion. The
> > patch then modifies a couple of the drivers to use the new functions.
>
> I like this, but:
>
> [....]
> > --- a/drivers/net/ethernet/broadcom/tg3.c
> > +++ b/drivers/net/ethernet/broadcom/tg3.c
> [...]
> > @@ -4903,23 +4880,19 @@ static int tg3_setup_fiber_mii_phy(struct tg3 *tp, int force_reset)
> > (tp->phy_flags & TG3_PHYFLG_PARALLEL_DETECT)) {
> > /* do nothing, just check for link up at the end */
> > } else if (tp->link_config.autoneg == AUTONEG_ENABLE) {
> > - u32 adv, new_adv;
> > + u32 adv, newadv;
> >
> > err |= tg3_readphy(tp, MII_ADVERTISE, &adv);
> > - new_adv = adv & ~(ADVERTISE_1000XFULL | ADVERTISE_1000XHALF |
> > - ADVERTISE_1000XPAUSE |
> > - ADVERTISE_1000XPSE_ASYM |
> > - ADVERTISE_SLCT);
> > -
> > - new_adv |= tg3_advert_flowctrl_1000X(tp->link_config.flowctrl);
> > + newadv = adv & ~(ADVERTISE_1000XFULL | ADVERTISE_1000XHALF |
> > + ADVERTISE_1000XPAUSE |
> > + ADVERTISE_1000XPSE_ASYM |
> > + ADVERTISE_SLCT);
>
> Not sure why you're renaming the variable here.
80 column character violations. By eliminating the underscore, I can
*just* make it fit.
> > - if (tp->link_config.advertising & ADVERTISED_1000baseT_Half)
> > - new_adv |= ADVERTISE_1000XHALF;
> > - if (tp->link_config.advertising & ADVERTISED_1000baseT_Full)
> > - new_adv |= ADVERTISE_1000XFULL;
> > + newadv |= tg3_advert_flowctrl_1000X(tp->link_config.flowctrl);
> > + newadv |= ethtool_adv_to_mii_1000X(tp->link_config.advertising);
>
> Doesn't this generate flow control advertising flags twice? Can the
> link_config.flowctrl and link_config.advertising fields get out of
> synch?
The tg3 driver doesn't store flow control settings through the advertising
field. I thought about changing the driver in that direction and
eliminating the tp->link_config.flowctrl field. That would be a
separate patch though.
> [...]
> > --- a/include/linux/mii.h
> > +++ b/include/linux/mii.h
> > @@ -240,6 +240,171 @@ static inline unsigned int mii_duplex (unsigned int duplex_lock,
> > }
> >
> > /**
> > + * ethtool_adv_to_mii_100bt
> > + * @ethadv: the ethtool advertisement settings
> > + *
> > + * A small helper function that translates ethtool advertisement
> > + * settings to phy autonegotiation advertisements for the
> > + * MII_ADVERTISE register.
> > + */
> > +static inline u32 ethtool_adv_to_mii_100bt(u32 ethadv)
> [...]
> > +static inline u32 mii_adv_to_ethtool_100bt(u32 adv)
> [...]
> > +static inline u32 ethtool_adv_to_mii_1000T(u32 ethadv)
> [...]
> > +static inline u32 mii_adv_to_ethtool_1000T(u32 adv)
> [...]
> > +#define mii_lpa_to_ethtool_100bt(lpa) mii_adv_to_ethtool_100bt(lpa)
>
> Shouldn't this additionally translate LPA_LPACK into ADVERTISED_Autoneg?
You mean, like this?
static inline u32 mii_lpa_to_ethtool_100bt(u32 lpa)
{
u32 result = 0;
if (lpa & LPA_LPACK)
result |= ADVERTISED_Autoneg;
return result | mii_adv_to_ethtool_100bt(lpa);
}
Yes, that looks like a better implementation.
> [...]
> > +static inline u32 mii_lpa_to_ethtool_1000T(u32 lpa)
> [...]
> > +static inline u32 ethtool_adv_to_mii_1000X(u32 ethadv)
> [...]
> > +static inline u32 mii_adv_to_ethtool_1000X(u32 adv)
> [...]
>
> I'm not convinced about the naming convention for these. Would it not
> make more sense to name them consistently by register name and signal
> type:
>
> ethtool_adv_to_mii_adv_t
> mii_adv_to_ethtool_adv_t
> ethtool_adv_to_mii_ctrl1000_t
> mii_ctrl1000_to_ethtool_adv_t
> mii_lpa_to_ethtool_lpa_t
> mii_stat1000_to_ethtool_lpa_t
> ethtool_adv_to_mii_adv_x
> mii_adv_to_ethtool_adv_x
I don't have a strong preference either way. I'll post the change along
with the above modification.
> Shouldn't there be mii_lpa_to_ethtool_1000X (or
> mii_lpa_to_ethtool_lpa_x)?
Yes. You're right. Should it just be a preprocessor definition that
points to mii_adv_to_ethtool_1000X()?
> Finally, do these need to be inline?
I don't have a strong preference here either. Phy code tends to be
slower, so there isn't really a strong performance argument. The
implementations don't seem to be so large to argue against it though.
Would you prefer they not be inlined?
^ permalink raw reply
* Re: [PATCH] bonding: Don't allow mode change via sysfs with slaves present
From: Flavio Leitner @ 2011-11-17 1:16 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Nicolas de Pesloüan, Veaceslav Falico, netdev, Jay Vosburgh,
debian-kernel
In-Reply-To: <20111116220200.GF25132@gospo.rdu.redhat.com>
On Wed, 16 Nov 2011 17:02:00 -0500
Andy Gospodarek <andy@greyhouse.net> wrote:
> On Wed, Nov 16, 2011 at 01:02:21PM +0100, Nicolas de Pesloüan wrote:
> > Le 15/11/2011 21:47, Andy Gospodarek a écrit :
> >> Nicolas,
> >>
> >> I took a look at the ifenslave package for debian more closely and
> >> it actually looks like devices are enslaved last, after mode is
> >> set. Can you please take a look at this package and confirm what
> >> I'm seeing in the 'pre-up' script.
> >>
> >> It appears to me that setup_master sets the mode and
> >> enslave_slaves is called after and enslaves the devices:
> >>
> >> # Option slaves deprecated, replaced by bond-slaves, but still
> >> supported # for backward compatibility.
> >> IF_BOND_SLAVES=${IF_BOND_SLAVES:-$IF_SLAVES}
> >>
> >> if [ "$IF_BOND_MASTER" ] ; then
> >> BOND_MASTER="$IF_BOND_MASTER"
> >> BOND_SLAVES="$IFACE"
> >> else
> >> if [ "$IF_BOND_SLAVES" ] ; then
> >> BOND_MASTER="$IFACE"
> >> BOND_SLAVES="$IF_BOND_SLAVES"
> >> fi
> >> fi
> >>
> >> # Exit if nothing to do...
> >> [ -z "$BOND_MASTER$BOND_SLAVES" ]&& exit
> >>
> >> add_master
> >> early_setup_master
> >> setup_master
> >> enslave_slaves
> >> exit 0
> >
> > Andy,
> >
> > I'm really surprise by this extract. In the most up to date version
> > of the ifenslave-2.6 package (1.1.0-19), the order is:
> >
> > add_master
> > early_setup_master
> > enslave_slaves
> > setup_master
> >
> > early_setup_master was created to be able to do things that
> > absolutely need to be done before enslavement. (See the comment
> > just before this function). The idea was to do every possible setup
> > in setup_master, after enslavement, except those that need to be
> > done in early_setup_master. So having enslave_slaves after
> > setup_master instead of before is definitely a mistake. Some of the
> > operations in setup_master must be done after enslavement, in
> > particular selecting the primary slave.
> >
> > In version 1.1.0-18 (change log below), I checked all the possible
> > order constraints of the sysfs interface and totally reworked most
> > of the setup code, putting everything in the right order to achieve
> > consistent results.
> >
> > ifenslave-2.6 (1.1.0-18) experimental; urgency=low
> >
> > * Apply patch from Nicolas de Pesloüan:
> >
> > - Major change: Check and fix the order in which the
> > configuration is written into /sys files, to ensure reliable
> > results, according to the tests done in the kernel (in
> > drivers/net/bonding/bond_sysfs.c).
> > - Ensure that master is properly brought down when
> > changing a parameter that require it to be down.
> > - Ensure the master is brought up at the end of the
> > setup, in the case where ifup won't bring it up because it is
> > currently configuring a slave.
> > - Add support for some previously unsupported /sys
> > files: ad_select, num_grat_arp, num_unsol_na, primary_reselect and
> > queue_id.
> > - Enhance the documentation (README.Debian), to
> > describe all the currently supported bond-* options.
> > - Many other changes in the documentation.
> > - Reverse the order of the arguments to most sysfs_*
> > internal functions, for better readability.
> >
> > It was a hard work, because there really exist many such
> > constraints. I fail to find enough time to insert the result of
> > this work into Documentation/networking/bonding.txt, but still plan
> > to do so, even if the result is documented in the script you looked
> > at.
> >
> > Of course, it is possible to change the scripts in ifenslave-2.6
> > package to arrange for one more constraint. For as far as I
> > understand, this would cause the Debian kernel that introduce the
> > change we discuss about and all the future Debian kernels to be
> > flagged with 'Breaks: ifenslave-2.6 (<< 1.1.0-20)'. I'm not really
> > comfortable with this and the Debian kernel team need to be
> > involved. I copied them.
> >
> > All that being said, I really advocate for less constraints on the
> > sysfs setup. This is not strictly related to sysfs setup. If we
> > eventually add a NETLINK interface for all the things we can setup
> > using sysfs, we will face the exact same problem.
>
>
> I was looking at ifenslave 1.1.0-20. If you look at Debian bug
> #641250 you will see a very similar report to what prompted Veaceslav
> to come up with this patch and post it here.
>
> ifenslave-2.6 (1.1.0-20) unstable; urgency=low
>
> * Use dashes consistently for bonding options in README.Debian.
> Closes: #639244
> * Enslave slaves only after fully setting up the master. Closes:
> #641250
>
[...]
> > I perfectly understand, as Veaceslav noted in another email that
> > there are a lot of mode-specific initialization stuff that's
> > present only in bond_enslave(), but I think this is what needs to
> > be fixed... Those initialization stuff should be moved out of
> > bond_enslave() and called at appropriate sime, from bond_enslave()
> > and from other locations, in particular when changing mode.
>
> I think Veaceslav is working on this, but there is significant
> re-organization that is needed to make it work properly and make sure
> it is tested. I could be wrong about how long it will take him, but
> to test it properly it will take some time.
Indeed.
> Since this problem seems like a pretty major problem and now Debian,
> Fedora, RHEL, and Ubuntu all seem to have proper initialization
> scripts to handle it, I stand behind my original ACK.
I agree. I think allowing to change the mode after enslavement can
cause unpredictable issues and the follow-up patch will need some
careful work and testing, so you have my ACK here as well.
fbl
^ permalink raw reply
* [PATCH FIX net v1] net-e1000e: Fix default interrupt throttle rate not set in NIC HW
From: David Decotigny @ 2011-11-17 1:18 UTC (permalink / raw)
To: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny,
Don
Cc: Paul Gortmaker, David Decotigny, Ying Cai
In-Reply-To: <cover.1321492611.git.david.decotigny@google.com>
From: Ying Cai <ycai@google.com>
This change ensures that the itr/itr_setting adjustment logic is used,
even for the default/compiled-in value.
Context:
When we changed the default InterruptThrottleRate value from default
(3 = dynamic mode) to 8000 for example, only adapter->itr_setting
(which controls interrupt coalescing mode) was set to 8000, but
adapter->itr (which controls the value set in NIC register) was not
updated accordingly. So from ethtool, it seemed the interrupt
throttling is enabled at 8000 intr/s, but the NIC actually was
running in dynamic mode which has lower CPU efficiency especially
when throughput is not high.
Signed-off-by: David Decotigny <david.decotigny@google.com>
---
drivers/net/ethernet/intel/e1000e/param.c | 98 +++++++++++++++-------------
1 files changed, 52 insertions(+), 46 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/param.c b/drivers/net/ethernet/intel/e1000e/param.c
index 20e93b0..41937e5 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -106,7 +106,7 @@ E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
/*
* Interrupt Throttle Rate (interrupts/sec)
*
- * Valid Range: 100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
+ * Valid Range: 100-100000 or one of: 0=off, 1=dynamic, 3=dynamic conservative
*/
E1000_PARAM(InterruptThrottleRate, "Interrupt Throttling Rate");
#define DEFAULT_ITR 3
@@ -335,53 +335,59 @@ void __devinit e1000e_check_options(struct e1000_adapter *adapter)
if (num_InterruptThrottleRate > bd) {
adapter->itr = InterruptThrottleRate[bd];
- switch (adapter->itr) {
- case 0:
- e_info("%s turned off\n", opt.name);
- break;
- case 1:
- e_info("%s set to dynamic mode\n", opt.name);
- adapter->itr_setting = adapter->itr;
- adapter->itr = 20000;
- break;
- case 3:
- e_info("%s set to dynamic conservative mode\n",
- opt.name);
- adapter->itr_setting = adapter->itr;
- adapter->itr = 20000;
- break;
- case 4:
- e_info("%s set to simplified (2000-8000 ints) "
- "mode\n", opt.name);
- adapter->itr_setting = 4;
- break;
- default:
- /*
- * Save the setting, because the dynamic bits
- * change itr.
- */
- if (e1000_validate_option(&adapter->itr, &opt,
- adapter) &&
- (adapter->itr == 3)) {
- /*
- * In case of invalid user value,
- * default to conservative mode.
- */
- adapter->itr_setting = adapter->itr;
- adapter->itr = 20000;
- } else {
- /*
- * Clear the lower two bits because
- * they are used as control.
- */
- adapter->itr_setting =
- adapter->itr & ~3;
- }
- break;
- }
+
+ /* Make sure a message is printed for
+ * non-special values. And in case of an
+ * invalid option, display warning, use
+ * default and go through itr/itr_setting
+ * adjustment logic below */
+ if ((adapter->itr < 0 || adapter->itr > 4)
+ && e1000_validate_option(&adapter->itr, &opt,
+ adapter))
+ adapter->itr = opt.def;
} else {
- adapter->itr_setting = opt.def;
+ /* if no option specified, use default value
+ and go through the logic below to adjust
+ itr/itr_setting */
+ adapter->itr = opt.def;
+
+ /* Make sure a message is printed for
+ * non-special default values */
+ if (adapter->itr < 0 || adapter->itr > 4)
+ e_info("%s set to default %d\n",
+ opt.name, adapter->itr);
+ }
+
+
+
+ adapter->itr_setting = adapter->itr;
+ switch (adapter->itr) {
+ case 0:
+ e_info("%s turned off\n", opt.name);
+ break;
+ case 1:
+ e_info("%s set to dynamic mode\n", opt.name);
+ adapter->itr = 20000;
+ break;
+ case 3:
+ e_info("%s set to dynamic conservative mode\n",
+ opt.name);
adapter->itr = 20000;
+ break;
+ case 4:
+ e_info("%s set to simplified (2000-8000 ints) "
+ "mode\n", opt.name);
+ break;
+ default:
+ /*
+ * Save the setting, because the dynamic bits
+ * change itr.
+ *
+ * Clear the lower two bits because
+ * they are used as control.
+ */
+ adapter->itr_setting &= ~3;
+ break;
}
}
{ /* Interrupt Mode */
--
1.7.3.1
^ permalink raw reply related
* Re: [PATCH V4 00/14] add clk_prepare/clk_unprepare to imx drivers
From: Richard Zhao @ 2011-11-17 1:22 UTC (permalink / raw)
To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-i2c-u79uwXL29TY76Z2rM5mHXA,
linux-mmc-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-serial-u79uwXL29TY76Z2rM5mHXA
Cc: amit.kucheria-Z7WLFzj8eWMS+FvcfC7Uqw,
linux-lFZ/pmaqli7XmaaqVzeoHQ, kernel-bIcnvbaLZ9MEGnE8C9+IrQ,
eric.miao-QSEj5FYQhm4dnm+yROfE0A,
ben-linux-elnMNo+KYs3YtjvyW6yDsg, cjb-2X9k7bc8m7Mdnm+yROfE0A,
alan-VuQAYsv1563Yd54FQh9/CA
In-Reply-To: <1321339689-25256-1-git-send-email-richard.zhao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Hi Sascha,
Would you merge this patch series?
Thanks
Richard
^ permalink raw reply
* Re: [PATCH 1/5] tcp: make is_dupack a parameter to tcp_fastretrans_alert()
From: Ilpo Järvinen @ 2011-11-17 1:23 UTC (permalink / raw)
To: Neal Cardwell
Cc: David Miller, Netdev, Ilpo Järvinen, Nandita Dukkipati,
Yuchung Cheng, Tom Herbert
In-Reply-To: <1321469885-10885-1-git-send-email-ncardwell@google.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 430 bytes --]
On Wed, 16 Nov 2011, Neal Cardwell wrote:
> Allow callers to decide whether an ACK is a duplicate ACK. This is a
> prerequisite to allowing fastretrans_alert to be called from new
> contexts, such as the no_queue and old_ack code paths, from which we
> have extra info that tells us whether an ACK is a dupack.
>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
--
i.
^ permalink raw reply
* Re: [PATCH] Add ethtool to mii advertisment conversion helpers
From: Ben Hutchings @ 2011-11-17 1:29 UTC (permalink / raw)
To: Matt Carlson; +Cc: davem@davemloft.net, netdev@vger.kernel.org, Michael Chan
In-Reply-To: <20111117011604.GA8683@mcarlson.broadcom.com>
On Wed, 2011-11-16 at 17:16 -0800, Matt Carlson wrote:
> On Wed, Nov 16, 2011 at 04:34:37PM -0800, Ben Hutchings wrote:
[...]
> > > +#define mii_lpa_to_ethtool_100bt(lpa) mii_adv_to_ethtool_100bt(lpa)
> >
> > Shouldn't this additionally translate LPA_LPACK into ADVERTISED_Autoneg?
>
> You mean, like this?
>
> static inline u32 mii_lpa_to_ethtool_100bt(u32 lpa)
> {
> u32 result = 0;
>
> if (lpa & LPA_LPACK)
> result |= ADVERTISED_Autoneg;
>
> return result | mii_adv_to_ethtool_100bt(lpa);
> }
>
> Yes, that looks like a better implementation.
Think so.
And I think the mii_adv_to_ethtool_* functions should add
ADVERTISED_Autoneg unconditionally. But I'm not entirely sure that's
right.
> > [...]
> > > +static inline u32 mii_lpa_to_ethtool_1000T(u32 lpa)
> > [...]
> > > +static inline u32 ethtool_adv_to_mii_1000X(u32 ethadv)
> > [...]
> > > +static inline u32 mii_adv_to_ethtool_1000X(u32 adv)
> > [...]
> >
> > I'm not convinced about the naming convention for these. Would it not
> > make more sense to name them consistently by register name and signal
> > type:
> >
> > ethtool_adv_to_mii_adv_t
> > mii_adv_to_ethtool_adv_t
> > ethtool_adv_to_mii_ctrl1000_t
> > mii_ctrl1000_to_ethtool_adv_t
> > mii_lpa_to_ethtool_lpa_t
> > mii_stat1000_to_ethtool_lpa_t
> > ethtool_adv_to_mii_adv_x
> > mii_adv_to_ethtool_adv_x
>
> I don't have a strong preference either way. I'll post the change along
> with the above modification.
>
> > Shouldn't there be mii_lpa_to_ethtool_1000X (or
> > mii_lpa_to_ethtool_lpa_x)?
>
> Yes. You're right. Should it just be a preprocessor definition that
> points to mii_adv_to_ethtool_1000X()?
I think that would need to handle LPA_LPACK as well.
> > Finally, do these need to be inline?
>
> I don't have a strong preference here either. Phy code tends to be
> slower, so there isn't really a strong performance argument. The
> implementations don't seem to be so large to argue against it though.
> Would you prefer they not be inlined?
I suppose one of us should measure what difference it makes to code
size.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH] Add ethtool to mii advertisment conversion helpers
From: Michael Chan @ 2011-11-17 1:21 UTC (permalink / raw)
To: Matt Carlson; +Cc: Ben Hutchings, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <20111117011604.GA8683@mcarlson.broadcom.com>
On Wed, 2011-11-16 at 17:16 -0800, Matt Carlson wrote:
> > Finally, do these need to be inline?
>
> I don't have a strong preference here either. Phy code tends to be
> slower, so there isn't really a strong performance argument. The
> implementations don't seem to be so large to argue against it though.
> Would you prefer they not be inlined?
>
Since we are defining these in .h file, they need to be inline, right?
Otherwise multiple source files including the same .h file will have
conflict.
^ permalink raw reply
* Re: [PATCH 2/5] tcp: use DSACKs that arrive when packets_out is 0
From: Ilpo Järvinen @ 2011-11-17 1:34 UTC (permalink / raw)
To: Neal Cardwell
Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng,
Tom Herbert
In-Reply-To: <1321469885-10885-2-git-send-email-ncardwell@google.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1345 bytes --]
On Wed, 16 Nov 2011, Neal Cardwell wrote:
> The bug: Senders ignored DSACKs after recovery when there were no
> outstanding packets (a common scenario for HTTP servers).
>
> The change: when there are no outstanding packets (the "no_queue" goto
> label), call tcp_fastretrans_alert() in order to use DSACKs to undo
> congestion window reductions.
>
> Other patches in this series will provide other changes that are
> necessary to fully fix this problem.
>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> ---
> net/ipv4/tcp_input.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index f772aaa..b49e418 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3788,6 +3788,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
> return 1;
>
> no_queue:
> + /* If data was DSACKed, see if we can undo a cwnd reduction. */
> + if (flag & FLAG_DSACKING_ACK)
> + tcp_fastretrans_alert(sk, pkts_acked, newly_acked_sacked,
> + is_dupack, flag);
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
...however, there could be other undo related gotchas now added by this
patch due to complexity of tcp_fastretrans_alert. It would probably be
safer to just add the dsack undo related code here?
--
i.
^ permalink raw reply
* Re: [PATCH] r6040: fix check against MCRO_HASHEN bit in r6040_multicast_list
From: David Miller @ 2011-11-17 1:35 UTC (permalink / raw)
To: florian; +Cc: netdev
In-Reply-To: <1321459208-28463-1-git-send-email-florian@openwrt.org>
From: Florian Fainelli <florian@openwrt.org>
Date: Wed, 16 Nov 2011 17:00:08 +0100
> We are checking whether the MCR0_HASHEN bit is set using a logical and
> instead of bitwise and, fix that.
>
> Signed-off-by: Florian Fainelli <florian@openwrt.org>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next 0/5] Use ETH_ALEN
From: David Miller @ 2011-11-17 1:35 UTC (permalink / raw)
To: joe; +Cc: netdev, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1321472142.git.joe@perches.com>
From: Joe Perches <joe@perches.com>
Date: Wed, 16 Nov 2011 11:38:01 -0800
> Remove other #defines and uses in favor of ETH_ALEN
>
> Joe Perches (5):
> ethernet: Convert MAC_ADDR_LEN uses to ETH_ALEN
> ethernet: Convert ETHER_ADDR_LEN uses to ETH_ALEN
> bna: Convert MAC_ADDRLEN uses to ETH_ALEN
> amd8111e: Convert ETH_ADDR_LEN uses to ETH_ALEN
> ucc_geth: Convert ENET_NUM_OCTETS_PER_ADDRESS uses to ETH_ALEN
All applied, thanks Joe.
^ permalink raw reply
* Re: [patch net-next 1/3 v2] team: Do not hold rcu_read_lock when running netlink cmds
From: David Miller @ 2011-11-17 1:36 UTC (permalink / raw)
To: jpirko
Cc: eric.dumazet, netdev, bhutchings, shemminger, andy, fbl, jzupka,
ivecera
In-Reply-To: <20111116215554.GB9631@minipsycho>
From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 16 Nov 2011 22:55:54 +0100
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Applied.
^ permalink raw reply
* Re: [patch net-next 2/3] team: convert overall spinlock to mutex
From: David Miller @ 2011-11-17 1:36 UTC (permalink / raw)
To: jpirko
Cc: netdev, eric.dumazet, bhutchings, shemminger, andy, fbl, jzupka,
ivecera
In-Reply-To: <1321477749-1877-3-git-send-email-jpirko@redhat.com>
From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 16 Nov 2011 22:09:08 +0100
> No need to have spinlock for this purpose. So convert this to mutex and
> avoid current schedule while atomic problems in netlink code.
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Applied.
^ permalink raw reply
* Re: [patch net-next 3/3] team: replicate options on register
From: David Miller @ 2011-11-17 1:36 UTC (permalink / raw)
To: jpirko
Cc: netdev, eric.dumazet, bhutchings, shemminger, andy, fbl, jzupka,
ivecera
In-Reply-To: <1321477749-1877-4-git-send-email-jpirko@redhat.com>
From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 16 Nov 2011 22:09:09 +0100
> Since multiple team instances are putting defined options into their
> option list, during register each option must be cloned before added
> into list. This resolves uncool memory corruptions when using multiple
> teams.
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH 0/6] sky2 patches for net-next
From: David Miller @ 2011-11-17 1:36 UTC (permalink / raw)
To: shemminger; +Cc: netdev
In-Reply-To: <20111116234254.319625694@vyatta.com>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 16 Nov 2011 15:42:54 -0800
> This includes fix for recent regression and other bug fixes (in order
> of decreasing importance). Only the first one is critical to get
> into 3.2.
All applied to 'net', thanks.
^ permalink raw reply
* Re: under-performing bonded interfaces
From: Simon Chen @ 2011-11-17 1:38 UTC (permalink / raw)
To: Ben Hutchings; +Cc: Ben Greear, netdev
In-Reply-To: <1321491449.2709.90.camel@bwh-desktop>
Thanks, Ben. That's good discovery...
Are you saying that both 10G NICs are on the same PCIe x4 slot, so
that they're subject to the 12G throughput bottleneck?
I'm gonna verify this with the hardware vendor...
Thanks.
-Simon
On Wed, Nov 16, 2011 at 7:57 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2011-11-16 at 19:05 -0500, Simon Chen wrote:
>> If used independently, I can get around 9.8Gbps.
> [...]
>> [ 11.572861] ixgbe 0000:03:00.0: (PCI Express:5.0Gb/s:Width x4) e8:9a:8f:23:42:1a
> [...]
>
> You need a PCIe 2.0 x8 slot to run 2 10 Gb ports at full speed. An x4
> slot is only good for about 12-13 Gb aggregate throughput (dependent on
> packet sizes and other details).
>
> Ben.
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>
>
^ permalink raw reply
* Re: [PATCH] Add ethtool to mii advertisment conversion helpers
From: David Miller @ 2011-11-17 1:38 UTC (permalink / raw)
To: mchan; +Cc: mcarlson, bhutchings, netdev
In-Reply-To: <1321492892.9114.22.camel@nseg_linux_HP1.broadcom.com>
From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 16 Nov 2011 17:21:32 -0800
>
> On Wed, 2011-11-16 at 17:16 -0800, Matt Carlson wrote:
>> > Finally, do these need to be inline?
>>
>> I don't have a strong preference here either. Phy code tends to be
>> slower, so there isn't really a strong performance argument. The
>> implementations don't seem to be so large to argue against it though.
>> Would you prefer they not be inlined?
>>
>
> Since we are defining these in .h file, they need to be inline, right?
> Otherwise multiple source files including the same .h file will have
> conflict.
Yes, if you keep them in the header you have to keep them inline.
Ben, by suggesting to not inline them, is implicitly saying to put
them out in a seperate *.c file somewhere, perhaps net/core/ethtool.c
or similar. With appropriate EXPORT_SYMBOL() added.
^ permalink raw reply
* Re: [PATCH] route: add more relaxed option for secure_redirects
From: Flavio Leitner @ 2011-11-17 1:40 UTC (permalink / raw)
To: Flavio Leitner; +Cc: David Miller, netdev
In-Reply-To: <20111116211738.067354c0@asterix.rh>
On Wed, 16 Nov 2011 21:17:38 -0200
Flavio Leitner <fbl@redhat.com> wrote:
> On Wed, 16 Nov 2011 17:02:13 -0500 (EST)
> David Miller <davem@davemloft.net> wrote:
>
> > From: Flavio Leitner <fbl@redhat.com>
> > Date: Wed, 16 Nov 2011 18:46:12 -0200
> >
> > > Thus, the only option at the sender side would be using iptables
> > > to change the ICMP redirect source address to be the float
> > > address, but that is not working as well. (It isn't passing
> > > through -t nat)
> >
> > If it's going to mangle the packet in one direct, the only option
> > for sane operation is to make the exact reverse transformation in
> > the other direction for ICMP messages.
> >
> > I'm sorry to be so difficult about this, but this is the only way to
> > handle this problem. If packet mangling is performed to change the
> > world, that mangling entity has taken on the responsibility to make
> > everything look correct to all entities for the mangled packets
> > and any packets generated in response to such mangled packets.
> >
>
> I'm sorry, I lost you there. There is no transformation happening in
> any side. The iptables is just a work around to force the outgoing
> ICMP redirect to use the correct source address (secondary or alias).
>
> The whole problem is the linux gateway sending ICMP redirects using
> *always* the primary address.
>
To make sure we are in the same page, this simple setup reproduces
the issue.
IP: 10.0.0.1
gw: 10.0.0.100
+--------+ +-----+ primary: 10.0.0.2
| client |----+-----| GW1 | alias: 10.0.0.100
+--------+ | +-----+ gw: 10.0.0.254
+--+--+
| GW2 |---> internet
+-----+
10.0.0.254
1. Client sends TCP SYN to an internet host using
GW1 alias address as default gw address
2. Then GW1 sends the ICMP redirect back to client
using the primary address as source address.
3. GW1 forwards the original packet to GW2
4. client ignores the ICMP redirect because
client.gw != gw1.primary.
Regards,
fbl
^ permalink raw reply
* Re: [PATCH 1/2] net: drivers: use bool type instead of double negation
From: David Miller @ 2011-11-17 1:40 UTC (permalink / raw)
To: mirq-linux; +Cc: netdev, bhutchings
In-Reply-To: <2ec988cd27c6eb5162ea2e41416edcaae207bb5e.1321488025.git.mirq-linux@rere.qmqm.pl>
From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Thu, 17 Nov 2011 01:05:33 +0100 (CET)
> Save some punctuation by using bool type's property equivalent to
> doubled negation operator.
>
> Reported-by: Ben Hutchings <bhutchings@solarflare.com>
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox