Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] bonding: remove useless assignment
From: David Miller @ 2009-10-07  8:05 UTC (permalink / raw)
  To: nicolas.2p.debian; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <4AB67242.6060803@free.fr>

From: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Date: Sun, 20 Sep 2009 20:19:46 +0200

> The variable old_active is first set to bond->curr_active_slave.
> Then, it is unconditionally set to new_active, without being used in
> between.
> 
> The first assignment, having no side effect, is useless.
> 
> Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>

Nicolas, all of your patches are corrupts by your email client.
It breaks up long lines and transforms tab characters into spaces.

Please correct this, and resubmit all of your pending patches.

Thank you.

^ permalink raw reply

* Re: [PATCH] add vif using local interface index instead of IP
From: David Miller @ 2009-10-07  8:23 UTC (permalink / raw)
  To: mail4ilia; +Cc: opurdila, netdev
In-Reply-To: <1b9338490909160853u4d90093fg56453ffff5a67ced@mail.gmail.com>

From: "Ilia K." <mail4ilia@gmail.com>
Date: Wed, 16 Sep 2009 18:53:07 +0300

> When routing daemon wants to enable forwarding of multicast traffic it
> performs something like:
 ...
> This leads (in the kernel) to calling  vif_add() function call which
> search the (physical) device using assigned IP address:
>        dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
> 
> The current API (struct vifctl) does not allow to specify an
> interface other way than using it's IP, and if there are more than a
> single interface with specified IP only the first one will be found.
> 
> The attached patch (against 2.6.30.4) allows to specify an interface
> by its index, instead of IP address:
...
> Signed-off-by: Ilia K. <mail4ilia@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH][RESEND 3] IPv6: 6rd tunnel mode
From: David Miller @ 2009-10-07  8:24 UTC (permalink / raw)
  To: yoshfuji; +Cc: acassen, netdev
In-Reply-To: <20090923184314.a2a2701d.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Wed, 23 Sep 2009 18:43:14 +0900

> Subject: [PATCH] ipv6 sit: 6rd (IPv6 Rapid Deployment) Support.
> 
> IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
> mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
> deploy IPv6 unicast service to IPv4 sites to which it provides
> customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
> IPv4 encapsulation in order to transit IPv4-only network
> infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
> prefix of its own in place of the fixed 6to4 prefix.
> 
> With this option enabled, the SIT driver offers 6rd functionality by
> providing additional ioctl API to configure the IPv6 Prefix for in
> stead of static 2002::/16 for 6to4.
> 
> Original patch was done by Alexandre Cassen <acassen@freebox.fr>
> based on old Internet-Draft.
> 
> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH net-next-2.6 v2] bonding: introduce primary_reselect option
From: David Miller @ 2009-10-07  8:25 UTC (permalink / raw)
  To: jpirko; +Cc: fubar, netdev, bonding-devel, nicolas.2p.debian
In-Reply-To: <20090925132803.GB3657@psychotron.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Fri, 25 Sep 2009 15:28:09 +0200

> Subject: [PATCH net-2.6 v3] bonding: introduce primary_reselect option
> 
> In some cases there is not desirable to switch back to primary interface when
> it's link recovers and rather stay with currently active one. We need to avoid
> packetloss as much as we can in some cases. This is solved by introducing
> primary_reselect option. Note that enslaved primary slave is set as current
> active no matter what.
> 
> Patch modified by Jay Vosburgh as follows: fixed bug in action
> after change of option setting via sysfs, revised the documentation
> update, and bumped the bonding version number.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] Use sk_mark for routing lookup in more places
From: David Miller @ 2009-10-07  8:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: atis, panther, netdev
In-Reply-To: <4AC58C46.8080408@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 02 Oct 2009 07:14:46 +0200

> Here is a followup on this area, thanks.
> 
> [RFC] af_packet: fill skb->mark at xmit
> 
> skb->mark may be used by classifiers, so fill it in case user 
> set a SO_MARK option on socket.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks fine, applied, thanks!

^ permalink raw reply

* Re: [RFC take2] pkt_sched: gen_estimator: Dont report fake rate estimators
From: David Miller @ 2009-10-07  8:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jarkao2, kaber, netdev
In-Reply-To: <4AC66352.8050500@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 02 Oct 2009 22:32:18 +0200

> Jarek Poplawski a écrit :
>> 
>> 
>> Hmm... So you made me to do some "real" work here, and guess what?:
>> there is one serious checkpatch warning! ;-) Plus, this new parameter
>> should be added to the function description. Otherwise:
>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>> 
>> Thanks,
>> Jarek P.
>> 
>> PS: I guess full "Don't" would show we really mean it...
> 
> Okay :) Here is the last round, before the night !
> 
> Thanks again
> 
> 
> [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

Applied, thanks!

^ permalink raw reply

* Re: [PATCH] net/ppp: fix comments - ppp_{sync,asynctty}_receive() may sleep
From: David Miller @ 2009-10-07  8:28 UTC (permalink / raw)
  To: tilman; +Cc: alan, alan, paulus, linux-ppp, netdev, jarkao2, linux-kernel
In-Reply-To: <20091001142844.C50E511186D@xenon.ts.pxnet.com>

From: Tilman Schmidt <tilman@imap.cc>
Date: Thu,  1 Oct 2009 16:28:44 +0200 (CEST)

> The receive_buf methods of the N_PPP and N_SYNC_PPP line disciplines,
> ppp_asynctty_receive() and ppp_sync_receive(), call tty_unthrottle()
> which may sleep. Fix the comments claiming otherwise.
> 
> Impact: documentation
> Signed-off-by: Tilman Schmidt <tilman@imap.cc>

Applied, thanks.

^ permalink raw reply

* Re: [RFC][PATCH] ethtool: Add reset operation
From: David Miller @ 2009-10-07  8:28 UTC (permalink / raw)
  To: bhutchings; +Cc: ajitk, netdev, linux-net-drivers
In-Reply-To: <1254481219.23350.77.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 02 Oct 2009 12:00:19 +0100

> On Fri, 2009-10-02 at 16:10 +0530, Ajit Khaparde wrote:
> [...]
>> Can you tell the intention behind this copy_to_user?
>> Do you envision drivers sending back some data to the userland - may be
>> sometime in future?
> 
> This allows userland to see which components were actually reset.

Patch applied, thanks.

^ permalink raw reply

* Re: [PATCH] ethtool: Add reset operation
From: David Miller @ 2009-10-07  8:29 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1254776398.2789.7.camel@achroite>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 05 Oct 2009 21:59:58 +0100

> After updating firmware stored in flash, users may wish to reset the
> relevant hardware and start the new firmware immediately.  This should
> not be completely automatic as it may be disruptive.
> 
> A selective reset may also be useful for debugging or diagnostics.
> 
> This adds a separate reset operation which takes flags indicating the
> components to be reset.  Drivers are allowed to reset only a subset of
> those requested, and must indicate the actual subset.  This allows the
> use of generic component masks and some future expansion.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> This differs from the previous (RFC) version only in the semantics of
> the output value of the reset flags: they should indicate the components
> which were *not* reset.  This should be slightly less error-prone as it
> means implementations do not need to maintain the input and output flags
> separately.

Just so my previous reply doesn't confuse, I did apply this version
of the patch.


^ permalink raw reply

* Re: [PATCH] Use sk_mark for IPv6 routing lookups
From: David Miller @ 2009-10-07  8:29 UTC (permalink / raw)
  To: brian.haley; +Cc: zenczykowski, eric.dumazet, atis, panther, netdev
In-Reply-To: <4AC64419.6020202@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Fri, 02 Oct 2009 14:19:05 -0400

> Add support for IPv6 route lookups using sk_mark.
> 
> Signed-off-by: Brian Haley <brian.haley@hp.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH] make TLLAO option for NA packets configurable
From: David Miller @ 2009-10-07  8:29 UTC (permalink / raw)
  To: opurdila; +Cc: shemminger, cratiu, netdev
In-Reply-To: <200910030039.15952.opurdila@ixiacom.com>

From: Octavian Purdila <opurdila@ixiacom.com>
Date: Sat, 3 Oct 2009 00:39:15 +0300

> Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
> 
> Neighbor advertisements responding to unicast neighbor solicitations
> did not include the target link-layer address option. This patch adds
> a new sysctl option (disabled by default) which controls whether this
> option should be sent even with unicast NAs.
> 
> The need for this arose because certain routers expect the TLLAO in
> some situations even as a response to unicast NS packets.
> 
> Moreover, RFC 2461 recommends sending this to avoid a race condition
> (section 4.4, Target link-layer address)
> 
> Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH v2] net: mark net_proto_ops as const
From: David Miller @ 2009-10-07  8:30 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20091005085839.2b3f82df@s6510>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 5 Oct 2009 08:58:39 -0700

> All usages of structure net_proto_ops should be declared const.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH] dcb: data center bridging ops should be r/o
From: David Miller @ 2009-10-07  8:30 UTC (permalink / raw)
  To: peter.p.waskiewicz.jr
  Cc: shemminger, jeffrey.t.kirsher, jesse.brandeburg, netdev,
	e1000-devel
In-Reply-To: <1254773298.2667.27.camel@localhost.localdomain>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Date: Mon, 05 Oct 2009 13:08:18 -0700

> On Mon, 2009-10-05 at 09:01 -0700, Stephen Hemminger wrote:
>> The data center bridging ops structure can be const
>> 
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>> 
> 
> Thanks Stephen, that was an oversight.
> 
> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH 00/11] Gigaset driver patches for 2.6.32 (v3)
From: David Miller @ 2009-10-07  8:30 UTC (permalink / raw)
  To: kkeil-iHCpqvpFUx0uJkBD2foKsQ
  Cc: isdn-iHCpqvpFUx0uJkBD2foKsQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, tilman-ZTO5kqT2PaM,
	hjlipp-S0/GAf8tV78, isdn4linux-JX7+OpRa80SjiSfgN6Y1Ib39b6g2fGNp,
	keil-shG/GajIFYqbacvFa/9K2g,
	i4ldeveloper-JX7+OpRa80SjiSfgN6Y1Ib39b6g2fGNp
In-Reply-To: <1254904188.9411@nb-hp>

From: Karsten Keil <kkeil-iHCpqvpFUx0uJkBD2foKsQ@public.gmane.org>
Date: Wed, 7 Oct 2009 10:29:48 +0200

> Since the project hard deadlines are end of this month, I hope I
> have more time for ISDN work then again.

Ok, thanks for the update.

^ permalink raw reply

* [PATCH net-next-2.6] be2net: Get rid of net_device_stats from adapter.
From: Ajit Khaparde @ 2009-10-07  9:09 UTC (permalink / raw)
  To: davem, netdev

adapter doesnot need to maintain a copy of net_device_stats.
Use the one already available in net_device. This patch takes care of the same.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
---
 drivers/net/benet/be.h         |    1 -
 drivers/net/benet/be_ethtool.c |    2 +-
 drivers/net/benet/be_main.c    |    6 ++----
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index a80da0e..4b61a91 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -181,7 +181,6 @@ struct be_drvr_stats {
 
 struct be_stats_obj {
 	struct be_drvr_stats drvr_stats;
-	struct net_device_stats net_stats;
 	struct be_dma_mem cmd;
 };
 
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 77c66da..333729b 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -234,7 +234,7 @@ be_get_ethtool_stats(struct net_device *netdev,
 	struct be_rxf_stats *rxf_stats = &hw_stats->rxf;
 	struct be_port_rxf_stats *port_stats =
 			&rxf_stats->port[adapter->port_num];
-	struct net_device_stats *net_stats = &adapter->stats.net_stats;
+	struct net_device_stats *net_stats = &netdev->stats;
 	struct be_erx_stats *erx_stats = &hw_stats->erx;
 	void *p = NULL;
 	int i;
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 6d5e81f..0e92a1f 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -141,7 +141,7 @@ void netdev_stats_update(struct be_adapter *adapter)
 	struct be_rxf_stats *rxf_stats = &hw_stats->rxf;
 	struct be_port_rxf_stats *port_stats =
 			&rxf_stats->port[adapter->port_num];
-	struct net_device_stats *dev_stats = &adapter->stats.net_stats;
+	struct net_device_stats *dev_stats = &adapter->netdev->stats;
 	struct be_erx_stats *erx_stats = &hw_stats->erx;
 
 	dev_stats->rx_packets = port_stats->rx_total_frames;
@@ -269,9 +269,7 @@ static void be_rx_eqd_update(struct be_adapter *adapter)
 
 static struct net_device_stats *be_get_stats(struct net_device *dev)
 {
-	struct be_adapter *adapter = netdev_priv(dev);
-
-	return &adapter->stats.net_stats;
+	return &dev->stats;
 }
 
 static u32 be_calc_rate(u64 bytes, unsigned long ticks)
-- 
1.6.0.4


^ permalink raw reply related

* Re: skb_shinfo(skb)->nr_frags > 0 while skb_is_gso(skb) == 0?
From: John Wright @ 2009-10-07  9:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Michael Chan, Bob Montgomery
In-Reply-To: <20091006182131.484d6e5a@nehalam>

Hi Stephen,

On Tue, Oct 06, 2009 at 06:21:31PM -0700, Stephen Hemminger wrote:
> On Tue, 6 Oct 2009 19:03:15 -0600
> John Wright <john.wright@hp.com> wrote:
> > Bob Montgomery and I are debugging an OOPS in the bnx2 driver.  The
> > driver OOPSes in bnx2_tx_int(), getting a NULL pointer dereference when
> > checking if the skb is GSO.  (This is on 2.6.29, before is_gso was
> > cached in the tx_buf (commit d62fda08), but bear with me - while kernels
> > with that commit might not crash in the same place, I think we have
> > discovered a bug that would manifest itself another way.)
> > 
> > So, first, a question for someone who knows more about sk_buff's than I:
> > is it reasonable/legal for an skb for which skb_is_gso(skb) == 0 to also
> > have skb_shinfo(skb)->nr_frags > 0?
> 
> Yes, if driver support Scatter/Gather and Checksum offload,
> TCP (especially splice) will hand fragmented frames to device.

Is there a good way to generate lots of these types of packets?  Is
disabling tso and gso with ethtool and sendmsg()ing big chunks of data
enough?

-- 
+----------------------------------------------------------+
| John Wright <john.wright@hp.com>                         |
| HP Mission Critical OS Enablement & Solution Test (MOST) |
+----------------------------------------------------------+

^ permalink raw reply

* Re: [PATCH] include/netdevice.h: fix nanodoc mismatch
From: Wolfram Sang @ 2009-10-07  9:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, trivial
In-Reply-To: <20090812.221625.138548183.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

On Wed, Aug 12, 2009 at 10:16:25PM -0700, David Miller wrote:
> From: Wolfram Sang <w.sang@pengutronix.de>
> Date: Mon, 10 Aug 2009 13:04:49 +0200
> 
> > nanodoc was missing an ndo_-prefix.
> > 
> > Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
> 
> I'll apply this, thanks.

I can't find it in your trees. Slipped through?

-- 
Pengutronix e.K.                           | Wolfram Sang                |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH] include/netdevice.h: fix nanodoc mismatch
From: David Miller @ 2009-10-07 10:10 UTC (permalink / raw)
  To: w.sang; +Cc: netdev, trivial
In-Reply-To: <20091007091803.GD3177@pengutronix.de>

From: Wolfram Sang <w.sang@pengutronix.de>
Date: Wed, 7 Oct 2009 11:18:03 +0200

> On Wed, Aug 12, 2009 at 10:16:25PM -0700, David Miller wrote:
>> From: Wolfram Sang <w.sang@pengutronix.de>
>> Date: Mon, 10 Aug 2009 13:04:49 +0200
>> 
>> > nanodoc was missing an ndo_-prefix.
>> > 
>> > Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
>> 
>> I'll apply this, thanks.
> 
> I can't find it in your trees. Slipped through?

Looks that way.

Sorry about that, could you please resend?

Thanks!

^ permalink raw reply

* Re: [PATCH net-next-2.6] be2net: Get rid of net_device_stats from adapter.
From: David Miller @ 2009-10-07 10:11 UTC (permalink / raw)
  To: ajitk, ajitkhaparde; +Cc: netdev
In-Reply-To: <20091007090903.GA18843@serverengines.com>

From: Ajit Khaparde <ajitkhaparde@gmail.com>
Date: Wed, 7 Oct 2009 14:39:16 +0530

> adapter doesnot need to maintain a copy of net_device_stats.
> Use the one already available in net_device. This patch takes care of the same.
> 
> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] ipv4: arp_notify address list bug
From: David Miller @ 2009-10-07 10:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: hannes, shemminger, netdev
In-Reply-To: <4ACAD393.5080909@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 06 Oct 2009 07:20:19 +0200

> From: Stephen Hemminger <shemminger@vyatta.com>
> 
> This fixes a bug with arp_notify.
> 
> If arp_notify is enabled, kernel will crash if address is changed
> and no IP address is assigned.
>   http://bugzilla.kernel.org/show_bug.cgi?id=14330
> 
> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH] Add sk_mark route lookup support for IPv4 listening sockets, and for IPv4 multicast forwarding
From: David Miller @ 2009-10-07 10:19 UTC (permalink / raw)
  To: atis; +Cc: netdev, panther, eric.dumazet, brian.haley, zenczykowski
In-Reply-To: <200910051646.34770.atis@mikrotik.com>

From: Atis Elsts <atis@mikrotik.com>
Date: Mon, 5 Oct 2009 16:46:34 +0300

> @@ -1238,6 +1238,7 @@ static void ipmr_queue_xmit(struct sk_buff *skb, struct mfc_cache *c, int vifi)
>  
>  	if (vif->flags&VIFF_TUNNEL) {
>  		struct flowi fl = { .oif = vif->link,
> +				    .mark = skb->mark,
>  				    .nl_u = { .ip4_u =
>  					      { .daddr = vif->remote,
>  						.saddr = vif->local,

I'm not so sure if this is right.

I understand what you're trying to do, inherit the socket's
mark when it goes over a multicast tunnel.

But I'm not so sure that's what we want to do, semantically.

Could you split out these skb->mark cases into a seperate
patch?  The parts that only use sk->mark are fine and I
would like to apply a patch from you which just does that
while we discuss the skb->mark case.

Thanks.

^ permalink raw reply

* [PATCH net-next-2.6] udp: dynamically size hash tables at boot time
From: Eric Dumazet @ 2009-10-07 10:37 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev
In-Reply-To: <20091006.222935.231081303.davem@davemloft.net>

David Miller a écrit :
> 
> That's incredible that it's been that low for so long :-)
> 
> Bug please, dynamically size this thing, maybe with a cap of say 64K
> to start with.  If you don't have time for it I'll take care of this.

Here we are.

Thank you

[PATCH] udp: dynamically size hash tables at boot time

UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups.

4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame
has to lookup all sockets to find best match, so long chains hurt latency.

Instead of a fixed size hash table that cant be perfect for every needs,
let UDP stack choose its table size at boot time like tcp/ip route,
using alloc_large_system_hash() helper

Add an optional boot parameter, uhash_entries=x so that an admin can force a size
between 256 and 65536 if needed, like thash_entries and rhash_entries.

dmesg logs two new lines :
[    0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[    0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)


Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 Documentation/kernel-parameters.txt |    3
 include/linux/udp.h                 |    6 -
 include/net/udp.h                   |   13 ++-
 net/ipv4/udp.c                      |   91 ++++++++++++++++++--------
 net/ipv4/udplite.c                  |    4 -
 net/ipv6/udp.c                      |    6 -
 6 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6fa7292..02df20b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2589,6 +2589,9 @@ and is between 256 and 4096 characters. It is defined in the file
 	uart6850=	[HW,OSS]
 			Format: <io>,<irq>
 
+	uhash_entries=	[KNL,NET]
+			Set number of hash buckets for UDP/UDP-Lite connections
+
 	uhci-hcd.ignore_oc=
 			[USB] Ignore overcurrent events (default N).
 			Some badly-designed motherboards generate lots of
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 0cf5c4c..832361e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -45,11 +45,11 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb)
 	return (struct udphdr *)skb_transport_header(skb);
 }
 
-#define UDP_HTABLE_SIZE		128
+#define UDP_HTABLE_SIZE_MIN		(CONFIG_BASE_SMALL ? 128 : 256)
 
-static inline int udp_hashfn(struct net *net, const unsigned num)
+static inline int udp_hashfn(struct net *net, unsigned num, unsigned mask)
 {
-	return (num + net_hash_mix(net)) & (UDP_HTABLE_SIZE - 1);
+	return (num + net_hash_mix(net)) & mask;
 }
 
 struct udp_sock {
diff --git a/include/net/udp.h b/include/net/udp.h
index f98abd2..22aa2e7 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -54,12 +54,19 @@ struct udp_hslot {
 	struct hlist_nulls_head	head;
 	spinlock_t		lock;
 } __attribute__((aligned(2 * sizeof(long))));
+
 struct udp_table {
-	struct udp_hslot	hash[UDP_HTABLE_SIZE];
+	struct udp_hslot	*hash;
+	unsigned int mask;
+	unsigned int log;
 };
 extern struct udp_table udp_table;
-extern void udp_table_init(struct udp_table *);
-
+extern void udp_table_init(struct udp_table *, const char *);
+static inline struct udp_hslot *udp_hashslot(struct udp_table *table,
+					     struct net *net, unsigned num)
+{
+	return &table->hash[udp_hashfn(net, num, table->mask)];
+}
 
 /* Note: this must match 'valbool' in sock_setsockopt */
 #define UDP_CSUM_NOXMIT		1
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6ec6a8a..194bcdc 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -106,7 +106,7 @@
 #include <net/xfrm.h>
 #include "udp_impl.h"
 
-struct udp_table udp_table;
+struct udp_table udp_table __read_mostly;
 EXPORT_SYMBOL(udp_table);
 
 int sysctl_udp_mem[3] __read_mostly;
@@ -121,14 +121,16 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
 atomic_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
-#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
+#define MAX_UDP_PORTS 65536
+#define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			       const struct udp_hslot *hslot,
 			       unsigned long *bitmap,
 			       struct sock *sk,
 			       int (*saddr_comp)(const struct sock *sk1,
-						 const struct sock *sk2))
+						 const struct sock *sk2),
+			       unsigned int log)
 {
 	struct sock *sk2;
 	struct hlist_nulls_node *node;
@@ -142,8 +144,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
 		    (*saddr_comp)(sk, sk2)) {
 			if (bitmap)
-				__set_bit(sk2->sk_hash / UDP_HTABLE_SIZE,
-					  bitmap);
+				__set_bit(sk2->sk_hash >> log, bitmap);
 			else
 				return 1;
 		}
@@ -180,13 +181,15 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		/*
 		 * force rand to be an odd multiple of UDP_HTABLE_SIZE
 		 */
-		rand = (rand | 1) * UDP_HTABLE_SIZE;
-		for (last = first + UDP_HTABLE_SIZE; first != last; first++) {
-			hslot = &udptable->hash[udp_hashfn(net, first)];
+		rand = (rand | 1) * (udptable->mask + 1);
+		for (last = first + udptable->mask + 1;
+		     first != last;
+		     first++) {
+			hslot = udp_hashslot(udptable, net, first);
 			bitmap_zero(bitmap, PORTS_PER_CHAIN);
 			spin_lock_bh(&hslot->lock);
 			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk,
-					    saddr_comp);
+					    saddr_comp, udptable->log);
 
 			snum = first;
 			/*
@@ -196,7 +199,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 			 */
 			do {
 				if (low <= snum && snum <= high &&
-				    !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
+				    !test_bit(snum >> udptable->log, bitmap))
 					goto found;
 				snum += rand;
 			} while (snum != first);
@@ -204,9 +207,10 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		}
 		goto fail;
 	} else {
-		hslot = &udptable->hash[udp_hashfn(net, snum)];
+		hslot = udp_hashslot(udptable, net, snum);
 		spin_lock_bh(&hslot->lock);
-		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
+		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk,
+					saddr_comp, 0))
 			goto fail_unlock;
 	}
 found:
@@ -283,7 +287,7 @@ static struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
 	unsigned short hnum = ntohs(dport);
-	unsigned int hash = udp_hashfn(net, hnum);
+	unsigned int hash = udp_hashfn(net, hnum, udptable->mask);
 	struct udp_hslot *hslot = &udptable->hash[hash];
 	int score, badness;
 
@@ -1013,8 +1017,8 @@ void udp_lib_unhash(struct sock *sk)
 {
 	if (sk_hashed(sk)) {
 		struct udp_table *udptable = sk->sk_prot->h.udp_table;
-		unsigned int hash = udp_hashfn(sock_net(sk), sk->sk_hash);
-		struct udp_hslot *hslot = &udptable->hash[hash];
+		struct udp_hslot *hslot = udp_hashslot(udptable, sock_net(sk),
+						     sk->sk_hash);
 
 		spin_lock_bh(&hslot->lock);
 		if (sk_nulls_del_node_init_rcu(sk)) {
@@ -1169,7 +1173,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udp_table *udptable)
 {
 	struct sock *sk;
-	struct udp_hslot *hslot = &udptable->hash[udp_hashfn(net, ntohs(uh->dest))];
+	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
 
 	spin_lock(&hslot->lock);
@@ -1609,9 +1613,14 @@ static struct sock *udp_get_first(struct seq_file *seq, int start)
 	struct udp_iter_state *state = seq->private;
 	struct net *net = seq_file_net(seq);
 
-	for (state->bucket = start; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
+	for (state->bucket = start; state->bucket <= state->udp_table->mask;
+	     ++state->bucket) {
 		struct hlist_nulls_node *node;
 		struct udp_hslot *hslot = &state->udp_table->hash[state->bucket];
+
+		if (hlist_nulls_empty(&hslot->head))
+			continue;
+
 		spin_lock_bh(&hslot->lock);
 		sk_nulls_for_each(sk, node, &hslot->head) {
 			if (!net_eq(sock_net(sk), net))
@@ -1636,7 +1645,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk)
 	} while (sk && (!net_eq(sock_net(sk), net) || sk->sk_family != state->family));
 
 	if (!sk) {
-		if (state->bucket < UDP_HTABLE_SIZE)
+		if (state->bucket <= state->udp_table->mask)
 			spin_unlock_bh(&state->udp_table->hash[state->bucket].lock);
 		return udp_get_first(seq, state->bucket + 1);
 	}
@@ -1656,7 +1665,7 @@ static struct sock *udp_get_idx(struct seq_file *seq, loff_t pos)
 static void *udp_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct udp_iter_state *state = seq->private;
-	state->bucket = UDP_HTABLE_SIZE;
+	state->bucket = MAX_UDP_PORTS;
 
 	return *pos ? udp_get_idx(seq, *pos-1) : SEQ_START_TOKEN;
 }
@@ -1678,7 +1687,7 @@ static void udp_seq_stop(struct seq_file *seq, void *v)
 {
 	struct udp_iter_state *state = seq->private;
 
-	if (state->bucket < UDP_HTABLE_SIZE)
+	if (state->bucket <= state->udp_table->mask)
 		spin_unlock_bh(&state->udp_table->hash[state->bucket].lock);
 }
 
@@ -1738,7 +1747,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f,
 	__u16 destp	  = ntohs(inet->dport);
 	__u16 srcp	  = ntohs(inet->sport);
 
-	seq_printf(f, "%4d: %08X:%04X %08X:%04X"
+	seq_printf(f, "%5d: %08X:%04X %08X:%04X"
 		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d%n",
 		bucket, src, srcp, dest, destp, sp->sk_state,
 		sk_wmem_alloc_get(sp),
@@ -1804,11 +1813,43 @@ void udp4_proc_exit(void)
 }
 #endif /* CONFIG_PROC_FS */
 
-void __init udp_table_init(struct udp_table *table)
+static __initdata unsigned long uhash_entries;
+static int __init set_uhash_entries(char *str)
 {
-	int i;
+	if (!str)
+		return 0;
+	uhash_entries = simple_strtoul(str, &str, 0);
+	if (uhash_entries && uhash_entries < UDP_HTABLE_SIZE_MIN)
+		uhash_entries = UDP_HTABLE_SIZE_MIN;
+	return 1;
+}
+__setup("uhash_entries=", set_uhash_entries);
 
-	for (i = 0; i < UDP_HTABLE_SIZE; i++) {
+void __init udp_table_init(struct udp_table *table, const char *name)
+{
+	unsigned int i;
+
+	if (!CONFIG_BASE_SMALL)
+		table->hash = alloc_large_system_hash(name,
+			sizeof(struct udp_hslot),
+			uhash_entries,
+			21, /* one slot per 2 MB */
+			0,
+			&table->log,
+			&table->mask,
+			64 * 1024);
+	/*
+	 * Make sure hash table has the minimum size
+	 */
+	if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) {
+		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
+				      sizeof(struct udp_hslot), GFP_KERNEL);
+		if (!table->hash)
+			panic(name);
+		table->log = ilog2(UDP_HTABLE_SIZE_MIN);
+		table->mask = UDP_HTABLE_SIZE_MIN - 1;
+	}
+	for (i = 0; i <= table->mask; i++) {
 		INIT_HLIST_NULLS_HEAD(&table->hash[i].head, i);
 		spin_lock_init(&table->hash[i].lock);
 	}
@@ -1818,7 +1859,7 @@ void __init udp_init(void)
 {
 	unsigned long nr_pages, limit;
 
-	udp_table_init(&udp_table);
+	udp_table_init(&udp_table, "UDP");
 	/* Set the pressure threshold up by the same strategy of TCP. It is a
 	 * fraction of global memory that is up to 1/2 at 256 MB, decreasing
 	 * toward zero with the amount of memory, with a floor of 128 pages.
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 95248d7..a495ca8 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -12,7 +12,7 @@
  */
 #include "udp_impl.h"
 
-struct udp_table 	udplite_table;
+struct udp_table 	udplite_table __read_mostly;
 EXPORT_SYMBOL(udplite_table);
 
 static int udplite_rcv(struct sk_buff *skb)
@@ -110,7 +110,7 @@ static inline int udplite4_proc_init(void)
 
 void __init udplite4_register(void)
 {
-	udp_table_init(&udplite_table);
+	udp_table_init(&udplite_table, "UDP-Lite");
 	if (proto_register(&udplite_prot, 1))
 		goto out_register_err;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3a60f12..d42f503 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -132,7 +132,7 @@ static struct sock *__udp6_lib_lookup(struct net *net,
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
 	unsigned short hnum = ntohs(dport);
-	unsigned int hash = udp_hashfn(net, hnum);
+	unsigned int hash = udp_hashfn(net, hnum, udptable->mask);
 	struct udp_hslot *hslot = &udptable->hash[hash];
 	int score, badness;
 
@@ -452,7 +452,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 {
 	struct sock *sk, *sk2;
 	const struct udphdr *uh = udp_hdr(skb);
-	struct udp_hslot *hslot = &udptable->hash[udp_hashfn(net, ntohs(uh->dest))];
+	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
 
 	spin_lock(&hslot->lock);
@@ -1195,7 +1195,7 @@ static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket
 	destp = ntohs(inet->dport);
 	srcp  = ntohs(inet->sport);
 	seq_printf(seq,
-		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
+		   "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
 		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
 		   bucket,
 		   src->s6_addr32[0], src->s6_addr32[1],

^ permalink raw reply related

* Re: r8169 chips on some Intel D945GSEJT boards fail to work after PXE boot
From: Simon Farnsworth @ 2009-10-07 10:39 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev
In-Reply-To: <20091006215601.GA10692@electric-eye.fr.zoreil.com>

Francois Romieu wrote:
> Francois Romieu <romieu@fr.zoreil.com> :
> [...]
>> @@ -2200,6 +3075,11 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>>  	tp->pcie_cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
>>  	if (!tp->pcie_cap && netif_msg_probe(tp))
>>  		dev_info(&pdev->dev, "no PCI Express capability\n");
>> +	else {
>> +		pci_write_config_word(pdev, tp->pcie_cap + PCI_EXP_DEVSTA,
>> +				      PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED |
>> +				      PCI_EXP_DEVSTA_FED | PCI_EXP_DEVSTA_URD);
>> +	}
>>  
>>  	RTL_W16(IntrMask, 0x0000);
>>  
> 
> Can you check if this part of the patch is required to fix
> your issue ?
> 
> I'd rather avoid including it under the 8168d support banner
> if it is not needed.
> 
I can confirm that I don't need that hunk.
-- 
Simon Farnsworth


^ permalink raw reply

* Re: [PATCH net-2.6 V2] tg3: Fix phylib locking strategy
From: David Miller @ 2009-10-07 10:41 UTC (permalink / raw)
  To: mcarlson; +Cc: felix, netdev, mchan, andy
In-Reply-To: <1254801692.18507@xw6200>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Mon, 5 Oct 2009 20:55:29 -0700

> O.K.  Here is the latest version.  Felix, can you verify your problem
> is solved with this patch?
> 
> ---
> 
> Felix Radensky noted that chip resets were generating stack trace dumps.
> This is because the driver is attempting to acquire the mdio bus mutex
> while holding the tp->lock spinlock.  The fix is to change the code such
> that every phy access takes the tp->lock spinlock instead.
> 
> Signed-off-by: Matt Carlson <mcarlson@broadcom.com>

I'm going to apply this now to net-2.6, let me know if there are
any updates after testing.

^ permalink raw reply

* Re: [PATCH 1/4] ethoc: fix typo to compute number of tx descriptors
From: David Miller @ 2009-10-07 10:41 UTC (permalink / raw)
  To: thomas; +Cc: netdev
In-Reply-To: <1254735200-2718-1-git-send-email-thomas@wytron.com.tw>

From: Thomas Chou <thomas@wytron.com.tw>
Date: Mon,  5 Oct 2009 17:33:17 +0800

> It should be max() instead of min(). Use 1/4 of available
> descriptors for tx, and there should be at least 2 tx
> descriptors.
> 
> Signed-off-by: Thomas Chou <thomas@wytron.com.tw>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox