Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] sctp: implement SIOCINQ ioctl() (take 3)
From: Diego Elio Pettenò @ 2010-10-01  9:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-sctp
In-Reply-To: <20100930.173557.108787840.davem@davemloft.net>

Il giorno gio, 30/09/2010 alle 17.35 -0700, David Miller ha scritto:
> 
> 
> Please put the break statement inside of the basic block.
> 
> The way you have it here the indentation looks not so nice. 

Okay will resend; please do note that I copied the style (and most of
the code) from dccp anyway.

-- 
Diego Elio Pettenò — “Flameeyes”
http://blog.flameeyes.eu/

If you found a .asc file in this mail and know not what it is,
it's a GnuPG digital signature: http://www.gnupg.org/



^ permalink raw reply

* Re: Regression (ancient), bisected: TCP hangs with certain ESP6 SA.
From: Herbert Xu @ 2010-10-01  9:28 UTC (permalink / raw)
  To: David Miller; +Cc: nbowler, linux-kernel, netdev, eric.dumazet
In-Reply-To: <20100930.181716.43034538.davem@davemloft.net>

On Thu, Sep 30, 2010 at 06:17:16PM -0700, David Miller wrote:
> From: Nick Bowler <nbowler@elliptictech.com>
> Date: Wed, 29 Sep 2010 10:22:13 -0400
> 
> > b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4 is the first bad commit
> > commit b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4
> > Author: Herbert Xu <herbert@gondor.apana.org.au>
> > Date:   Thu Feb 14 23:49:37 2008 -0800
> > 
> >     [IPV6]: Fix reversed local_df test in ip6_fragment
> >     
> >     I managed to reverse the local_df test when forward-porting this
> >     patch so it actually makes things worse by never fragmenting at
> >     all.
> >     
> >     Thanks to David Stevens for testing and reporting this bug.
> >     
> >     Bill Fink pointed out that the local_df setting is also the wrong
> >     way around.
> >     
> >     Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I suspect that Herbert's change is correct, it's just that for some
> reason PMTU doesn't work correctly with IPV6 for whatever reason.
> 
> That matches with your observed behavior that ping and UDP stuff
> works just fine, and it's just TCP with certain ESP6 transport mode
> settings.

Yeah I suspect if you go back even further (before the patch with
the reversed logic referred to above) that you'll find it to be
broken again.

I'll try to reproduce this but I may not be able to get to it until
November.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 1/2] Phonet: Implement Pipe Controller to support Nokia Slim Modems
From: Rémi Denis-Courmont @ 2010-10-01  9:20 UTC (permalink / raw)
  To: Kumar SANGHVI
  Cc: Rémi Denis-Courmont, netdev@vger.kernel.org,
	STEricsson_nomadik_linux, Sudeep DIVAKARAN, Gulshan KARMANI,
	Linus WALLEIJ
In-Reply-To: <20101001085553.GA27109@bnru01.bnr.st.com>

   Hello,

On Friday 01 October 2010, Kumar SANGHVI wrote:
> Hi,
> 
> On Fri, Oct 01, 2010 at 10:42:44 +0200, Rémi Denis-Courmont wrote:
> > > I have not introduced any new ioctl()'s as part of Pipe controller
> > > implementation.
> > 
> > Sure. What you did is basically worse than ioctl()'s. You've implemented
> > them as socket options. Socket options are meant to configure parameters
> > with setsockopt and read paramters with getsockopt. They are not meant
> > for 'doing' things - that's what ioctl()'s are for.
> 
> Isn't the existing phonet stack 'doing' something as part of
> PNPIPE_ENCAP rather than simply configuring some socket option or flag ?

It sets (or gets) the delivery path for incoming data.

-- 
Rémi Denis-Courmont
http://www.remlab.net/

^ permalink raw reply

* Re: [PATCH v2 1/2] Phonet: Implement Pipe Controller to support Nokia Slim Modems
From: Kumar SANGHVI @ 2010-10-01  8:55 UTC (permalink / raw)
  To: Rémi Denis-Courmont
  Cc: Rémi Denis-Courmont, netdev@vger.kernel.org,
	STEricsson_nomadik_linux, Sudeep DIVAKARAN, Gulshan KARMANI,
	Linus WALLEIJ
In-Reply-To: <201010011142.45767.remi@remlab.net>

Hi,

On Fri, Oct 01, 2010 at 10:42:44 +0200, Rémi Denis-Courmont wrote:
 > 
> > I have not introduced any new ioctl()'s as part of Pipe controller
> > implementation.
> 
> Sure. What you did is basically worse than ioctl()'s. You've implemented them 
> as socket options. Socket options are meant to configure parameters with 
> setsockopt and read paramters with getsockopt. They are not meant for 'doing' 
> things - that's what ioctl()'s are for.

Isn't the existing phonet stack 'doing' something as part of
PNPIPE_ENCAP rather than simply configuring some socket option or flag ?
 
> > Regarding implementing connect() socket call, few queries:
> > 1. It should carry out all the same steps which I am currently doing as
> > part of PIPE_CREATE socket option, right?
> > 2. Currently, as part of Pipe controller implementation, user-space
> >    follows below sequence:-
> > 	socket()
> > 	bind()
> > 	listen()
> > 	setsockopt(PIPE_CREATE)
> > 	accept()'
> > 
> >    In the phonet stack pipe controller logic, we wait for PEP_CONNECT_RESP
> >    from host-pep (GPRS socket or video telephony socket is a host-pep.
> >    pep_reply sends out the PEP_CONNECT_RESP) and remote-pep (modem),
> >    negotiate the best flow-control to be used, and then send
> >    PIPE_CREATED_IND, with selected flow-control to both pipe end-points.
> 
> connect() should replace listen(), PIPE_CREATE and accept().

Thanks. I will implement connect and upload the patch.

-Kumar.

^ permalink raw reply

* Re: [PATCH v2 1/2] Phonet: Implement Pipe Controller to support Nokia Slim Modems
From: Rémi Denis-Courmont @ 2010-10-01  8:42 UTC (permalink / raw)
  To: Kumar SANGHVI
  Cc: Rémi Denis-Courmont, netdev@vger.kernel.org,
	STEricsson_nomadik_linux, Sudeep DIVAKARAN, Gulshan KARMANI,
	Linus WALLEIJ
In-Reply-To: <20100930071952.GA21859@bnru01.bnr.st.com>

   Hello,

On Thursday 30 September 2010, Kumar SANGHVI wrote:
> Hi Rémi Denis-Courmont,
> 
> On Wed, Sep 29, 2010 at 20:21:17 +0200, Rémi Denis-Courmont wrote:
> > It seems to me that you really want to implement the connect() socket
> > call, so that one of the two endpoints will stand up for the missing
> > controller.
> 
> Yes, implementing connect() socket call would be nice.
> 
> > That's
> > still much cleaner than CREATE and DESTROY ioctl()'s.
> 
> I have not introduced any new ioctl()'s as part of Pipe controller
> implementation.

Sure. What you did is basically worse than ioctl()'s. You've implemented them 
as socket options. Socket options are meant to configure parameters with 
setsockopt and read paramters with getsockopt. They are not meant for 'doing' 
things - that's what ioctl()'s are for.

> The PIPE_CREATE/PIPE_DESTROY/PIPE_ENABLE/PIPE_DISABLE are all provided
> as socket options.
> So, user-space can call setsockopt for creating/enabling or
> disabling/destroying pipe.

That makes absolutely no sense if you consider how setsockopt and getsockopt 
are supposed to work.

> Regarding implementing connect() socket call, few queries:
> 1. It should carry out all the same steps which I am currently doing as
> part of PIPE_CREATE socket option, right?
> 2. Currently, as part of Pipe controller implementation, user-space
>    follows below sequence:-
> 	socket()
> 	bind()
> 	listen()
> 	setsockopt(PIPE_CREATE)
> 	accept()'
> 
>    In the phonet stack pipe controller logic, we wait for PEP_CONNECT_RESP
>    from host-pep (GPRS socket or video telephony socket is a host-pep.
>    pep_reply sends out the PEP_CONNECT_RESP) and remote-pep (modem),
>    negotiate the best flow-control to be used, and then send
>    PIPE_CREATED_IND, with selected flow-control to both pipe end-points.

connect() should replace listen(), PIPE_CREATE and accept().

-- 
Rémi Denis-Courmont
http://www.remlab.net/

^ permalink raw reply

* Re: VLAN packets silently dropped in promiscuous mode
From: Eric Dumazet @ 2010-10-01  8:41 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <1285909831.2705.41.camel@edumazet-laptop>

Le vendredi 01 octobre 2010 à 07:10 +0200, Eric Dumazet a écrit :
> Le jeudi 30 septembre 2010 à 19:37 -0700, Jesse Gross a écrit :
> 
> > That's true.  Dropping here seems roughly equivalent to the effects of
> > a hardware VLAN filter, which will also not be tracked by a counter,
> > so that seems not too bad to me.
> > 
> > The thing that concerns me though is why so many drivers seem to have
> > this problem with completely dropping the VLAN header.  I know that
> > even several of the ones that work now were broken initially and had
> > to be fixed.  Seeing as the driver drops the VLAN information before
> > it gets to the general networking code I don't see a generic fix to
> > this as it is currently setup.  However, perhaps we could make it so
> > that it is harder to get wrong.  Something like this:
> > 
> > * Allow vlan_gro_receive() to take a NULL VLAN group and a tag of 0
> > (and do the same thing for vlan_hwaccel_rx())
> > * Now that the vlan functions can deal with non-VLAN packets, merge
> > them into their non-VLAN counterparts.
> > * We can now demultiplex between the VLAN/non-VLAN case in core
> > networking.  This is done anyways, it just prevents every driver from
> > needing that code block I copied above and allows us to fix these
> > types of problems centrally.
> > * Dump the VLAN tag into the SKB and hand off the packet to the
> > various consumers: VLAN devices, libpcap, bridge hook (not currently
> > done but should be for trunking).
> > 
> > I see a number of advantages of this:
> > * Fixes all the problems with cards dropping VLAN headers at once.
> > * Avoids having to disable VLAN acceleration when in promiscuous mode
> > (good for bridging since it always puts devices in promiscuous mode).
> > * Keeps VLAN tag separate until given to ultimate consumer, which
> > avoids needing to do header reconstruction as in tg3 unless absolutely
> > necessary.
> > * Consolidates common driver code in core networking.
> 
> This seems very reasonable ;)

Jesse, do you plan to work on this stuff yourself in a near future ?



^ permalink raw reply

* Re: [net-next-2.6 PATCH 3/3] enic: Update MAINTAINERS
From: David Miller @ 2010-10-01  7:37 UTC (permalink / raw)
  To: vkolluri; +Cc: netdev, shemminger, roprabhu, dwang2
In-Reply-To: <20100930233550.16528.4515.stgit@savbu-pc100.cisco.com>

From: Vasanthy Kolluri <vkolluri@cisco.com>
Date: Thu, 30 Sep 2010 16:36:05 -0700

> From: Vasanthy Kolluri <vkolluri@cisco.com>
> 
> Update MAINTAINERS list
> 
> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] enic: Make local functions static
From: David Miller @ 2010-10-01  7:37 UTC (permalink / raw)
  To: vkolluri; +Cc: netdev, shemminger, roprabhu, dwang2
In-Reply-To: <20100930233540.16528.14130.stgit@savbu-pc100.cisco.com>

From: Vasanthy Kolluri <vkolluri@cisco.com>
Date: Thu, 30 Sep 2010 16:35:45 -0700

> From: Vasanthy Kolluri <vkolluri@cisco.com>
> 
> Make functions used locally in a file as static
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/3] enic: Remove dead code
From: David Miller @ 2010-10-01  7:37 UTC (permalink / raw)
  To: vkolluri; +Cc: netdev, shemminger, roprabhu, dwang2
In-Reply-To: <20100930233503.16528.47303.stgit@savbu-pc100.cisco.com>

From: Vasanthy Kolluri <vkolluri@cisco.com>
Date: Thu, 30 Sep 2010 16:35:34 -0700

> From: Vasanthy Kolluri <vkolluri@cisco.com>
> 
> Removed code that is unused
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] neigh: reorder fields in struct neighbour
From: David Miller @ 2010-10-01  7:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285860989.2615.650.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 17:36:29 +0200

> On 64bit arches, there are two 32bit holes that we can remove.
> 
> sizeof(struct neighbour) shrinks from 0xf8 to 0xf0 bytes
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/9] Gigaset patches for net-next
From: David Miller @ 2010-10-01  7:35 UTC (permalink / raw)
  To: tilman; +Cc: isdn, hjlipp, keil, i4ldeveloper, netdev, linux-kernel
In-Reply-To: <20100930-patch-gigaset-00.tilman@imap.cc>

From: Tilman Schmidt <tilman@imap.cc>
Date: Fri,  1 Oct 2010 01:34:20 +0200 (CEST)

> here's a series of nine patches to the Gigaset driver.
> Please consider for application to net-next-2.6.
> The first three should also go into -stable and are
> tagged "CC: stable" accordingly.

All applied, thanks Tilman.

^ permalink raw reply

* Re: [PATCH net-next 0/8] tg3: Bugfixes and updates
From: David Miller @ 2010-10-01  7:26 UTC (permalink / raw)
  To: mcarlson; +Cc: netdev, andy
In-Reply-To: <1285878877-12148-1-git-send-email-mcarlson@broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Thu, 30 Sep 2010 13:34:29 -0700

> This patchset implements some bugfixes, removes the 5724 device
> ID and introduces extended rx buffer rings.

All applied....

But really, I want to hear some real justification for a 2048 entry RX
ring at gigabit speeds.  I even think 512 is way too large for gigabit
parts.

Any machine that gets one of these newer 5717 parts does not need that
much queueing, and too deep queues tend to hurt locality and thus
performance.

^ permalink raw reply

* Re: [PATCH v12 06/17] Use callback to deal with skb_release_data() specially.
From: David Miller @ 2010-10-01  7:14 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, herbert, jdike
In-Reply-To: <f2bf65a8a7676bbd3e8a749ae93d99e88671d35d.1285853725.git.xiaohui.xin@intel.com>

From: xiaohui.xin@intel.com
Date: Thu, 30 Sep 2010 22:04:23 +0800

> @@ -197,10 +197,11 @@ struct skb_shared_info {
>  	union skb_shared_tx tx_flags;
>  	struct sk_buff	*frag_list;
>  	struct skb_shared_hwtstamps hwtstamps;
> -	skb_frag_t	frags[MAX_SKB_FRAGS];
>  	/* Intermediate layers must ensure that destructor_arg
>  	 * remains valid until skb destructor */
>  	void *		destructor_arg;
> +
> +	skb_frag_t	frags[MAX_SKB_FRAGS];
>  };
>  
>  /* The structure is for a skb which pages may point to

Why are you moving frags[] to the end like this?

^ permalink raw reply

* [PATCH net-next] net: add a core netdev->rx_dropped counter
From: Eric Dumazet @ 2010-10-01  7:06 UTC (permalink / raw)
  To: Jesse Gross, David Miller; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <1285909831.2705.41.camel@edumazet-laptop>

Le vendredi 01 octobre 2010 à 07:10 +0200, Eric Dumazet a écrit :

> This seems very reasonable ;)
> 
> I'll add a counter, a core generalization of 
> commit 8990f468a (net: rx_dropped accounting)
> 
> Because we can drop packets _after_ netif_rx() if RPS is in action
> anyway.
> 
> 

In this patch I fold the additional dev->rx_dropped into get_stats()
structure. We might chose to not fold it, and provides this counter in a
new /proc/net/dev column, a new rtnetlink attribute (and appropriate
iproute2 change)

What do you think ?

[PATCH net-next] net: add a core netdev->rx_dropped counter

In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)

We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)

This is a generalization of commit 8990f468a (net: rx_dropped
accounting), thus reverting it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/loopback.c    |    8 +-------
 include/linux/netdevice.h |    3 +++
 net/8021q/vlan.h          |    2 --
 net/8021q/vlan_core.c     |    2 ++
 net/8021q/vlan_dev.c      |   11 ++++-------
 net/core/dev.c            |   19 +++++++++++--------
 net/ipv4/ip_gre.c         |    3 +--
 net/ipv4/ipip.c           |    3 +--
 net/ipv6/ip6_tunnel.c     |    3 +--
 net/ipv6/ip6mr.c          |    3 +--
 net/ipv6/sit.c            |    3 +--
 11 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 4b0e30b..2d9663a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -64,7 +64,6 @@ struct pcpu_lstats {
 	u64			packets;
 	u64			bytes;
 	struct u64_stats_sync	syncp;
-	unsigned long		drops;
 };
 
 /*
@@ -90,8 +89,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 		lb_stats->bytes += len;
 		lb_stats->packets++;
 		u64_stats_update_end(&lb_stats->syncp);
-	} else
-		lb_stats->drops++;
+	}
 
 	return NETDEV_TX_OK;
 }
@@ -101,7 +99,6 @@ static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev,
 {
 	u64 bytes = 0;
 	u64 packets = 0;
-	u64 drops = 0;
 	int i;
 
 	for_each_possible_cpu(i) {
@@ -115,14 +112,11 @@ static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev,
 			tbytes = lb_stats->bytes;
 			tpackets = lb_stats->packets;
 		} while (u64_stats_fetch_retry(&lb_stats->syncp, start));
-		drops   += lb_stats->drops;
 		bytes   += tbytes;
 		packets += tpackets;
 	}
 	stats->rx_packets = packets;
 	stats->tx_packets = packets;
-	stats->rx_dropped = drops;
-	stats->rx_errors  = drops;
 	stats->rx_bytes   = bytes;
 	stats->tx_bytes   = bytes;
 	return stats;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ceed347..444f042 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -884,6 +884,9 @@ struct net_device {
 	int			iflink;
 
 	struct net_device_stats	stats;
+	atomic_long_t		rx_dropped; /* dropped packets by core network
+					     * Do not use this in drivers.
+					     */
 
 #ifdef CONFIG_WIRELESS_EXT
 	/* List of functions to handle Wireless Extensions (instead of ioctl).
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index b26ce34..8d9503a 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -25,7 +25,6 @@ struct vlan_priority_tci_mapping {
  *	@rx_multicast: number of received multicast packets
  *	@syncp: synchronization point for 64bit counters
  *	@rx_errors: number of errors
- *	@rx_dropped: number of dropped packets
  */
 struct vlan_rx_stats {
 	u64			rx_packets;
@@ -33,7 +32,6 @@ struct vlan_rx_stats {
 	u64			rx_multicast;
 	struct u64_stats_sync	syncp;
 	unsigned long		rx_errors;
-	unsigned long		rx_dropped;
 };
 
 /**
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 0eb486d..35a04a1 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -30,6 +30,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
 	return polling ? netif_receive_skb(skb) : netif_rx(skb);
 
 drop:
+	atomic_long_inc(&skb->dev->rx_dropped);
 	dev_kfree_skb_any(skb);
 	return NET_RX_DROP;
 }
@@ -117,6 +118,7 @@ vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
 	return dev_gro_receive(napi, skb);
 
 drop:
+	atomic_long_inc(&skb->dev->rx_dropped);
 	return GRO_DROP;
 }
 
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f6fbcc0..f54251e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -225,16 +225,15 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
 		}
 	}
 
-	if (unlikely(netif_rx(skb) == NET_RX_DROP)) {
-		if (rx_stats)
-			rx_stats->rx_dropped++;
-	}
+	netif_rx(skb);
+
 	rcu_read_unlock();
 	return NET_RX_SUCCESS;
 
 err_unlock:
 	rcu_read_unlock();
 err_free:
+	atomic_long_inc(&dev->rx_dropped);
 	kfree_skb(skb);
 	return NET_RX_DROP;
 }
@@ -846,15 +845,13 @@ static struct rtnl_link_stats64 *vlan_dev_get_stats64(struct net_device *dev, st
 			accum.rx_packets += rxpackets;
 			accum.rx_bytes   += rxbytes;
 			accum.rx_multicast += rxmulticast;
-			/* rx_errors, rx_dropped are ulong, not protected by syncp */
+			/* rx_errors is ulong, not protected by syncp */
 			accum.rx_errors  += p->rx_errors;
-			accum.rx_dropped += p->rx_dropped;
 		}
 		stats->rx_packets = accum.rx_packets;
 		stats->rx_bytes   = accum.rx_bytes;
 		stats->rx_errors  = accum.rx_errors;
 		stats->multicast  = accum.rx_multicast;
-		stats->rx_dropped = accum.rx_dropped;
 	}
 	return stats;
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index a313bab..5143663 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1483,8 +1483,9 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 	skb_orphan(skb);
 	nf_reset(skb);
 
-	if (!(dev->flags & IFF_UP) ||
-	    (skb->len > (dev->mtu + dev->hard_header_len))) {
+	if (unlikely(!(dev->flags & IFF_UP) ||
+		     (skb->len > (dev->mtu + dev->hard_header_len)))) {
+		atomic_long_inc(&dev->rx_dropped);
 		kfree_skb(skb);
 		return NET_RX_DROP;
 	}
@@ -2548,6 +2549,7 @@ enqueue:
 
 	local_irq_restore(flags);
 
+	atomic_long_inc(&skb->dev->rx_dropped);
 	kfree_skb(skb);
 	return NET_RX_DROP;
 }
@@ -2996,6 +2998,7 @@ ncls:
 	if (pt_prev) {
 		ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
+		atomic_long_inc(&skb->dev->rx_dropped);
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
@@ -5431,14 +5434,14 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 
 	if (ops->ndo_get_stats64) {
 		memset(storage, 0, sizeof(*storage));
-		return ops->ndo_get_stats64(dev, storage);
-	}
-	if (ops->ndo_get_stats) {
+		ops->ndo_get_stats64(dev, storage);
+	} else if (ops->ndo_get_stats) {
 		netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev));
-		return storage;
+	} else {
+		netdev_stats_to_stats64(storage, &dev->stats);
+		dev_txq_stats_fold(dev, storage);
 	}
-	netdev_stats_to_stats64(storage, &dev->stats);
-	dev_txq_stats_fold(dev, storage);
+	storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
 	return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fbe2c47..9d421f4 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -679,8 +679,7 @@ static int ipgre_rcv(struct sk_buff *skb)
 		skb_reset_network_header(skb);
 		ipgre_ecn_decapsulate(iph, skb);
 
-		if (netif_rx(skb) == NET_RX_DROP)
-			tunnel->dev->stats.rx_dropped++;
+		netif_rx(skb);
 
 		rcu_read_unlock();
 		return 0;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 6ad46c2..e9b816e 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -414,8 +414,7 @@ static int ipip_rcv(struct sk_buff *skb)
 
 		ipip_ecn_decapsulate(iph, skb);
 
-		if (netif_rx(skb) == NET_RX_DROP)
-			tunnel->dev->stats.rx_dropped++;
+		netif_rx(skb);
 
 		rcu_read_unlock();
 		return 0;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 8be3c45..c2c0f89 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -768,8 +768,7 @@ static int ip6_tnl_rcv(struct sk_buff *skb, __u16 protocol,
 
 		dscp_ecn_decapsulate(t, ipv6h, skb);
 
-		if (netif_rx(skb) == NET_RX_DROP)
-			t->dev->stats.rx_dropped++;
+		netif_rx(skb);
 
 		rcu_read_unlock();
 		return 0;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 2640c9b..6f32ffc 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -666,8 +666,7 @@ static int pim6_rcv(struct sk_buff *skb)
 
 	skb_tunnel_rx(skb, reg_dev);
 
-	if (netif_rx(skb) == NET_RX_DROP)
-		reg_dev->stats.rx_dropped++;
+	netif_rx(skb);
 
 	dev_put(reg_dev);
 	return 0;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index d770178..367a6cc 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -600,8 +600,7 @@ static int ipip6_rcv(struct sk_buff *skb)
 
 		ipip6_ecn_decapsulate(iph, skb);
 
-		if (netif_rx(skb) == NET_RX_DROP)
-			tunnel->dev->stats.rx_dropped++;
+		netif_rx(skb);
 
 		rcu_read_unlock();
 		return 0;



^ permalink raw reply related

* [PATCH 17/18] net: Fix endianess issues in IBM newemac driver
From: Ian Munsie @ 2010-10-01  7:06 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, benh
  Cc: paulus, Ian Munsie, David S. Miller, Grant Likely, Jiri Pirko,
	Sean MacLennan, Tejun Heo, netdev, devicetree-discuss
In-Reply-To: <1285916771-18033-1-git-send-email-imunsie@au1.ibm.com>

From: Ian Munsie <imunsie@au1.ibm.com>

This patch fixes all the device tree and ring buffer accesses in the IBM
newemac driver.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 drivers/net/ibm_newemac/core.c |   68 ++++++++++++++++++++--------------------
 drivers/net/ibm_newemac/mal.c  |    6 ++--
 drivers/net/ibm_newemac/mal.h  |    6 ++--
 3 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 3506fd6..67238b8 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -981,12 +981,12 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
 	 * to simplify error recovery in the case of allocation failure later.
 	 */
 	for (i = 0; i < NUM_RX_BUFF; ++i) {
-		if (dev->rx_desc[i].ctrl & MAL_RX_CTRL_FIRST)
+		if (dev->rx_desc[i].ctrl & cpu_to_be16(MAL_RX_CTRL_FIRST))
 			++dev->estats.rx_dropped_resize;
 
 		dev->rx_desc[i].data_len = 0;
-		dev->rx_desc[i].ctrl = MAL_RX_CTRL_EMPTY |
-		    (i == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
+		dev->rx_desc[i].ctrl = cpu_to_be16(MAL_RX_CTRL_EMPTY |
+		    (i == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0));
 	}
 
 	/* Reallocate RX ring only if bigger skb buffers are required */
@@ -1005,9 +1005,9 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
 		dev_kfree_skb(dev->rx_skb[i]);
 
 		skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
-		dev->rx_desc[i].data_ptr =
+		dev->rx_desc[i].data_ptr = cpu_to_be32(
 		    dma_map_single(&dev->ofdev->dev, skb->data - 2, rx_sync_size,
-				   DMA_FROM_DEVICE) + 2;
+				   DMA_FROM_DEVICE) + 2);
 		dev->rx_skb[i] = skb;
 	}
  skip:
@@ -1067,7 +1067,7 @@ static void emac_clean_tx_ring(struct emac_instance *dev)
 		if (dev->tx_skb[i]) {
 			dev_kfree_skb(dev->tx_skb[i]);
 			dev->tx_skb[i] = NULL;
-			if (dev->tx_desc[i].ctrl & MAL_TX_CTRL_READY)
+			if (dev->tx_desc[i].ctrl & cpu_to_be16(MAL_TX_CTRL_READY))
 				++dev->estats.tx_dropped;
 		}
 		dev->tx_desc[i].ctrl = 0;
@@ -1104,12 +1104,12 @@ static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
 	dev->rx_desc[slot].data_len = 0;
 
 	skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
-	dev->rx_desc[slot].data_ptr =
+	dev->rx_desc[slot].data_ptr = cpu_to_be32(
 	    dma_map_single(&dev->ofdev->dev, skb->data - 2, dev->rx_sync_size,
-			   DMA_FROM_DEVICE) + 2;
+			   DMA_FROM_DEVICE) + 2);
 	wmb();
-	dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY |
-	    (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
+	dev->rx_desc[slot].ctrl = cpu_to_be16(MAL_RX_CTRL_EMPTY |
+	    (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0));
 
 	return 0;
 }
@@ -1373,12 +1373,12 @@ static int emac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	DBG2(dev, "xmit(%u) %d" NL, len, slot);
 
 	dev->tx_skb[slot] = skb;
-	dev->tx_desc[slot].data_ptr = dma_map_single(&dev->ofdev->dev,
+	dev->tx_desc[slot].data_ptr = cpu_to_be32(dma_map_single(&dev->ofdev->dev,
 						     skb->data, len,
-						     DMA_TO_DEVICE);
-	dev->tx_desc[slot].data_len = (u16) len;
+						     DMA_TO_DEVICE));
+	dev->tx_desc[slot].data_len = cpu_to_be16(len);
 	wmb();
-	dev->tx_desc[slot].ctrl = ctrl;
+	dev->tx_desc[slot].ctrl = cpu_to_be16(ctrl);
 
 	return emac_xmit_finish(dev, len);
 }
@@ -1399,9 +1399,9 @@ static inline int emac_xmit_split(struct emac_instance *dev, int slot,
 			ctrl |= MAL_TX_CTRL_WRAP;
 
 		dev->tx_skb[slot] = NULL;
-		dev->tx_desc[slot].data_ptr = pd;
-		dev->tx_desc[slot].data_len = (u16) chunk;
-		dev->tx_desc[slot].ctrl = ctrl;
+		dev->tx_desc[slot].data_ptr = cpu_to_be32(pd);
+		dev->tx_desc[slot].data_len = cpu_to_be16(chunk);
+		dev->tx_desc[slot].ctrl = cpu_to_be16(ctrl);
 		++dev->tx_cnt;
 
 		if (!len)
@@ -1442,9 +1442,9 @@ static int emac_start_xmit_sg(struct sk_buff *skb, struct net_device *ndev)
 	/* skb data */
 	dev->tx_skb[slot] = NULL;
 	chunk = min(len, MAL_MAX_TX_SIZE);
-	dev->tx_desc[slot].data_ptr = pd =
-	    dma_map_single(&dev->ofdev->dev, skb->data, len, DMA_TO_DEVICE);
-	dev->tx_desc[slot].data_len = (u16) chunk;
+	dev->tx_desc[slot].data_ptr = cpu_to_be32(pd =
+	    dma_map_single(&dev->ofdev->dev, skb->data, len, DMA_TO_DEVICE));
+	dev->tx_desc[slot].data_len = cpu_to_be16(chunk);
 	len -= chunk;
 	if (unlikely(len))
 		slot = emac_xmit_split(dev, slot, pd + chunk, len, !nr_frags,
@@ -1473,7 +1473,7 @@ static int emac_start_xmit_sg(struct sk_buff *skb, struct net_device *ndev)
 	if (dev->tx_slot == NUM_TX_BUFF - 1)
 		ctrl |= MAL_TX_CTRL_WRAP;
 	wmb();
-	dev->tx_desc[dev->tx_slot].ctrl = ctrl;
+	dev->tx_desc[dev->tx_slot].ctrl = cpu_to_be16(ctrl);
 	dev->tx_slot = (slot + 1) % NUM_TX_BUFF;
 
 	return emac_xmit_finish(dev, skb->len);
@@ -1541,7 +1541,7 @@ static void emac_poll_tx(void *param)
 		u16 ctrl;
 		int slot = dev->ack_slot, n = 0;
 	again:
-		ctrl = dev->tx_desc[slot].ctrl;
+		ctrl = be16_to_cpu(dev->tx_desc[slot].ctrl);
 		if (!(ctrl & MAL_TX_CTRL_READY)) {
 			struct sk_buff *skb = dev->tx_skb[slot];
 			++n;
@@ -1583,8 +1583,8 @@ static inline void emac_recycle_rx_skb(struct emac_instance *dev, int slot,
 
 	dev->rx_desc[slot].data_len = 0;
 	wmb();
-	dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY |
-	    (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
+	dev->rx_desc[slot].ctrl = cpu_to_be16(MAL_RX_CTRL_EMPTY |
+	    (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0));
 }
 
 static void emac_parse_rx_error(struct emac_instance *dev, u16 ctrl)
@@ -1628,7 +1628,7 @@ static inline void emac_rx_csum(struct emac_instance *dev,
 static inline int emac_rx_sg_append(struct emac_instance *dev, int slot)
 {
 	if (likely(dev->rx_sg_skb != NULL)) {
-		int len = dev->rx_desc[slot].data_len;
+		int len = be16_to_cpu(dev->rx_desc[slot].data_len);
 		int tot_len = dev->rx_sg_skb->len + len;
 
 		if (unlikely(tot_len + 2 > dev->rx_skb_size)) {
@@ -1659,14 +1659,14 @@ static int emac_poll_rx(void *param, int budget)
 	while (budget > 0) {
 		int len;
 		struct sk_buff *skb;
-		u16 ctrl = dev->rx_desc[slot].ctrl;
+		u16 ctrl = be16_to_cpu(dev->rx_desc[slot].ctrl);
 
 		if (ctrl & MAL_RX_CTRL_EMPTY)
 			break;
 
 		skb = dev->rx_skb[slot];
 		mb();
-		len = dev->rx_desc[slot].data_len;
+		len = be16_to_cpu(dev->rx_desc[slot].data_len);
 
 		if (unlikely(!MAL_IS_SINGLE_RX(ctrl)))
 			goto sg;
@@ -1757,7 +1757,7 @@ static int emac_poll_rx(void *param, int budget)
 
 	if (unlikely(budget && test_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags))) {
 		mb();
-		if (!(dev->rx_desc[slot].ctrl & MAL_RX_CTRL_EMPTY)) {
+		if (!(dev->rx_desc[slot].ctrl & cpu_to_be16(MAL_RX_CTRL_EMPTY))) {
 			DBG2(dev, "rx restart" NL);
 			received = 0;
 			goto again;
@@ -1783,7 +1783,7 @@ static int emac_peek_rx(void *param)
 {
 	struct emac_instance *dev = param;
 
-	return !(dev->rx_desc[dev->rx_slot].ctrl & MAL_RX_CTRL_EMPTY);
+	return !(dev->rx_desc[dev->rx_slot].ctrl & cpu_to_be16(MAL_RX_CTRL_EMPTY));
 }
 
 /* NAPI poll context */
@@ -1793,7 +1793,7 @@ static int emac_peek_rx_sg(void *param)
 
 	int slot = dev->rx_slot;
 	while (1) {
-		u16 ctrl = dev->rx_desc[slot].ctrl;
+		u16 ctrl = be16_to_cpu(dev->rx_desc[slot].ctrl);
 		if (ctrl & MAL_RX_CTRL_EMPTY)
 			return 0;
 		else if (ctrl & MAL_RX_CTRL_LAST)
@@ -2367,14 +2367,14 @@ static int __devinit emac_read_uint_prop(struct device_node *np, const char *nam
 					 u32 *val, int fatal)
 {
 	int len;
-	const u32 *prop = of_get_property(np, name, &len);
+	const __be32 *prop = of_get_property(np, name, &len);
 	if (prop == NULL || len < sizeof(u32)) {
 		if (fatal)
 			printk(KERN_ERR "%s: missing %s property\n",
 			       np->full_name, name);
 		return -ENODEV;
 	}
-	*val = *prop;
+	*val = be32_to_cpup(prop);
 	return 0;
 }
 
@@ -3013,7 +3013,7 @@ static void __init emac_make_bootlist(void)
 
 	/* Collect EMACs */
 	while((np = of_find_all_nodes(np)) != NULL) {
-		const u32 *idx;
+		const __be32 *idx;
 
 		if (of_match_node(emac_match, np) == NULL)
 			continue;
@@ -3022,7 +3022,7 @@ static void __init emac_make_bootlist(void)
 		idx = of_get_property(np, "cell-index", NULL);
 		if (idx == NULL)
 			continue;
-		cell_indices[i] = *idx;
+		cell_indices[i] = be32_to_cpup(idx);
 		emac_boot_list[i++] = of_node_get(np);
 		if (i >= EMAC_BOOT_LIST_SIZE) {
 			of_node_put(np);
diff --git a/drivers/net/ibm_newemac/mal.c b/drivers/net/ibm_newemac/mal.c
index d5717e2..9e4939e 100644
--- a/drivers/net/ibm_newemac/mal.c
+++ b/drivers/net/ibm_newemac/mal.c
@@ -524,7 +524,7 @@ static int __devinit mal_probe(struct platform_device *ofdev,
 	int err = 0, i, bd_size;
 	int index = mal_count++;
 	unsigned int dcr_base;
-	const u32 *prop;
+	const __be32 *prop;
 	u32 cfg;
 	unsigned long irqflags;
 	irq_handler_t hdlr_serr, hdlr_txde, hdlr_rxde;
@@ -550,7 +550,7 @@ static int __devinit mal_probe(struct platform_device *ofdev,
 		err = -ENODEV;
 		goto fail;
 	}
-	mal->num_tx_chans = prop[0];
+	mal->num_tx_chans = be32_to_cpu(prop[0]);
 
 	prop = of_get_property(ofdev->dev.of_node, "num-rx-chans", NULL);
 	if (prop == NULL) {
@@ -560,7 +560,7 @@ static int __devinit mal_probe(struct platform_device *ofdev,
 		err = -ENODEV;
 		goto fail;
 	}
-	mal->num_rx_chans = prop[0];
+	mal->num_rx_chans = be32_to_cpu(prop[0]);
 
 	dcr_base = dcr_resource_start(ofdev->dev.of_node, 0);
 	if (dcr_base == 0) {
diff --git a/drivers/net/ibm_newemac/mal.h b/drivers/net/ibm_newemac/mal.h
index 6608421..b8ee413 100644
--- a/drivers/net/ibm_newemac/mal.h
+++ b/drivers/net/ibm_newemac/mal.h
@@ -147,9 +147,9 @@ static inline int mal_tx_chunks(int len)
 
 /* MAL Buffer Descriptor structure */
 struct mal_descriptor {
-	u16 ctrl;		/* MAL / Commac status control bits */
-	u16 data_len;		/* Max length is 4K-1 (12 bits)     */
-	u32 data_ptr;		/* pointer to actual data buffer    */
+	__be16 ctrl;		/* MAL / Commac status control bits */
+	__be16 data_len;	/* Max length is 4K-1 (12 bits)     */
+	__be32 data_ptr;	/* pointer to actual data buffer    */
 };
 
 /* the following defines are for the MadMAL status and control registers. */
-- 
1.7.1

^ permalink raw reply related

* [PATCH V4] fs: allow for more than 2^31 files
From: Eric Dumazet @ 2010-10-01  5:29 UTC (permalink / raw)
  To: Robin Holt
  Cc: David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm,
	npiggin, xemul, linux-kernel, netdev
In-Reply-To: <1285909434.2705.35.camel@edumazet-laptop>

Le vendredi 01 octobre 2010 à 07:03 +0200, Eric Dumazet a écrit :
> Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit :
> 
> > The proc_handler used to be proc_nr_files() which would call
> > get_nr_files() and deposit the result in files_stat.nr_files then cascade
> > to proc_dointvec() which would dump the 3 values.  Now it will dump the
> > three values, but not update the middle (nr_files) value first.
> > 
> 
> Ah I get it now, thanks !
> 
> I'll send a V4 shortly.
> 
> 

In this v4, I call proc_nr_files() again, and proc_nr_files() calls
proc_doulongvec_minmax() instead of proc_dointvec()

Added the "cat /proc/sys/fs/file-nr" in Changelog

Thanks again Robin

[PATCH V3] fs: allow for more than 2^31 files

Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

<quote>

We were seeing a failure which prevented boot.  The kernel was incapable
of creating either a named pipe or unix domain socket.  This comes down
to a common kernel function called unix_create1() which does:

        atomic_inc(&unix_nr_socks);
        if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

        n = (mempages * (PAGE_SIZE / 1024)) / 10;
        files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000).  That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

</quote>

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of
atomic_t.

get_max_files() is changed to return an unsigned long.
get_nr_files() is changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.
 
Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704	0	2147483648


Reported-by: Robin Holt <holt@sgi.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/file_table.c    |   17 +++++++----------
 include/linux/fs.h |    8 ++++----
 kernel/sysctl.c    |    6 +++---
 net/unix/af_unix.c |   14 +++++++-------
 4 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index a04bdd8..c3dee38 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -60,7 +60,7 @@ static inline void file_free(struct file *f)
 /*
  * Return the total number of open files in the system
  */
-static int get_nr_files(void)
+static long get_nr_files(void)
 {
 	return percpu_counter_read_positive(&nr_files);
 }
@@ -68,7 +68,7 @@ static int get_nr_files(void)
 /*
  * Return the maximum number of open files in the system
  */
-int get_max_files(void)
+unsigned long get_max_files(void)
 {
 	return files_stat.max_files;
 }
@@ -82,7 +82,7 @@ int proc_nr_files(ctl_table *table, int write,
                      void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	files_stat.nr_files = get_nr_files();
-	return proc_dointvec(table, write, buffer, lenp, ppos);
+	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #else
 int proc_nr_files(ctl_table *table, int write,
@@ -105,7 +105,7 @@ int proc_nr_files(ctl_table *table, int write,
 struct file *get_empty_filp(void)
 {
 	const struct cred *cred = current_cred();
-	static int old_max;
+	static long old_max;
 	struct file * f;
 
 	/*
@@ -140,8 +140,7 @@ struct file *get_empty_filp(void)
 over:
 	/* Ran out of filps - report that */
 	if (get_nr_files() > old_max) {
-		printk(KERN_INFO "VFS: file-max limit %d reached\n",
-					get_max_files());
+		pr_info("VFS: file-max limit %lu reached\n", get_max_files());
 		old_max = get_nr_files();
 	}
 	goto fail;
@@ -487,7 +486,7 @@ retry:
 
 void __init files_init(unsigned long mempages)
 { 
-	int n; 
+	unsigned long n;
 
 	filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
 			SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
@@ -498,9 +497,7 @@ void __init files_init(unsigned long mempages)
 	 */ 
 
 	n = (mempages * (PAGE_SIZE / 1024)) / 10;
-	files_stat.max_files = n; 
-	if (files_stat.max_files < NR_FILE)
-		files_stat.max_files = NR_FILE;
+	files_stat.max_files = max_t(unsigned long, n, NR_FILE);
 	files_defer_init();
 	lg_lock_init(files_lglock);
 	percpu_counter_init(&nr_files, 0);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 63d069b..8c06590 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -34,9 +34,9 @@
 
 /* And dynamically-tunable limits and defaults: */
 struct files_stat_struct {
-	int nr_files;		/* read only */
-	int nr_free_files;	/* read only */
-	int max_files;		/* tunable */
+	unsigned long nr_files;		/* read only */
+	unsigned long nr_free_files;	/* read only */
+	unsigned long max_files;		/* tunable */
 };
 
 struct inodes_stat_t {
@@ -404,7 +404,7 @@ extern void __init inode_init_early(void);
 extern void __init files_init(unsigned long);
 
 extern struct files_stat_struct files_stat;
-extern int get_max_files(void);
+extern unsigned long get_max_files(void);
 extern int sysctl_nr_open;
 extern struct inodes_stat_t inodes_stat;
 extern int leases_enable, lease_break_time;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f88552c..f789a0a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = {
 	{
 		.procname	= "file-nr",
 		.data		= &files_stat,
-		.maxlen		= 3*sizeof(int),
+		.maxlen		= sizeof(files_stat),
 		.mode		= 0444,
 		.proc_handler	= proc_nr_files,
 	},
 	{
 		.procname	= "file-max",
 		.data		= &files_stat.max_files,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(files_stat.max_files),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_doulongvec_minmax,
 	},
 	{
 		.procname	= "nr_open",
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0b39b24..3e1d7d1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -117,7 +117,7 @@
 
 static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
 static DEFINE_SPINLOCK(unix_table_lock);
-static atomic_t unix_nr_socks = ATOMIC_INIT(0);
+static atomic_long_t unix_nr_socks;
 
 #define unix_sockets_unbound	(&unix_socket_table[UNIX_HASH_SIZE])
 
@@ -360,13 +360,13 @@ static void unix_sock_destructor(struct sock *sk)
 	if (u->addr)
 		unix_release_addr(u->addr);
 
-	atomic_dec(&unix_nr_socks);
+	atomic_long_dec(&unix_nr_socks);
 	local_bh_disable();
 	sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
 	local_bh_enable();
 #ifdef UNIX_REFCNT_DEBUG
-	printk(KERN_DEBUG "UNIX %p is destroyed, %d are still alive.\n", sk,
-		atomic_read(&unix_nr_socks));
+	printk(KERN_DEBUG "UNIX %p is destroyed, %ld are still alive.\n", sk,
+		atomic_long_read(&unix_nr_socks));
 #endif
 }
 
@@ -606,8 +606,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 	struct sock *sk = NULL;
 	struct unix_sock *u;
 
-	atomic_inc(&unix_nr_socks);
-	if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
+	atomic_long_inc(&unix_nr_socks);
+	if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files())
 		goto out;
 
 	sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto);
@@ -632,7 +632,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 	unix_insert_socket(unix_sockets_unbound, sk);
 out:
 	if (sk == NULL)
-		atomic_dec(&unix_nr_socks);
+		atomic_long_dec(&unix_nr_socks);
 	else {
 		local_bh_disable();
 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);

^ permalink raw reply related

* Re: [PATCH 1/2] net-next-2.6: SYN retransmits: Rename threshold variable
From: Damian Lukowski @ 2010-10-01  5:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100930.172337.220062330.davem@davemloft.net>

Am Donnerstag, den 30.09.2010, 17:23 -0700 schrieb David Miller:
> Damian please don't do things like this.


No problem. It was just for preventing the merge conflict Stephen
experienced, as I've seen that parameters have changed in net-next-2.6
already.

Damian

> When we make a change in net-2.6, that change is going to propagate into
> net-next-2.6 the next time I do a merge.
> 
> And in this case here, the addition of the "syn_set" boolean argument to
> retransmits_timed_out() will happen at that point.
> 
> So if anything, you should build on top of the bug fix we put into
> net-2.6 instead of duplicating the change.
> 
> Adding the same change in two different ways to net-2.6 and net-next-2.6
> makes the merge a pain in the neck for me and just makes things look
> real confusing.
> 
> I'm not applying these two patches, please ask me to merge net-2.6 into
> net-next-2.6 and this way you can code them relative to that.
> 
> Thanks!



^ permalink raw reply

* Re: VLAN packets silently dropped in promiscuous mode
From: Eric Dumazet @ 2010-10-01  5:10 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <AANLkTi=Vdcn7xzJMPxkugvEVy32N7Bp=KVtir6NESnDF@mail.gmail.com>

Le jeudi 30 septembre 2010 à 19:37 -0700, Jesse Gross a écrit :

> That's true.  Dropping here seems roughly equivalent to the effects of
> a hardware VLAN filter, which will also not be tracked by a counter,
> so that seems not too bad to me.
> 
> The thing that concerns me though is why so many drivers seem to have
> this problem with completely dropping the VLAN header.  I know that
> even several of the ones that work now were broken initially and had
> to be fixed.  Seeing as the driver drops the VLAN information before
> it gets to the general networking code I don't see a generic fix to
> this as it is currently setup.  However, perhaps we could make it so
> that it is harder to get wrong.  Something like this:
> 
> * Allow vlan_gro_receive() to take a NULL VLAN group and a tag of 0
> (and do the same thing for vlan_hwaccel_rx())
> * Now that the vlan functions can deal with non-VLAN packets, merge
> them into their non-VLAN counterparts.
> * We can now demultiplex between the VLAN/non-VLAN case in core
> networking.  This is done anyways, it just prevents every driver from
> needing that code block I copied above and allows us to fix these
> types of problems centrally.
> * Dump the VLAN tag into the SKB and hand off the packet to the
> various consumers: VLAN devices, libpcap, bridge hook (not currently
> done but should be for trunking).
> 
> I see a number of advantages of this:
> * Fixes all the problems with cards dropping VLAN headers at once.
> * Avoids having to disable VLAN acceleration when in promiscuous mode
> (good for bridging since it always puts devices in promiscuous mode).
> * Keeps VLAN tag separate until given to ultimate consumer, which
> avoids needing to do header reconstruction as in tg3 unless absolutely
> necessary.
> * Consolidates common driver code in core networking.

This seems very reasonable ;)

I'll add a counter, a core generalization of 
commit 8990f468a (net: rx_dropped accounting)

Because we can drop packets _after_ netif_rx() if RPS is in action
anyway.




^ permalink raw reply

* Re: [PATCH V3] fs: allow for more than 2^31 files
From: Eric Dumazet @ 2010-10-01  5:03 UTC (permalink / raw)
  To: Robin Holt
  Cc: David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm,
	npiggin, xemul, linux-kernel, netdev
In-Reply-To: <20101001043413.GN14068@sgi.com>

Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit :

> The proc_handler used to be proc_nr_files() which would call
> get_nr_files() and deposit the result in files_stat.nr_files then cascade
> to proc_dointvec() which would dump the 3 values.  Now it will dump the
> three values, but not update the middle (nr_files) value first.
> 

Ah I get it now, thanks !

I'll send a V4 shortly.




^ permalink raw reply

* Re: [PATCH V3] fs: allow for more than 2^31 files
From: Robin Holt @ 2010-10-01  4:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Robin Holt, David Miller, dipankar, viro, bcrl, den, mingo,
	mszeredi, cmm, npiggin, xemul, linux-kernel, netdev
In-Reply-To: <1285879545.2705.4.camel@edumazet-laptop>

On Thu, Sep 30, 2010 at 10:45:45PM +0200, Eric Dumazet wrote:
> Le jeudi 30 septembre 2010 à 15:26 -0500, Robin Holt a écrit :
> > On Tue, Sep 28, 2010 at 05:46:51AM +0200, Eric Dumazet wrote:
> > > Le lundi 27 septembre 2010 à 15:36 -0700, David Miller a écrit :
> > ...
> > 
> > > Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
> > > integers, and change af_unix to use an atomic_long_t instead of
> > > atomic_t.
> > > 
> > > get_max_files() is changed to return an unsigned long.
> > 
> > I _THINK_ we actually want get_max_files to return a long and have
> > the files_stat_struct definitions be longs.  If we do not have it that
> > way, we could theoretically open enough files on a single cpu to make
> > get_nr_files return a negative without overflowing max_files.  That,
> > of course, would require an insane amount of memory, but I think it is
> > technically more correct.
> > 
> 
> Number of opened file is technically a positive (or null) value, I have
> no idea why you want it being signed.
> 
> > 
> > > --- a/kernel/sysctl.c
> > > +++ b/kernel/sysctl.c
> > > @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = {
> > >  	{
> > >  		.procname	= "file-nr",
> > >  		.data		= &files_stat,
> > > -		.maxlen		= 3*sizeof(int),
> > > +		.maxlen		= sizeof(files_stat),
> > >  		.mode		= 0444,
> > > -		.proc_handler	= proc_nr_files,
> > > +		.proc_handler	= proc_doulongvec_minmax,
> > 
> > With this change, don't we lose the current nr_files value?  I think
> > you need proc_nr_files to stay as it was.  If you disagree, we should
> > probably eliminate the definitions for proc_nr_files as I don't believe
> > they are used anywhere else.
> > 
> 
> I have no idea why you think I changed something. I only made the value
> use 64bit on 64bit arches, so that we are not anymore limited to 2^31
> files.

The proc_handler used to be proc_nr_files() which would call
get_nr_files() and deposit the result in files_stat.nr_files then cascade
to proc_dointvec() which would dump the 3 values.  Now it will dump the
three values, but not update the middle (nr_files) value first.

Robin

^ permalink raw reply

* Re: [PATCH net-next 2/2] ipv4: rcu conversion in ip_route_output_slow
From: David Miller @ 2010-10-01  4:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285853638.2615.520.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 15:33:58 +0200

> ip_route_output_slow() is enclosed in an rcu_read_lock() protected
> section, so that no references are taken/released on device, thanks to
> __ip_dev_find() & dev_get_by_index_rcu()
> 
> Tested with ip route cache disabled, and a stress test :
 ...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Also applied, thanks!

^ permalink raw reply

* Re: PATCH net-next 1/2] ipv4: introduce __ip_dev_find()
From: David Miller @ 2010-10-01  4:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285853516.2615.515.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 15:31:56 +0200

> ip_dev_find(net, addr) finds a device given an IPv4 source address and
> takes a reference on it.
> 
> Introduce __ip_dev_find(), taking a third argument, to optionally take
> the device reference. Callers not asking the reference to be taken
> should be in an rcu_read_lock() protected section.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 3/3] e1000e: 82579 performance improvements
From: David Miller @ 2010-10-01  4:17 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, bruce.w.allan
In-Reply-To: <20100930073934.13378.44230.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 30 Sep 2010 00:39:37 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> The initial support for 82579 was tuned poorly for performance.  Adjust the
> packet buffer allocation appropriately for both standard and jumbo frames;
> and for jumbo frames increase the receive descriptor pre-fetch, disable
> adaptive interrupt moderation and set the DMA latency tolerance.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/3] e1000e: use hardware writeback batching
From: David Miller @ 2010-10-01  4:16 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, jesse.brandeburg
In-Reply-To: <20100930073814.13378.4212.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 30 Sep 2010 00:38:49 -0700

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> Most e1000e parts support batching writebacks.  The problem with this is
> that when some of the TADV or TIDV timers are not set, Tx can sit forever.
> 
> This is solved in this patch with write flushes using the Flush Partial
> Descriptors (FPD) bit in TIDV and RDTR.
> 
> This improves bus utilization and removes partial writes on e1000e,
> particularly from 82571 parts in S5500 chipset based machines.
> 
> Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
> testing load.
> 
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbe: fix link issues and panic with shared interrupts for 82598
From: David Miller @ 2010-10-01  4:16 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, emil.s.tantilov
In-Reply-To: <20100930073251.12750.67720.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 30 Sep 2010 00:35:23 -0700

> From: Emil Tantilov <emil.s.tantilov@intel.com>
> 
> Fix possible panic/hang with shared Legacy interrupts by not enabling
> interrupts when interface is down.
> 
> Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()
> 
> This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
> when enabling interrupts.
> 
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Tested-by: Stephen Ko <stephen.s.ko@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox