Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-20 11:03 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev
In-Reply-To: <20120620.031543.1511134879638711616.davem@davemloft.net>

On Wed, 2012-06-20 at 03:15 -0700, David Miller wrote:

> Here's what I have so far, the ipv6 implementation we get nearly for
> free :-)
> 
> Initially I tried to use ->gro_complete() for this as it was more
> natural, but we abort before we get there for a lot of cases where we
> want to use the early demux and cached route (ACKs, FINs, sub-mss
> sized packets, etc.)
> 

Seems very good, I only have one remark :


>  /*
>   *	From tcp_input.c
>   */
> @@ -2576,6 +2530,7 @@ void tcp4_proc_exit(void)
>  struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>  {
>  	const struct iphdr *iph = skb_gro_network_header(skb);
> +	struct sk_buff **pp;
>  
>  	switch (skb->ip_summed) {
>  	case CHECKSUM_COMPLETE:
> @@ -2591,7 +2546,36 @@ struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>  		return NULL;
>  	}
>  
> -	return tcp_gro_receive(head, skb);
> +	pp = tcp_gro_receive(head, skb);
> +
> +	if (!NAPI_GRO_CB(skb)->same_flow) {
> +		const struct tcphdr *th = tcp_hdr(skb);
> +		struct net_device *dev = skb->dev;
> +		struct sock *sk;
> +
> +		sk = __inet_lookup_established(dev_net(dev), &tcp_hashinfo,
> +					       iph->saddr, th->source,
> +					       iph->daddr, th->dest,
> +					       dev->ifindex);
> +		if (sk) {
> +			skb_orphan(skb);
> +			skb->sk = sk;
> +			skb->destructor = sock_edemux;
> +			if (!skb_dst(skb) &&

I am not sure we need the skb_dst(skb) test here, it should be NULL
anyway in GRO layer ? (loopback device don't use GRO ;) )

> +			    sk->sk_state != TCP_TIME_WAIT) {
> +				struct dst_entry *dst = sk->sk_rx_dst;
> +				if (dst)
> +					dst = dst_check(dst, 0);
> +				if (dst) {
> +					struct rtable *rt = (struct rtable *) dst;
> +
> +					if (rt->rt_iif == dev->ifindex)
> +						skb_dst_set_noref(skb, dst);
> +				}
> +			}
> +		}
> +	}
> +	return pp;
>  }
>  
>  int tcp4_gro_complete(struct sk_buff *skb)

^ permalink raw reply

* From Captain Miller Peterson
From: Captain Miller Peterson @ 2012-06-20 10:53 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 31 bytes --]

(Find Details Of Mail Attached)

^ permalink raw reply

* Re: [PATCH -v1 3/3] usbnet: handle remote wakeup asap
From: Sergei Shtylyov @ 2012-06-20 11:02 UTC (permalink / raw)
  To: Ming Lei
  Cc: David S. Miller, Greg Kroah-Hartman, Oliver Neukum,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1340176553-32225-4-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Hello.

On 20-06-2012 11:15, Ming Lei wrote:

> If usbnet is resumed by remote wakeup, generally there are
> some packets comming to be handled, so allocate and submit
> rx URBs in usbnet_resume to avoid delays introduced by tasklet.
> Otherwise, usbnet may have been runtime suspended before the
> usbnet_bh is executed to schedule Rx URBs.

> Without the patch, usbnet can't recieve any packets from peer
> in runtime suspend state if runtime PM is enabled and
> autosuspend_delay is set as zero.

> Signed-off-by: Ming Lei<ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> ---
>   drivers/net/usb/usbnet.c |   42 ++++++++++++++++++++++++++----------------
>   1 file changed, 26 insertions(+), 16 deletions(-)

> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 9bfa775..a89d6c5 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -1201,6 +1201,21 @@ deferred:
>   }
>   EXPORT_SYMBOL_GPL(usbnet_start_xmit);
>
> +static void rx_alloc_submit(struct usbnet *dev, gfp_t flags)
> +{
> +	struct urb	*urb;
> +	int		i;
> +
> +	/* don't refill the queue all at once */
> +	for (i = 0; i<  10&&  dev->rxq.qlen<  RX_QLEN(dev); i++) {
> +		urb = usb_alloc_urb(0, flags);
> +		if (urb != NULL) {
> +			if (rx_submit(dev, urb, flags) == -ENOLINK)

    The above 2 *if* statements can be collapsed into single one.

> +				return;
> +		}
> +	}
> +}
> +

WBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] usbnet: Activate halt interrupt endpoint before re-submit URB
From: Ming Lei @ 2012-06-20 10:56 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Huajun Li, David Miller, stern, linux-usb, netdev
In-Reply-To: <201206201221.37211.oneukum@suse.de>

On Wed, Jun 20, 2012 at 6:21 PM, Oliver Neukum <oneukum@suse.de> wrote:

> It probably was halted and cleared. However that you cleared
> a halt doesn't mean that the reason for stalling went away.
> So you must cope with an endpoint being halted again right after
> it was cleared.

I only suggested we should handle -EPIPE for usb_submit_urb
on interrupt endpoint, maybe it is the 1st handling, at least it is
per USB spec.

Also from implementation of usb gadget device, generally
ClearFeature(HALT) is to clear the some halt related flag of
endpoint hardware.

Looks the reasons of interrupt endpoint stalling is invisible
for usbnet driver, so it is not easy to handle the situation
you described(halted and cleared repeatedly).

>
>> > In that case we'd need to do something more intrusive
>> > like resetting the device, but that cannot be done well
>> > in the generic usbnet part.
>>
>> IMO, resetting is not needed for -EPIPE, but may be needed for
>> -EPROTO failure.
>
> We don't need it for a single failure, but what else would we do
> if we keep getting -EPIPE?

Suppose the case will happen, what is the appropriate actions
usbnet should take on the failure? I am not sure RESET can deal
with it.

Also is it a actual failure case or only a theory case?

Thanks,
-- 
Ming Lei

^ permalink raw reply

* [PATCH] netxen : Error return off by one for XG port.
From: santosh nayak @ 2012-06-20 10:52 UTC (permalink / raw)
  To: sony.chacko, rajesh.borundia; +Cc: netdev, kernel-janitors, Santosh Nayak

From: Santosh Nayak <santoshprasadnayak@gmail.com>

There are  NETXEN_NIU_MAX_XG_PORTS ports.
Port indexing starts from zero.
Hence we should also return error for  'port == NETXEN_NIU_MAX_XG_PORTS'.

Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
---
 .../ethernet/qlogic/netxen/netxen_nic_ethtool.c    |    4 ++--
 drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
index d4f179f..9103e3e 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_ethtool.c
@@ -511,7 +511,7 @@ netxen_nic_get_pauseparam(struct net_device *dev,
 				break;
 		}
 	} else if (adapter->ahw.port_type == NETXEN_NIC_XGBE) {
-		if ((port < 0) || (port > NETXEN_NIU_MAX_XG_PORTS))
+		if ((port < 0) || (port >= NETXEN_NIU_MAX_XG_PORTS))
 			return;
 		pause->rx_pause = 1;
 		val = NXRD32(adapter, NETXEN_NIU_XG_PAUSE_CTL);
@@ -577,7 +577,7 @@ netxen_nic_set_pauseparam(struct net_device *dev,
 		}
 		NXWR32(adapter, NETXEN_NIU_GB_PAUSE_CTL, val);
 	} else if (adapter->ahw.port_type == NETXEN_NIC_XGBE) {
-		if ((port < 0) || (port > NETXEN_NIU_MAX_XG_PORTS))
+		if ((port < 0) || (port >= NETXEN_NIU_MAX_XG_PORTS))
 			return -EIO;
 		val = NXRD32(adapter, NETXEN_NIU_XG_PAUSE_CTL);
 		if (port == 0) {
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
index de96a94..946160f 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
@@ -365,7 +365,7 @@ static int netxen_niu_disable_xg_port(struct netxen_adapter *adapter)
 	if (NX_IS_REVISION_P3(adapter->ahw.revision_id))
 		return 0;
 
-	if (port > NETXEN_NIU_MAX_XG_PORTS)
+	if (port >= NETXEN_NIU_MAX_XG_PORTS)
 		return -EINVAL;
 
 	mac_cfg = 0;
@@ -392,7 +392,7 @@ static int netxen_p2_nic_set_promisc(struct netxen_adapter *adapter, u32 mode)
 	u32 port = adapter->physical_port;
 	u16 board_type = adapter->ahw.board_type;
 
-	if (port > NETXEN_NIU_MAX_XG_PORTS)
+	if (port >= NETXEN_NIU_MAX_XG_PORTS)
 		return -EINVAL;
 
 	mac_cfg = NXRD32(adapter, NETXEN_NIU_XGE_CONFIG_0 + (0x10000 * port));
-- 
1.7.4.4

^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Marc Kleine-Budde @ 2012-06-20 10:33 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, viresh.kumar2, bhupesh.sharma, netdev, linux-next,
	linux-kernel, federico.vaga, giancarlo.asnaghi, wg, spear-devel,
	Andrew Morton
In-Reply-To: <20120620202604.407af721f746045ae00c8268@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]

On 06/20/2012 12:26 PM, Stephen Rothwell wrote:
> Hi all,
> 
> On Wed, 20 Jun 2012 01:20:37 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>>
>> From: viresh kumar <viresh.kumar2@arm.com>
>> Date: Wed, 20 Jun 2012 09:08:34 +0100
>>
>>> Please see following patchset from me, that got applied in linux-next
>>>
>>> https://lkml.org/lkml/2012/4/24/154
>>>
>>> Please check if this patchset is present in your build repo. I believe it should be
>>> there. If it is, then you shouldn't get these errors.
>>
>> Well, then Stephen shouldn't get those errors either.
>>
>> But obviously he did.
>>
>> But all of this talk about changes existing only in linux-next is
>> entirely moot.  Because The damn thing MUST build independently inside
>> of net-next which doesn't have those clock layer changes.
>>
>> Someone send me a clean fix for net-next now.
> 
> I get those errors because those patches are in the akpm tree which is
> merged after everything else ...
> 
> One possibility is to put those changes in another (stable) tree and
> merge that into the net-next tree (and any other tree that needs it).

We're about to remove the offending clk_*() functions from the driver,
as they are untested anyway. The hardware the driver was developed for
uses a hardcoded clock rate in the driver anyway, as it cannot be
retrieved from clock tree. As soon as there is hardware available that
will work with the clock tree, we can add those functions back.

Sorry for the noise,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2012-06-20 10:26 UTC (permalink / raw)
  To: David Miller
  Cc: viresh.kumar2, bhupesh.sharma, netdev, linux-next, linux-kernel,
	federico.vaga, giancarlo.asnaghi, wg, mkl, spear-devel,
	Andrew Morton
In-Reply-To: <20120620.012037.783895812206310043.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1139 bytes --]

Hi all,

On Wed, 20 Jun 2012 01:20:37 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>
> From: viresh kumar <viresh.kumar2@arm.com>
> Date: Wed, 20 Jun 2012 09:08:34 +0100
> 
> > Please see following patchset from me, that got applied in linux-next
> > 
> > https://lkml.org/lkml/2012/4/24/154
> > 
> > Please check if this patchset is present in your build repo. I believe it should be
> > there. If it is, then you shouldn't get these errors.
> 
> Well, then Stephen shouldn't get those errors either.
> 
> But obviously he did.
> 
> But all of this talk about changes existing only in linux-next is
> entirely moot.  Because The damn thing MUST build independently inside
> of net-next which doesn't have those clock layer changes.
> 
> Someone send me a clean fix for net-next now.

I get those errors because those patches are in the akpm tree which is
merged after everything else ...

One possibility is to put those changes in another (stable) tree and
merge that into the net-next tree (and any other tree that needs it).
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] usbnet: Activate halt interrupt endpoint before re-submit URB
From: Oliver Neukum @ 2012-06-20 10:21 UTC (permalink / raw)
  To: Ming Lei; +Cc: Huajun Li, David Miller, stern, linux-usb, netdev
In-Reply-To: <CACVXFVMy6Hrqgw6rmAXv5YXD86sLF+FUPTRRcXKs40uwN6ioCg@mail.gmail.com>

Am Mittwoch, 20. Juni 2012, 12:15:25 schrieb Ming Lei:
> On Wed, Jun 20, 2012 at 4:58 PM, Oliver Neukum <oneukum@suse.de> wrote:
> > Am Mittwoch, 20. Juni 2012, 10:07:55 schrieb Ming Lei:
> >> BTW, maybe it is better to add below
> >>
> >>     usbnet_defer_kevent(dev, EVENT_STS_HALT);
> >>
> >> for -EPIPE returned from usb_urb_submit if it will be resent.
> >
> > Why? If it failed once it'll probably also fail the next time.
> 
> -EPIPE just means the endpoint is halted, either from usb_urb_submit
> or urb->status, so the HALT should be cleared in the situation.

It probably was halted and cleared. However that you cleared
a halt doesn't mean that the reason for stalling went away.
So you must cope with an endpoint being halted again right after
it was cleared.

> > In that case we'd need to do something more intrusive
> > like resetting the device, but that cannot be done well
> > in the generic usbnet part.
> 
> IMO, resetting is not needed for -EPIPE, but may be needed for
> -EPROTO failure.

We don't need it for a single failure, but what else would we do
if we keep getting -EPIPE?

	Regards
		Oliver

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: Federico Vaga @ 2012-06-20 10:18 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: David Miller, netdev, linux-can
In-Reply-To: <4FE1A1A1.3000105@pengutronix.de>

> I think we finally can see the big picture now; I'm preparing a patch
> which removes the clk_*() functions.

Thank you, and sorry for the big trouble

-- 
Federico Vaga

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Federico Vaga @ 2012-06-20 10:17 UTC (permalink / raw)
  To: David Miller
  Cc: mkl, bhupesh.sharma, sfr, netdev, linux-next, linux-kernel,
	giancarlo.asnaghi, wg
In-Reply-To: <20120620.025837.721158723130014230.davem@davemloft.net>

> Why would you try to be generic by using an interface currently
> only available on certain platforms?

I know, I was wrong.

> That is how you make drivers non-portable, and not generic.

Now is fixed in my mind; I learn the lesson.

-- 
Federico Vaga

^ permalink raw reply

* Re: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.
From: santosh prasad nayak @ 2012-06-20 10:16 UTC (permalink / raw)
  To: Rajesh Borundia
  Cc: Dan Carpenter, Sony Chacko, netdev,
	kernel-janitors@vger.kernel.org
In-Reply-To: <13A253B3F9BEFE43B93C09CF75F63CAA81A886EF19@MNEXMB1.qlogic.org>

On Wed, Jun 20, 2012 at 3:21 PM, Rajesh Borundia
<rajesh.borundia@qlogic.com> wrote:
> _______________________________________
> From: santosh prasad nayak [santoshprasadnayak@gmail.com]
> Sent: Wednesday, June 20, 2012 1:29 PM
> To: Dan Carpenter; Rajesh Borundia
> Cc: Sony Chacko; netdev; kernel-janitors@vger.kernel.org
> Subject: Re: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.
>
> On Wed, Jun 20, 2012 at 1:14 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
>> On Wed, Jun 20, 2012 at 12:57:39PM +0530, santosh nayak wrote:
>>> From: Santosh Nayak <santoshprasadnayak@gmail.com>
>>>
>>> There are 'NETXEN_NIU_MAX_GBE_PORTS'  GBE ports. Port indexing starts
>>> from zero.
>>> Hence we should also return error for "port == NETXEN_NIU_MAX_GBE_PORTS"
>>>
>>
>> I don't know this code well enough to say if you are right or not,
>> but what about for port == NETXEN_NIU_MAX_XG_PORTS a few lines later
>> in both functions?
>
>
> I think "for port == NETXEN_NIU_MAX_XG_PORTS"  error should be returned.
>
>
> @Rajesh,
>
> Can you please comment on it ?
>
>
> regards
> santosh
>
>>
>> regards,
>> dan carpenter
>>
>
> Yes error should be returned for  both port == NETXEN_NIU_MAX_XG_PORTS and
> port ==  NETXEN_NIU_MAX_GBE_PORTS.


Ok.

The current patch is for GBE port.
For XG port I will send another patch.

regards
santosh


>
>
> Rajesh
>

^ permalink raw reply

* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-20 10:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev
In-Reply-To: <20120619.231412.1236237191660427779.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 19 Jun 2012 23:14:12 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 20 Jun 2012 07:59:00 +0200
> 
>> On Tue, 2012-06-19 at 21:46 -0700, David Miller wrote:
>> 
>>> These numbers can be decreased further, because since we're already
>>> looking at the TCP header we can pre-cook the TCP control block in the
>>> SKB and skip much of the stuff that tcp_v4_rcv() does since we've done
>>> it already in the early demux code.
>> 
>> It could be done at GRO level and remove one another demux.
>> 
>> As routers probably have no use of GRO, no need of additional knob.
> 
> That's a great idea.

Here's what I have so far, the ipv6 implementation we get nearly for
free :-)

Initially I tried to use ->gro_complete() for this as it was more
natural, but we abort before we get there for a lot of cases where we
want to use the early demux and cached route (ACKs, FINs, sub-mss
sized packets, etc.)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index 967b926..a1b1b53 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -37,7 +37,6 @@
 
 /* This is used to register protocols. */
 struct net_protocol {
-	int			(*early_demux)(struct sk_buff *skb);
 	int			(*handler)(struct sk_buff *skb);
 	void			(*err_handler)(struct sk_buff *skb, u32 info);
 	int			(*gso_send_check)(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b21522..c1b5626 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2956,6 +2956,12 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		return -ENOMEM;
 
 	__copy_skb_header(nskb, p);
+	if (p->sk) {
+		nskb->sk = p->sk;
+		nskb->destructor = p->destructor;
+		p->sk = NULL;
+		p->destructor = NULL;
+	}
 	nskb->mac_len = p->mac_len;
 
 	skb_reserve(nskb, headroom);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 07a02f6..0aabad7 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1519,7 +1519,6 @@ static const struct net_protocol igmp_protocol = {
 #endif
 
 static const struct net_protocol tcp_protocol = {
-	.early_demux	=	tcp_v4_early_demux,
 	.handler	=	tcp_v4_rcv,
 	.err_handler	=	tcp_v4_err,
 	.gso_send_check	=	tcp_v4_gso_send_check,
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 93b092c..c4fe1d2 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -323,32 +323,19 @@ static int ip_rcv_finish(struct sk_buff *skb)
 	 *	how the packet travels inside Linux networking.
 	 */
 	if (skb_dst(skb) == NULL) {
-		const struct net_protocol *ipprot;
-		int protocol = iph->protocol;
-		int err;
-
-		rcu_read_lock();
-		ipprot = rcu_dereference(inet_protos[protocol]);
-		err = -ENOENT;
-		if (ipprot && ipprot->early_demux)
-			err = ipprot->early_demux(skb);
-		rcu_read_unlock();
-
-		if (err) {
-			err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
-						   iph->tos, skb->dev);
-			if (unlikely(err)) {
-				if (err == -EHOSTUNREACH)
-					IP_INC_STATS_BH(dev_net(skb->dev),
-							IPSTATS_MIB_INADDRERRORS);
-				else if (err == -ENETUNREACH)
-					IP_INC_STATS_BH(dev_net(skb->dev),
-							IPSTATS_MIB_INNOROUTES);
-				else if (err == -EXDEV)
-					NET_INC_STATS_BH(dev_net(skb->dev),
-							 LINUX_MIB_IPRPFILTER);
-				goto drop;
-			}
+		int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+					       iph->tos, skb->dev);
+		if (unlikely(err)) {
+			if (err == -EHOSTUNREACH)
+				IP_INC_STATS_BH(dev_net(skb->dev),
+						IPSTATS_MIB_INADDRERRORS);
+			else if (err == -ENETUNREACH)
+				IP_INC_STATS_BH(dev_net(skb->dev),
+						IPSTATS_MIB_INNOROUTES);
+			else if (err == -EXDEV)
+				NET_INC_STATS_BH(dev_net(skb->dev),
+						 LINUX_MIB_IPRPFILTER);
+			goto drop;
 		}
 	}
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13857df..2a483ad 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1671,52 +1671,6 @@ csum_err:
 }
 EXPORT_SYMBOL(tcp_v4_do_rcv);
 
-int tcp_v4_early_demux(struct sk_buff *skb)
-{
-	struct net *net = dev_net(skb->dev);
-	const struct iphdr *iph;
-	const struct tcphdr *th;
-	struct sock *sk;
-	int err;
-
-	err = -ENOENT;
-	if (skb->pkt_type != PACKET_HOST)
-		goto out_err;
-
-	if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct tcphdr)))
-		goto out_err;
-
-	iph = ip_hdr(skb);
-	th = (struct tcphdr *) ((char *)iph + ip_hdrlen(skb));
-
-	if (th->doff < sizeof(struct tcphdr) / 4)
-		goto out_err;
-
-	if (!pskb_may_pull(skb, ip_hdrlen(skb) + th->doff * 4))
-		goto out_err;
-
-	sk = __inet_lookup_established(net, &tcp_hashinfo,
-				       iph->saddr, th->source,
-				       iph->daddr, th->dest,
-				       skb->dev->ifindex);
-	if (sk) {
-		skb->sk = sk;
-		skb->destructor = sock_edemux;
-		if (sk->sk_state != TCP_TIME_WAIT) {
-			struct dst_entry *dst = sk->sk_rx_dst;
-			if (dst)
-				dst = dst_check(dst, 0);
-			if (dst) {
-				skb_dst_set_noref(skb, dst);
-				err = 0;
-			}
-		}
-	}
-
-out_err:
-	return err;
-}
-
 /*
  *	From tcp_input.c
  */
@@ -2576,6 +2530,7 @@ void tcp4_proc_exit(void)
 struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 {
 	const struct iphdr *iph = skb_gro_network_header(skb);
+	struct sk_buff **pp;
 
 	switch (skb->ip_summed) {
 	case CHECKSUM_COMPLETE:
@@ -2591,7 +2546,36 @@ struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		return NULL;
 	}
 
-	return tcp_gro_receive(head, skb);
+	pp = tcp_gro_receive(head, skb);
+
+	if (!NAPI_GRO_CB(skb)->same_flow) {
+		const struct tcphdr *th = tcp_hdr(skb);
+		struct net_device *dev = skb->dev;
+		struct sock *sk;
+
+		sk = __inet_lookup_established(dev_net(dev), &tcp_hashinfo,
+					       iph->saddr, th->source,
+					       iph->daddr, th->dest,
+					       dev->ifindex);
+		if (sk) {
+			skb_orphan(skb);
+			skb->sk = sk;
+			skb->destructor = sock_edemux;
+			if (!skb_dst(skb) &&
+			    sk->sk_state != TCP_TIME_WAIT) {
+				struct dst_entry *dst = sk->sk_rx_dst;
+				if (dst)
+					dst = dst_check(dst, 0);
+				if (dst) {
+					struct rtable *rt = (struct rtable *) dst;
+
+					if (rt->rt_iif == dev->ifindex)
+						skb_dst_set_noref(skb, dst);
+				}
+			}
+		}
+	}
+	return pp;
 }
 
 int tcp4_gro_complete(struct sk_buff *skb)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 26a8862..b8ea463 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -797,6 +797,7 @@ static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
 					 struct sk_buff *skb)
 {
 	const struct ipv6hdr *iph = skb_gro_network_header(skb);
+	struct sk_buff **pp;
 
 	switch (skb->ip_summed) {
 	case CHECKSUM_COMPLETE:
@@ -812,7 +813,32 @@ static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
 		return NULL;
 	}
 
-	return tcp_gro_receive(head, skb);
+	pp = tcp_gro_receive(head, skb);
+
+	if (!NAPI_GRO_CB(skb)->same_flow) {
+		const struct tcphdr *th = tcp_hdr(skb);
+		struct net_device *dev = skb->dev;
+		struct sock *sk;
+
+		sk = __inet6_lookup_established(dev_net(dev), &tcp_hashinfo,
+						&iph->saddr, th->source,
+						&iph->daddr, th->dest,
+						dev->ifindex);
+		if (sk) {
+			skb_orphan(skb);
+			skb->sk = sk;
+			skb->destructor = sock_edemux;
+			if (!skb_dst(skb) &&
+			    sk->sk_state != TCP_TIME_WAIT) {
+				struct dst_entry *dst = sk->sk_rx_dst;
+				if (dst)
+					dst = dst_check(dst, 0);
+				if (dst)
+					skb_dst_set(skb, dst);
+			}
+		}
+	}
+	return pp;
 }
 
 static int tcp6_gro_complete(struct sk_buff *skb)

^ permalink raw reply related

* Re: [PATCH] usbnet: Activate halt interrupt endpoint before re-submit URB
From: Ming Lei @ 2012-06-20 10:15 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Huajun Li, David Miller, stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201206201058.55519.oneukum-l3A5Bk7waGM@public.gmane.org>

On Wed, Jun 20, 2012 at 4:58 PM, Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org> wrote:
> Am Mittwoch, 20. Juni 2012, 10:07:55 schrieb Ming Lei:
>> BTW, maybe it is better to add below
>>
>>     usbnet_defer_kevent(dev, EVENT_STS_HALT);
>>
>> for -EPIPE returned from usb_urb_submit if it will be resent.
>
> Why? If it failed once it'll probably also fail the next time.

-EPIPE just means the endpoint is halted, either from usb_urb_submit
or urb->status, so the HALT should be cleared in the situation.

> In that case we'd need to do something more intrusive
> like resetting the device, but that cannot be done well
> in the generic usbnet part.

IMO, resetting is not needed for -EPIPE, but may be needed for
-EPROTO failure.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: David Miller @ 2012-06-20 10:12 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, federico.vaga
In-Reply-To: <4FE1A1A1.3000105@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 20 Jun 2012 12:10:41 +0200

> I think we finally can see the big picture now; I'm preparing a patch
> which removes the clk_*() functions.

Thank you.

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: Marc Kleine-Budde @ 2012-06-20 10:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-can, federico.vaga
In-Reply-To: <20120620.025452.2203668280120884694.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]

On 06/20/2012 11:54 AM, David Miller wrote:
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Date: Wed, 20 Jun 2012 11:48:08 +0200
> 
>> In commit:
>>
>>   5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI
>>
>> the c_can_pci driver has been added. It uses clk_*() functions
>> unconditionally, resulting in a link error on archs without
>> clock support. This patch adds a "depends on HAVE_CLK" to the
>> Kconfig symbol.
> 
> This is an unreasonable change and I just explained why in my email to
> Frederico, did you not see it?

I send that mail before I received Frederico's and your Mail.

> He says that this driver was only tested on an architecture that
> currently doesn't even have clock support in any existing tree, and
> therefore completely relies upon local changes they have to add clock
> support to that platform.
> 
> Which means you're change is restricting compilation of this driver to
> platforms the driver was never, ever, tested on.
> 
> Can you see what a complete joke this is?

I think we finally can see the big picture now; I'm preparing a patch
which removes the clk_*() functions.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: 10GBE performance drop with net.ipv4.tcp_timestamps=0
From: Eric Dumazet @ 2012-06-20 10:06 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Linux Netdev List
In-Reply-To: <4FE19CFC.8030408@profihost.ag>

On Wed, 2012-06-20 at 11:50 +0200, Stefan Priebe - Profihost AG wrote:
> Am 20.06.2012 11:47, schrieb Eric Dumazet:
> > On Wed, 2012-06-20 at 11:33 +0200, Stefan Priebe - Profihost AG wrote:
> >
> >> Sure. In that case i get 4Gbit/s in both variants. I also tried two
> >> other different machines same result.
> >>
> >
> > So 3.5 on receiver is the problem, it seems ?
> Yes.
> 
> > And you checked all the stuff about irq affinities, i presume, since a
> > lot of things might have changed between 2.6.32 and 3.5 ?
> 
> It is a single core E5 Xeon - i've set the affinity like this:

And you still have the retransmits in "netstat -s" output ?

Might be a firmware or pci issue, I have same cards but no problem here.

Check LRO is on ?

ethtool -k eth2

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Federico Vaga @ 2012-06-20  9:59 UTC (permalink / raw)
  To: David Miller
  Cc: mkl, bhupesh.sharma, sfr, netdev, linux-next, linux-kernel,
	giancarlo.asnaghi, wg
In-Reply-To: <20120620.020611.1696375957120854262.davem@davemloft.net>

> Then the driver should NEVER have been submitted without the
> required infrastructure in place first.

This particular driver don't use the clk framework at the moment. I put 
that lines about clk to try to be generic as possibile, but I see that I 
made a mistake: I'm sorry. An alternative solution to HAVE_CLK 
dependency can be: remove the clk_* lines because actualy nobody use 
them. In the future, if our c_can migrate to clk and our clk framework 
is accepted in the kernel, we can re-add the clk_* lines.

-- 
Federico Vaga

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: David Miller @ 2012-06-20  9:58 UTC (permalink / raw)
  To: federico.vaga
  Cc: mkl, bhupesh.sharma, sfr, netdev, linux-next, linux-kernel,
	giancarlo.asnaghi, wg
In-Reply-To: <5059693.iaLYtpk3E4@harkonnen>

From: Federico Vaga <federico.vaga@gmail.com>
Date: Wed, 20 Jun 2012 11:59:26 +0200

>> Then the driver should NEVER have been submitted without the
>> required infrastructure in place first.
> 
> This particular driver don't use the clk framework at the moment. I put 
> that lines about clk to try to be generic as possibile, but I see that I 
> made a mistake: I'm sorry.

Why would you try to be generic by using an interface currently
only available on certain platforms?

That is how you make drivers non-portable, and not generic.

^ permalink raw reply

* RE: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.
From: Rajesh Borundia @ 2012-06-20  9:51 UTC (permalink / raw)
  To: santosh prasad nayak, Dan Carpenter
  Cc: Sony Chacko, netdev, kernel-janitors@vger.kernel.org
In-Reply-To: <CAOD=uF6Oe1KZqmtUpY57u9GJh9BqxoK0DP0n4yP_DrMUjfXbCQ@mail.gmail.com>

_______________________________________
From: santosh prasad nayak [santoshprasadnayak@gmail.com]
Sent: Wednesday, June 20, 2012 1:29 PM
To: Dan Carpenter; Rajesh Borundia
Cc: Sony Chacko; netdev; kernel-janitors@vger.kernel.org
Subject: Re: [PATCH] netxen: Error return off by one in 'netxen_nic_set_pauseparam()'.

On Wed, Jun 20, 2012 at 1:14 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> On Wed, Jun 20, 2012 at 12:57:39PM +0530, santosh nayak wrote:
>> From: Santosh Nayak <santoshprasadnayak@gmail.com>
>>
>> There are 'NETXEN_NIU_MAX_GBE_PORTS'  GBE ports. Port indexing starts
>> from zero.
>> Hence we should also return error for "port == NETXEN_NIU_MAX_GBE_PORTS"
>>
>
> I don't know this code well enough to say if you are right or not,
> but what about for port == NETXEN_NIU_MAX_XG_PORTS a few lines later
> in both functions?

I think "for port == NETXEN_NIU_MAX_XG_PORTS"  error should be returned.

@Rajesh,

Can you please comment on it ?

regards
santosh

>
> regards,
> dan carpenter
>

Yes error should be returned for  both port == NETXEN_NIU_MAX_XG_PORTS and
port ==  NETXEN_NIU_MAX_GBE_PORTS.

Rajesh 

^ permalink raw reply

* Re: [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: David Miller @ 2012-06-20  9:54 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, federico.vaga
In-Reply-To: <1340185688-9454-1-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 20 Jun 2012 11:48:08 +0200

> In commit:
> 
>   5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI
> 
> the c_can_pci driver has been added. It uses clk_*() functions
> unconditionally, resulting in a link error on archs without
> clock support. This patch adds a "depends on HAVE_CLK" to the
> Kconfig symbol.

This is an unreasonable change and I just explained why in my email to
Frederico, did you not see it?

He says that this driver was only tested on an architecture that
currently doesn't even have clock support in any existing tree, and
therefore completely relies upon local changes they have to add clock
support to that platform.

Which means you're change is restricting compilation of this driver to
platforms the driver was never, ever, tested on.

Can you see what a complete joke this is?

^ permalink raw reply

* Re: 10GBE performance drop with net.ipv4.tcp_timestamps=0
From: Stefan Priebe - Profihost AG @ 2012-06-20  9:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Netdev List
In-Reply-To: <1340185645.4604.853.camel@edumazet-glaptop>

Am 20.06.2012 11:47, schrieb Eric Dumazet:
> On Wed, 2012-06-20 at 11:33 +0200, Stefan Priebe - Profihost AG wrote:
>
>> Sure. In that case i get 4Gbit/s in both variants. I also tried two
>> other different machines same result.
>>
>
> So 3.5 on receiver is the problem, it seems ?
Yes.

> And you checked all the stuff about irq affinities, i presume, since a
> lot of things might have changed between 2.6.32 and 3.5 ?

It is a single core E5 Xeon - i've set the affinity like this:

eth2 mask=1 for /proc/irq/83/smp_affinity
eth2 mask=2 for /proc/irq/84/smp_affinity
eth2 mask=4 for /proc/irq/85/smp_affinity
eth2 mask=8 for /proc/irq/86/smp_affinity
eth2 mask=10 for /proc/irq/87/smp_affinity
eth2 mask=20 for /proc/irq/88/smp_affinity
eth2 mask=40 for /proc/irq/89/smp_affinity
eth2 mask=80 for /proc/irq/90/smp_affinity

> cat /proc/interrupts

# cat /proc/interrupts
             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 
       CPU6       CPU7
    0:        141          0          0          0          0          0 
          0          0   IO-APIC-edge      timer
    1:          1          8          0          0          0          0 
          0          0   IO-APIC-edge      i8042
    9:          0          0          0          0          0          0 
          0          0   IO-APIC-fasteoi   acpi
   12:          0          3          0          0          0          0 
          0          0   IO-APIC-edge      i8042
   14:          0          0          0          0          0          0 
          0          0   IO-APIC-edge      ide0
   15:          0          0          0          0          0          0 
          0          0   IO-APIC-edge      ide1
   16:          0          0         26          0          0          0 
          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
   23:          0          0         30          0          0          0 
          0          0   IO-APIC-fasteoi   ehci_hcd:usb2
   64:          0          0          0      81979          0          0 
          0          0   PCI-MSI-edge      ahci
   65:          0          0          0          1          0          0 
          0          0   PCI-MSI-edge      eth0
   66:          0          0          0          0       1090          0 
          0          0   PCI-MSI-edge      eth0-TxRx-0
   67:          0          0          0          0        411          0 
          0          0   PCI-MSI-edge      eth0-TxRx-1
   68:          0          0          0          0        592          0 
          0          0   PCI-MSI-edge      eth0-TxRx-2
   69:          0          0          0          0        472          0 
          0          0   PCI-MSI-edge      eth0-TxRx-3
   70:          0          0          0          0          0       1196 
          0          0   PCI-MSI-edge      eth0-TxRx-4
   71:          0          0          0          0          0        374 
          0          0   PCI-MSI-edge      eth0-TxRx-5
   72:          0          0          0          0          0        405 
          0          0   PCI-MSI-edge      eth0-TxRx-6
   73:          0          0          0          0          0        468 
          0          0   PCI-MSI-edge      eth0-TxRx-7
   83:      31278          0          0         65          0          0 
          0          0   PCI-MSI-edge      eth2-TxRx-0
   84:          0      36311          0          0         61          0 
          0          0   PCI-MSI-edge      eth2-TxRx-1
   85:          0          0      46189          0         61          0 
          0          0   PCI-MSI-edge      eth2-TxRx-2
   86:          0          0          0      28712         67          0 
          0          0   PCI-MSI-edge      eth2-TxRx-3
   87:          0          0          0          0      28089          0 
          0          0   PCI-MSI-edge      eth2-TxRx-4
   88:          0          0          0          0          0      34982 
          0          0   PCI-MSI-edge      eth2-TxRx-5
   89:          0          0          0          0          0         61 
      32420          0   PCI-MSI-edge      eth2-TxRx-6
   90:          0          0          0          0          0         61 
          0      25922   PCI-MSI-edge      eth2-TxRx-7
   91:          0          0          0          0          0          3 
          0          0   PCI-MSI-edge      eth2
  NMI:         13         12         15         22          5          5 
          5          5   Non-maskable interrupts
  LOC:      58919      61420      65519      82647      35519      40489 
      27141      30228   Local timer interrupts
  SPU:          0          0          0          0          0          0 
          0          0   Spurious interrupts
  PMI:         13         12         15         22          5          5 
          5          5   Performance monitoring interrupts
  IWI:          0          0          0          0          0          0 
          0          0   IRQ work interrupts
  RTR:          6          0          0          0          0          0 
          0          0   APIC ICR read retries
  RES:      15116       4521       2418       1814       2375       1615 
       1488       1367   Rescheduling interrupts
  CAL:        134        148        100        162        170        172 
        172        172   Function call interrupts
  TLB:        422        486        415        483        460        460 
        476        398   TLB shootdowns
  TRM:          0          0          0          0          0          0 
          0          0   Thermal event interrupts
  THR:          0          0          0          0          0          0 
          0          0   Threshold APIC interrupts
  MCE:          0          0          0          0          0          0 
          0          0   Machine check exceptions
  MCP:          4          4          4          4          4          4 
          4          4   Machine check polls

> what kind of NIC it is ?

# lspci | grep 10-Giga
06:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
06:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)

Stefan

^ permalink raw reply

* [PATCH] can: c_can_pci: limit compilation to archs with clock support
From: Marc Kleine-Budde @ 2012-06-20  9:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Marc Kleine-Budde, Federico Vaga

In commit:

  5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI

the c_can_pci driver has been added. It uses clk_*() functions
unconditionally, resulting in a link error on archs without
clock support. This patch adds a "depends on HAVE_CLK" to the
Kconfig symbol.

An upcoming patch from Viresh Kumar adds a generic dummy
implementation. As soons as this patch has been merged, this
Kconfig symbol can be removed.

Cc: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/c_can/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/can/c_can/Kconfig b/drivers/net/can/c_can/Kconfig
index 3b83baf..2835277 100644
--- a/drivers/net/can/c_can/Kconfig
+++ b/drivers/net/can/c_can/Kconfig
@@ -17,6 +17,7 @@ config CAN_C_CAN_PLATFORM
 config CAN_C_CAN_PCI
 	tristate "Generic PCI Bus based C_CAN/D_CAN driver"
 	depends on PCI
+	depends on HAVE_CLK
 	---help---
 	  This driver adds support for the C_CAN/D_CAN chips connected
 	  to the PCI bus.
-- 
1.7.10

^ permalink raw reply related

* Re: 10GBE performance drop with net.ipv4.tcp_timestamps=0
From: Eric Dumazet @ 2012-06-20  9:47 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Linux Netdev List
In-Reply-To: <4FE198FE.5050202@profihost.ag>

On Wed, 2012-06-20 at 11:33 +0200, Stefan Priebe - Profihost AG wrote:

> Sure. In that case i get 4Gbit/s in both variants. I also tried two 
> other different machines same result.
> 

So 3.5 on receiver is the problem, it seems ?

And you checked all the stuff about irq affinities, i presume, since a
lot of things might have changed between 2.6.32 and 3.5 ?

cat /proc/interrupts

what kind of NIC it is ?

^ permalink raw reply

* [PATCH 12/12] Avoid dereferencing bd_disk during swap_entry_free for network storage
From: Mel Gorman @ 2012-06-20  9:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

Commit [b3a27d: swap: Add swap slot free callback to
block_device_operations] dereferences p->bdev->bd_disk but this is a
NULL dereference if using swap-over-NFS. This patch checks SWP_BLKDEV
on the swap_info_struct before dereferencing.

With reference to this callback, Christoph Hellwig stated "Please
just remove the callback entirely.  It has no user outside the staging
tree and was added clearly against the rules for that staging tree".
This would also be my preference but there was not an obvious way of
keeping zram in staging/ happy.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swapfile.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 7307fc9..e6c4b13 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -549,7 +549,6 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 
 	/* free if no reference */
 	if (!usage) {
-		struct gendisk *disk = p->bdev->bd_disk;
 		if (offset < p->lowest_bit)
 			p->lowest_bit = offset;
 		if (offset > p->highest_bit)
@@ -560,9 +559,11 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
 		nr_swap_pages++;
 		p->inuse_pages--;
 		frontswap_invalidate_page(p->type, offset);
-		if ((p->flags & SWP_BLKDEV) &&
-				disk->fops->swap_slot_free_notify)
-			disk->fops->swap_slot_free_notify(p->bdev, offset);
+		if (p->flags & SWP_BLKDEV) {
+			struct gendisk *disk = p->bdev->bd_disk;
+			if (disk->fops->swap_slot_free_notify)
+				disk->fops->swap_slot_free_notify(p->bdev, offset);
+		}
 	}
 
 	return usage;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 11/12] nfs: Prevent page allocator recursions with swap over NFS.
From: Mel Gorman @ 2012-06-20  9:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
	Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
	Mike Christie, Eric B Munson, Mel Gorman
In-Reply-To: <1340185081-22525-1-git-send-email-mgorman@suse.de>

GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate
IO, just not of any filesystem data.

The problem is that previously NOFS was correct because that avoids
recursion into the NFS code. With swap-over-NFS, it is no longer
correct as swap IO can lead to this recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/nfs/pagelist.c |    2 +-
 fs/nfs/write.c    |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 9ef8b3c..7de1646 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -70,7 +70,7 @@ void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos)
 static inline struct nfs_page *
 nfs_page_alloc(void)
 {
-	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL);
+	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_NOIO);
 	if (p)
 		INIT_LIST_HEAD(&p->wb_list);
 	return p;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index f45b9ca..6f90681 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -52,7 +52,7 @@ static mempool_t *nfs_commit_mempool;
 
 struct nfs_commit_data *nfs_commitdata_alloc(void)
 {
-	struct nfs_commit_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
+	struct nfs_commit_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -70,7 +70,7 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
 
 struct nfs_write_header *nfs_writehdr_alloc(void)
 {
-	struct nfs_write_header *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS);
+	struct nfs_write_header *p = mempool_alloc(nfs_wdata_mempool, GFP_NOIO);
 
 	if (p) {
 		struct nfs_pgio_header *hdr = &p->header;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox