Netdev List
 help / color / mirror / Atom feed
* Re: [Ilw] drivers/net/wireless/iwlwifi/dvm/tx.c:456 iwlagn_tx_skb+0x6c5/0x883()
From: Sander Eikelenboom @ 2013-10-15 20:19 UTC (permalink / raw)
  To: Grumbach, Emmanuel
  Cc: John W. Linville, Berg, Johannes,
	ilw-VuQAYsv1563Yd54FQh9/CA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <0BA3FCBA62E2DC44AF3030971E174FB301DC4E24-yLYCJVCiOGwLt2AQoY/u9bfspsVTdybXVpNB7YpNyf8@public.gmane.org>


Tuesday, October 15, 2013, 9:11:36 PM, you wrote:

>> > Please apply this:
>> > diff --git a/drivers/net/wireless/iwlwifi/dvm/tx.c
>> b/drivers/net/wireless/iwlwifi/dvm/tx.c
>> > index d131f85..5968f19 100644
>> > --- a/drivers/net/wireless/iwlwifi/dvm/tx.c
>> > +++ b/drivers/net/wireless/iwlwifi/dvm/tx.c
>> > @@ -457,8 +457,8 @@ int iwlagn_tx_skb(struct iwl_priv *priv,
>> >         WARN_ON_ONCE(is_agg &&
>> >                      priv->queue_to_mac80211[txq_id] != info->hw_queue);
>> >
>> > -       IWL_DEBUG_TX(priv, "TX to [%d|%d] Q:%d - seq: 0x%x\n", sta_id, tid,
>> > -                    txq_id, seq_number);
>> > +       IWL_DEBUG_TX(priv, "TX to [%d|%d] Q:%d info Q %d - seq: 0x%x\n",
>> sta_id, tid,
>> > +                    txq_id, info->hw_queue, seq_number);
>> >
>> >         if (iwl_trans_tx(priv->trans, skb, dev_cmd, txq_id))
>> >                 goto drop_unlock_sta;
>> 
>> > and send the output back to me
>> 
>> > Thanks.
>> 

> Can you please apply the patch attached (and remove the previous change)?
> Thanks.


That seems to make the warning go away :-)

[    7.306696] iwlwifi 0000:02:00.0: L1 Disabled; Enabling L0S
[    7.315790] iwlwifi 0000:02:00.0: Radio type=0x2-0x1-0x0
[    7.362212] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 9 on FIFO 7 WrPtr: 0
[    7.364973] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 0 on FIFO 3 WrPtr: 0
[    7.365090] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 1 on FIFO 2 WrPtr: 0
[    7.365208] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 2 on FIFO 1 WrPtr: 0
[    7.365324] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 3 on FIFO 0 WrPtr: 0
[    7.365440] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 4 on FIFO 0 WrPtr: 0
[    7.365556] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 5 on FIFO 4 WrPtr: 0
[    7.365672] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 6 on FIFO 2 WrPtr: 0
[    7.365789] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 7 on FIFO 5 WrPtr: 0
[    7.365905] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 8 on FIFO 4 WrPtr: 0
[    7.366034] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 10 on FIFO 5 WrPtr: 0
[    7.602726] iwlwifi 0000:02:00.0: L1 Disabled; Enabling L0S
[    7.612133] iwlwifi 0000:02:00.0: Radio type=0x2-0x1-0x0
[    7.658168] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 9 on FIFO 7 WrPtr: 0
[    7.661021] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 0 on FIFO 3 WrPtr: 0
[    7.663693] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 1 on FIFO 2 WrPtr: 0
[    7.666341] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 2 on FIFO 1 WrPtr: 0
[    7.668914] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 3 on FIFO 0 WrPtr: 0
[    7.671464] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 4 on FIFO 0 WrPtr: 0
[    7.673057] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 5 on FIFO 4 WrPtr: 0
[    7.674631] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 6 on FIFO 2 WrPtr: 0
[    7.676174] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 7 on FIFO 5 WrPtr: 0
[    7.677657] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 8 on FIFO 4 WrPtr: 0
[    7.679133] iwlwifi 0000:02:00.0: U iwl_trans_pcie_txq_enable Activate queue 10 on FIFO 5 WrPtr: 0
[    7.730037] device wlan0 entered promiscuous mode
[    7.732390] xen_bridge: port 2(wlan0) entered forwarding state
[    7.733984] xen_bridge: port 2(wlan0) entered forwarding state
[    7.735692] cfg80211: Pending regulatory request, waiting for it to be processed...
[    7.743541] iwlwifi 0000:02:00.0: I iwlagn_tx_skb TX to [14|8] Q:8 - seq: 0x0
[    7.745347] device wlan0 left promiscuous mode
[    7.747088] xen_bridge: port 2(wlan0) entered disabled state
[    7.748034] iwlwifi 0000:02:00.0: I iwl_trans_pcie_reclaim [Q 8] 0 -> 1 (1)
[    7.748039] iwlwifi 0000:02:00.0: I iwlagn_rx_reply_tx TXQ 8 status SUCCESS (0x00000201)
[    7.748042] iwlwifi 0000:02:00.0: I iwlagn_rx_reply_tx                               initial_rate 0x820a retries 0, idx=0 ssn=1 seq_ctl=0x0
[    7.769963] iwlwifi 0000:02:00.0: I iwl_pcie_txq_inc_wr_ptr Q:9 WR: 0x23
[    7.821813] iwlwifi 0000:02:00.0: I iwl_pcie_txq_inc_wr_ptr Q:9 WR: 0x24
[    7.824914] iwlwifi 0000:02:00.0: I iwl_pcie_txq_inc_wr_ptr Q:9 WR: 0x25
[    7.828033] iwlwifi 0000:02:00.0: I iwl_pcie_txq_inc_wr_ptr Q:9 WR: 0x26
[    7.831078] iwlwifi 0000:02:00.0: I iwl_pcie_txq_inc_wr_ptr Q:9 WR: 0x27
[    7.834005] iwlwifi 0000:02:00.0: I iwl_pcie_txq_inc_wr_ptr Q:9 WR: 0x28
[    7.836087] iwlwifi 0000:02:00.0: I iwl_pcie_txq_unmap Q 9 Free 39

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC 4/5] net: macb: Use devm_request_irq()
From: Sören Brinkmann @ 2013-10-15 20:20 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Nicolas Ferre, netdev, linux-kernel, Michal Simek
In-Reply-To: <525D95A4.3020706@cogentembedded.com>

On Tue, Oct 15, 2013 at 11:21:08PM +0400, Sergei Shtylyov wrote:
> Hello.
> 
> On 10/15/2013 03:58 AM, Soren Brinkmann wrote:
> 
> >Use the device managed interface to request the IRQ, simplifying error
> >paths.
> 
> >Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> >---
> >  drivers/net/ethernet/cadence/macb.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> >diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> >index 436aecc31732..603844b1d483 100644
> >--- a/drivers/net/ethernet/cadence/macb.c
> >+++ b/drivers/net/ethernet/cadence/macb.c
> >@@ -1825,7 +1825,8 @@ static int __init macb_probe(struct platform_device *pdev)
> >  	}
> >
> >  	dev->irq = platform_get_irq(pdev, 0);
> >-	err = request_irq(dev->irq, macb_interrupt, 0, dev->name, dev);
> >+	err = devm_request_irq(&pdev->dev, dev->irq, macb_interrupt, 0,
> >+			dev->name, dev);
> 
>    You should start the continuation line right under &.
Actually this one is a good example why I don't do such alignment: You
do a simple search & replace - in this case request_irq ->
devm_request_irq - and all alignment is gone.

	Sören

^ permalink raw reply

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
From: Eric W. Biederman @ 2013-10-15 20:34 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Stephen Hemminger, David Miller, yamato, netdev
In-Reply-To: <525D7109.4010004@6wind.com>

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 10/10/2013 02:17, Eric W. Biederman a écrit :
>>
>> Right.
>>
>> IFLA_NET_NS_PID is not invertible as there may be no processes running
>> in a pid namespace.
>>
>> IFLA_NET_NS_FD is in principle invertible.  We just need to add a file
>> descriptor to the callers fd table.  I don't see IFLA_NET_NS_FD being
>> invertible for broadcast messages, but for unicast it looks like a bit
>> of a pain but there are no fundamental problems.
> I'm not sure to understand why it is invertible only for unicast message.
> Or are you saying that it is invertible only for the netns where the
> caller stands (and then not for the veth peer)?

The pain is that it is a special case of SCM_RIGHTS aka passing file
descriptors.  Right now we don't support SCM_RIGHTS on netlink sockets
and so from that perspective IFLA_NET_NS_FD is a bit of a hack.

For unicast messages we can just stuff a file descriptor in the calling
process and be done with it.  For multicast messages we have to be much
more complete.

>> I don't know if we care enough yet to write the code for the
>> IFLA_NET_NS_FD attribute but it is doable.
> I care ;-)
> Has somebody already started to write a patch?

For IFLA_NET_NS_FD not that I know of.

Mostly it is doable but there are some silly cases.
- Do we need to actually implement SCM_RIGHTS to prevent people
  accepting file-descriptors unknowingly and hitting their file
  descriptor limits.

  In which case we need to call the attribute IFLA_NET_NS_SCM_FD
  so we knew it was just an index into the passed file descriptors.n

- Do we need an extra permission check to prevent keeping a network
  namespace alive longer than necessary?  Aka there are some permission
  checks opening and bind mounting /proc/<pid>/ns/net do we need
  a similar check.  Perhaps we would need to require CAP_NET_ADMIN over
  the target network namespace.

Beyond that it is just the logistics to open what is essentially
/proc/<pid>/ns/net and add it to the file descriptor table of the
requesting process.  Exactly which mount of proc we are going to
find the appropriate file to open I don't know.

It isn't likely to be lots of code but it is code that the necessary
infrastructure is not in place for, and a bunch of moderately hairy
corner cases to deal with.

Eric

^ permalink raw reply

* Re: kernel policy routing table src ip not respected since 2.6.37 and commit 9fc3bbb4a752
From: Julian Anastasov @ 2013-10-15 20:36 UTC (permalink / raw)
  To: Vincent Li; +Cc: netdev@vger.kernel.org, Joel Sing
In-Reply-To: <CAK3+h2w13isZEOurBMv57L0H_pkqMdYmPaNE3Kn5vPxfqErOMw@mail.gmail.com>


	Hello,

On Tue, 15 Oct 2013, Vincent Li wrote:

> it is strange though when 10.1.1.9 is unreachable address and the ping
> utility reports error 'Destination Host Unreachable' with source
> 10.1.1.1.  before 2.6.37, it reports 10.1.1..2

	I see, it is the icmp_send() function that uses
inet_select_addr() to select primary source for locally generated
ICMP errors that are sent back to the sender (10.1.1.2).
In your case it is the error_report from the ARP code.

	So, you are correct about the commit but I don't
see any problem with this behavior. IIRC, users preferred
to see primary addresses in the traceroute output.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH RFC 4/5] net: macb: Use devm_request_irq()
From: Sergei Shtylyov @ 2013-10-15 20:38 UTC (permalink / raw)
  To: Sören Brinkmann; +Cc: Nicolas Ferre, netdev, linux-kernel, Michal Simek
In-Reply-To: <ffb65abe-36e2-45c3-9d40-02050545184d@TX2EHSMHS038.ehs.local>

On 10/16/2013 12:20 AM, Sören Brinkmann wrote:

>>> Use the device managed interface to request the IRQ, simplifying error
>>> paths.

>>> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
>>> ---
>>>   drivers/net/ethernet/cadence/macb.c | 8 +++-----
>>>   1 file changed, 3 insertions(+), 5 deletions(-)

>>> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
>>> index 436aecc31732..603844b1d483 100644
>>> --- a/drivers/net/ethernet/cadence/macb.c
>>> +++ b/drivers/net/ethernet/cadence/macb.c
>>> @@ -1825,7 +1825,8 @@ static int __init macb_probe(struct platform_device *pdev)
>>>   	}
>>>
>>>   	dev->irq = platform_get_irq(pdev, 0);
>>> -	err = request_irq(dev->irq, macb_interrupt, 0, dev->name, dev);
>>> +	err = devm_request_irq(&pdev->dev, dev->irq, macb_interrupt, 0,
>>> +			dev->name, dev);

>>     You should start the continuation line right under &.

> Actually this one is a good example why I don't do such alignment: You
> do a simple search & replace - in this case request_irq ->
> devm_request_irq - and all alignment is gone.

    I didn't understand why this is a good example. In this case you broke the 
line yourself and did it incorrectly, not following the networking coding 
style which assumes Emacs-style alignment for broken lines.

> 	Sören

WBR, Sergei

^ permalink raw reply

* "xfrm: Fix the gc threshold value for ipv4" broke my IPSec connections
From: Damian Pietras @ 2013-10-15 20:40 UTC (permalink / raw)
  To: netdev

I've recently upgraded from 3.4.x to 3.10.x and this broke my IPSec
setup in transport mode. The simplest test case is to setup few such
connections with few boxes like this:

spdadd 192.168.1.100 192.168.2.100 any -P out ipsec
           esp/transport//require
           ah/transport//require;

spdadd 192.168.2.100 192.168.1.100 any -P in ipsec
           esp/transport//require
           ah/transport//require;

Then set up an HTTP server on one box and run ab on the other box to
create come TCP connections:

ab -n 10000 -c 50 http://192.168.1.100/

Then the connect() call will very quickly start returning ENOBUFS. I
haven't seen anything wrong with my simple setup (just copy of
ipsec-howto.org in transport mode and pre shared keys) and started
bisecting. That way I found this commit to break my case:

703fb94ec58e0e8769380c2877a8a34aeb5b6c97
xfrm: Fix the gc threshold value for ipv4

Reverting it on 3.10.15 fixes my issue. This seems to be there from 3.7
and I don't really believe such simple case stayed broken for so long.
Em I missing something or there is really a bug?

If smeone is interested in details of this configuration and commands
I'm running, just let me know. This was reproduced with few VMs under XEN.

-- 
Damian Pietras

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Michal Kubecek @ 2013-10-15 20:55 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Herbert Xu, David S. Miller, netdev
In-Reply-To: <20131015083348.GW7660@secunet.com>

On Tue, Oct 15, 2013 at 10:33:48AM +0200, Steffen Klassert wrote:
> > diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
> > index 2906d52..96946fb 100644
> > --- a/net/xfrm/xfrm_ipcomp.c
> > +++ b/net/xfrm/xfrm_ipcomp.c
> > @@ -48,9 +48,11 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
> >  	const int cpu = get_cpu();
> >  	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
> >  	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
> > -	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
> > +	int err;
> >  	int len;
> >  
> > +	local_bh_disable();
> 
> Maybe we could disable the BHs before we fetch the percpu pointers.
> Then we can use smp_processor_id() to get the cpu. With that we
> could get rid of a (now useless) preempt_disable()/preempt_enable()
> pair. Same could be done in ipcomp_compress().

Sounds like a good idea. I'll send v2 after some basic testing.

                                                 Michal Kubecek

^ permalink raw reply

* Re: "xfrm: Fix the gc threshold value for ipv4" broke my IPSec connections
From: Eric Dumazet @ 2013-10-15 21:02 UTC (permalink / raw)
  To: Damian Pietras; +Cc: netdev
In-Reply-To: <525DA855.1010905@daper.net>

On Tue, 2013-10-15 at 22:40 +0200, Damian Pietras wrote:
> I've recently upgraded from 3.4.x to 3.10.x and this broke my IPSec
> setup in transport mode. The simplest test case is to setup few such
> connections with few boxes like this:
> 
> spdadd 192.168.1.100 192.168.2.100 any -P out ipsec
>            esp/transport//require
>            ah/transport//require;
> 
> spdadd 192.168.2.100 192.168.1.100 any -P in ipsec
>            esp/transport//require
>            ah/transport//require;
> 
> Then set up an HTTP server on one box and run ab on the other box to
> create come TCP connections:
> 
> ab -n 10000 -c 50 http://192.168.1.100/
> 
> Then the connect() call will very quickly start returning ENOBUFS. I
> haven't seen anything wrong with my simple setup (just copy of
> ipsec-howto.org in transport mode and pre shared keys) and started
> bisecting. That way I found this commit to break my case:
> 
> 703fb94ec58e0e8769380c2877a8a34aeb5b6c97
> xfrm: Fix the gc threshold value for ipv4
> 
> Reverting it on 3.10.15 fixes my issue. This seems to be there from 3.7
> and I don't really believe such simple case stayed broken for so long.
> Em I missing something or there is really a bug?
> 
> If smeone is interested in details of this configuration and commands
> I'm running, just let me know. This was reproduced with few VMs under XEN.
> 

It looks like you need to tune /proc/sys/net/ipv4/xfrm4_gc_thresh to a
sensible value given your workload.

try :

echo 65536 >/proc/sys/net/ipv4/xfrm4_gc_thresh

Presumably the 1024 default is really too small...

^ permalink raw reply

* Re: kernel policy routing table src ip not respected since 2.6.37 and commit 9fc3bbb4a752
From: Vincent Li @ 2013-10-15 21:38 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev@vger.kernel.org, Joel Sing
In-Reply-To: <alpine.LFD.2.03.1310152242170.1919@ssi.bg>

ok, thanks for the clarification.

On Tue, Oct 15, 2013 at 1:36 PM, Julian Anastasov <ja@ssi.bg> wrote:
>
>         Hello,
>
> On Tue, 15 Oct 2013, Vincent Li wrote:
>
>> it is strange though when 10.1.1.9 is unreachable address and the ping
>> utility reports error 'Destination Host Unreachable' with source
>> 10.1.1.1.  before 2.6.37, it reports 10.1.1..2
>
>         I see, it is the icmp_send() function that uses
> inet_select_addr() to select primary source for locally generated
> ICMP errors that are sent back to the sender (10.1.1.2).
> In your case it is the error_report from the ARP code.
>
>         So, you are correct about the commit but I don't
> see any problem with this behavior. IIRC, users preferred
> to see primary addresses in the traceroute output.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [PATCH ipsec v2] xfrm: prevent ipcomp scratch buffer race condition
From: Michal Kubecek @ 2013-10-15 21:40 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Herbert Xu, David S. Miller, netdev
In-Reply-To: <20131015083348.GW7660@secunet.com>

In ipcomp_compress(), sortirq is enabled too early, allowing the
per-cpu scratch buffer to be rewritten by ipcomp_decompress()
(called on the same CPU in softirq context) between populating
the buffer and copying the compressed data to the skb.

Add similar protection into ipcomp_decompress() as it can be
called from process context as well (even if such scenario seems
a bit artificial).

v2: as pointed out by Steffen Klassert, if we also move the
local_bh_disable() before reading the per-cpu pointers, we can
get rid of get_cpu()/put_cpu().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 net/xfrm/xfrm_ipcomp.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index 2906d52..9fe4f42 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -45,12 +45,17 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
 	const int plen = skb->len;
 	int dlen = IPCOMP_SCRATCH_SIZE;
 	const u8 *start = skb->data;
-	const int cpu = get_cpu();
-	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
-	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
-	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
+	struct crypto_comp *tfm;
+	u8 *scratch;
+	int cpu;
+	int err;
 	int len;
 
+	local_bh_disable();
+	cpu = smp_processor_id();
+	scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
+	tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+	err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
 	if (err)
 		goto out;
 
@@ -103,7 +108,7 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
 	err = 0;
 
 out:
-	put_cpu();
+	local_bh_enable();
 	return err;
 }
 
@@ -141,14 +146,16 @@ static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb)
 	const int plen = skb->len;
 	int dlen = IPCOMP_SCRATCH_SIZE;
 	u8 *start = skb->data;
-	const int cpu = get_cpu();
-	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
-	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+	struct crypto_comp *tfm;
+	u8 *scratch;
+	int cpu;
 	int err;
 
 	local_bh_disable();
+	cpu = smp_processor_id();
+	scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
+	tfm = *per_cpu_ptr(ipcd->tfms, cpu);
 	err = crypto_comp_compress(tfm, start, plen, scratch, &dlen);
-	local_bh_enable();
 	if (err)
 		goto out;
 
@@ -158,13 +165,13 @@ static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb)
 	}
 
 	memcpy(start + sizeof(struct ip_comp_hdr), scratch, dlen);
-	put_cpu();
+	local_bh_enable();
 
 	pskb_trim(skb, dlen + sizeof(struct ip_comp_hdr));
 	return 0;
 
 out:
-	put_cpu();
+	local_bh_enable();
 	return err;
 }
 
-- 
1.8.1.4

^ permalink raw reply related

* Re: [PATCH] net: sh_eth: Fix RX packets errors on R8A7740
From: Sergei Shtylyov @ 2013-10-15 21:48 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: Nguyen Hong Ky, David S. Miller, netdev, Ryusuke Sakato,
	Simon Horman
In-Reply-To: <Pine.LNX.4.64.1310150926220.5601@axis700.grange>

Hello.

On 10/15/2013 11:28 AM, Guennadi Liakhovetski wrote:

>>> This patch will fix RX packets errors when receiving big size
>>> of data by set bit RNC = 1.

>>> RNC - Receive Enable Control

>>> 0: Upon completion of reception of one frame, the E-DMAC writes
>>> the receive status to the descriptor and clears the RR bit in
>>> EDRRR to 0.

>>> 1: Upon completion of reception of one frame, the E-DMAC writes
>>> (writes back) the receive status to the descriptor. In addition,
>>> the E-DMAC reads the next descriptor and prepares for reception
>>> of the next frame.

>>> In addition, for get more stable when receiving packets, I set
>>> maximum size for the transmit/receive FIFO and inserts padding
>>> in receive data.

>>> Signed-off-by: Nguyen Hong Ky <nh-ky@jinso.co.jp>
>>> ---
>>>    drivers/net/ethernet/renesas/sh_eth.c |    4 ++++
>>>    1 files changed, 4 insertions(+), 0 deletions(-)

>>> diff --git a/drivers/net/ethernet/renesas/sh_eth.c
>>> b/drivers/net/ethernet/renesas/sh_eth.c
>>> index a753928..11d34f0 100644
>>> --- a/drivers/net/ethernet/renesas/sh_eth.c
>>> +++ b/drivers/net/ethernet/renesas/sh_eth.c
>>> @@ -649,12 +649,16 @@ static struct sh_eth_cpu_data r8a7740_data = {
>>>    	.eesr_err_check	= EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
>>>    			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
>>>    			  EESR_TDE | EESR_ECI,
>>> +	.fdr_value	= 0x0000070f,
>>> +	.rmcr_value	= 0x00000001,
>>>
>>>    	.apr		= 1,
>>>    	.mpr		= 1,
>>>    	.tpauser	= 1,
>>>    	.bculr		= 1,
>>>    	.hw_swap	= 1,
>>> +	.rpadir		= 1,
>>> +	.rpadir_value   = 2 << 16,
>>>    	.no_trimd	= 1,
>>>    	.no_ade		= 1,
>>>    	.tsu		= 1,

>>     Guennadi, could you check if this patch fixes your issue with NFS. Make
>> sure it applies to 'r8a7740_data' (it was misapplied to DaveM's tree).

> Yes, the current -next, which includes this patch (in a slightly different
> form) boots fine over NFS for me.

    I don't know what you mean by "slightly different form" exactly. Also, I 
was unable to locate the fresh -next tree. 'net-next.git' contains this patch 
in a mismerged form, 'net.git' has Simon's patch that corrects this mismerge.

> Thanks
> Guennadi

WBR, Sergei

^ permalink raw reply

* Re: [PATCH v3 net-next] openvswitch: fix vport-netdev unregister
From: Alexei Starovoitov @ 2013-10-15 21:49 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David S. Miller, Pravin B Shelar, Jiri Pirko, Cong Wang,
	dev@openvswitch.org, netdev
In-Reply-To: <CAMEtUuxysgzNqeBpBQ-ajLtymYCvEF-GMiUcQuy1b-QA=dFhdw@mail.gmail.com>

On Tue, Oct 15, 2013 at 9:53 AM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Tue, Oct 15, 2013 at 8:31 AM, Jesse Gross <jesse@nicira.com> wrote:
>> On Sun, Oct 13, 2013 at 8:50 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
>>> index c323567..ffa429a 100644
>>> --- a/net/openvswitch/dp_notify.c
>>> +++ b/net/openvswitch/dp_notify.c
>>> @@ -59,15 +59,9 @@ void ovs_dp_notify_wq(struct work_struct *work)
>>>                         struct hlist_node *n;
>>>
>>>                         hlist_for_each_entry_safe(vport, n, &dp->ports[i], dp_hash_node) {
>>> -                               struct netdev_vport *netdev_vport;
>>> -
>>>                                 if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
>>>                                         continue;
>>> -
>>> -                               netdev_vport = netdev_vport_priv(vport);
>>> -                               if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED ||
>>> -                                   netdev_vport->dev->reg_state == NETREG_UNREGISTERING)
>>> -                                       dp_detach_port_notify(vport);
>>> +                               dp_detach_port_notify(vport);
>>
>> Doesn't this free *all* ports of type OVS_VPORT_TYPE_NETDEV when any
>> one of them is removed?
>
> sorry. not sure what I was thinking on Sunday evening. will respin

will take it back. the check was removed to prevent hang upon dev netns moves,
since reg_state will still be == netreg_registered,
but yes, different check is needed.
sending v4

^ permalink raw reply

* [PATCH v4 net-next] openvswitch: fix vport-netdev unregister
From: Alexei Starovoitov @ 2013-10-15 21:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jesse Gross, Pravin B Shelar, Jiri Pirko, Cong Wang, dev, netdev

The combination of two commits:
commit 8e4e1713e4
("openvswitch: Simplify datapath locking.")
commit 2537b4dd0a
("openvswitch:: link upper device for port devices")

introduced a bug where upper_dev wasn't unlinked upon
netdev_unregister notification

The following steps:

  modprobe openvswitch
  ovs-dpctl add-dp test
  ip tuntap add dev tap1 mode tap
  ovs-dpctl add-if test tap1
  ip tuntap del dev tap1 mode tap

are causing multiple warnings:

[   62.747557] gre: GRE over IPv4 demultiplexor driver
[   62.749579] openvswitch: Open vSwitch switching datapath
[   62.755087] device test entered promiscuous mode
[   62.765911] device tap1 entered promiscuous mode
[   62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
[   62.769017] ------------[ cut here ]------------
[   62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240()
[   62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60
[   62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769053]  0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006
[   62.769055]  0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58
[   62.769057]  ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8
[   62.769059] Call Trace:
[   62.769062]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769065]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769067]  [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20
[   62.769069]  [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240
[   62.769071]  [<ffffffff8162a101>] rollback_registered+0x31/0x40
[   62.769073]  [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90
[   62.769075]  [<ffffffff8154f900>] __tun_detach+0x140/0x340
[   62.769077]  [<ffffffff8154fb36>] tun_chr_close+0x36/0x60
[   62.769080]  [<ffffffff811bddaf>] __fput+0xff/0x260
[   62.769082]  [<ffffffff811bdf5e>] ____fput+0xe/0x10
[   62.769084]  [<ffffffff8107b515>] task_work_run+0xb5/0xe0
[   62.769087]  [<ffffffff810029b9>] do_notify_resume+0x59/0x80
[   62.769089]  [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   62.769091]  [<ffffffff81770f5a>] int_signal+0x12/0x17
[   62.769093] ---[ end trace 838756c62e156ffb ]---
[   62.769481] ------------[ cut here ]------------
[   62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769486] sysfs: can not remove 'master', no directory
[   62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769519]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769521]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28
[   62.769523]  0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500
[   62.769525] Call Trace:
[   62.769528]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769529]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769531]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769533]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769535]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769538]  [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150
[   62.769540]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769542]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769544]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769548]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769550]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769552]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769555]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769557]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769559]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769562]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769564]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769566]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769568]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769570]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769572]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769573] ---[ end trace 838756c62e156ffc ]---
[   62.769574] ------------[ cut here ]------------
[   62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769577] sysfs: can not remove 'upper_test', no directory
[   62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769607]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769609]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58
[   62.769611]  0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500
[   62.769613] Call Trace:
[   62.769615]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769617]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769619]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769621]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769622]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769624]  [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150
[   62.769627]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769629]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769631]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769633]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769636]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769638]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769640]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769642]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769644]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769646]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769648]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769650]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769652]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769654]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769656]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769657] ---[ end trace 838756c62e156ffd ]---
[   62.769724] device tap1 left promiscuous mode

This patch also affects moving devices between net namespaces.

OVS used to ignore netns move notifications which caused problems.
Like:
  ovs-dpctl add-if test tap1
  ip link set tap1 netns 3512
and then removing tap1 inside the namespace will cause hang on missing dev_put.

With this patch OVS will detach dev upon receiving netns move event.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/openvswitch/dp_notify.c    |    7 +++++--
 net/openvswitch/vport-netdev.c |   16 +++++++++++++---
 net/openvswitch/vport-netdev.h |    1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index c323567..5c2dab2 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -65,8 +65,7 @@ void ovs_dp_notify_wq(struct work_struct *work)
 					continue;
 
 				netdev_vport = netdev_vport_priv(vport);
-				if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED ||
-				    netdev_vport->dev->reg_state == NETREG_UNREGISTERING)
+				if (!(netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH))
 					dp_detach_port_notify(vport);
 			}
 		}
@@ -88,6 +87,10 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
 		return NOTIFY_DONE;
 
 	if (event == NETDEV_UNREGISTER) {
+		/* upper_dev_unlink and decrement promisc immediately */
+		ovs_netdev_detach_dev(vport);
+
+		/* schedule vport destroy, dev_put and genl notification */
 		ovs_net = net_generic(dev_net(dev), ovs_net_id);
 		queue_work(system_wq, &ovs_net->dp_notify_work);
 	}
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 09d93c1..d21f77d 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -150,15 +150,25 @@ static void free_port_rcu(struct rcu_head *rcu)
 	ovs_vport_free(vport_from_priv(netdev_vport));
 }
 
-static void netdev_destroy(struct vport *vport)
+void ovs_netdev_detach_dev(struct vport *vport)
 {
 	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
 
-	rtnl_lock();
+	ASSERT_RTNL();
 	netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
 	netdev_rx_handler_unregister(netdev_vport->dev);
-	netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
+	netdev_upper_dev_unlink(netdev_vport->dev,
+				netdev_master_upper_dev_get(netdev_vport->dev));
 	dev_set_promiscuity(netdev_vport->dev, -1);
+}
+
+static void netdev_destroy(struct vport *vport)
+{
+	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
+
+	rtnl_lock();
+	if (netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH)
+		ovs_netdev_detach_dev(vport);
 	rtnl_unlock();
 
 	call_rcu(&netdev_vport->rcu, free_port_rcu);
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index dd298b5..8df01c11 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -39,5 +39,6 @@ netdev_vport_priv(const struct vport *vport)
 }
 
 const char *ovs_netdev_get_name(const struct vport *);
+void ovs_netdev_detach_dev(struct vport *);
 
 #endif /* vport_netdev.h */
-- 
1.7.9.5

^ permalink raw reply related

* Re: "xfrm: Fix the gc threshold value for ipv4" broke my IPSec connections
From: Damian Pietras @ 2013-10-15 22:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1381870957.2045.73.camel@edumazet-glaptop.roam.corp.google.com>

On 15.10.2013 23:02, Eric Dumazet wrote:
>> 703fb94ec58e0e8769380c2877a8a34aeb5b6c97
>> xfrm: Fix the gc threshold value for ipv4
>>
>> Reverting it on 3.10.15 fixes my issue. This seems to be there from 3.7
>> and I don't really believe such simple case stayed broken for so long.
>> Em I missing something or there is really a bug?
>>
>> If smeone is interested in details of this configuration and commands
>> I'm running, just let me know. This was reproduced with few VMs under XEN.
>>
> 
> It looks like you need to tune /proc/sys/net/ipv4/xfrm4_gc_thresh to a
> sensible value given your workload.
> 
> try :
> 
> echo 65536 >/proc/sys/net/ipv4/xfrm4_gc_thresh
> 
> Presumably the 1024 default is really too small...

Now it's working in my test setup, I'm changing it on the production
boxes, thanks!


-- 
Damian Pietras

^ permalink raw reply

* [PATCH] sh_eth: add/use RMCR.RNC bit
From: Sergei Shtylyov @ 2013-10-15 22:29 UTC (permalink / raw)
  To: netdev; +Cc: nobuhiro.iwamatsu.yj, linux-sh, davem, horms

Declare 'enum EMCR_BIT' containing the single member for the RMCR.RNC bit and
replace bare numbers in the driver by  this mnemonic.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
This patch is against DaveM's 'net.git' repo but it intended for 'net-next.git'
repo -- it's  because 'net-next.git' doesn't contain the required Simon Horman's
patch yet.

 drivers/net/ethernet/renesas/sh_eth.c |    6 +++---
 drivers/net/ethernet/renesas/sh_eth.h |    3 +++
 2 files changed, 6 insertions(+), 3 deletions(-)

Index: net/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net/drivers/net/ethernet/renesas/sh_eth.c
@@ -483,7 +483,7 @@ static struct sh_eth_cpu_data sh7757_dat
 	.register_type	= SH_ETH_REG_FAST_SH4,
 
 	.eesipr_value	= DMAC_M_RFRMER | DMAC_M_ECI | 0x003fffff,
-	.rmcr_value	= 0x00000001,
+	.rmcr_value	= RMCR_RNC,
 
 	.tx_check	= EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
 	.eesr_err_check	= EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
@@ -561,7 +561,7 @@ static struct sh_eth_cpu_data sh7757_dat
 			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
 			  EESR_TDE | EESR_ECI,
 	.fdr_value	= 0x0000072f,
-	.rmcr_value	= 0x00000001,
+	.rmcr_value	= RMCR_RNC,
 
 	.irq_flags	= IRQF_SHARED,
 	.apr		= 1,
@@ -689,7 +689,7 @@ static struct sh_eth_cpu_data r8a7740_da
 			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
 			  EESR_TDE | EESR_ECI,
 	.fdr_value	= 0x0000070f,
-	.rmcr_value	= 0x00000001,
+	.rmcr_value	= RMCR_RNC,
 
 	.apr		= 1,
 	.mpr		= 1,
Index: net/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net/drivers/net/ethernet/renesas/sh_eth.h
@@ -321,6 +321,9 @@ enum TD_STS_BIT {
 #define TD_TFP	(TD_TFP1|TD_TFP0)
 
 /* RMCR */
+enum RMCR_BIT {
+	RMCR_RNC = 0x00000001,
+};
 #define DEFAULT_RMCR_VALUE	0x00000000
 
 /* ECMR */

^ permalink raw reply

* Re: [PATCH ipsec v2] xfrm: prevent ipcomp scratch buffer race condition
From: Eric Dumazet @ 2013-10-15 22:44 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, netdev
In-Reply-To: <20131015214030.B0D06E8A60@unicorn.suse.cz>

On Tue, 2013-10-15 at 23:40 +0200, Michal Kubecek wrote:
> In ipcomp_compress(), sortirq is enabled too early, allowing the
> per-cpu scratch buffer to be rewritten by ipcomp_decompress()
> (called on the same CPU in softirq context) between populating
> the buffer and copying the compressed data to the skb.
> 
> Add similar protection into ipcomp_decompress() as it can be
> called from process context as well (even if such scenario seems
> a bit artificial).
> 
> v2: as pointed out by Steffen Klassert, if we also move the
> local_bh_disable() before reading the per-cpu pointers, we can
> get rid of get_cpu()/put_cpu().
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> ---
>  net/xfrm/xfrm_ipcomp.c | 29 ++++++++++++++++++-----------
>  1 file changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
> index 2906d52..9fe4f42 100644
> --- a/net/xfrm/xfrm_ipcomp.c
> +++ b/net/xfrm/xfrm_ipcomp.c
> @@ -45,12 +45,17 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
>  	const int plen = skb->len;
>  	int dlen = IPCOMP_SCRATCH_SIZE;
>  	const u8 *start = skb->data;
> -	const int cpu = get_cpu();
> -	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
> -	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
> -	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
> +	struct crypto_comp *tfm;
> +	u8 *scratch;
> +	int cpu;
> +	int err;
>  	int len;
>  
> +	local_bh_disable();
> +	cpu = smp_processor_id();
> +	scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
> +	tfm = *per_cpu_ptr(ipcd->tfms, cpu);

Have you tried this_cpu_ptr() instead ?

^ permalink raw reply

* Re: "xfrm: Fix the gc threshold value for ipv4" broke my IPSec connections
From: Eric Dumazet @ 2013-10-15 22:51 UTC (permalink / raw)
  To: Damian Pietras, Steffen Klassert; +Cc: netdev
In-Reply-To: <525DBE65.1070707@daper.net>

On Wed, 2013-10-16 at 00:15 +0200, Damian Pietras wrote:
> On 15.10.2013 23:02, Eric Dumazet wrote:
> >> 703fb94ec58e0e8769380c2877a8a34aeb5b6c97
> >> xfrm: Fix the gc threshold value for ipv4
> >>
> >> Reverting it on 3.10.15 fixes my issue. This seems to be there from 3.7
> >> and I don't really believe such simple case stayed broken for so long.
> >> Em I missing something or there is really a bug?
> >>
> >> If smeone is interested in details of this configuration and commands
> >> I'm running, just let me know. This was reproduced with few VMs under XEN.
> >>
> > 
> > It looks like you need to tune /proc/sys/net/ipv4/xfrm4_gc_thresh to a
> > sensible value given your workload.
> > 
> > try :
> > 
> > echo 65536 >/proc/sys/net/ipv4/xfrm4_gc_thresh
> > 
> > Presumably the 1024 default is really too small...
> 
> Now it's working in my test setup, I'm changing it on the production
> boxes, thanks!
> 
> 

Steffen, what do you think ?

1024 seems really small, given we had much higher values.

(256 K on a 1GB host)

This sysctl also needs an entry in
Documentation/networking/ip-sysctl.txt

^ permalink raw reply

* Re: [PATCH RFC 0/2] xfrm: Remove ancient sleeping code
From: David Miller @ 2013-10-15 23:14 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20131015073020.GV7660@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 15 Oct 2013 09:30:20 +0200

> We could fiddle something to get a terminating condition if the
> state is not resolved after some time, but my plan was to disable
> the larval_drop sysctl by default some day again. At best without
> any notable change to userspace. That's why I would prefer to
> remove the sleeping entirely.

Ok I have no problem with removing the sleeping stuff.

^ permalink raw reply

* Re: [PATCH RFC 0/2] xfrm: Remove ancient sleeping code
From: Eric Dumazet @ 2013-10-15 23:58 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20131015073020.GV7660@secunet.com>

On Tue, 2013-10-15 at 09:30 +0200, Steffen Klassert wrote:

> Right, that's why I've limited the queue to 100 packets. We can
> queue the SYNs of up to 100 tcp connestions that want to use
> this IPsec state. It surely can happen that we queue multiple
> retransmitted SYNs if the IPsec resolution is slow. But the
> queueing code tries at least to get the packets out before
> the first tcp retransmit. I think there is still room for
> optimizations, maybe reducing the queue lenght or the queue
> timeout to avoid queueing retransmitted SYNs as much as possible.

Note that its totally possible to avoid retransmitting SYN if original
SYN is still in a host queue.

We currently increment a SNMP counter when we detect this, we could
do something else (like not queuing a copy of the packet)

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=0e280af026a5662ffd57c4e623b822df1f7f47ff

Another work in progress is to delay RTO arming at the time TCP
packet leaves the host queues, instead of at the enqueue time.

^ permalink raw reply

* [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  0:02 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明
  Cc: lvs-devel, netdev, Julian Anastasov, Mark Brooks, Simon Horman

In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
such that it creates and uses a neigh entry if none is found.
Subsequently the 'n' field was removed from struct rt6_info.

Unfortunately my analysis is that in the case of IPVS direct routing this
change leads to incorrect behaviour as in this case packets may be output
to a destination other than where they would be output according to the
route table. In particular, the destination address may actually be a local
address and empirically a neighbour lookup seems to result in it becoming
unreachable.

This patch resolves the problem by providing the destination address
determined by IPVS to ip6_finish_output2() in the skb callback.  Although
this seems to work I can see several problems with this approach:

* It is rather ugly, stuffing an IPVS exception right in
  the middle of IPv6 code. The overhead could be eliminated for many users
  by using a staic key. But none the less it is not attractive.

* The use of the skb callback is may not be valid
  as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
  alternative is to add a new field to struct sk_buff.

* This covers all IPv6 packets output by IPVS but actually
  only those output using IPVS Direct-Routing need this.  One way to
  resolve this would be to add a more fine-grained ipvs_property to
  struct sk_buff.

Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h             | 6 ++++++
 net/ipv6/ip6_output.c           | 9 +++++++--
 net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..11d90a6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
 		atomic_read(&dest->inactconns);
 }
 
+struct ipvs_skb_cb {
+	struct in6_addr *daddr;
+};
+
+#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
+
 #endif	/* _NET_IP_VS_H */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..a340180 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -52,6 +52,7 @@
 #include <net/addrconf.h>
 #include <net/rawv6.h>
 #include <net/icmp.h>
+#include <net/ip_vs.h>
 #include <net/xfrm.h>
 #include <net/checksum.h>
 #include <linux/mroute6.h>
@@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
 	struct neighbour *neigh;
-	struct in6_addr *nexthop;
+	struct in6_addr *nexthop, *daddr;
 	int ret;
 
 	skb->protocol = htons(ETH_P_IPV6);
@@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
+	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
+		daddr = IP_VS_SKB_CB(skb)->daddr;
+	else
+		daddr = &ipv6_hdr(skb)->daddr;
+	nexthop = rt6_nexthop((struct rt6_info *)dst, daddr);
 	neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index c47444e..054b679 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -391,6 +391,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
 		rt = (struct rt6_info *) dst;
 	}
 
+	IP_VS_SKB_CB(skb)->daddr = daddr;
+
 	local = __ip_vs_is_local_route6(rt);
 	if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
 	      rt_mode)) {
-- 
1.8.4


^ permalink raw reply related

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  0:13 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381881751-6719-1-git-send-email-horms@verge.net.au>

On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
> 
> Unfortunately my analysis is that in the case of IPVS direct routing this
> change leads to incorrect behaviour as in this case packets may be output
> to a destination other than where they would be output according to the
> route table. In particular, the destination address may actually be a local
> address and empirically a neighbour lookup seems to result in it becoming
> unreachable.
> 
> This patch resolves the problem by providing the destination address
> determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> this seems to work I can see several problems with this approach:
> 
> * It is rather ugly, stuffing an IPVS exception right in
>   the middle of IPv6 code. The overhead could be eliminated for many users
>   by using a staic key. But none the less it is not attractive.
> 
> * The use of the skb callback is may not be valid
>   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
>   alternative is to add a new field to struct sk_buff.

Please no ;)

> 
> * This covers all IPv6 packets output by IPVS but actually
>   only those output using IPVS Direct-Routing need this.  One way to
>   resolve this would be to add a more fine-grained ipvs_property to
>   struct sk_buff.
> 
> Reported-by: Mark Brooks <mark@loadbalancer.org>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
>  include/net/ip_vs.h             | 6 ++++++
>  net/ipv6/ip6_output.c           | 9 +++++++--
>  net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 1c2e1b9..11d90a6 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
>  		atomic_read(&dest->inactconns);
>  }
>  
> +struct ipvs_skb_cb {
> +	struct in6_addr *daddr;
> +};

So we pass a reference.

> +
> +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> +
>  #endif	/* _NET_IP_VS_H */
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index a54c45c..a340180 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -52,6 +52,7 @@
>  #include <net/addrconf.h>
>  #include <net/rawv6.h>
>  #include <net/icmp.h>
> +#include <net/ip_vs.h>
>  #include <net/xfrm.h>
>  #include <net/checksum.h>
>  #include <linux/mroute6.h>
> @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
>  	struct dst_entry *dst = skb_dst(skb);
>  	struct net_device *dev = dst->dev;
>  	struct neighbour *neigh;
> -	struct in6_addr *nexthop;
> +	struct in6_addr *nexthop, *daddr;
>  	int ret;
>  
>  	skb->protocol = htons(ETH_P_IPV6);
> @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
>  	}
>  
>  	rcu_read_lock_bh();
> -	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> +	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> +		daddr = IP_VS_SKB_CB(skb)->daddr;

What guarantee do we have daddr points to valid memory (not already
freed/reused) ?

I guess things like NFQUEUE could happen ?




^ permalink raw reply

* Re: DomU's network interface will hung when Dom0 running 32bit
From: jianhai luan @ 2013-10-16  0:15 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI
In-Reply-To: <525D6C16.6010705@oracle.com>


On 2013-10-16 0:23, jianhai luan wrote:
>
> On 2013-10-16 0:03, Wei Liu wrote:
>> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
>> [...]
>>>>>>> * time_after_eq(now, next_credit) -> false
>>>>>>> * time_before(now, expires) -> false
>>>>>> If now is placed in above environment, the result will be correct
>>>>>> (Sending package will be not allowed until next_credit).
>>>>> No, it is not necessarily correct. Keep in mind that "now" wraps 
>>>>> around,
>>>>> which is the issue you try to fix. You still have a window to 
>>>>> stall your
>>>>> frontend.
>>>> Remember that time_after_eq is supposed to work even with wraparound
>>>> occurring, so long as the two times are less than MAX_LONG/2 apart.
>>> Sorry for my misunderstand explanation. I mean that
>>>    * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
>>> please think about  jiffies in line increasing.
>>>    * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
>>> the judge will be wrong if out of the range.
>>>
>>> So please think about three kind environment
>>>    -  expires        now        next_credit
>>>       --------time increases this direction ---------->
>>>
>>>    -  expires        [next_credit        now next_credit+MAX_LONG/2
>>>       --------time increase this direction ----------->
>>>
>>>    - expires        next_credit        next_credit+MAX_LONG/2 now
>>>       --------time increadse this direction ---------->
>>>
>>> The first environment should be netfront consume all credit_byte
>>> before next_credit, So we should pending one timer to calculator the
>>> new credit_byte, and don't transmit until next_credit.
>>>
>>> the second environment should be calculator the credit_byte because
>>> netfront don't consume all credit_byte before next_credit, and
>>> time_after_eq() do correct judge.
>>>
>>> the third environment should be calculator in time because netfront
>>> don't consume all credit_byte until next_credit.But time_after_eq do
>>> error judge (time_after_eq(now, next_credit) is false), so the
>>> remaining_byte isn't be increased.
>>>
>>> and I work on the third environment.  You know now >
>>> next_credit+MAX_LONG/2, time_before(now, expire) should be
>>> true(time_before(now, expire) is false in first environment)
>> Thanks for staighten this out for me. I'm just too dumb for this, please
>> be patient with me. :-)
>>
>> Could you prove that time_before(now, expire) is always true in third
>> case? That's where my main cencern lies. Is it because msecs_to_jiffies
>> always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?
>
> I have wrong judge in third environment. If now large than expires + 
> MAX_UNLONG, time_before(now, expires) will be false.
>   expires    next_credit    next_credit+MAX_UNLONG/2    expires + 
> MAX_UNLONG    now    next_credit+MAX_UNLONG
>   --------------------------------------------------------- time 
> increadse this direction  ---------------------------------->
>
>   In the above environment, time_before(now, expires) will return 
> false. But the jiffies elapsed more time and next_credit will be 
> reachable in soon(time_after_eq(now, next_credit) will be true).

After above talk, the window should be exist (expire+MAX_ULONG 
next_credit+MAX_ULONG, expire + 2MAX_ULONF next_credit+MAX_ULONG 
....,expire+<n>ULONG next_credit+<n>MAX_ULONG). Other time window should 
be not exist (maybe i don't think about).
  If so, please think about the below:
   * If no speed control, vif->credit_usec should be zero. expire = 
next_credit and the window is zero
   * If we have done speed control, the scenario should be very likely 
than first environment except the value is larger (the delta is 
<n>MAX_UNLONG)
      - If do speed control. the window should have been think about
                      speed            worse time
                      100M/s          40s
                      1000M/s       4s
      - the time should be acceptable.
>>
>> Wei.
>

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  0:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381882426.2045.85.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 15, 2013 at 05:13:46PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
> > 
> > Unfortunately my analysis is that in the case of IPVS direct routing this
> > change leads to incorrect behaviour as in this case packets may be output
> > to a destination other than where they would be output according to the
> > route table. In particular, the destination address may actually be a local
> > address and empirically a neighbour lookup seems to result in it becoming
> > unreachable.
> > 
> > This patch resolves the problem by providing the destination address
> > determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> > this seems to work I can see several problems with this approach:
> > 
> > * It is rather ugly, stuffing an IPVS exception right in
> >   the middle of IPv6 code. The overhead could be eliminated for many users
> >   by using a staic key. But none the less it is not attractive.
> > 
> > * The use of the skb callback is may not be valid
> >   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> >   alternative is to add a new field to struct sk_buff.
> 
> Please no ;)

I thought of you when I wrote that comment :)

> > * This covers all IPv6 packets output by IPVS but actually
> >   only those output using IPVS Direct-Routing need this.  One way to
> >   resolve this would be to add a more fine-grained ipvs_property to
> >   struct sk_buff.
> > 
> > Reported-by: Mark Brooks <mark@loadbalancer.org>
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> > ---
> >  include/net/ip_vs.h             | 6 ++++++
> >  net/ipv6/ip6_output.c           | 9 +++++++--
> >  net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
> >  3 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> > index 1c2e1b9..11d90a6 100644
> > --- a/include/net/ip_vs.h
> > +++ b/include/net/ip_vs.h
> > @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
> >  		atomic_read(&dest->inactconns);
> >  }
> >  
> > +struct ipvs_skb_cb {
> > +	struct in6_addr *daddr;
> > +};
> 
> So we pass a reference.
>
> > +
> > +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> > +
> >  #endif	/* _NET_IP_VS_H */
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index a54c45c..a340180 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -52,6 +52,7 @@
> >  #include <net/addrconf.h>
> >  #include <net/rawv6.h>
> >  #include <net/icmp.h>
> > +#include <net/ip_vs.h>
> >  #include <net/xfrm.h>
> >  #include <net/checksum.h>
> >  #include <linux/mroute6.h>
> > @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
> >  	struct dst_entry *dst = skb_dst(skb);
> >  	struct net_device *dev = dst->dev;
> >  	struct neighbour *neigh;
> > -	struct in6_addr *nexthop;
> > +	struct in6_addr *nexthop, *daddr;
> >  	int ret;
> >  
> >  	skb->protocol = htons(ETH_P_IPV6);
> > @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
> >  	}
> >  
> >  	rcu_read_lock_bh();
> > -	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> > +	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> > +		daddr = IP_VS_SKB_CB(skb)->daddr;
> 
> What guarantee do we have daddr points to valid memory (not already
> freed/reused) ?

I can change that to passing a value if there is a risk
the reference could become invalid. To be honest I am more
worried that skb->cb might be clobbered entirely.

> I guess things like NFQUEUE could happen ?

Could you expand a little?

^ permalink raw reply

* Re: [PATCH] sh_eth: add/use RMCR.RNC bit
From: Simon Horman @ 2013-10-16  0:34 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, nobuhiro.iwamatsu.yj, linux-sh, davem
In-Reply-To: <201310160229.58735.sergei.shtylyov@cogentembedded.com>

On Wed, Oct 16, 2013 at 02:29:58AM +0400, Sergei Shtylyov wrote:
> Declare 'enum EMCR_BIT' containing the single member for the RMCR.RNC bit and
> replace bare numbers in the driver by  this mnemonic.
> 
> Suggested-by: David Miller <davem@davemloft.net>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Thanks Sergei,

this seems to move things in the right direction.

Reviewed-by: Simon Horman <horms+renesas@verge.net.au>

> ---
> This patch is against DaveM's 'net.git' repo but it intended for 'net-next.git'
> repo -- it's  because 'net-next.git' doesn't contain the required Simon Horman's
> patch yet.
> 
>  drivers/net/ethernet/renesas/sh_eth.c |    6 +++---
>  drivers/net/ethernet/renesas/sh_eth.h |    3 +++
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> Index: net/drivers/net/ethernet/renesas/sh_eth.c
> ===================================================================
> --- net.orig/drivers/net/ethernet/renesas/sh_eth.c
> +++ net/drivers/net/ethernet/renesas/sh_eth.c
> @@ -483,7 +483,7 @@ static struct sh_eth_cpu_data sh7757_dat
>  	.register_type	= SH_ETH_REG_FAST_SH4,
>  
>  	.eesipr_value	= DMAC_M_RFRMER | DMAC_M_ECI | 0x003fffff,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.tx_check	= EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
>  	.eesr_err_check	= EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
> @@ -561,7 +561,7 @@ static struct sh_eth_cpu_data sh7757_dat
>  			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
>  			  EESR_TDE | EESR_ECI,
>  	.fdr_value	= 0x0000072f,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.irq_flags	= IRQF_SHARED,
>  	.apr		= 1,
> @@ -689,7 +689,7 @@ static struct sh_eth_cpu_data r8a7740_da
>  			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
>  			  EESR_TDE | EESR_ECI,
>  	.fdr_value	= 0x0000070f,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.apr		= 1,
>  	.mpr		= 1,
> Index: net/drivers/net/ethernet/renesas/sh_eth.h
> ===================================================================
> --- net.orig/drivers/net/ethernet/renesas/sh_eth.h
> +++ net/drivers/net/ethernet/renesas/sh_eth.h
> @@ -321,6 +321,9 @@ enum TD_STS_BIT {
>  #define TD_TFP	(TD_TFP1|TD_TFP0)
>  
>  /* RMCR */
> +enum RMCR_BIT {
> +	RMCR_RNC = 0x00000001,
> +};
>  #define DEFAULT_RMCR_VALUE	0x00000000
>  
>  /* ECMR */
> 

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  0:39 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <20131016002807.GM22321@verge.net.au>

On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:

> > I guess things like NFQUEUE could happen ?
> 
> Could you expand a little?

This was to point that between IPVS and ipv6 stack we might have a
delay, and daddr was maybe pointed to a freed memory.

IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox