Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Simon Horman @ 2011-10-19 12:08 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Hans Schillström, Krzysztof Wilczynski, Patrick McHardy,
	netdev
In-Reply-To: <20111019111112.GB10008@1984>

On Wed, Oct 19, 2011 at 01:11:12PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Oct 19, 2011 at 07:49:57PM +0900, Simon Horman wrote:
> > Yes, it seems that Pablo has picked up the change but
> > it hasn't propagated to Dave's tree for one reason or another.
> 
> I was waiting for david to take bugfixes for his net tree. This
> happened yesterday. Now I'll send updates for net-next.
> 
> I expect to make it along today, please, be patient, they'll show up
> in david's net-next tree soon :-)

Hi Pablo,

thanks for the clarification. I can be patient :-)

^ permalink raw reply

* Re: [PATCH 0/3] net: time stamping fixes
From: Richard Cochran @ 2011-10-19 11:50 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, netdev, eric.dumazet
In-Reply-To: <1319001336.4424.8.camel@jlt3.sipsolutions.net>

On Wed, Oct 19, 2011 at 07:15:36AM +0200, Johannes Berg wrote:
> The only thing I'm not completely sure about is whether or not it is
> permissible to sock_hold() at that point. I'm probably just missing
> something, but: if sk_free() was called before hard_start_xmit() which
> will call skb_clone_tx_timestamp(), can we really call sock_hold()?
> 
> The reason I ask is that sock_wfree() doesn't check sk_refcnt, so if it
> is possible for sk_free() to have been called before hard_start_xmit(),
> maybe because the packet was stuck on the qdisc for a while, the socket
> won't be released (sk_free checks sk_wmem_alloc) but the sk_wfree() when
> the original skb is freed will actually free the socket, invalidating
> the clone's sk pointer *even though* we called sock_hold() right after
> making the clone.
> 
> So what guarantees that sk_refcnt is still non-zero when we make the
> clone?

In the non-qdisc path, the kernel is in a send() call, so the initial
reference taken in socket() is held.

I really don't know the qdisc code, whether it is somehow holding the
skb->sk indirectly or not.

Eric? David?

> Alternatively, should sock_wfree() check sk_refcnt?

That would presumably spoil the performance enhancement gained in
commit 2b85a34e.

^ permalink raw reply

* Re: [PATCH] iproute2: Conforming to -D_FORTIFY_SOURCE=2 restrictions
From: Eric Dumazet @ 2011-10-19 11:30 UTC (permalink / raw)
  To: Bin Li; +Cc: Stephen Hemminger, netdev
In-Reply-To: <CAGBH1r66sTJUiq=EaqgsW-3nKzhhJoQ1Bwd9EC6AmsEd-E8SRQ@mail.gmail.com>

Le mercredi 19 octobre 2011 à 17:15 +0800, Bin Li a écrit :
> Stephen,
> 
>  You can reproduce this issue in 2.6.37 like below. And the previous
> gdb log is after the install the debuginfo package in SUSE.
> 
> # ip -6 xfrm state add src 3ffe:501:ffff:ff03:21a:64ff:fe12:e4c1 dst
> 3ffe:501:ffff:ff05:200:ff:fe00:c1c1 proto ah spi 0x1000 mode transport
> auth md5 "TAHITEST89ABCDEF"
> 
> *** buffer overflow detected ***: ip terminated
> ======= Backtrace: =========
> /lib/libc.so.6(__fortify_fail+0x40)[0xb76d0070]
> /lib/libc.so.6(+0xe8e27)[0xb76cde27]
> /lib/libc.so.6(+0xe8317)[0xb76cd317]
> ip[0x806d6c4]
> ip(do_xfrm_state+0x120)[0x806dc70]
> ip(do_xfrm+0x81)[0x806ad51]
> ip[0x804c355]
> ip(main+0x476)[0x804caa6]
> /lib/libc.so.6(__libc_start_main+0xfe)[0xb75fbc2e]
> ip[0x804c261]
> ======= Memory map: ========
> 08048000-08087000 r-xp 00000000 08:01 4465       /sbin/ip
> 08087000-08088000 r--p 0003e000 08:01 4465       /sbin/ip
> 08088000-0808a000 rw-p 0003f000 08:01 4465       /sbin/ip
> 0808a000-080ad000 rw-p 00000000 00:00 0          [heap]
> b75c6000-b75e2000 r-xp 00000000 08:01 131084     /lib/libgcc_s.so.1
> b75e2000-b75e3000 r--p 0001b000 08:01 131084     /lib/libgcc_s.so.1
> b75e3000-b75e4000 rw-p 0001c000 08:01 131084     /lib/libgcc_s.so.1
> b75e4000-b75e5000 rw-p 00000000 00:00 0
> b75e5000-b774b000 r-xp 00000000 08:01 131375     /lib/libc-2.11.3.so
> b774b000-b774c000 ---p 00166000 08:01 131375     /lib/libc-2.11.3.so
> b774c000-b774e000 r--p 00166000 08:01 131375     /lib/libc-2.11.3.so
> b774e000-b774f000 rw-p 00168000 08:01 131375     /lib/libc-2.11.3.so
> b774f000-b7752000 rw-p 00000000 00:00 0
> b7752000-b7755000 r-xp 00000000 08:01 131428     /lib/libdl-2.11.3.so
> b7755000-b7756000 r--p 00002000 08:01 131428     /lib/libdl-2.11.3.so
> b7756000-b7757000 rw-p 00003000 08:01 131428     /lib/libdl-2.11.3.so
> b7774000-b7775000 rw-p 00000000 00:00 0
> b7775000-b7794000 r-xp 00000000 08:01 154467     /lib/ld-2.11.3.so
> b7794000-b7795000 r--p 0001e000 08:01 154467     /lib/ld-2.11.3.so
> b7795000-b7796000 rw-p 0001f000 08:01 154467     /lib/ld-2.11.3.so
> bfa02000-bfa23000 rw-p 00000000 00:00 0          [stack]
> ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
> Aborted
> 
> And If without -D_FORTIFY_SOURCE=2 in gcc, it works fine, so It's a
> bug in iproute2 which is not conforming to -D_FORTIFY_SOURCE=2
> restrictions.
> 

FORTIFY assumes we cant copy a string on alg.u.alg.alg_key !

This completely precludes 0-sized arrays

struct xfrm_algo {
        char            alg_name[64];
        unsigned int    alg_key_len;    /* in bits */
        char            alg_key[0];
};

struct {
      union {
          struct xfrm_algo alg;
          struct xfrm_algo_aead aead;
          struct xfrm_algo_auth auth;
      } u;
      char buf[XFRM_ALGO_KEY_BUF_SIZE];
} alg = {};

I would say its a FORTIFY bug. This kind of construct is perfectly
valid.

^ permalink raw reply

* Re: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Pablo Neira Ayuso @ 2011-10-19 11:11 UTC (permalink / raw)
  To: Simon Horman
  Cc: Hans Schillström, Krzysztof Wilczynski, Patrick McHardy,
	netdev
In-Reply-To: <20111019104945.GA17995@verge.net.au>

On Wed, Oct 19, 2011 at 07:49:57PM +0900, Simon Horman wrote:
> Yes, it seems that Pablo has picked up the change but
> it hasn't propagated to Dave's tree for one reason or another.

I was waiting for david to take bugfixes for his net tree. This
happened yesterday. Now I'll send updates for net-next.

I expect to make it along today, please, be patient, they'll show up
in david's net-next tree soon :-)

^ permalink raw reply

* Re: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Pablo Neira Ayuso @ 2011-10-19 11:07 UTC (permalink / raw)
  To: Hans Schillström
  Cc: Krzysztof Wilczynski, Simon Horman, Patrick McHardy,
	netdev@vger.kernel.org
In-Reply-To: <C8A6796DE7C66C4ABCBC18106CB6C1CC106D903185@ESESSCMS0356.eemea.ericsson.se>

On Wed, Oct 19, 2011 at 12:37:51PM +0200, Hans Schillström wrote:
> >commit 833656973cde30ba067a994c4802ebab6c9557ee
> >Author: Simon Horman <horms@verge.net.au>
> >Date:   Fri Sep 16 14:11:49 2011 +0900
> >
> >    ipvs: Remove unused return value of protocol state transitions
> >
> >    Acked-by: Julian Anastasov <ja@ssi.bg>
> >    Acked-by Hans Schillstrom <hans@schillstrom.com>
> >    Signed-off-by: Simon Horman <horms@verge.net.au>
> >    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> >
> 
> From what I can see Simon made a pull request and it was Acked by Pablo 28 Sep
> http://www.spinics.net/lists/netfilter-devel/msg19742.html
> 
> I have not seen any output to Miller yet...
> but I might be wrong in that case 

This patch is in my nf-next tree:

http://1984.lsi.us.es/git/?p=net-next/.git;a=shortlog;h=refs/heads/nf-next

David just pulled from my nf tree for bugfix yesterday. I expect to
send him nf-next updates by today.

^ permalink raw reply

* Re: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Simon Horman @ 2011-10-19 10:51 UTC (permalink / raw)
  To: Krzysztof Wilczynski; +Cc: Patrick McHardy, netdev
In-Reply-To: <1318967989-14076-1-git-send-email-krzysztof.wilczynski@linux.com>

On Tue, Oct 18, 2011 at 08:59:49PM +0100, Krzysztof Wilczynski wrote:
> This is to address the following warning during compilation time:
> 
>   net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_leave’:
>   net/netfilter/ipvs/ip_vs_core.c:532: warning: unused variable ‘cs’
> 
> This variable is indeed no longer in use.

Thanks, applied to ipvs-next,
currently living at git://github.com/horms/ipvs-next.git

^ permalink raw reply

* Re: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Simon Horman @ 2011-10-19 10:49 UTC (permalink / raw)
  To: Hans Schillström
  Cc: Krzysztof Wilczynski, Patrick McHardy, Pablo Neira Ayuso, netdev
In-Reply-To: <C8A6796DE7C66C4ABCBC18106CB6C1CC106D903185@ESESSCMS0356.eemea.ericsson.se>

On Wed, Oct 19, 2011 at 12:37:51PM +0200, Hans Schillström wrote:
> 
> >Hello,
> >
> >[...]
> >> In my code there is another fix needed as well (todays net-next)
> >> i.e.  usage od cs  "cs = ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd)"
> >[...]
> >
> >Well spotted in terms of "net-next" branch and/or repository.
> >
> >In my case, I was working against "ipvs-next" using Simon's tree on
> >github (I have made the fork when kernel.org facilities were not
> >available at the time): https://github.com/horms/ipvs-next, and the
> >code line you still appear to have in Dave's "net-next" (if that is
> >the one) was amended in the following commit made against "ipvs-next":
> 
> Sorry, I didn't use that repo...
> (waiting for ipvs return to kernel.org)

It may take a little while for me to get into the new
kernel.org web of trust which needs to happen before
ipvs moves back to kernel.org.
> >
> >commit 833656973cde30ba067a994c4802ebab6c9557ee
> >Author: Simon Horman <horms@verge.net.au>
> >Date:   Fri Sep 16 14:11:49 2011 +0900
> >
> >    ipvs: Remove unused return value of protocol state transitions
> >
> >    Acked-by: Julian Anastasov <ja@ssi.bg>
> >    Acked-by Hans Schillstrom <hans@schillstrom.com>
> >    Signed-off-by: Simon Horman <horms@verge.net.au>
> >    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> >
> 
> >From what I can see Simon made a pull request and it was Acked by Pablo 28 Sep
> http://www.spinics.net/lists/netfilter-devel/msg19742.html
> 
> I have not seen any output to Miller yet...
> but I might be wrong in that case 

Yes, it seems that Pablo has picked up the change but
it hasn't propagated to Dave's tree for one reason or another.

> >It wasn't there when I went about cleaning unused declaration of "cs".
> >I guess we have to make sure that everything between github (and
> >repositories there) and kernel.org repositories like Dave's "net-next"
> >align and holds water. Just by looking at some files, I can see that
> >"net-next" is missing some commits.
> >
> >KW
> 
> Regards
> Hans

^ permalink raw reply

* RE: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Hans Schillström @ 2011-10-19 10:37 UTC (permalink / raw)
  To: Krzysztof Wilczynski
  Cc: Simon Horman, Patrick McHardy, netdev@vger.kernel.org,
	pablo@netfilter.org
In-Reply-To: <CAC52Y8ZvgBVC4_TEBFMALa-xU84PseV6w6sJO64ch+BdAqD2aA@mail.gmail.com>


>Hello,
>
>[...]
>> In my code there is another fix needed as well (todays net-next)
>> i.e.  usage od cs  "cs = ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd)"
>[...]
>
>Well spotted in terms of "net-next" branch and/or repository.
>
>In my case, I was working against "ipvs-next" using Simon's tree on
>github (I have made the fork when kernel.org facilities were not
>available at the time): https://github.com/horms/ipvs-next, and the
>code line you still appear to have in Dave's "net-next" (if that is
>the one) was amended in the following commit made against "ipvs-next":

Sorry, I didn't use that repo...
(waiting for ipvs return to kernel.org)

>
>
>commit 833656973cde30ba067a994c4802ebab6c9557ee
>Author: Simon Horman <horms@verge.net.au>
>Date:   Fri Sep 16 14:11:49 2011 +0900
>
>    ipvs: Remove unused return value of protocol state transitions
>
>    Acked-by: Julian Anastasov <ja@ssi.bg>
>    Acked-by Hans Schillstrom <hans@schillstrom.com>
>    Signed-off-by: Simon Horman <horms@verge.net.au>
>    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>

>From what I can see Simon made a pull request and it was Acked by Pablo 28 Sep
http://www.spinics.net/lists/netfilter-devel/msg19742.html

I have not seen any output to Miller yet...
but I might be wrong in that case 


>
>It wasn't there when I went about cleaning unused declaration of "cs".
>I guess we have to make sure that everything between github (and
>repositories there) and kernel.org repositories like Dave's "net-next"
>align and holds water. Just by looking at some files, I can see that
>"net-next" is missing some commits.
>
>KW

Regards
Hans

^ permalink raw reply

* Re: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Krzysztof Wilczynski @ 2011-10-19 10:13 UTC (permalink / raw)
  To: Hans Schillström
  Cc: Simon Horman, Patrick McHardy, netdev@vger.kernel.org
In-Reply-To: <C8A6796DE7C66C4ABCBC18106CB6C1CC106D903183@ESESSCMS0356.eemea.ericsson.se>

Hello,

[...]
> In my code there is another fix needed as well (todays net-next)
> i.e.  usage od cs  "cs = ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd)"
[...]

Well spotted in terms of "net-next" branch and/or repository.

In my case, I was working against "ipvs-next" using Simon's tree on
github (I have made the fork when kernel.org facilities were not
available at the time): https://github.com/horms/ipvs-next, and the
code line you still appear to have in Dave's "net-next" (if that is
the one) was amended in the following commit made against "ipvs-next":


commit 833656973cde30ba067a994c4802ebab6c9557ee
Author: Simon Horman <horms@verge.net.au>
Date:   Fri Sep 16 14:11:49 2011 +0900

    ipvs: Remove unused return value of protocol state transitions

    Acked-by: Julian Anastasov <ja@ssi.bg>
    Acked-by Hans Schillstrom <hans@schillstrom.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


It wasn't there when I went about cleaning unused declaration of "cs".
I guess we have to make sure that everything between github (and
repositories there) and kernel.org repositories like Dave's "net-next"
align and holds water. Just by looking at some files, I can see that
"net-next" is missing some commits.

KW

^ permalink raw reply

* Re: [patch] pktgen: bug when calling ndelay in x86 architectures
From: Eric Dumazet @ 2011-10-19 10:13 UTC (permalink / raw)
  To: Daniel Turull
  Cc: Ben Hutchings, David Miller, netdev, Robert Olsson,
	Voravit Tanyingyong, Jens Laas
In-Reply-To: <4E9E9963.7090209@gmail.com>

Le mercredi 19 octobre 2011 à 11:33 +0200, Daniel Turull a écrit :
> Hi,
> then if we want to use the spin more often.
> maybe we can increase the constant from 100000 (0.1ms) to 1000000 (1ms)?
> How was the current value chosen?
> 

Based on user needs ;)

> I did some measurements of the inter-arrival time between packets
> and with bigger values the maximal is reduced in the rates between
> 2kpps and 20kpps.
> 

ndelay()/udelay() have some inaccuracies, for 'long' values, because of
rounding errors.

If we spin, just call ktime_now() in a loop until spin_until is
reached...

That way you get max possible resolution, given kernel time service
constraints.

Untested patch :

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 38d6577..5c7e900 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2145,9 +2145,11 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 	}
 
 	start_time = ktime_now();
-	if (remaining < 100000)
-		ndelay(remaining);	/* really small just spin */
-	else {
+	if (remaining < 100000) {
+		do {
+			end_time = ktime_now();
+		} while (ktime_lt(end_time, spin_until));
+	} else {
 		/* see do_nanosleep */
 		hrtimer_init_sleeper(&t, current);
 		do {
@@ -2162,8 +2164,8 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 			hrtimer_cancel(&t.timer);
 		} while (t.task && pkt_dev->running && !signal_pending(current));
 		__set_current_state(TASK_RUNNING);
+		end_time = ktime_now();
 	}
-	end_time = ktime_now();
 
 	pkt_dev->idle_acc += ktime_to_ns(ktime_sub(end_time, start_time));
 	pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);

^ permalink raw reply related

* RE: [PATCH] ipvs: Remove unused variable "cs" from ip_vs_leave function.
From: Hans Schillström @ 2011-10-19  9:23 UTC (permalink / raw)
  To: Krzysztof Wilczynski, Simon Horman
  Cc: Patrick McHardy, netdev@vger.kernel.org
In-Reply-To: <1318967989-14076-1-git-send-email-krzysztof.wilczynski@linux.com>

Hello
>This is to address the following warning during compilation time:
>
>  net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_leave’:
>  net/netfilter/ipvs/ip_vs_core.c:532: warning: unused variable ‘cs’
>
>This variable is indeed no longer in use.
>
>Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com>

In my code there is another fix needed as well (todays net-next)
i.e.  usage od cs  "cs = ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd)"


diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 4f77bb1..a4ef9c6 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -530,7 +530,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
           a cache_bypass connection entry */
        ipvs = net_ipvs(net);
        if (ipvs->sysctl_cache_bypass && svc->fwmark && unicast) {
-               int ret, cs;
+               int ret;
                struct ip_vs_conn *cp;
                unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
                                      iph.protocol == IPPROTO_UDP)?
@@ -557,7 +557,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
                ip_vs_in_stats(cp, skb);
 
                /* set state */
-               cs = ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd);
+               ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd);
 
                /* transmit the first SYN packet */
                ret = cp->packet_xmit(skb, cp, pd->pp);

>---
> net/netfilter/ipvs/ip_vs_core.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
>index 00ea1ad..4f7d89d 100644
>--- a/net/netfilter/ipvs/ip_vs_core.c
>+++ b/net/netfilter/ipvs/ip_vs_core.c
>@@ -529,7 +529,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
>           a cache_bypass connection entry */
>        ipvs = net_ipvs(net);
>        if (ipvs->sysctl_cache_bypass && svc->fwmark && unicast) {
>-               int ret, cs;
>+               int ret;
>                struct ip_vs_conn *cp;
>                unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
>                                      iph.protocol == IPPROTO_UDP)?
>--
>1.7.7

Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply related

* [PATCH] smsc911x: Enable flow control advertisement
From: Kazunori Kobayashi @ 2011-10-19  9:21 UTC (permalink / raw)
  To: netdev; +Cc: steve.glendinning, dhobsong, Kazunori Kobayashi

Enable the advertisement of both symmetric pause and asymmetric pause 
flow control capability. 

Signed-off-by: Kazunori Kobayashi <kkobayas@igel.co.jp>
---
 drivers/net/smsc911x.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/smsc911x.c b/drivers/net/smsc911x.c
index b9016a3..60c3bc2 100644
--- a/drivers/net/smsc911x.c
+++ b/drivers/net/smsc911x.c
@@ -800,8 +800,13 @@ static void smsc911x_phy_update_flowcontrol(struct smsc911x_data *pdata)
 	struct phy_device *phy_dev = pdata->phy_dev;
 	u32 afc = smsc911x_reg_read(pdata, AFC_CFG);
 	u32 flow;
+	u16 miiadv = smsc911x_mii_read(phy_dev->bus, phy_dev->addr,
+				       MII_ADVERTISE);
 	unsigned long flags;
 
+	miiadv |= (ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM);
+	smsc911x_mii_write(phy_dev->bus, phy_dev->addr, MII_ADVERTISE, miiadv);
+
 	if (phy_dev->duplex == DUPLEX_FULL) {
 		u16 lcladv = phy_read(phy_dev, MII_ADVERTISE);
 		u16 rmtadv = phy_read(phy_dev, MII_LPA);
-- 
1.7.0.4

^ permalink raw reply related

* Re: Problem with ixgbe and TX locked on one cpu
From: Paweł Staszewski @ 2011-10-19  9:21 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Linux Network Development list, e1000-devel
In-Reply-To: <CAEuXFEy89F_=J_PJMSbQ1fDcOEkkpz2vucq3LcToi7rTcZHcDA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5739 bytes --]

W dniu 2011-10-18 20:57, Jesse Brandeburg pisze:
> CC: e1000-devel
>
> 2011/10/14 Paweł Staszewski<pstaszewski@itcare.pl>:
>> Hello
>>
>> I have weird problem with ixgbe and irq affinity / rx-tx queue assignment
> what application are you running, how are you using ixgbe? looks like
> a router.  is something changing the skb->rx_queue entry (like
> netfilter) or is there a layered device above ixgbe (bonding or ...) ?
Yes it is a router. - Quagga - BGP

There is few (10rules) iptables rules in INPUT - and nothing more

There are also namespaces running - and traffic is processed in namespaces.
no bonding

> why do your interrupts move after a long period? did you do it by
> hand? we recommend disabling irqbalance and hand tuning interrupts
> possibly with the set_irq_affinity.sh script.
Yes it was changed by hand to check why TX is processed on one interrupt.
Also now it is impossible to move this TX traffic to other interrupt.

All configuration for affinity is made by hand - no irqbalance running.

>> Statistics for my ethernet - ixgbe driver:
>> ethtool -S eth4
>> NIC statistics:
>>      rx_packets: 5815535848808
>>      tx_packets: 5811202378421
>>      rx_bytes: 4791001750842200
>>      tx_bytes: 4781190419358301
>>      rx_pkts_nic: 5815535848827
>>      tx_pkts_nic: 5811202378510
>>      rx_bytes_nic: 4837563124411799
>>      tx_bytes_nic: 4829987507084013
>>      lsc_int: 8
>>      tx_busy: 0
>>      non_eop_descs: 0
>>      rx_errors: 0
>>      tx_errors: 0
>>      rx_dropped: 0
>>      tx_dropped: 0
>>      multicast: 92494273
>>      broadcast: 268718206
>>      rx_no_buffer_count: 28829
>>      collisions: 0
>>      rx_over_errors: 0
>>      rx_crc_errors: 0
>>      rx_frame_errors: 0
>>      hw_rsc_aggregated: 0
>>      hw_rsc_flushed: 0
>>      fdir_match: 0
>>      fdir_miss: 0
>>      rx_fifo_errors: 0
>>      rx_missed_errors: 307051074
>>      tx_aborted_errors: 0
>>      tx_carrier_errors: 0
>>      tx_fifo_errors: 0
>>      tx_heartbeat_errors: 0
>>      tx_timeout_count: 0
>>      tx_restart_queue: 15926219
>>      rx_long_length_errors: 298
>>      rx_short_length_errors: 0
>>      tx_flow_control_xon: 0
>>      rx_flow_control_xon: 0
>>      tx_flow_control_xoff: 0
>>      rx_flow_control_xoff: 0
>>      rx_csum_offload_errors: 54173917
>>      alloc_rx_page_failed: 0
>>      alloc_rx_buff_failed: 0
>>      rx_no_dma_resources: 0
>>      tx_queue_0_packets: 68694825
>>      tx_queue_0_bytes: 9443750332
>>      tx_queue_1_packets: 8410961
>>      tx_queue_1_bytes: 2527763233
>>      tx_queue_2_packets: 14411252
>>      tx_queue_2_bytes: 1317132394
>>      tx_queue_3_packets: 15013508147
>>      tx_queue_3_bytes: 17364767277348
>>      tx_queue_4_packets: 62779891
>>      tx_queue_4_bytes: 63476596221
>>      tx_queue_5_packets: 11176001
>>      tx_queue_5_bytes: 2763600253
>>      tx_queue_6_packets: 4416357
>>      tx_queue_6_bytes: 611874984
>>      tx_queue_7_packets: 8933405
>>      tx_queue_7_bytes: 1837198524
>>      tx_queue_8_packets: 13292669
>>      tx_queue_8_bytes: 3241333510
>>      tx_queue_9_packets: 10747236
>>      tx_queue_9_bytes: 1805109931
>>      tx_queue_10_packets: 5795935258380
>>      tx_queue_10_bytes: 4763725304722245
>>      tx_queue_11_packets: 12073934
>>      tx_queue_11_bytes: 2982743045
>>      tx_queue_12_packets: 10523764
>>      tx_queue_12_bytes: 2637451199
>>      tx_queue_13_packets: 12480552
>>      tx_queue_13_bytes: 2434827407
>>      tx_queue_14_packets: 7401777
>>      tx_queue_14_bytes: 2413618099
>>      tx_queue_15_packets: 8269270
>>      tx_queue_15_bytes: 2854359576
>>      rx_queue_0_packets: 361373769507
>>      rx_queue_0_bytes: 298565751248279
>>      rx_queue_1_packets: 369901571908
>>      rx_queue_1_bytes: 303414679798160
>>      rx_queue_2_packets: 362508961738
>>      rx_queue_2_bytes: 299852439447157
>>      rx_queue_3_packets: 363449272013
>>      rx_queue_3_bytes: 299738390792515
>>      rx_queue_4_packets: 361876234461
>>      rx_queue_4_bytes: 297483366939732
>>      rx_queue_5_packets: 361402926316
>>      rx_queue_5_bytes: 297633876486533
>>      rx_queue_6_packets: 362261522767
>>      rx_queue_6_bytes: 298026696344647
>>      rx_queue_7_packets: 361248593301
>>      rx_queue_7_bytes: 296756459279986
>>      rx_queue_8_packets: 361654143416
>>      rx_queue_8_bytes: 298272433659520
>>      rx_queue_9_packets: 362781764710
>>      rx_queue_9_bytes: 298804803191595
>>      rx_queue_10_packets: 361386593064
>>      rx_queue_10_bytes: 297434987797644
>>      rx_queue_11_packets: 369886597895
>>      rx_queue_11_bytes: 302353350171712
>>      rx_queue_12_packets: 361582732276
>>      rx_queue_12_bytes: 298670408005971
>>      rx_queue_13_packets: 365248093536
>>      rx_queue_13_bytes: 302573023878287
>>      rx_queue_14_packets: 366571142073
>>      rx_queue_14_bytes: 302396739276514
>>      rx_queue_15_packets: 362401929830
>>      rx_queue_15_bytes: 299024344526029
>>
>> The problem is with queue 10
>>      tx_queue_10_packets: 5795935258380
>>      tx_queue_10_bytes: 4763725304722245
>>
>> as you can see most of the queue processing is used in queue 10
>> Average difference is 1,854271229903958e-6  - compared to other queues
>>
>> and the problem is that almost all TX packet processing is on one CPU
>> cat /proc/interrupts - in attached file
>>
>> Is this driver or kernel problem ?
>>
>> Kernel is: 2.6.38.2
>>
>> ixgbe driver is:
>> ethtool -i eth4
>> driver: ixgbe
>> version: 3.2.9-k2
>> firmware-version: 1.12-2
>> bus-info: 0000:04:00.0
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


-- 


[-- Attachment #2: pstaszewski.vcf --]
[-- Type: text/x-vcard, Size: 336 bytes --]

begin:vcard
fn;quoted-printable:Pawe=C5=82 Staszewski
n;quoted-printable:Staszewski;Pawe=C5=82
org:ITCare
adr;quoted-printable;quoted-printable;dom:;;Sikorskiego 22;Libi=C4=85=C5=BC;Ma=C5=82opolskie;32-590
title:IT Manager
tel;work:+48 32 7203681
tel;fax:+48 32 7203682
tel;cell:+48 0 609911040
url:www.itcare.pl
version:2.1
end:vcard


^ permalink raw reply

* Re: Problem with ixgbe and TX locked on one cpu
From: Paweł Staszewski @ 2011-10-19  9:21 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: e1000-devel, Linux Network Development list
In-Reply-To: <CAEuXFEy89F_=J_PJMSbQ1fDcOEkkpz2vucq3LcToi7rTcZHcDA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 5738 bytes --]

W dniu 2011-10-18 20:57, Jesse Brandeburg pisze:
> CC: e1000-devel
>
> 2011/10/14 Paweł Staszewski<pstaszewski@itcare.pl>:
>> Hello
>>
>> I have weird problem with ixgbe and irq affinity / rx-tx queue assignment
> what application are you running, how are you using ixgbe? looks like
> a router.  is something changing the skb->rx_queue entry (like
> netfilter) or is there a layered device above ixgbe (bonding or ...) ?
Yes it is a router. - Quagga - BGP

There is few (10rules) iptables rules in INPUT - and nothing more

There are also namespaces running - and traffic is processed in namespaces.
no bonding

> why do your interrupts move after a long period? did you do it by
> hand? we recommend disabling irqbalance and hand tuning interrupts
> possibly with the set_irq_affinity.sh script.
Yes it was changed by hand to check why TX is processed on one interrupt.
Also now it is impossible to move this TX traffic to other interrupt.

All configuration for affinity is made by hand - no irqbalance running.

>> Statistics for my ethernet - ixgbe driver:
>> ethtool -S eth4
>> NIC statistics:
>>      rx_packets: 5815535848808
>>      tx_packets: 5811202378421
>>      rx_bytes: 4791001750842200
>>      tx_bytes: 4781190419358301
>>      rx_pkts_nic: 5815535848827
>>      tx_pkts_nic: 5811202378510
>>      rx_bytes_nic: 4837563124411799
>>      tx_bytes_nic: 4829987507084013
>>      lsc_int: 8
>>      tx_busy: 0
>>      non_eop_descs: 0
>>      rx_errors: 0
>>      tx_errors: 0
>>      rx_dropped: 0
>>      tx_dropped: 0
>>      multicast: 92494273
>>      broadcast: 268718206
>>      rx_no_buffer_count: 28829
>>      collisions: 0
>>      rx_over_errors: 0
>>      rx_crc_errors: 0
>>      rx_frame_errors: 0
>>      hw_rsc_aggregated: 0
>>      hw_rsc_flushed: 0
>>      fdir_match: 0
>>      fdir_miss: 0
>>      rx_fifo_errors: 0
>>      rx_missed_errors: 307051074
>>      tx_aborted_errors: 0
>>      tx_carrier_errors: 0
>>      tx_fifo_errors: 0
>>      tx_heartbeat_errors: 0
>>      tx_timeout_count: 0
>>      tx_restart_queue: 15926219
>>      rx_long_length_errors: 298
>>      rx_short_length_errors: 0
>>      tx_flow_control_xon: 0
>>      rx_flow_control_xon: 0
>>      tx_flow_control_xoff: 0
>>      rx_flow_control_xoff: 0
>>      rx_csum_offload_errors: 54173917
>>      alloc_rx_page_failed: 0
>>      alloc_rx_buff_failed: 0
>>      rx_no_dma_resources: 0
>>      tx_queue_0_packets: 68694825
>>      tx_queue_0_bytes: 9443750332
>>      tx_queue_1_packets: 8410961
>>      tx_queue_1_bytes: 2527763233
>>      tx_queue_2_packets: 14411252
>>      tx_queue_2_bytes: 1317132394
>>      tx_queue_3_packets: 15013508147
>>      tx_queue_3_bytes: 17364767277348
>>      tx_queue_4_packets: 62779891
>>      tx_queue_4_bytes: 63476596221
>>      tx_queue_5_packets: 11176001
>>      tx_queue_5_bytes: 2763600253
>>      tx_queue_6_packets: 4416357
>>      tx_queue_6_bytes: 611874984
>>      tx_queue_7_packets: 8933405
>>      tx_queue_7_bytes: 1837198524
>>      tx_queue_8_packets: 13292669
>>      tx_queue_8_bytes: 3241333510
>>      tx_queue_9_packets: 10747236
>>      tx_queue_9_bytes: 1805109931
>>      tx_queue_10_packets: 5795935258380
>>      tx_queue_10_bytes: 4763725304722245
>>      tx_queue_11_packets: 12073934
>>      tx_queue_11_bytes: 2982743045
>>      tx_queue_12_packets: 10523764
>>      tx_queue_12_bytes: 2637451199
>>      tx_queue_13_packets: 12480552
>>      tx_queue_13_bytes: 2434827407
>>      tx_queue_14_packets: 7401777
>>      tx_queue_14_bytes: 2413618099
>>      tx_queue_15_packets: 8269270
>>      tx_queue_15_bytes: 2854359576
>>      rx_queue_0_packets: 361373769507
>>      rx_queue_0_bytes: 298565751248279
>>      rx_queue_1_packets: 369901571908
>>      rx_queue_1_bytes: 303414679798160
>>      rx_queue_2_packets: 362508961738
>>      rx_queue_2_bytes: 299852439447157
>>      rx_queue_3_packets: 363449272013
>>      rx_queue_3_bytes: 299738390792515
>>      rx_queue_4_packets: 361876234461
>>      rx_queue_4_bytes: 297483366939732
>>      rx_queue_5_packets: 361402926316
>>      rx_queue_5_bytes: 297633876486533
>>      rx_queue_6_packets: 362261522767
>>      rx_queue_6_bytes: 298026696344647
>>      rx_queue_7_packets: 361248593301
>>      rx_queue_7_bytes: 296756459279986
>>      rx_queue_8_packets: 361654143416
>>      rx_queue_8_bytes: 298272433659520
>>      rx_queue_9_packets: 362781764710
>>      rx_queue_9_bytes: 298804803191595
>>      rx_queue_10_packets: 361386593064
>>      rx_queue_10_bytes: 297434987797644
>>      rx_queue_11_packets: 369886597895
>>      rx_queue_11_bytes: 302353350171712
>>      rx_queue_12_packets: 361582732276
>>      rx_queue_12_bytes: 298670408005971
>>      rx_queue_13_packets: 365248093536
>>      rx_queue_13_bytes: 302573023878287
>>      rx_queue_14_packets: 366571142073
>>      rx_queue_14_bytes: 302396739276514
>>      rx_queue_15_packets: 362401929830
>>      rx_queue_15_bytes: 299024344526029
>>
>> The problem is with queue 10
>>      tx_queue_10_packets: 5795935258380
>>      tx_queue_10_bytes: 4763725304722245
>>
>> as you can see most of the queue processing is used in queue 10
>> Average difference is 1,854271229903958e-6  - compared to other queues
>>
>> and the problem is that almost all TX packet processing is on one CPU
>> cat /proc/interrupts - in attached file
>>
>> Is this driver or kernel problem ?
>>
>> Kernel is: 2.6.38.2
>>
>> ixgbe driver is:
>> ethtool -i eth4
>> driver: ixgbe
>> version: 3.2.9-k2
>> firmware-version: 1.12-2
>> bus-info: 0000:04:00.0
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


-- 

[-- Attachment #2: Type: text/plain, Size: 377 bytes --]

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH] iproute2: Conforming to -D_FORTIFY_SOURCE=2 restrictions
From: Bin Li @ 2011-10-19  9:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20111017082307.46a994a8@nehalam.linuxnetplumber.net>

Stephen,

 You can reproduce this issue in 2.6.37 like below. And the previous
gdb log is after the install the debuginfo package in SUSE.

# ip -6 xfrm state add src 3ffe:501:ffff:ff03:21a:64ff:fe12:e4c1 dst
3ffe:501:ffff:ff05:200:ff:fe00:c1c1 proto ah spi 0x1000 mode transport
auth md5 "TAHITEST89ABCDEF"

*** buffer overflow detected ***: ip terminated
======= Backtrace: =========
/lib/libc.so.6(__fortify_fail+0x40)[0xb76d0070]
/lib/libc.so.6(+0xe8e27)[0xb76cde27]
/lib/libc.so.6(+0xe8317)[0xb76cd317]
ip[0x806d6c4]
ip(do_xfrm_state+0x120)[0x806dc70]
ip(do_xfrm+0x81)[0x806ad51]
ip[0x804c355]
ip(main+0x476)[0x804caa6]
/lib/libc.so.6(__libc_start_main+0xfe)[0xb75fbc2e]
ip[0x804c261]
======= Memory map: ========
08048000-08087000 r-xp 00000000 08:01 4465       /sbin/ip
08087000-08088000 r--p 0003e000 08:01 4465       /sbin/ip
08088000-0808a000 rw-p 0003f000 08:01 4465       /sbin/ip
0808a000-080ad000 rw-p 00000000 00:00 0          [heap]
b75c6000-b75e2000 r-xp 00000000 08:01 131084     /lib/libgcc_s.so.1
b75e2000-b75e3000 r--p 0001b000 08:01 131084     /lib/libgcc_s.so.1
b75e3000-b75e4000 rw-p 0001c000 08:01 131084     /lib/libgcc_s.so.1
b75e4000-b75e5000 rw-p 00000000 00:00 0
b75e5000-b774b000 r-xp 00000000 08:01 131375     /lib/libc-2.11.3.so
b774b000-b774c000 ---p 00166000 08:01 131375     /lib/libc-2.11.3.so
b774c000-b774e000 r--p 00166000 08:01 131375     /lib/libc-2.11.3.so
b774e000-b774f000 rw-p 00168000 08:01 131375     /lib/libc-2.11.3.so
b774f000-b7752000 rw-p 00000000 00:00 0
b7752000-b7755000 r-xp 00000000 08:01 131428     /lib/libdl-2.11.3.so
b7755000-b7756000 r--p 00002000 08:01 131428     /lib/libdl-2.11.3.so
b7756000-b7757000 rw-p 00003000 08:01 131428     /lib/libdl-2.11.3.so
b7774000-b7775000 rw-p 00000000 00:00 0
b7775000-b7794000 r-xp 00000000 08:01 154467     /lib/ld-2.11.3.so
b7794000-b7795000 r--p 0001e000 08:01 154467     /lib/ld-2.11.3.so
b7795000-b7796000 rw-p 0001f000 08:01 154467     /lib/ld-2.11.3.so
bfa02000-bfa23000 rw-p 00000000 00:00 0          [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
Aborted

And If without -D_FORTIFY_SOURCE=2 in gcc, it works fine, so It's a
bug in iproute2 which is not conforming to -D_FORTIFY_SOURCE=2
restrictions.

Thanks!

On Mon, Oct 17, 2011 at 11:23 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Mon, 17 Oct 2011 15:35:35 +0800
> Bin Li <libin.charles@gmail.com> wrote:
>
>> (gdb) l
>> 161                     len = slen;
>> 162                     if (len > 0) {
>> 163                             if (len > max)
>> 164                                     invarg("\"ALGOKEY\" makes buffer
>> overflow\n", key);
>> 165
>> 166                             strncpy(buf, key, len);
>> 167                     }
>> 168             }
>> 169
>> 170             alg->alg_key_len = len * 8;
>> (gdb) up
>> #8  xfrm_state_modify (cmd=<optimized out>, flags=<optimized out>, argc=1,
>>     argv=0x7fffffffe370) at xfrm_state.c:406
>> 406                                     xfrm_algo_parse((void *)&alg, type,
>> name, key,
>>
>> the compiler passes zero to __builtin___strncpy_chk as the buffer size.
>> xfrm_algo_parse is inlined into xfrm_state_modify.
>
> I don't understand, looks like a compiler bug. Call strncpy with
> 0 length should not be possible since the check was  3 lines
> before for len > 0.
>

^ permalink raw reply

* Re: [PATCH] Fix guest memory leak and panic
From: Eric Dumazet @ 2011-10-19  9:07 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Krishna Kumar, rusty@rustcorp.com.au, mst@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, davem@davemloft.net
In-Reply-To: <1319014511.3385.72.camel@zakaz.uk.xensource.com>

Le mercredi 19 octobre 2011 à 09:55 +0100, Ian Campbell a écrit :
> On Tue, 2011-10-18 at 14:20 +0100, Eric Dumazet wrote:
> > But I suggest using get_page(pkt_dev->page), this seems more obvious to
> > me [ This was how I wrote the thing ;) ]
> 
> I guess it depends on whether you consider the reference to be on the
> page or to be on the frag (which contains the page). The distinction
> would only matter if pktgen were to transition to using the forthcoming
> destructors though.
> 

Yes, but get_page() is now a clear sign we dirty a cache line, and I am
trying to explain to driver authors this is a contention point they
should address to get best performance from sub-page frags devices.

The hidden __skb_frag_ref() in skb_frag_set_page() was misleading
them ;)

> Here's a version like you describe:
> 
> 8<---------------------------------------------------------------
> 
> From 369022220db31efb9c261cbabcb890a4d216a176 Mon Sep 17 00:00:00 2001
> From: Ian Campbell <ian.campbell@citrix.com>
> Date: Tue, 18 Oct 2011 09:59:37 +0100
> Subject: [PATCH] net: do not take an additional reference in skb_frag_set_page
> 
> I audited all of the callers in the tree and only one of them (pktgen) expects
> it to do so. Taking this reference is pretty obviously confusing and error
> prone.
> 
> In particular I looked at the following commits which switched callers of
> (__)skb_frag_set_page to the skb paged fragment api:
> 
> 6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API.
> 5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API.
> 0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API.
> 86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API.
> 4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API.
> 18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API.
> b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API.
> b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API.
> 804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs
> ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  include/linux/skbuff.h |    1 -
>  net/core/pktgen.c      |    1 +
>  2 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 64f8695..78741da 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1765,7 +1765,6 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag)
>  static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page)
>  {
>  	frag->page = page;
> -	__skb_frag_ref(frag);
>  }
>  
>  /**
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 796044a..e81f5d0 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -2602,6 +2602,7 @@ static void pktgen_finalize_skb(struct pktgen_dev *pkt_dev, struct sk_buff *skb,
>  				if (!pkt_dev->page)
>  					break;
>  			}
> +			get_page(pkt_dev->page);
>  			skb_frag_set_page(skb, i, pkt_dev->page);
>  			skb_shinfo(skb)->frags[i].page_offset = 0;
>  			/*last fragment, fill rest of data*/

Thanks !

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
From: Gao feng @ 2011-10-19  9:07 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20111017121805.GR1830@secunet.com>

2011.10.17 20:18, Steffen Klassert wrote:
> On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
>> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
>>>
>>> Please find out exactly why dst->obsolete is non-zero on a freshly
>>> looked up route.  It's unexpected.
>>
>> Hm, on a slow path route lookup e.g. __mkroute_output() calls
>> rt_dst_alloc() which initializes dst->obsolete to -1.
> 
> Just a follow up:
> git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
> changed the initialization value of dst->obsolete from
> 0 to -1.
> --

Anybody give any suggestion?
Actually,we can't suppose the dst->obsolete is non-zero.
maybe we should use both dst->obsolete and rt->rt_peer_genid to decide whether to do dst_check or not?

^ permalink raw reply

* Re: [PATCH] route:ip_rt_frag_needed always return unzero
From: Gao feng @ 2011-10-19  8:59 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Eric Dumazet, davem, kuznet, jmorris, netdev
In-Reply-To: <20111019083832.GT1830@secunet.com>

2011.10.19 16:38, Steffen Klassert wrote:
> On Wed, Oct 19, 2011 at 04:07:46PM +0800, Gao feng wrote:
>> 2011.10.19 15:26, Steffen Klassert wrote:
>>
>>>
>>> It is valid in the sense that we should not provide the user
>>> with a mtu information if we know that the value we got from
>>> the icmp packet ist bogus. But perhaps we can think about
>>> making the check for a valid mtu unconditionally and let
>>> ip_rt_frag_needed return a valid mtu in any case.
>>>
>>>
>>
>> I think we should return the pmtu in icmp packet to the raw socket,
>> and the valid mtu to tcp_v4_err or something else.
>>
> 
> Why you want to handle raw sockets different here?
> 

Return the pmtu in icmp packet to the raw socket,
let user space program to decide what to do.
BUT this is not important.

^ permalink raw reply

* Re: [PATCH] Fix guest memory leak and panic
From: Ian Campbell @ 2011-10-19  8:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Krishna Kumar, rusty@rustcorp.com.au, mst@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, davem@davemloft.net
In-Reply-To: <1318944002.2657.76.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, 2011-10-18 at 14:20 +0100, Eric Dumazet wrote:
> But I suggest using get_page(pkt_dev->page), this seems more obvious to
> me [ This was how I wrote the thing ;) ]

I guess it depends on whether you consider the reference to be on the
page or to be on the frag (which contains the page). The distinction
would only matter if pktgen were to transition to using the forthcoming
destructors though.

Here's a version like you describe:

8<---------------------------------------------------------------

>From 369022220db31efb9c261cbabcb890a4d216a176 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Tue, 18 Oct 2011 09:59:37 +0100
Subject: [PATCH] net: do not take an additional reference in skb_frag_set_page

I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.

In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:

6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API.
5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API.
0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API.
86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API.
4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API.
18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API.
b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API.
b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API.
804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs
ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 include/linux/skbuff.h |    1 -
 net/core/pktgen.c      |    1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 64f8695..78741da 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1765,7 +1765,6 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag)
 static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page)
 {
 	frag->page = page;
-	__skb_frag_ref(frag);
 }
 
 /**
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 796044a..e81f5d0 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2602,6 +2602,7 @@ static void pktgen_finalize_skb(struct pktgen_dev *pkt_dev, struct sk_buff *skb,
 				if (!pkt_dev->page)
 					break;
 			}
+			get_page(pkt_dev->page);
 			skb_frag_set_page(skb, i, pkt_dev->page);
 			skb_shinfo(skb)->frags[i].page_offset = 0;
 			/*last fragment, fill rest of data*/
-- 
1.7.2.5

^ permalink raw reply related

* Re: kenel level packet capturing
From: Ajith Adapa @ 2011-10-19  8:52 UTC (permalink / raw)
  To: David Miller; +Cc: raviraj.j1991, netdev, netfilter-devel, netfilter
In-Reply-To: <20111019.031314.1383303852454510720.davem@davemloft.net>

Yeah you are right David. But ..

For a newbie who just know more about linux userspace level need to
know starting steps like where to start from and flow of packet might
be confusing because of the level of abstraction being used. I guess
it might take some time until you are really serious about making
hands dirty :)

Regards,
Ajith



On Wed, Oct 19, 2011 at 12:43 PM, David Miller <davem@davemloft.net> wrote:
>
> From: raviraj joshi <raviraj.j1991@gmail.com>
> Date: Wed, 19 Oct 2011 12:24:01 +0530
>
> > We are finding it difficult to understand kenel networking code.
>
> That's surprising, because the Linux networking stack is the cleanest
> real world networking implementation on the planet.
>
> Perhaps you should think about re-evaluating your project choice?
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] igb: fix a compile warning
From: roy.qing.li @ 2011-10-19  8:52 UTC (permalink / raw)
  To: netdev

From: RongQing Li <roy.qing.li@gmail.com>

control these three function declarations and
definitions with same macro CONFIG_PCI_IOV

drivers/net/ethernet/intel/igb/igb_main.c:165:
warning: ‘igb_vf_configure’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:166:
warning: ‘igb_find_enabled_vfs’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:167:
warning: ‘igb_check_vf_assignment’ declared ‘static’ but never defined

Signed-off-by: RongQing Li <roy.qing.li@gmail.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 837adbb..5d16cdc 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -162,9 +162,12 @@ static int igb_ndo_set_vf_bw(struct net_device *netdev, int vf, int tx_rate);
 static int igb_ndo_get_vf_config(struct net_device *netdev, int vf,
 				 struct ifla_vf_info *ivi);
 static void igb_check_vf_rate_limit(struct igb_adapter *);
+
+#ifdef CONFIG_PCI_IOV
 static int igb_vf_configure(struct igb_adapter *adapter, int vf);
 static int igb_find_enabled_vfs(struct igb_adapter *adapter);
 static int igb_check_vf_assignment(struct igb_adapter *adapter);
+#endif
 
 #ifdef CONFIG_PM
 static int igb_suspend(struct pci_dev *, pm_message_t);
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] route:ip_rt_frag_needed always return unzero
From: Steffen Klassert @ 2011-10-19  8:38 UTC (permalink / raw)
  To: Gao feng; +Cc: Eric Dumazet, davem, kuznet, jmorris, netdev
In-Reply-To: <4E9E8552.1050405@cn.fujitsu.com>

On Wed, Oct 19, 2011 at 04:07:46PM +0800, Gao feng wrote:
> 2011.10.19 15:26, Steffen Klassert wrote:
> 
> > 
> > It is valid in the sense that we should not provide the user
> > with a mtu information if we know that the value we got from
> > the icmp packet ist bogus. But perhaps we can think about
> > making the check for a valid mtu unconditionally and let
> > ip_rt_frag_needed return a valid mtu in any case.
> > 
> > 
> 
> I think we should return the pmtu in icmp packet to the raw socket,
> and the valid mtu to tcp_v4_err or something else.
> 

Why you want to handle raw sockets different here?

^ permalink raw reply

* [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close
From: Mitsuo Hayasaka @ 2011-10-19  8:17 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek
  Cc: netdev, linux-kernel, yrl.pp-manager.tt, Mitsuo Hayasaka,
	Jay Vosburgh, Andy Gospodarek, WANG Cong

The bond_close() calls cancel_delayed_work() to cancel delayed works.
It, however, cannot cancel works that were already queued in workqueue.
The bond_open() initializes work->data, and proccess_one_work() refers
get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
work->data has been initialized. Thus, a panic occurs.

This patch uses flush_delayed_work_sync() instead of cancel_delayed_work()
in bond_close(). It cancels delayed timer and waits for work to finish
execution. So, it can avoid the null pointer dereference due to the
parallel executions of proccess_one_work() and initializing proccess
of bond_open().

Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
---

 drivers/net/bonding/bond_main.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index de3d351..a4353f9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3504,27 +3504,27 @@ static int bond_close(struct net_device *bond_dev)
 	write_unlock_bh(&bond->lock);
 
 	if (bond->params.miimon) {  /* link check interval, in milliseconds. */
-		cancel_delayed_work(&bond->mii_work);
+		flush_delayed_work_sync(&bond->mii_work);
 	}
 
 	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
-		cancel_delayed_work(&bond->arp_work);
+		flush_delayed_work_sync(&bond->arp_work);
 	}
 
 	switch (bond->params.mode) {
 	case BOND_MODE_8023AD:
-		cancel_delayed_work(&bond->ad_work);
+		flush_delayed_work_sync(&bond->ad_work);
 		break;
 	case BOND_MODE_TLB:
 	case BOND_MODE_ALB:
-		cancel_delayed_work(&bond->alb_work);
+		flush_delayed_work_sync(&bond->alb_work);
 		break;
 	default:
 		break;
 	}
 
 	if (delayed_work_pending(&bond->mcast_work))
-		cancel_delayed_work(&bond->mcast_work);
+		flush_delayed_work_sync(&bond->mcast_work);
 
 	if (bond_is_lb(bond)) {
 		/* Must be called only after all

^ permalink raw reply related

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Herbert Xu @ 2011-10-19  8:08 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras
In-Reply-To: <20111019.033052.851679720607111734.davem@davemloft.net>

On Wed, Oct 19, 2011 at 03:30:52AM -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 19 Oct 2011 09:18:33 +0200
> 
> > Seems fine (Maybe do the +15 in caller site ?), but we also have other
> > problematic cases, using alloc_skb() only...
> 
> Ok, which ones operate over these problematic GRE tunnels?

Potentially all of them since we now support Ethernet-over-GRE.

I think Eric's initial patch is probably the safest bet for rc10.
We can then work on the proper fix for the next release.

As to the latter, I've just done a grep over net and it seems that
all users of LL_ALLOCATED_SPACE fall into two cases, alloc_skb users
or sock_alloc_send_skb (including pskb) users.

For alloc_skb we could add a new helper.  While for the other
case we could either create a new helper or just add an extra
dev argument that may be NULL for those that don't care about
LL_ALLOCATED_SPACE.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] route:ip_rt_frag_needed always return unzero
From: Gao feng @ 2011-10-19  8:07 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Eric Dumazet, davem, kuznet, jmorris, netdev
In-Reply-To: <20111019072628.GS1830@secunet.com>

2011.10.19 15:26, Steffen Klassert wrote:

> 
> On udp and raw sockets, the user is responsible to adjust the packet
> size according to the mtu value he may find in the socket's error queue.
> So we shoud provide the user with this information, even in the unlikely
> case where we could not create an inet_peer.
> 

Agree.Maybe we should modify mtu immediately when create inet_peer failed

> 
> It is valid in the sense that we should not provide the user
> with a mtu information if we know that the value we got from
> the icmp packet ist bogus. But perhaps we can think about
> making the check for a valid mtu unconditionally and let
> ip_rt_frag_needed return a valid mtu in any case.
> 
> 

I think we should return the pmtu in icmp packet to the raw socket,
and the valid mtu to tcp_v4_err or something else.

thanks Steffen.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox