Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] ipt_REJECT: can't work with bridges
From: Patrick McHardy @ 2010-07-16 12:16 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, netfilter-devel, netdev
In-Reply-To: <1279251163-2887-1-git-send-email-xiaosuo@gmail.com>

Am 16.07.2010 05:32, schrieb Changli Gao:
> ipt_REJECT: can't work with bridges
> 
> ip_route_me_harder can't create the route cache when the outdev is the same with
> the indev for the skbs whichout a valid protocol set.
> 
> __mkroute_input functions has this check:
> 1998         if (skb->protocol != htons(ETH_P_IP)) {
> 1999                 /* Not IP (i.e. ARP). Do not create route, if it is
> 2000                  * invalid for proxy arp. DNAT routes are always valid.
> 2001                  *
> 2002                  * Proxy arp feature have been extended to allow, ARP
> 2003                  * replies back to the same interface, to support
> 2004                  * Private VLAN switch technologies. See arp.c.
> 2005                  */
> 2006                 if (out_dev == in_dev &&
> 2007                     IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) {
> 2008                         err = -EINVAL;
> 2009                         goto cleanup;
> 2010                 }
> 2011         }
> 
> This patch gives the new skb a valid protocol to bypass this check. In order to
> make ipt_REJECT work with bridges, you also need to enable ip_forward.

This actually also fixes a regression, back when we used skb_copy_expand
of the old skb, the protocol was properly set, which not only affects
bridges, but also TC classification and packet sockets.

> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/ipv4/netfilter/ipt_REJECT.c |    1 +
>  1 file changed, 1 insertion(+)
> diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c
> index bbbd273..e0bca39 100644
> --- a/net/ipv4/netfilter/ipt_REJECT.c
> +++ b/net/ipv4/netfilter/ipt_REJECT.c
> @@ -111,6 +111,7 @@ static void send_reset(struct sk_buff *oldskb, int hook)
>  	/* ip_route_me_harder expects skb->dst to be set */
>  	skb_dst_set_noref(nskb, skb_dst(oldskb));
>  
> +	nskb->protocol = __constant_htons(ETH_P_IP);

You don't need __constant_htons here however, the compiler is smart
enough to this itself. Please resend and add a note to the changelog
stating that this fixes a regression and I'll toss it into
nf-2.6.git.

Thanks!

>  	if (ip_route_me_harder(nskb, addr_type))
>  		goto free_nskb;
>  
> 


^ permalink raw reply

* Re: [PATCHv4] netfilter: add CHECKSUM target
From: Patrick McHardy @ 2010-07-16 12:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daniel P. Berrange, Jes Sorensen, David S. Miller, Jan Engelhardt,
	Randy Dunlap, netfilter-devel, netfilter, coreteam, linux-kernel,
	netdev
In-Reply-To: <20100715174216.GA16216@redhat.com>

Am 15.07.2010 19:42, schrieb Michael S. Tsirkin:
> netfilter: correct CHECKSUM header and export it
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15
From: Ilpo Järvinen @ 2010-07-16 12:02 UTC (permalink / raw)
  To: Lennart Schulte
  Cc: Eric Dumazet, Tejun Heo, David S. Miller, lkml,
	netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert
In-Reply-To: <4C3F053F.7090704@nets.rwth-aachen.de>

On Thu, 15 Jul 2010, Lennart Schulte wrote:

> Since tcp_xmit_retransmit_queue also gets skb == NULL I'm pretty sure it is
> the same bug.
> Up to now I only experienced the problem with ACK loss (without ACK loss the
> test ran about 30min without problems, with ACK loss it had paniced within
> 10min).
> The data sender only has a HTB queue for traffic shaping (set to 20 Mbit/s).
> The ACK loss is done by another router.
> The setup looks like this. This way it seems to be the most realistic.
> 
> o sender with HTB
> |
> |
> o netem queue for forward path delay
> |
> o netem queue for a queue limit
> |
> o netem queue for backward path delay
> |
> o netem queue for ACK loss
> |
> |
> o receiver with HTB
> 
> Perhaps now it is a little big clearer.

> > > [ 2754.413150] NULL head, pkts 0
> > > [ 2754.413156] Errors caught so far 1

Thanks for reporting the results.

Could you post the oops too or double check do the timestamps really match 
(and there wasn't more "Errors caught" prints in between)? Since this 
condition doesn't seem to crash the kernel as also send_head should be 
NULL, which saves the day here exiting the loop (unless send head would 
too be corrupt). ...However, I don't like too much anyway that we can end 
up into tcp_xmit_retransmit_queue loop with packets_out being zero and 
only send_head check side-effect causes proper action.

Besides, Tejun has also found that it's hint->next ptr which is NULL in 
his case so this won't solve his case anyway. Tejun, can you confirm 
whether it was retransmit_skb_hint->next being NULL on _entry time_ to 
tcp_xmit_retransmit_queue() or later on in the loop after the updates done 
by the loop itself to the hint (or that your testing didn't conclude 
either)?

-- 
 i.

^ permalink raw reply

* Re: [PATCH 4/4] phylib: Allow reading and writing a mii bus from atomic context.
From: Richard Cochran @ 2010-07-16 11:49 UTC (permalink / raw)
  To: Andy Fleming; +Cc: netdev
In-Reply-To: <20100716112544.GA2996@riccoc20.at.omicron.at>

On Thu, Jul 08, 2010 at 03:40:41PM -0500, Andy Fleming wrote:
> 
> Wait, is the intent for this mdio read to be done for *every* packet?
> MDIO is spec'ed out to go up to 2.5MHz.  Each transaction takes 64
> cycles.  And I see that reading the timestamp from the PHY you
> submitted support for takes at *least* 2 transactions.  The fastest
> you can process packets would then be under 20,000 packets per second.
> Even on a 100Mb link with full-sized packets, you would be 40% done
> receiving the next packet before you had passed the packet on to the
> stack.  Each MDIO transaction takes 25,600 cycles on a gigahertz
> processor.  It's just too long, IMO, and it looks like this code will
> end up doing up to...10?

Well, it is actually worse than that. From my measurements, 64 cycles
at 2.5 MHz (25.6 usec) is too optimistic. On one Xscale IXP platform,
I get 35 to 40 usec per read. Also, the PHYTER needs six reads for a
Rx and four for a Tx time stamp.

So you are right, it is a very long time to leave interrupts off.

However, not every packet needs a time stamp. The PHY devices that I
know of all are selective. That is, they recognize PTP packets in
hardware and only provide time stamps for those special packets. The
packet rate is not that hight. For example, the "sync" message comes
once every two seconds in PTPv1 and up to ten times per second in
PTPv2.

I have been working on an alternative, which I will post soon. It goes
like this... (comments, please, before I get too far into it!)

* General
   - phy registers itself with net_device { opaque pointer }

   - ts_candidate(skb):
     function to detect likely packets, returns true if
       1. phy_device pointer exists in net_device
       2. data look like PTP, using a BPF

   - work queue:
     1. read out time stamps
     2. match times with queued skbs
     3. deliver skbs with time stamps (or without on timeout)

* Rx path
   - netif_receive_skb:
     if ts_candidate(skb), defer(skb).

* Tx path
   - hook in each MAC driver
   - if ts_candidate(skb), clone and defer(skb).

> I wasn't able to find an example of how you were going to use the
> time-stamp functions you provided.  Could you please go into a little
> more detail about how you intended this to work?

My last PTP series included the PHY stuff, too. There you can see how
it all fits together.

   http://marc.info/?l=linux-netdev&m=127661796627120&w=3

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH 4/4] phylib: Allow reading and writing a mii bus from atomic context.
From: Richard Cochran @ 2010-07-16 11:25 UTC (permalink / raw)
  To: Andy Fleming; +Cc: netdev
In-Reply-To: <20100707071832.GA2816@riccoc20.at.omicron.at>

On Wed, Jul 07, 2010 at 09:18:32AM +0200, Richard Cochran wrote:
> Consider the receive path:
> 
> 1. PTP Packet passed through PHY. PHY recognizes it and stores a time
>    stamp along with some UID from the packet.
> 
> 2. Napi calls the MAC driver's poll function.
> 
> 3. MAC driver acquires packet. At this point, if we want to have a
>    hardware time stamp, we must read it out over the mdio bus, before
>    handing the packet over to the stack via netif_receive_skb().
> 
> If we decide to defer the packet delivery (in step 3), we have to know
> whether the PHY will have a time stamp for this packet. The only way
> to do this is to compare the UIDs, but that requires reading it over
> the mdio bus.

(Forwarding Andy's message to me that was missing a CC to the list)

On Thu, Jul 08, 2010 at 03:40:41PM -0500, Andy Fleming wrote:

Wait, is the intent for this mdio read to be done for *every* packet?
MDIO is spec'ed out to go up to 2.5MHz.  Each transaction takes 64
cycles.  And I see that reading the timestamp from the PHY you
submitted support for takes at *least* 2 transactions.  The fastest
you can process packets would then be under 20,000 packets per second.
Even on a 100Mb link with full-sized packets, you would be 40% done
receiving the next packet before you had passed the packet on to the
stack.  Each MDIO transaction takes 25,600 cycles on a gigahertz
processor.  It's just too long, IMO, and it looks like this code will
end up doing up to...10?

I wasn't able to find an example of how you were going to use the
time-stamp functions you provided.  Could you please go into a little
more detail about how you intended this to work?

^ permalink raw reply

* [PATCH] IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input())
From: Arnaud Ebalard @ 2010-07-16 10:38 UTC (permalink / raw)
  To: David S. Miller, Hideaki YOSHIFUJI; +Cc: netdev

Hi,

The input handler for Type 2 Routing Header (mip6_rthdr_input())
checks if the CoA in the packet matches the CoA in the XFRM state.

Current check is buggy: it compares the adddress in the Type 2
Routing Header, i.e. the HoA, against the expected CoA in the state.
The comparison should be made against the address in the destination
field of the IPv6 header.

The bug remained unnoticed because the main (and possibly only current)
user of the code (UMIP MIPv6 Daemon) initializes the XFRM state with the
unspecified address, i.e. explicitly allows everything.

Yoshifuji-san, can you ack that one?

Cheers,

a+

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 net/ipv6/mip6.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
index 2794b60..d6e9599 100644
--- a/net/ipv6/mip6.c
+++ b/net/ipv6/mip6.c
@@ -347,11 +347,12 @@ static const struct xfrm_type mip6_destopt_type =

 static int mip6_rthdr_input(struct xfrm_state *x, struct sk_buff *skb)
 {
+	struct ipv6hdr *iph = ipv6_hdr(skb);
 	struct rt2_hdr *rt2 = (struct rt2_hdr *)skb->data;
 	int err = rt2->rt_hdr.nexthdr;

 	spin_lock(&x->lock);
-	if (!ipv6_addr_equal(&rt2->addr, (struct in6_addr *)x->coaddr) &&
+	if (!ipv6_addr_equal(&iph->daddr, (struct in6_addr *)x->coaddr) &&
 	    !ipv6_addr_any((struct in6_addr *)x->coaddr))
 		err = -ENOENT;
 	spin_unlock(&x->lock);
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 01/11] Removing dead RT2800PCI_SOC
From: Helmut Schaa @ 2010-07-16 10:08 UTC (permalink / raw)
  To: Gertjan van Wingerde
  Cc: Bartlomiej Zolnierkiewicz, Felix Fietkau, John W. Linville,
	Ivo Van Doorn, Christoph Egger, linux-wireless, users, netdev,
	linux-kernel, vamos-dev, Luis Correia
In-Reply-To: <4C4007CD.2070504@gmail.com>

On Fri, Jul 16, 2010 at 9:18 AM, Gertjan van Wingerde
<gwingerde@gmail.com> wrote:
>
> On 07/16/10 08:57, Helmut Schaa wrote:
> > On Thu, Jul 15, 2010 at 10:41 AM, Bartlomiej Zolnierkiewicz <bzolnier@gmail.com <mailto:bzolnier@gmail.com>> wrote:
> >
> >     On Wednesday 14 July 2010 04:44:44 pm Felix Fietkau wrote:
> >     > On 2010-07-14 3:15 PM, John W. Linville wrote:
> >     > > On Wed, Jul 14, 2010 at 02:52:14PM +0200, Ivo Van Doorn wrote:
> >     > >> On Wed, Jul 14, 2010 at 2:46 PM, Luis Correia <luis.f.correia@gmail.com <mailto:luis.f.correia@gmail.com>> wrote:
> >     > >> > On Wed, Jul 14, 2010 at 13:39, Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>> wrote:
> >     > >> >> While RT2800PCI_SOC exists in Kconfig, it depends on either
> >     > >> >> RALINK_RT288X or RALINK_RT305X which are both not available in Kconfig
> >     > >> >> so all Code depending on that can't ever be selected and, if there's
> >     > >> >> no plan to add these options, should be cleaned up
> >     > >> >>
> >     > >> >> Signed-off-by: Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>>
> >     > >> >
> >     > >> > NAK,
> >     > >> >
> >     > >> > this is not dead code, it is needed for the Ralink System-on-Chip
> >     > >> > Platform devices.
> >     > >> >
> >     > >> > While I can't fix Kconfig errors and the current KConfig file may be
> >     > >> > wrong, this code cannot and will not be deleted.
> >     > >>
> >     > >> When the config option was introduced, the config options RALINK_RT288X and
> >     > >> RALINK_RT305X were supposed to be merged as well soon after by somebody (Felix?)
> >     > >>
> >     > >> But since testing is done on SoC boards by Helmut and Felix, I assume the code
> >     > >> isn't dead but actually in use.
> >     > >
> >     > > Perhaps Helmut and Felix can send us the missing code?
> >     > The missing code is a MIPS platform port, which is currently being
> >     > maintained in OpenWrt, but is not ready for upstream submission yet.
> >     > I'm not working on this code at the moment, but I think it will be
> >     > submitted once it's ready.
> >
> >     People are using automatic scripts to catch unused config options nowadays
> >     so the issue is quite likely to come back again sooner or later..
> >
> >     Would it be possible to improve situation somehow till the missing parts
> >     get merged?  Maybe by adding a tiny comment documenting RT2800PCI_SOC
> >     situation to Kconfig (if the config option itself really cannot be removed)
> >     until all code is ready etc.?
> >
> >
> > Or we could just remove RT2800PCI_SOC completely and build the soc specific
> > parts always as part of rt2800pci. I mean it's not much code, just the platform
> > driver stuff and the eeprom access.
> >
>
> I'm not sure if that is feasible. Sure, we can reduce the usage of the variable by
> unconditionally compiling in the generic SOC code, but we should not unconditionally
> register the SOC platform device, which is currently also under the scope of this
> Kconfig variable.

Ehm, no, the platform device is not registered in rt2800pci at all,
it's just the platform
driver that gets registered there. The platform device will be
registered in the according
board init code (that only resides in openwrt at the moment).

Helmut

^ permalink raw reply

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
From: Eric Dumazet @ 2010-07-16  9:56 UTC (permalink / raw)
  To: divya
  Cc: LKML, linuxppc-dev, sachinp, benh, netdev, David Miller,
	Jan-Bernd Themann
In-Reply-To: <4C401D56.3070108@linux.vnet.ibm.com>

Le vendredi 16 juillet 2010 à 14:20 +0530, divya a écrit :
> Hi ,
> 
> With the latest kernel version 2.6.35-rc5-git1(2f7989efd4398) running on power(p6) box came across the following
> call trace
> 
> Call Trace:
> [c000000006a0e800] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> [c000000006a0e8b0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> [c000000006a0ea30] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> [c000000006a0ead0] [c00000000015b1a0] .new_slab+0xe0/0x314
> [c000000006a0eb70] [c00000000015b6fc] .__slab_alloc+0x328/0x644
> [c000000006a0ec50] [c00000000015cc34] .__kmalloc_node_track_caller+0x114/0x194
> [c000000006a0ed00] [c000000000599f6c] .__alloc_skb+0x94/0x180
> [c000000006a0edb0] [c00000000059af5c] .__netdev_alloc_skb+0x3c/0x74
> [c000000006a0ee30] [c0000000004f9480] .ehea_refill_rq_def+0xf8/0x2d0
> [c000000006a0ef30] [c0000000004fab8c] .ehea_up+0x5b8/0x69c
> [c000000006a0f040] [c0000000004facd4] .ehea_open+0x64/0x118
> [c000000006a0f0e0] [c0000000005a6e9c] .__dev_open+0x100/0x168
> [c000000006a0f170] [c0000000005a3ac0] .__dev_change_flags+0x10c/0x1ac
> [c000000006a0f210] [c0000000005a6d44] .dev_change_flags+0x24/0x7c
> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> CPU    2: hi:    0, btch:   1 usd:   0
> CPU    3: hi:    0, btch:   1 usd:   0
> active_anon:50 inactive_anon:260 isolated_anon:0
>   active_file:159 inactive_file:139 isolated_file:0
>   unevictable:0 dirty:2 writeback:1 unstable:0
>   free:16 slab_reclaimable:66 slab_unreclaimable:502
>   mapped:120 shmem:2 pagetables:37 bounce:0
> Node 0 DMA free:1024kB min:1408kB low:1728kB high:2112kB active_anon:3200kB inactive_anon:16640kB active_file:10176kB inactive_file:8896kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130944kB mlocked:0kB dirty:128kB writeback:64kB mapped:7680kB shmem:128kB slab_reclaimable:4224kB slab_unreclaimable:32128kB kernel_stack:2528kB pagetables:2368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> Node 0 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
> 496 total pagecache pages
> 178 pages in swap cache
> Swap cache stats: add 780, delete 602, find 467/551
> Free swap  = 1027904kB
> Total swap = 1044160kB
> 2048 pages RAM
> 683 pages reserved
> 582 pages shared
> 1075 pages non-shared
> SLUB: Unable to allocate memory on node -1 (gfp=0x20)
>    cache: kmalloc-16384, object size: 16384, buffer size: 16384, default order: 2, min order: 0
>    node 0: slabs: 28, objs: 292, free: 0
> ip: page allocation failure. order:0, mode:0x8020
> Call Trace:
> [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
> [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
> [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
> [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
> [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
> [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> CPU    2: hi:    0, btch:   1 usd:   0
> CPU    3: hi:    0, btch:   1 usd:   0
> 
> The mainline 2.6.35-rc5 worked fine.

Maybe you were lucky with 2.6.35-rc5

Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
called in process context, but GFP_KERNEL.

Another patch is needed for ehea_refill_rq_def() as well.



[PATCH] ehea: ehea_get_stats() should use GFP_KERNEL

ehea_get_stats() is called in process context and should use GFP_KERNEL
allocation instead of GFP_ATOMIC.

Clearing stats at beginning of ehea_get_stats() is racy in case of
concurrent stat readers.

get_stats() can also use netdev net_device_stats, instead of a private
copy.

Reported-by: divya <dipraksh@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/ehea/ehea.h      |    1 -
 drivers/net/ehea/ehea_main.c |    6 ++----
 2 files changed, 2 insertions(+), 5 deletions(-)

^ permalink raw reply

* Re: Raise initial congestion window size / speedup slow  start?
From: Hagen Paul Pfeifer @ 2010-07-16  9:03 UTC (permalink / raw)
  To: Bill Fink
  Cc: David Miller, rick.jones2, lists, davidsen, linux-kernel, netdev
In-Reply-To: <20100714234917.924f420d.billfink@mindspring.com>

On Wed, 14 Jul 2010 23:49:17 -0400, Bill Fink wrote:

> A long, long time ago, I suggested a Path BW Discovery mechanism

> to the IETF, analogous to the Path MTU Discovery mechanism, but

> it didn't get any traction.  Such information could be extremely

> useful to TCP endpoints, to determine a maximum window size to

> use, to effectively rate limit a much stronger sender from

> overpowering a much weaker receiver (for example 10-GigE -> GigE),

> resulting in abominable performance across large RTT paths

> (as low as 12 Mbps), even in the absence of any real network

> contention.

Much weaker middlebox? The windowing mechanism should be sufficient to

avoid endpoints from over-commiting.

Anyway, your proposed draft (I didn't searched for it) sound like a

mechanism similar to RFC 4782: Quick-Start for TCP and IP.

   This document specifies an optional Quick-Start mechanism for

   transport protocols, in cooperation with routers, to determine an

   allowed sending rate at the start and, at times, in the middle of a

   data transfer (e.g., after an idle period).  While Quick-Start is

   designed to be used by a range of transport protocols, in this

   document we only specify its use with TCP.  Quick-Start is designed

   to allow connections to use higher sending rates when there is

   significant unused bandwidth along the path, and the sender and all

   of the routers along the path approve the Quick-Start Request.

Cheers, Hagen

^ permalink raw reply

* [PATCH v2 1/2] bnx2: allocate with GFP_KERNEL flag on RX path init
From: Stanislaw Gruszka @ 2010-07-16  8:55 UTC (permalink / raw)
  To: netdev; +Cc: Michael Chan
In-Reply-To: <C27F8246C663564A84BB7AB3439772421B842BD26F@IRVEXCHCCR01.corp.ad.broadcom.com>

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
v1->v2: use GFP_ATOMIC in bnx2_rx_skb

 drivers/net/bnx2.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index a203f39..a7df539 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2664,13 +2664,13 @@ bnx2_set_mac_addr(struct bnx2 *bp, u8 *mac_addr, u32 pos)
 }
 
 static inline int
-bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
+bnx2_alloc_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
 {
 	dma_addr_t mapping;
 	struct sw_pg *rx_pg = &rxr->rx_pg_ring[index];
 	struct rx_bd *rxbd =
 		&rxr->rx_pg_desc_ring[RX_RING(index)][RX_IDX(index)];
-	struct page *page = alloc_page(GFP_ATOMIC);
+	struct page *page = alloc_page(gfp);
 
 	if (!page)
 		return -ENOMEM;
@@ -2705,7 +2705,7 @@ bnx2_free_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
 }
 
 static inline int
-bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
+bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	struct sw_bd *rx_buf = &rxr->rx_buf_ring[index];
@@ -2713,7 +2713,7 @@ bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
 	struct rx_bd *rxbd = &rxr->rx_desc_ring[RX_RING(index)][RX_IDX(index)];
 	unsigned long align;
 
-	skb = netdev_alloc_skb(bp->dev, bp->rx_buf_size);
+	skb = __netdev_alloc_skb(bp->dev, bp->rx_buf_size, gfp);
 	if (skb == NULL) {
 		return -ENOMEM;
 	}
@@ -2974,7 +2974,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 	int err;
 	u16 prod = ring_idx & 0xffff;
 
-	err = bnx2_alloc_rx_skb(bp, rxr, prod);
+	err = bnx2_alloc_rx_skb(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
 		bnx2_reuse_rx_skb(bp, rxr, skb, (u16) (ring_idx >> 16), prod);
 		if (hdr_len) {
@@ -3039,7 +3039,8 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 			rx_pg->page = NULL;
 
 			err = bnx2_alloc_rx_page(bp, rxr,
-						 RX_PG_RING_IDX(pg_prod));
+						 RX_PG_RING_IDX(pg_prod),
+						 GFP_ATOMIC);
 			if (unlikely(err)) {
 				rxr->rx_pg_cons = pg_cons;
 				rxr->rx_pg_prod = pg_prod;
@@ -5179,7 +5180,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
 
 	ring_prod = prod = rxr->rx_pg_prod;
 	for (i = 0; i < bp->rx_pg_ring_size; i++) {
-		if (bnx2_alloc_rx_page(bp, rxr, ring_prod) < 0) {
+		if (bnx2_alloc_rx_page(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
 			netdev_warn(bp->dev, "init'ed rx page ring %d with %d/%d pages only\n",
 				    ring_num, i, bp->rx_pg_ring_size);
 			break;
@@ -5191,7 +5192,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
 
 	ring_prod = prod = rxr->rx_prod;
 	for (i = 0; i < bp->rx_ring_size; i++) {
-		if (bnx2_alloc_rx_skb(bp, rxr, ring_prod) < 0) {
+		if (bnx2_alloc_rx_skb(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
 			netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d skbs only\n",
 				    ring_num, i, bp->rx_ring_size);
 			break;
-- 
1.7.1


^ permalink raw reply related

* [rfc] ipvs: Add ip_vs_onepacket_enabled()
From: Simon Horman @ 2010-07-16  8:30 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Nick Chalk, Wensong Zhang, Julian Anastasov, lvs-devel, netdev

Add ip_vs_onepacket_enabled() instead of open-coding the logic three times.

Signed-off-by: Simon Horman <horms@verge.net.au>

--- 

Compile tested only.

Extracted from a patch by Thomas Graf <tgraf@infradead.org>.
Thomas, feel free to make this patch from you and add a signed-off-by line.

Index: nf-next-2.6/net/netfilter/ipvs/ip_vs_core.c
===================================================================
--- nf-next-2.6.orig/net/netfilter/ipvs/ip_vs_core.c	2010-07-16 16:44:49.000000000 +0900
+++ nf-next-2.6/net/netfilter/ipvs/ip_vs_core.c	2010-07-16 16:45:20.000000000 +0900
@@ -165,7 +165,6 @@ ip_vs_conn_stats(struct ip_vs_conn *cp,
 	spin_unlock(&ip_vs_stats.lock);
 }
 
-
 static inline int
 ip_vs_set_state(struct ip_vs_conn *cp, int direction,
 		const struct sk_buff *skb,
@@ -176,6 +175,12 @@ ip_vs_set_state(struct ip_vs_conn *cp, i
 	return pp->state_transition(cp, direction, skb, pp);
 }
 
+static inline __u16
+ip_vs_onepacket_enabled(struct ip_vs_service *svc, struct ip_vs_iphdr iph)
+{
+	return (svc->flags & IP_VS_SVC_F_ONEPACKET &&
+		iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0;
+}
 
 /*
  *  IPVS persistent scheduling function
@@ -194,7 +199,6 @@ ip_vs_sched_persist(struct ip_vs_service
 	struct ip_vs_dest *dest;
 	struct ip_vs_conn *ct;
 	__be16  dport;			/* destination port to forward */
-	__be16  flags;
 	union nf_inet_addr snet;	/* source network of the client,
 					   after masking */
 
@@ -341,10 +345,6 @@ ip_vs_sched_persist(struct ip_vs_service
 		dport = ports[1];
 	}
 
-	flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
-		 && iph.protocol == IPPROTO_UDP)?
-		IP_VS_CONN_F_ONE_PACKET : 0;
-
 	/*
 	 *    Create a new connection according to the template
 	 */
@@ -352,7 +352,7 @@ ip_vs_sched_persist(struct ip_vs_service
 			    &iph.saddr, ports[0],
 			    &iph.daddr, ports[1],
 			    &dest->addr, dport,
-			    flags,
+			    ip_vs_onepacket_enabled(svc, &iph),
 			    dest);
 	if (cp == NULL) {
 		ip_vs_conn_put(ct);
@@ -382,7 +382,7 @@ ip_vs_schedule(struct ip_vs_service *svc
 	struct ip_vs_conn *cp = NULL;
 	struct ip_vs_iphdr iph;
 	struct ip_vs_dest *dest;
-	__be16 _ports[2], *pptr, flags;
+	__be16 _ports[2], *pptr;
 
 	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
 	pptr = skb_header_pointer(skb, iph.len, sizeof(_ports), _ports);
@@ -412,10 +412,6 @@ ip_vs_schedule(struct ip_vs_service *svc
 		return NULL;
 	}
 
-	flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
-		 && iph.protocol == IPPROTO_UDP)?
-		IP_VS_CONN_F_ONE_PACKET : 0;
-
 	/*
 	 *    Create a connection entry.
 	 */
@@ -423,7 +419,7 @@ ip_vs_schedule(struct ip_vs_service *svc
 			    &iph.saddr, pptr[0],
 			    &iph.daddr, pptr[1],
 			    &dest->addr, dest->port ? dest->port : pptr[1],
-			    flags,
+			    ip_vs_onepacket_enabled(svc, &iph),
 			    dest);
 	if (cp == NULL)
 		return NULL;
@@ -473,9 +469,6 @@ int ip_vs_leave(struct ip_vs_service *sv
 	if (sysctl_ip_vs_cache_bypass && svc->fwmark && unicast) {
 		int ret, cs;
 		struct ip_vs_conn *cp;
-		__u16 flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
-				iph.protocol == IPPROTO_UDP)?
-				IP_VS_CONN_F_ONE_PACKET : 0;
 		union nf_inet_addr daddr =  { .all = { 0, 0, 0, 0 } };
 
 		ip_vs_service_put(svc);
@@ -486,7 +479,8 @@ int ip_vs_leave(struct ip_vs_service *sv
 				    &iph.saddr, pptr[0],
 				    &iph.daddr, pptr[1],
 				    &daddr, 0,
-				    IP_VS_CONN_F_BYPASS | flags,
+				    IP_VS_CONN_F_BYPASS,
+				    ip_vs_onepacket_enabled(svc, &iph),
 				    NULL);
 		if (cp == NULL)
 			return NF_DROP;

^ permalink raw reply

* Re: [PATCH 2.6.35-rc1] net-next: vmxnet3 fixes [4/5] Do not reset when the device is not opened
From: Shreyas Bhatewara @ 2010-07-16  8:17 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	pv-drivers@vmware.com, Ronghua Zhang, Matthieu Bucchianeri
In-Reply-To: <20100715.183210.226762655.davem@davemloft.net>

On Thu, 15 Jul 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)
> 
> > Is this what you suggest :
> > 
> > ---
> > 
> > Hold rtnl_lock to get the right link state.
> 
> It ought to work, but make sure that it is legal to take the
> RTNL semaphore in all contexts in which this code block
> might be called.
> 

This code block is called only from the workqueue handler, which runs in
process context, so it is legal to take rtnl semaphore.
Tested this code by simulating event interrupts (which schedule this 
code) at considerable frequency while the interface was brought up and
down in a loop. Similar stress testing had revealed the bug originally. 

^ permalink raw reply

* sky2 mac hung again (or still?)
From: Helmut Grohne @ 2010-07-16  7:44 UTC (permalink / raw)
  To: Stephen Hemminger, netdev

Dear sky2 maintainers,

thanks for all your work on the continuously improving stability issues
of the sky2 driver. Unfortunately this battle is not yet over. To aid in
getting this fully working I'd like to give another data point.

Kernel: vanilla 2.6.33.2

lspci output:
05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19)
        Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus)
        Flags: bus master, fast devsel, latency 0, IRQ 36
        Memory at d1000000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at a000 [size=256]
        [virtual] Expansion ROM at 80800000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] MSI: Enable- Count=1/2 Maskable- 64bit+
        Capabilities: [e0] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: sky2

dmesg output after rmmod and modprobe (for a currently very hung card):
sky2 driver version 1.26
sky2 0000:05:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
sky2 0000:05:00.0: setting latency timer to 64
sky2 0000:05:00.0: PCI: Disallowing DAC for device
sky2 0000:05:00.0: Yukon-2 EC chip revision 2
sky2 0000:05:00.0: irq 53 for MSI/MSI-X
sky2 0000:05:00.0: No interrupt generated using MSI, switching to INTx mode.
sky2 eth0: addr 00:15:f2:36:00:e9

The problem (again/still) is as usual:
sky2 eth0: hung mac 124:91 fifo 194 (182:176)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
sky2 eth0: enabling interface

(I can get you more of these hung mac messages with slightly different
numbers if need be.)

This kind of hang happens about once a day for me and it kills my pptp vpn
connection, cause the card takes too long to recover.

Today in contrast the card took the opportunity to crash a bit harder,
so I did not get it up again (maybe a reboot can help later). I was
lucky to have an unused rtl8139too in that machine...

It seems like a precondition for the card to hang is a bit network
traffic. I don't remember getting it crashed with a load of less than
10Mbit symmetric (receive/send) while using this kernel version. On the
other I have seen the card survive this workload for 8 hours a few
times. It does not seem to have any problems with a quick and high
receive load (downloading Debian packages from the next 1Gbit mirror,
usually around 200Mbit).

To me this sounds like a race condition.

Did I miss out any details?

Thanks in advance

Helmut

^ permalink raw reply

* Re: [PATCH 2.6.35-rc1] net-next: vmxnet3 fixes [3/5] Initialize link state at probe time
From: Shreyas Bhatewara @ 2010-07-16  7:51 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	pv-drivers@vmware.com
In-Reply-To: <20100715.224435.45920963.davem@davemloft.net>



On Thu, 15 Jul 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> Date: Thu, 15 Jul 2010 18:20:14 -0700 (PDT)
> 
> > 
> > On Wed, 14 Jul 2010, David Miller wrote:
> > 
> >> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> >> Date: Tue, 13 Jul 2010 17:48:55 -0700 (PDT)
> >> 
> >> > 
> >> > Initialize vmxnet3 link state at probe time
> >> > 
> >> > This change initializes the state of link at the time when driver is
> >> > loaded. The ethtool output for 'link detected' and 'link speed'
> >> > is thus valid even before the interface is brought up.
> >> > 
> >> > Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
> >> 
> >> You should never, ever, call netif_start_queue() on a device which has
> >> not been brought up.
> >> 
> >> But that is what this patch is doing.
> >> 
> > 
> > I do not understand why you say so. vmxnet3_check_link() is called in 
> > probe() with affectTxQueue as false. Hence netif_start_queue() will not be 
> > called before device is brought up.
> > vmxnet3_check_link() is again called with affectTxQueue as true in 
> > vmxnet3_activate_dev() after device was activated.
> 
> Aha, I see how the logic works now.
> 
> But still there is a problem with this patch, please remove the
> driver version bump and resubmit.
> 
> You should only version bump at the last patch in a series.
> 
> Thanks.
> 

---

Initialize vmxnet3 link state at probe time

This change initializes the state of link at the time when driver is
loaded. The ethtool output for 'link detected' and 'link speed'
is thus valid even before the interface is brought up.

Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---
 drivers/net/vmxnet3/vmxnet3_drv.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 7792a44..1e31d40 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -130,7 +130,7 @@ vmxnet3_tq_stop(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
  * Check the link state. This may start or stop the tx queue.
  */
 static void
-vmxnet3_check_link(struct vmxnet3_adapter *adapter)
+vmxnet3_check_link(struct vmxnet3_adapter *adapter, bool affectTxQueue)
 {
 	u32 ret;
 
@@ -143,14 +143,16 @@ vmxnet3_check_link(struct vmxnet3_adapter *adapter)
 		if (!netif_carrier_ok(adapter->netdev))
 			netif_carrier_on(adapter->netdev);
 
-		vmxnet3_tq_start(&adapter->tx_queue, adapter);
+		if (affectTxQueue)
+			vmxnet3_tq_start(&adapter->tx_queue, adapter);
 	} else {
 		printk(KERN_INFO "%s: NIC Link is Down\n",
 		       adapter->netdev->name);
 		if (netif_carrier_ok(adapter->netdev))
 			netif_carrier_off(adapter->netdev);
 
-		vmxnet3_tq_stop(&adapter->tx_queue, adapter);
+		if (affectTxQueue)
+			vmxnet3_tq_stop(&adapter->tx_queue, adapter);
 	}
 }
 
@@ -165,7 +167,7 @@ vmxnet3_process_events(struct vmxnet3_adapter *adapter)
 
 	/* Check if link state has changed */
 	if (events & VMXNET3_ECR_LINK)
-		vmxnet3_check_link(adapter);
+		vmxnet3_check_link(adapter, true);
 
 	/* Check if there is an error on xmit/recv queues */
 	if (events & (VMXNET3_ECR_TQERR | VMXNET3_ECR_RQERR)) {
@@ -1947,7 +1949,7 @@ vmxnet3_activate_dev(struct vmxnet3_adapter *adapter)
 	 * Check link state when first activating device. It will start the
 	 * tx queue if the link is up.
 	 */
-	vmxnet3_check_link(adapter);
+	vmxnet3_check_link(adapter, true);
 
 	napi_enable(&adapter->napi);
 	vmxnet3_enable_all_intrs(adapter);
@@ -2549,6 +2551,7 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 	}
 
 	set_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state);
+	vmxnet3_check_link(adapter, false);
 	atomic_inc(&devices_found);
 	return 0;


^ permalink raw reply related

* Re: Question about way that NICs deliver packets to the kernel
From: Junchang Wang @ 2010-07-16  7:35 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev
In-Reply-To: <20100715211249.GA7278@electric-eye.fr.zoreil.com>

> It is a simple, straightforward fix against a 8169 hardware bug.
>
> See commit c0cd884af045338476b8e69a61fceb3f34ff22f1.
>
Fortunately, it seems my device is unaffected by this issue. :)

Thanks Francois.

-- 
--Junchang

^ permalink raw reply

* Re: [PATCH 1/2] bnx2: allocate with GFP_KERNEL flag on RX path init
From: Stanislaw Gruszka @ 2010-07-16  7:30 UTC (permalink / raw)
  To: Mitchell Erblich; +Cc: netdev, Michael Chan
In-Reply-To: <5AD9FE05-1F9B-44DE-8CC4-4D63F43E79C8@earthlink.net>

On Thu, 15 Jul 2010 11:57:59 -0700
Mitchell Erblich <erblichs@earthlink.net> wrote:

> > @@ -3039,7 +3039,8 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
> > 			rx_pg->page = NULL;
> > 
> > 			err = bnx2_alloc_rx_page(bp, rxr,
> > -						 RX_PG_RING_IDX(pg_prod));
> > +						 RX_PG_RING_IDX(pg_prod),
> > +	
> 
> > 					 GFP_ATOMIC);
> 
> Why not GFP_NOWAIT here?
> This would then not use the last reserved pages of memory.
> This still would remove the possibe sleep asociated with GFP_KERNEL.

There is no GFP_NOWAIT usage in any network driver. I'm not sure if
this flag is intended to driver usage. Anyway I can not judge if
GFP_ATOMIC -> GFP_NOWAIT conversion is good or bad idea, I think you
should ask mm guys about that.

Stanislaw

^ permalink raw reply

* Re: [PATCH 01/11] Removing dead RT2800PCI_SOC
From: Gertjan van Wingerde @ 2010-07-16  7:18 UTC (permalink / raw)
  To: Helmut Schaa
  Cc: Bartlomiej Zolnierkiewicz, Felix Fietkau, John W. Linville,
	Ivo Van Doorn, Christoph Egger, linux-wireless, users, netdev,
	linux-kernel, vamos-dev, Luis Correia
In-Reply-To: <AANLkTilqoYMMKYJT-YYbEzdnytUSYa0EylEiJ4x5xXXH@mail.gmail.com>

On 07/16/10 08:57, Helmut Schaa wrote:
> On Thu, Jul 15, 2010 at 10:41 AM, Bartlomiej Zolnierkiewicz <bzolnier@gmail.com <mailto:bzolnier@gmail.com>> wrote:
> 
>     On Wednesday 14 July 2010 04:44:44 pm Felix Fietkau wrote:
>     > On 2010-07-14 3:15 PM, John W. Linville wrote:
>     > > On Wed, Jul 14, 2010 at 02:52:14PM +0200, Ivo Van Doorn wrote:
>     > >> On Wed, Jul 14, 2010 at 2:46 PM, Luis Correia <luis.f.correia@gmail.com <mailto:luis.f.correia@gmail.com>> wrote:
>     > >> > On Wed, Jul 14, 2010 at 13:39, Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>> wrote:
>     > >> >> While RT2800PCI_SOC exists in Kconfig, it depends on either
>     > >> >> RALINK_RT288X or RALINK_RT305X which are both not available in Kconfig
>     > >> >> so all Code depending on that can't ever be selected and, if there's
>     > >> >> no plan to add these options, should be cleaned up
>     > >> >>
>     > >> >> Signed-off-by: Christoph Egger <siccegge@cs.fau.de <mailto:siccegge@cs.fau.de>>
>     > >> >
>     > >> > NAK,
>     > >> >
>     > >> > this is not dead code, it is needed for the Ralink System-on-Chip
>     > >> > Platform devices.
>     > >> >
>     > >> > While I can't fix Kconfig errors and the current KConfig file may be
>     > >> > wrong, this code cannot and will not be deleted.
>     > >>
>     > >> When the config option was introduced, the config options RALINK_RT288X and
>     > >> RALINK_RT305X were supposed to be merged as well soon after by somebody (Felix?)
>     > >>
>     > >> But since testing is done on SoC boards by Helmut and Felix, I assume the code
>     > >> isn't dead but actually in use.
>     > >
>     > > Perhaps Helmut and Felix can send us the missing code?
>     > The missing code is a MIPS platform port, which is currently being
>     > maintained in OpenWrt, but is not ready for upstream submission yet.
>     > I'm not working on this code at the moment, but I think it will be
>     > submitted once it's ready.
> 
>     People are using automatic scripts to catch unused config options nowadays
>     so the issue is quite likely to come back again sooner or later..
> 
>     Would it be possible to improve situation somehow till the missing parts
>     get merged?  Maybe by adding a tiny comment documenting RT2800PCI_SOC
>     situation to Kconfig (if the config option itself really cannot be removed)
>     until all code is ready etc.?
> 
> 
> Or we could just remove RT2800PCI_SOC completely and build the soc specific
> parts always as part of rt2800pci. I mean it's not much code, just the platform
> driver stuff and the eeprom access.
> 

I'm not sure if that is feasible. Sure, we can reduce the usage of the variable by
unconditionally compiling in the generic SOC code, but we should not unconditionally
register the SOC platform device, which is currently also under the scope of this
Kconfig variable.

---
Gertjan.

^ permalink raw reply

* Re: Kexec Reboot Network Issue
From: Simon Horman @ 2010-07-16  7:15 UTC (permalink / raw)
  To: Richard Genthner
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Matt Carlson,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Michael Chan
In-Reply-To: <4C3F2ACF.60601-2CkDue2ckGJOSnsfY10OVw@public.gmane.org>

[ CCed Michael Chan, Matt Carlson and Netdev ]

On Thu, Jul 15, 2010 at 11:35:43AM -0400, Richard Genthner wrote:
> On 07/15/2010 10:20 AM, Simon Horman wrote:
> >On Thu, Jul 15, 2010 at 07:59:07AM -0400, Richard Genthner wrote:
> >>I'm currently using the following string:
> >>
> >>kexex --type=elf-x86_64 --args-linux -l /boot/vmlinuz-2.6.18-12.el5
> >>--initrid=/boot/initrd-2.6.18-128.el6.img --append="`cat
> >>/proc/cmdline`"
> >>kexec -e
> >>
> >>Some times we can get to the box from any subnet, other times we can
> >>only get to the box from the same subnet only. Our solution to this
> >>is to down the iface and then restart networking. Has anyone else
> >>run into this issue?
> >Hi Richard,
> >
> >could you be more specific about which NIC you are using?
> >And is it at all possible to test a newer kernel version?
> >
> >What I suspect is happening is that the NIC is getting into an unknown
> >state. And what I'm hoping is that is a problem thats already been
> >addressed.
>
> I would try a different kenerl but our cluster fs has us locked to
> this kernel until we finish the upgrade to the new cluster fs
> version. heres ethtool on the iface
> 
> from lshw
> *-network
>                 description: Ethernet interface
>                 product: NetXtreme BCM5721 Gigabit Ethernet PCI Express
>                 vendor: Broadcom Corporation
>                 physical id: 0
>                 bus info: pci@0000:03:00.0
>                 logical name: eth0
>                 version: 21
>                 serial: 00:25:64:3b:9c:ae
>                 size: 1GB/s
>                 capacity: 1GB/s
>                 width: 64 bits
>                 clock: 33MHz
>                 capabilities: pm vpd msi pciexpress bus_master
> cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt
> 1000bt-fd autonegotiation
>                 configuration: autonegotiation=on broadcast=yes
> driver=tg3 driverversion=3.93 duplex=full firmware=5721-v3.65,
> ASFIPMI v6.25 ip=172.16.1.123 latency=0 link=yes module=tg3
> multicast=yes port=twisted pair speed=1GB/s

Hi Richard,

first, please don't top-post, its not the done thing in these parts.

I had a quick hunt through the git change log and the onl changed that
jumped out was "[TG3]: Fix msi issue with kexec/kdump"[1], but this
seems to have been back-ported to the initrd-2.6.18-128.el5 (I assume
you meant 5 not 6) kernel.

[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ee6a99b539a50b4e9398938a0a6d37f8bf911550

In any case I strongly suspect that the problem is a kernel problem and as
such can't be solved modifying the kernel or at least tg.ko module (which
is probably in the initrd). I suggest logging bug report with Red Hat.

^ permalink raw reply

* Re: [PATCH 1/2] bnx2: allocate with GFP_KERNEL flag on RX path init
From: Stanislaw Gruszka @ 2010-07-16  7:13 UTC (permalink / raw)
  To: David Miller; +Cc: mchan, netdev
In-Reply-To: <20100715.202537.116398412.davem@davemloft.net>

On Thu, 15 Jul 2010 20:25:37 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> > This should be GFP_ATOMIC since it is called from NAPI softirq
> > context.
> 
> This fatal issue gives me doubts about whether this patch was even
> tested at all.
> 
> Immediately the kernel memory allocator should have issued a warning
> due to this GFP_KERNEL allocation in a non-sleep'able context.
> 
> Stanislaw, how did you test this patch?

I run net-next-2.6 kernel with patches on machine with bnx2 device,
but I compiled kernel with CONFIG_DEBUG_KOBJECT and all dmesg was filled
by messages like:

kobject: 'block' (ffff8801663122c0): kobject_add_internal: parent: '2:2:1:0', set: '(null)'
kobject: 'sdc' (ffff8801642ca070): kobject_add_internal: parent: 'block', set: 'devices'
kobject: 'sdc' (ffff8801642ca070): kobject_uevent_env

so I missed the warning, grr...

Stanislaw

^ permalink raw reply

* Re: Question about way that NICs deliver packets to the kernel
From: Junchang Wang @ 2010-07-16  7:05 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: romieu, netdev
In-Reply-To: <1279204417.2118.12.camel@achroite.uk.solarflarecom.com>

>
> You should also compare the CPU usage.
>
> Ben.
>
Hi Ben,
I added options -c -C to netperf's command line. Result is as follows:
                    scheme 1    scheme 2    Imp.
Throughput:     683M        718M       5%
CPU usage:     47.8%       45.6%

That really surprised me because "top" command showed the CPU usage
was fluctuating between 0.5% and 1.5% rather that between 45% and 50%.

How can I get the exact CPU usage?

Thanks.

-- 
--Junchang

^ permalink raw reply

* Re: [PATCH 1/2] bnx2: allocate with GFP_KERNEL flag on RX path init
From: Mitchell Erblich @ 2010-07-16  5:57 UTC (permalink / raw)
  To: David Miller; +Cc: mchan, sgruszka, netdev
In-Reply-To: <20100715.202537.116398412.davem@davemloft.net>


On Jul 15, 2010, at 8:25 PM, David Miller wrote:

> From: "Michael Chan" <mchan@broadcom.com>
> Date: Thu, 15 Jul 2010 07:48:40 -0700
> 
>> Stanislaw Gruszka wrote:
>> 
>>> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
>>> ---
>>> @@ -2974,7 +2974,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct
>>> bnx2_rx_ring_info *rxr, struct sk_buff *skb,
>>>      int err;
>>>      u16 prod = ring_idx & 0xffff;
>>> 
>>> -     err = bnx2_alloc_rx_skb(bp, rxr, prod);
>>> +     err = bnx2_alloc_rx_skb(bp, rxr, prod, GFP_KERNEL);
>> 
>> This should be GFP_ATOMIC since it is called from NAPI softirq
>> context.
> 
> This fatal issue gives me doubts about whether this patch was even
> tested at all.
> 
> Immediately the kernel memory allocator should have issued a warning
> due to this GFP_KERNEL allocation in a non-sleep'able context.
> 
> Stanislaw, how did you test this patch?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Group,

		Why NOT GFP_NOWAIT. This won't use the last resource pages
		versus GFP_ATOMIC?

		GFP_ATOMIC IMO, SHOULD be used in the paths that cleans
		and frees pages.

Mitchell Erblich



^ permalink raw reply

* Re: [PATCH] cxgb4vf: fix SGE resource resource deallocation bug
From: David Miller @ 2010-07-16  5:47 UTC (permalink / raw)
  To: leedom; +Cc: netdev
In-Reply-To: <201007151554.43269.leedom@chelsio.com>

From: Casey Leedom <leedom@chelsio.com>
Date: Thu, 15 Jul 2010 15:54:43 -0700

>>From 1ecf246196c43d75e75525211450a64d650f4fbc Mon Sep 17 00:00:00 2001
> From: Casey Leedom <leedom@chelsio.com>
> Date: Thu, 15 Jul 2010 15:50:16 -0700
> Subject: [PATCH] cxgb4vf: fix SGE resource resource deallocation bug
> 
> Fix SGE resource resource deallocation bug.  Forgot to increment the RXQ and
> TXQ cursors in the loop ...
> 
> Signed-off-by: Casey Leedom <leedom@chelsio.com>

Applied, thanks Casey.

^ permalink raw reply

* Re: [PATCH 2.6.35-rc1] net-next: vmxnet3 fixes [3/5] Initialize link state at probe time
From: David Miller @ 2010-07-16  5:44 UTC (permalink / raw)
  To: sbhatewara; +Cc: netdev, linux-kernel, pv-drivers
In-Reply-To: <alpine.LRH.2.00.1007150125240.25472@localhost.localdomain>

From: Shreyas Bhatewara <sbhatewara@vmware.com>
Date: Thu, 15 Jul 2010 18:20:14 -0700 (PDT)

> 
> On Wed, 14 Jul 2010, David Miller wrote:
> 
>> From: Shreyas Bhatewara <sbhatewara@vmware.com>
>> Date: Tue, 13 Jul 2010 17:48:55 -0700 (PDT)
>> 
>> > 
>> > Initialize vmxnet3 link state at probe time
>> > 
>> > This change initializes the state of link at the time when driver is
>> > loaded. The ethtool output for 'link detected' and 'link speed'
>> > is thus valid even before the interface is brought up.
>> > 
>> > Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
>> 
>> You should never, ever, call netif_start_queue() on a device which has
>> not been brought up.
>> 
>> But that is what this patch is doing.
>> 
> 
> I do not understand why you say so. vmxnet3_check_link() is called in 
> probe() with affectTxQueue as false. Hence netif_start_queue() will not be 
> called before device is brought up.
> vmxnet3_check_link() is again called with affectTxQueue as true in 
> vmxnet3_activate_dev() after device was activated.

Aha, I see how the logic works now.

But still there is a problem with this patch, please remove the
driver version bump and resubmit.

You should only version bump at the last patch in a series.

Thanks.

^ permalink raw reply

* Re: [ipmr] ipmr: Don't leak memory if fib lookup fails.
From: David Miller @ 2010-07-16  5:39 UTC (permalink / raw)
  To: greearb; +Cc: netdev
In-Reply-To: <4C3FA22A.7090706@candelatech.com>

From: Ben Greear <greearb@candelatech.com>
Date: Thu, 15 Jul 2010 17:04:58 -0700

> On 07/15/2010 04:22 PM, Ben Greear wrote:
>> This was detected using two mcast router tables.  The
>> pimreg for the second interface did not have a specific
>> mrule, so packets received by it were handled by the
>> default table, which had nothing configured.
> 
> I just realized this should be applied against 2.6.35-pre, not 2.6.34.

Fix looks good, applied to net-2.6, thanks!

^ permalink raw reply

* [RFC PATCH] ipv6: Make IP6CB(skb)->nhoff 16-bit.
From: David Miller @ 2010-07-16  5:30 UTC (permalink / raw)
  To: netdev

Even with jumbograms I cannot see any way in which we would need
to records a larger than 65535 valued next-header offset.

The maximum extension header length is (256 << 3) == 2048.
There are only a handful of extension headers specified which
we'd even accept (say 5 or 6), therefore the largest next-header
offset we'd ever have to contend with is something less than
say 16k.

Therefore make it a u16 instead of a u32.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
Can anyone find any holes in this?  I want to do this so we
can legitimately rip ndisc_nodetype out of struct sk_buff
and stick it into the IP6CB where it belongs.  Currently
there isn't enough space, but with this change there will
be.

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 940e215..ab9e9e8 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -246,7 +246,7 @@ struct inet6_skb_parm {
 	__u16			srcrt;
 	__u16			dst1;
 	__u16			lastopt;
-	__u32			nhoff;
+	__u16			nhoff;
 	__u16			flags;
 #if defined(CONFIG_IPV6_MIP6) || defined(CONFIG_IPV6_MIP6_MODULE)
 	__u16			dsthao;

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox