Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH] bridge: netfilter: fix skb->nf_bridge NULL panic in br_nf_forward_finish
From: Massimo Cetra @ 2012-07-10  6:58 UTC (permalink / raw)
  To: Lin Ming
  Cc: Massimo Cetra, Eric Dumazet, netdev, Stephen Hemminger,
	David S. Miller, Julian Anastasov
In-Reply-To: <CAF1ivSZBMWYc5iKxhX5d_ykkMD4LauFP9M10dBwfmqvpYj=pHg@mail.gmail.com>

On 09/07/2012 14:00, Lin Ming wrote:

>> i spent a couple of days trying to figure out how to reproduce but you were
>> quicker and smarter than me.
>
> Could you also test it ? :-)
>

Of course.

I have already installed a 3.5-rc and a 3.2.22 with this patch and, by 
now, i see no problems.

I'm only waiting a couple of days before reporting, to be sure the issue 
is gone.

Massimo

^ permalink raw reply

* Re: [PATCH net-next 6/6] r8169: support RTL8168G
From: Hayes Wang @ 2012-07-10  7:12 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1341898590-1253-1-git-send-email-hayeswang@realtek.com>

1. Remove rtl_ocpdr_cond. No waiting is needed for mac_ocp_{write / read}.
2. Set ocp_base to OCP_STD_PHY_BASE after rtl8168g_1_hw_phy_config.
---
 drivers/net/ethernet/realtek/r8169.c |   14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index c29c5fb..7269175 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1043,13 +1043,6 @@ static void rtl_w1w0_phy_ocp(struct rtl8169_private *tp, int reg, int p, int m)
 	r8168_phy_ocp_write(tp, reg, (val | p) & ~m);
 }
 
-DECLARE_RTL_COND(rtl_ocpdr_cond)
-{
-	void __iomem *ioaddr = tp->mmio_addr;
-
-	return RTL_R32(OCPDR) & OCPAR_FLAG;
-}
-
 static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -1058,8 +1051,6 @@ static void r8168_mac_ocp_write(struct rtl8169_private *tp, u32 reg, u32 data)
 		return;
 
 	RTL_W32(OCPDR, OCPAR_FLAG | (reg << 15) | data);
-
-	rtl_udelay_loop_wait_low(tp, &rtl_ocpdr_cond, 25, 10);
 }
 
 static u16 r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
@@ -1071,8 +1062,7 @@ static u16 r8168_mac_ocp_read(struct rtl8169_private *tp, u32 reg)
 
 	RTL_W32(OCPDR, reg << 15);
 
-	return rtl_udelay_loop_wait_high(tp, &rtl_ocpdr_cond, 25, 10) ?
-		RTL_R32(OCPDR) : ~0;
+	return RTL_R32(OCPDR);
 }
 
 #define OCP_STD_PHY_BASE	0xa400
@@ -3417,6 +3407,8 @@ static void rtl8168g_1_hw_phy_config(struct rtl8169_private *tp)
 	rtl_w1w0_phy_ocp(tp, 0xa438, 0x8000, 0x0000);
 
 	rtl_w1w0_phy_ocp(tp, 0xc422, 0x4000, 0x2000);
+
+	rtl_writephy(tp, 0x1f, 0x0000);
 }
 
 static void rtl8102e_hw_phy_config(struct rtl8169_private *tp)
-- 
1.7.10.4

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Ming Lei @ 2012-07-10  7:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller
In-Reply-To: <1341895143.3265.4049.camel@edumazet-glaptop>

On Tue, Jul 10, 2012 at 12:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Please dont send private messages for discussing general linux stuff.
>
> Next time I wont reply.
>
> On Tue, 2012-07-10 at 12:00 +0800, Ming Lei wrote:
>> On Mon, Jul 9, 2012 at 9:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2012-07-09 at 21:23 +0800, Ming Lei wrote:
>> >
>> >> Looks the patch replaces skb_clone with netdev_alloc_skb_ip_align and
>> >> introduces extra copies on incoming data, so would you mind explaining
>> >> it in a bit detail? And why is skb_clone not OK for the purpose?
>> >
>> > Problem with cloning is that some paths will have to make a private copy
>> > of the skb.
>>
>> Looks you convert some private copy into all copy in rx path, :-)
>
> For small speed device, a copy is probably unnoticed.

The copy still has some effect on low speed device, for example, your recent
patch on asix driver can improve tx performance from ~75M to ~92M.

>
> rtl8169 does that (copybreak) for security issues on Gbps link speed,
> and I get Gbps link speed on an old AMD host with no problem.
>
> As you discovered, the slowdown comes from SLAB debug on the 30K huge
> skb. To recover from this we must patch usbnet to not constantly
> allocate/free such big RX skb but recycle them. Once we do that, you'll
> find out that copybreak improves general performance on low ram devices
> by an order of magnitude.

Looks your copybreak patch doesn't improve tx performance on smsc95xx.

>> >
>> > So you dont see the cost here in the driver, but later in upper stacks.
>> >
>> > Since this driver defaults to a huge RX area of more than 16Kbytes,
>> > a copy to a much smaller skb (we call this 'copybreak' in our jargon )
>> > is more than welcome to avoid OOM problems anyway.
>>
>> Looks 'memory compaction' has been implemented already to address
>> the big buffer allocation problem.
>
> Usually its too late (not enough ram to perform the compaction), and
> a collapse having to compact 3MB is very expensive and blows cpu caches.
>
> I noticed that on machines with 1GB or 2GB ram. These machines are
> called ChromeBooks and every lost network frame is analyzed in Google.
> And we had problems because some wifi adapters use 8KB skbs for incoming
> frames.

Kernel stack size is 8KB or more, so could you find process creation failure
in your ChromeBooks machine at the same time?

> (Not even 32KB !!! This is just crazy !!)
>
> Relying on TCP collapsing is just very lazy. What about other
> protocols ?
>
> I guess that on beagle this can happen very fast.

Previously I only found there was usbnet OOMs triggered by
kmalloc(GFP_ATOMIC), but kmalloc(GFP_KERNEL) can succeed.
Some times later, the problem disappeared.

>>
>> Also the allocated huge RX SKB buffer will be freed after all cloned buffers
>> are consumed, so I still don't know what is the real problem with cloned buffer.
>>
>
> IF they are consumed.
>
> But IF they arent because application is not fast enough to drain, you
> end with sockets storing huge amount of data in their receive buffer.
>
> So a single 100 bytes payload holds the 32KB block.
>
> If you allowed your UDP socket to store 130.000 bytes of payload, you
> can consume 13.000 * 32KB = ~40 MB

Looks it is one advantage of copybreak.

>
>
>> >
>> > TCP coalescing (skb_try_coalesce) for example wont work for cloned skbs,
>> > so TCP receive window will close pretty fast, and performance sucks in
>> > lossy environments (like the Internet)
>>
>> I didn't observe the above thing, so could you provide a way to reproduce it?
>>
>
> netstat -s can show you interesting TCP counters. But as driver lies on
> skb->truesize, you can also have unexpected crashes with malicious
> senders. With a 64 ratio, its easy to consume all ram.
>
> TCP coalescing is great as soon as you have Out Of Order queueing
> because of packet losses. You avoid expensive collapses and
> dropping/purge of OFO queue. Sender has to resend previously sent data.
>
>> Suppose the above is true, looks skb_clone is useless, isn't it?
>
> cloning has some uses, for example if you dont need to touch packet
> content, only mess with skb->data, skb->len, skb->tail.
>
> But if you need to change a single bit in the payload, or play with skb
> fragments (struct skb_shared_info), you have to make a full copy of the
> 30KB buffer, even if the skb contained only 10 bytes of payload.

So the netdev_alloc_skb_ip_align() can be replaced with skb_clone()
in asix driver since not bits are touched in asix_rx_fixup? The default MTU is
1500 and rx_urb_size is 2048.

If so, could we use copybreak only for case of rx_urb_size > 4096?
And for ax88172, the dev->rx_urb_size is always 2048, looks the copy
is not needed at all.

> I would just switch off turbo mode by default, I doubt it has any
> advantage.

At least for smsc95xx, I think 32K buffer is not worthy of the feature.

>
> Coalescing up to 16K of incoming frames adds latency for no performance
> gain, once you do it the right way (that is without OOM risks).
> Currently, skb->truesize lie is very bad.
>



Thanks,
-- 
Ming Lei

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Bjørn Mork @ 2012-07-10  7:25 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: David Miller, netdev, linux-next, linux-kernel
In-Reply-To: <20120710130848.1014fbe05e5146a33a3c7d39@canb.auug.org.au>

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi all,
>
> After merging the net-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/net/usb/qmi_wwan.c:381:13: error: 'qmi_wwan_unbind_shared' undeclared here (not in a function)
>
> Caused by a bad automatic merge between commit 6fecd35d4cd7 ("net:
> qmi_wwan: add ZTE MF60") from the net tree and commit 230718bda1be ("net:
> qmi_wwan: bind to both control and data interface") from the net-next
> tree.
>
> I added the following merge fix patch:
>
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 10 Jul 2012 13:06:01 +1000
> Subject: [PATCH] net: fix for qmi_wwan_unbind_shared changes
>
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> ---
>  drivers/net/usb/qmi_wwan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index 06cfcc7..85c983d 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -378,7 +378,7 @@ static const struct driver_info qmi_wwan_force_int2 = {
>  	.description	= "Qualcomm WWAN/QMI device",
>  	.flags		= FLAG_WWAN,
>  	.bind		= qmi_wwan_bind_shared,
> -	.unbind		= qmi_wwan_unbind_shared,
> +	.unbind		= qmi_wwan_unbind,
>  	.manage_power	= qmi_wwan_manage_power,
>  	.data		= BIT(2), /* interface whitelist bitmap */
>  };


Looks good.  Thanks.


Bjørn

^ permalink raw reply

* net-next kernel NULL pointer dereference at fib_rules_tclass
From: Or Gerlitz @ 2012-07-10  7:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Shlomo Pongratz, Amir Vadai, Erez Shitrit

Hi Dave,

Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
get the below crash during the boot cycle. The crash happens on a set of
nodes which use igb for their onboard 1g nic, as soon as the device goes
up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
NIC doesn't get this crash, but the kernel there is built by a different
.config .

Or.

Bringing up loopback interface:  [  OK  ]
Bringing up interface eth1:
Determining IP information for eth1...IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Starting system logger: BUG: unable to handle kernel NULL pointer dereference at 00000000000000ac
IP: [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
PGD 223171067 PUD 22353e067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in:
 ipv6 dm_mirror dm_region_hash dm_log uinput igb ptp pps_core mlx4_ib ib_mad ib_core mlx4_en mlx4_core sg kvm_intel kvm microcode pcspkr rng_core ioatdma dca shpchp dm_mod button sr_mod ext3 jbd sd_mod usb_storage ata_piix libata scsi_mod ehci_hcd uhci_hcd floppy [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc5-12540-g061a5c3-dirty #94 Supermicro X7DWU/X7DWU
RIP: 0010:[<ffffffff81320393>]  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
RSP: 0018:ffff88022fc03a30  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88022fc03b54 RCX: 0000000000000050
RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff88022fc03a40
RBP: ffff88022fc03a30 R08: ffff88022fc03a70 R09: ffff88022fc03a40
R10: 0000000000000020 R11: ffff880225390a80 R12: 0000000000000001
R13: ffff88021cc7a000 R14: 0000000000000000 R15: ffff8802269c26c0
FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000ac CR3: 0000000222aeb000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81613410)
Stack:
 ffff88022fc03ac0 ffffffff81318956 ffff8802fd010010 ffff8802232d5a80
 ffff880222add880 ffff880223269a98 0000000000000020 ffff880200000000
 0000000100000000 ffff000000000000 12311eac2540eaf0 ffff88027e001eac
Call Trace:
 <IRQ>

 [<ffffffff81318956>] fib_validate_source+0x170/0x2a5
 [<ffffffff812e6603>] ip_route_input_common+0x6fe/0xd12
 [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
 [<ffffffff812e8461>] ip_rcv_finish+0x151/0x457
 [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
 [<ffffffff812e89a1>] ip_rcv+0x23a/0x260
 [<ffffffff812beae7>] __netif_receive_skb+0x3ac/0x415
 [<ffffffff812be86f>] ? __netif_receive_skb+0x134/0x415
 [<ffffffff81312ae5>] ? inet_gro_receive+0x81/0x23f
 [<ffffffff812b68da>] ? skb_free_head+0x47/0x49
 [<ffffffff812c035d>] netif_receive_skb+0xee/0xf7
 [<ffffffff812c071d>] ? dev_gro_receive+0x15f/0x2fb
 [<ffffffff812c063a>] ? dev_gro_receive+0x7c/0x2fb
 [<ffffffff81065644>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff812c044c>] napi_skb_finish+0x24/0x56
 [<ffffffff812c0bf0>] napi_gro_receive+0x10f/0x11e
 [<ffffffffa0216e85>] igb_poll+0x843/0xae5 [igb]
 [<ffffffff812c0e01>] ? net_rx_action+0x14c/0x1ee
 [<ffffffff812c0d76>] net_rx_action+0xc1/0x1ee
 [<ffffffff8102f746>] __do_softirq+0xff/0x1de
 [<ffffffff813631cc>] call_softirq+0x1c/0x26
 [<ffffffff81003090>] do_softirq+0x38/0x80
 [<ffffffff8102f41f>] irq_exit+0x4e/0x83
 [<ffffffff810028f9>] do_IRQ+0x98/0xaf
 [<ffffffff8135b52c>] common_interrupt+0x6c/0x6c
 <EOI>

 [<ffffffff810083ec>] ? mwait_idle+0x13c/0x208
 [<ffffffff810083e3>] ? mwait_idle+0x133/0x208
 [<ffffffff810088d1>] cpu_idle+0x6e/0xab
 [<ffffffff81343e13>] rest_init+0xc7/0xce
 [<ffffffff81343d4c>] ? csum_partial_copy_generic+0x16c/0x16c
 [<ffffffff8167fbf3>] start_kernel+0x332/0x33f
 [<ffffffff8167f6f6>] ? kernel_init+0x19d/0x19d
 [<ffffffff8167f2b4>] x86_64_start_reservations+0xb8/0xbd
 [<ffffffff8167f3a6>] x86_64_start_kernel+0xed/0xf4
Code: 81 31 c0 e8 a5 bb dd ff 48 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 90 90 90 48 8b 57 20 55 31 c0 48 89 e5 48 85 d2 74 06 <8b> 82 8c 00 00 00 c9 c3 8b 47 7c 33 46 14 85 87 80 00 00 00 55
RIP  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
 RSP <ffff88022fc03a30>
CR2: 00000000000000ac
---[ end trace e7c6714b8de1c341 ]---
Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply

* net-next kernel NULL pointer dereference at fib_rules_tclass
From: Or Gerlitz @ 2012-07-10  7:29 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Amir Vadai, Shlomo Pongratz, Erez Shitrit

Hi Dave,

Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
get the below crash during the boot cycle. The crash happens on a set of
nodes which use igb for their onboard 1g nic, as soon as the device goes
up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
NIC doesn't get this crash, but the kernel there is built by a different
.config

Or.


Bringing up loopback interface:  [  OK  ]
Bringing up interface eth1:
Determining IP information for eth1...IPv6: ADDRCONF(NETDEV_UP): eth1:
link is not ready
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Starting system logger: BUG: unable to handle kernel NULL pointer
dereference at 00000000000000ac
IP: [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
PGD 223171067 PUD 22353e067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in:
  ipv6 dm_mirror dm_region_hash dm_log uinput igb ptp pps_core mlx4_ib
ib_mad ib_core mlx4_en mlx4_core sg kvm_intel kvm microcode pcspkr
rng_core ioatdma dca shpchp dm_mod button sr_mod ext3 jbd sd_mod
usb_storage ata_piix libata scsi_mod ehci_hcd uhci_hcd floppy [last
unloaded: scsi_wait_scan]

Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc5-12540-g061a5c3-dirty #94
Supermicro X7DWU/X7DWU
RIP: 0010:[<ffffffff81320393>]  [<ffffffff81320393>]
fib_rules_tclass+0xf/0x17
RSP: 0018:ffff88022fc03a30  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88022fc03b54 RCX: 0000000000000050
RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff88022fc03a40
RBP: ffff88022fc03a30 R08: ffff88022fc03a70 R09: ffff88022fc03a40
R10: 0000000000000020 R11: ffff880225390a80 R12: 0000000000000001
R13: ffff88021cc7a000 R14: 0000000000000000 R15: ffff8802269c26c0
FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000ac CR3: 0000000222aeb000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task
ffffffff81613410)
Stack:
  ffff88022fc03ac0 ffffffff81318956 ffff8802fd010010 ffff8802232d5a80
  ffff880222add880 ffff880223269a98 0000000000000020 ffff880200000000
  0000000100000000 ffff000000000000 12311eac2540eaf0 ffff88027e001eac
Call Trace:
  <IRQ>

  [<ffffffff81318956>] fib_validate_source+0x170/0x2a5
  [<ffffffff812e6603>] ip_route_input_common+0x6fe/0xd12
  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
  [<ffffffff812e8461>] ip_rcv_finish+0x151/0x457
  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
  [<ffffffff812e89a1>] ip_rcv+0x23a/0x260
  [<ffffffff812beae7>] __netif_receive_skb+0x3ac/0x415
  [<ffffffff812be86f>] ? __netif_receive_skb+0x134/0x415
  [<ffffffff81312ae5>] ? inet_gro_receive+0x81/0x23f
  [<ffffffff812b68da>] ? skb_free_head+0x47/0x49
  [<ffffffff812c035d>] netif_receive_skb+0xee/0xf7
[<ffffffff812c071d>] ? dev_gro_receive+0x15f/0x2fb
  [<ffffffff812c063a>] ? dev_gro_receive+0x7c/0x2fb
  [<ffffffff81065644>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff812c044c>] napi_skb_finish+0x24/0x56
  [<ffffffff812c0bf0>] napi_gro_receive+0x10f/0x11e
  [<ffffffffa0216e85>] igb_poll+0x843/0xae5 [igb]
  [<ffffffff812c0e01>] ? net_rx_action+0x14c/0x1ee
  [<ffffffff812c0d76>] net_rx_action+0xc1/0x1ee
  [<ffffffff8102f746>] __do_softirq+0xff/0x1de
  [<ffffffff813631cc>] call_softirq+0x1c/0x26
  [<ffffffff81003090>] do_softirq+0x38/0x80
  [<ffffffff8102f41f>] irq_exit+0x4e/0x83
  [<ffffffff810028f9>] do_IRQ+0x98/0xaf
  [<ffffffff8135b52c>] common_interrupt+0x6c/0x6c
  <EOI>

  [<ffffffff810083ec>] ? mwait_idle+0x13c/0x208
  [<ffffffff810083e3>] ? mwait_idle+0x133/0x208
  [<ffffffff810088d1>] cpu_idle+0x6e/0xab
  [<ffffffff81343e13>] rest_init+0xc7/0xce
  [<ffffffff81343d4c>] ? csum_partial_copy_generic+0x16c/0x16c
  [<ffffffff8167fbf3>] start_kernel+0x332/0x33f
  [<ffffffff8167f6f6>] ? kernel_init+0x19d/0x19d
  [<ffffffff8167f2b4>] x86_64_start_reservations+0xb8/0xbd
  [<ffffffff8167f3a6>] x86_64_start_kernel+0xed/0xf4
Code: 81 31 c0 e8 a5 bb dd ff 48 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 41
5f c9 c3 90 90 90 48 8b 57 20 55 31 c0 48 89 e5 48 85 d2 74 06 <8b> 82
8c 00 00 00 c9 c3 8b 47 7c 33 46 14 85 87 80 00 00 00 55
RIP  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
  RSP <ffff88022fc03a30>
CR2: 00000000000000ac
---[ end trace e7c6714b8de1c341 ]---
Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply

* Re: 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-07-10  7:40 UTC (permalink / raw)
  To: Joe Jin; +Cc: e1000-devel, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <4FFA9B96.6040901@oracle.com>

When I debug the driver I found before Detected HW hang, driver unable to clean
and reclaim the resources:

1457         while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&  <== at here upper.data always is 0x300
1458                (count < tx_ring->count)) {
     <--- snip --->
1487         }


I checked all driver codes I did not found anywhere will set the upper.data with 
E1000_TXD_STAT_DD, I guess upper.data be set by hardware?
If OS is 32bit system, what which happen?

Thanks in advance,
Joe 

On 07/09/12 16:51, Joe Jin wrote:
> Hi list,
> 
> I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing
> scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy
> a big file (>500M) from another server will hit it at once. 
> 
> Would you please help on this?
> 
> device info:
> # lspci -s 05:00.0 
> 05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
> 
> # lspci -s 05:00.0 -n
> 05:00.0 0200: 8086:10bc (rev 06)
> 
> # ethtool -i eth0
> driver: e1000e
> version: 2.0.0-NAPI
> firmware-version: 5.10-2
> bus-info: 0000:05:00.0
> 
> # ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: on
> udp fragmentation offload: off
> generic segmentation offload: on
> generic-receive-offload: on
> 
> kernel log:
> -----------
> e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
>   TDH                  <6c>
>   TDT                  <81>
>   next_to_use          <81>
>   next_to_clean        <6b>
> buffer_info[next_to_clean]:
>   time_stamp           <fffc7a23>
>   next_to_watch        <71>
>   jiffies              <fffc8c0c>
>   next_to_watch.status <0>
> MAC Status             <80387>
> PHY Status             <792d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
>   TDH                  <6c>
>   TDT                  <81>
>   next_to_use          <81>
>   next_to_clean        <6b>
> buffer_info[next_to_clean]:
>   time_stamp           <fffc7a23>
>   next_to_watch        <71>
>   jiffies              <fffc9bac>
>   next_to_watch.status <0>
> MAC Status             <80387>
> PHY Status             <792d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x225/0x230()
> Hardware name: SUN FIRE X2270 M2
> NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
> Modules linked in: autofs4 hidp rfcomm bluetooth rfkill lockd sunrpc cpufreq_ondemand acpi_cpufreq mperf be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi video sbs sbshc acpi_pad acpi_ipmi ipmi_msghandler parport_pc lp parport e1000e(U) snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device igb snd_pcm_oss serio_raw snd_mixer_oss snd_pcm tpm_infineon snd_timer snd soundcore snd_page_alloc i2c_i801 iTCO_wdt i2c_core pcspkr i7core_edac iTCO_vendor_support ioatdma ghes dca edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage sd_mod crc_t10dif sg ahci libahci ext3 jbd mbcache [last unloaded: microcode]
> Pid: 0, comm: swapper Not tainted 2.6.39-200.24.1.el5uek #1
> Call Trace:
>  [<c07d9ac5>] ? dev_watchdog+0x225/0x230
>  [<c045ba61>] warn_slowpath_common+0x81/0xa0
>  [<c07d9ac5>] ? dev_watchdog+0x225/0x230
>  [<c045bb23>] warn_slowpath_fmt+0x33/0x40
>  [<c07d9ac5>] dev_watchdog+0x225/0x230
>  [<c07d98a0>] ? dev_activate+0xb0/0xb0
>  [<c0468e82>] call_timer_fn+0x32/0xf0
>  [<c04bceb0>] ? rcu_check_callbacks+0x80/0x80
>  [<c046a76d>] run_timer_softirq+0xed/0x1b0
>  [<c07d98a0>] ? dev_activate+0xb0/0xb0
>  [<c0461a81>] __do_softirq+0x91/0x1a0
>  [<c04619f0>] ? local_bh_enable+0x80/0x80
>  <IRQ>  [<c0462295>] ? irq_exit+0x95/0xa0
>  [<c087f8b8>] ? smp_apic_timer_interrupt+0x38/0x42
>  [<c08784f5>] ? apic_timer_interrupt+0x31/0x38
>  [<c046007b>] ? do_exit+0x11b/0x370
>  [<c065eae4>] ? intel_idle+0xa4/0x100
>  [<c078d9b9>] ? cpuidle_idle_call+0xb9/0x1e0
>  [<c0411d77>] ? cpu_idle+0x97/0xd0
>  [<c085cbbd>] ? rest_init+0x5d/0x70
>  [<c0b07a7a>] ? start_kernel+0x28a/0x340
>  [<c0b074b0>] ? obsolete_checksetup+0xb0/0xb0
>  [<c0b070a4>] ? i386_start_kernel+0x64/0xb0
> ---[ end trace 5502b55cd4d4e5cb ]---
> e1000e 0000:05:00.0: eth0: Reset adapter
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 
> Thanks,
> Joe
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 

^ permalink raw reply

* [v2 PATCH] qlge: fix endian issue
From: roy.qing.li @ 2012-07-10  8:02 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

commit 6d29b1ef introduces a bug, ntohs is __be16_to_cpu,
not cpu_to_be16.

We always use htons on IP_OFFSET and IP_MF, then compare
with network package.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
v2 : Change my name
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 09d8d33..7c520fa 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1546,7 +1546,7 @@ static void ql_process_mac_rx_page(struct ql_adapter *qdev,
 			struct iphdr *iph =
 				(struct iphdr *) ((u8 *)addr + ETH_HLEN);
 			if (!(iph->frag_off &
-				cpu_to_be16(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG,
 					     qdev->ndev,
@@ -1654,7 +1654,7 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
 			/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				ntohs(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG,
 					     qdev->ndev,
@@ -1968,7 +1968,7 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
 		/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				ntohs(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG, qdev->ndev,
 					     "TCP checksum done!\n");
-- 
1.7.1

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-10  8:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVPgqtSN3BrEXRxSv4yxaxCni495SxZNXBmYQpagmxk2tQ@mail.gmail.com>

On Tue, 2012-07-10 at 15:22 +0800, Ming Lei wrote:

> Kernel stack size is 8KB or more, so could you find process creation failure
> in your ChromeBooks machine at the same time?

I believe you mix a lot of things.

Have you ever heard of sockets limits ?

All available ram on a machine is not for whoever wants it, thanks God.

No : TCP stack was dropping frames, because of socket limits.

Only because skbs were fat (8KB allocated/truesize, for a single 1500
bytes frame)

If application is fast and read skb as soon as the arrive, no problem is
detected.

But if  application is slow, or a TCP packet is lost on network,
man packets are queued into ofo queue. And eventually not enough room is
avalable -> we drop incoming frames, and sender has to restransmit them.

So instead of loading your web pages as fast as possible, you have to
wait for retransmits.

So you see nothing at all, no kernel logs, no failed memory attempts.

Only its slower than necessary

^ permalink raw reply

* Re: [RFC PATCH] bridge: netfilter: fix skb->nf_bridge NULL panic in br_nf_forward_finish
From: Lin Ming @ 2012-07-10  8:34 UTC (permalink / raw)
  To: Massimo Cetra
  Cc: Massimo Cetra, Eric Dumazet, netdev, Stephen Hemminger,
	David S. Miller, Julian Anastasov
In-Reply-To: <4FFBD289.7050909@navynet.it>

On Tue, Jul 10, 2012 at 2:58 PM, Massimo Cetra <mcetra@navynet.it> wrote:
> On 09/07/2012 14:00, Lin Ming wrote:
>
>>> i spent a couple of days trying to figure out how to reproduce but you
>>> were
>>> quicker and smarter than me.
>>
>>
>> Could you also test it ? :-)
>>
>
> Of course.
>
> I have already installed a 3.5-rc and a 3.2.22 with this patch and, by now,
> i see no problems.
>
> I'm only waiting a couple of days before reporting, to be sure the issue is
> gone.

Then could you reply to below thread after you confirm the issue is gone?

http://marc.info/?l=linux-netdev&m=134165707424765&w=2

Nice to add your "Reported-and-tested-by:".

Thanks,
Lin Ming

>
>
> Massimo

^ permalink raw reply

* Re: [PATCH v2] bridge: netfilter: fix skb->nf_bridge NULL panic in br_nf_forward_finish
From: Simon Horman @ 2012-07-10  8:41 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Lin Ming, Massimo Cetra, Eric Dumazet, netdev, Stephen Hemminger,
	David S. Miller
In-Reply-To: <alpine.LFD.2.00.1207071322490.5927@ja.ssi.bg>

On Sat, Jul 07, 2012 at 01:27:49PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Sat, 7 Jul 2012, Lin Ming wrote:
> 
> > On Sat, 2012-07-07 at 12:48 +0300, Julian Anastasov wrote:
> > > 
> > > 	Very good. Thanks for tracking and fixing this bug.
> > > Can you send a copy to Simon Horman <horms@verge.net.au>
> > > with correct Subject. As this change can go to stable
> > > kernels you can also improve the comments, for example:
> > > 
> > > ipvs: fix oops on NAT reply in br_nf context
> > > 
> > > 	IPVS should not reset skb->nf_bridge in FORWARD hook
> > > by calling nf_reset for NAT replies. It triggers oops in
> > > br_nf_forward_finish.
> > > 
> > > [here follows your corrected description including
> > > the stack trace]
> > 
> > How about below? Can I have your ACK?
> > I'll resend this patch in another mail.
> 
> 	Very good. You can add my
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

Thanks, I will queue this up in my ipvs tree and see
about getting it included in 3.5

It seems to me that this problem has been present since 2.6.37
and thus is stable material.

^ permalink raw reply

* Re: net-next kernel NULL pointer dereference at fib_rules_tclass
From: Lin Ming @ 2012-07-10  8:42 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David Miller, netdev, Shlomo Pongratz, Amir Vadai, Erez Shitrit
In-Reply-To: <alpine.LRH.2.00.1207101008270.9760@ogerlitz.voltaire.com>

On Tue, Jul 10, 2012 at 3:16 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Hi Dave,
>
> Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
> get the below crash during the boot cycle. The crash happens on a set of
> nodes which use igb for their onboard 1g nic, as soon as the device goes
> up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
> NIC doesn't get this crash, but the kernel there is built by a different
> .config .

Hi,

I got similar panic, but not at boot time.
I'll look for the cause.

Regards,
Lin Ming

>
> Or.
>
> Bringing up loopback interface:  [  OK  ]
> Bringing up interface eth1:
> Determining IP information for eth1...IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
> igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> Starting system logger: BUG: unable to handle kernel NULL pointer dereference at 00000000000000ac
> IP: [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
> PGD 223171067 PUD 22353e067 PMD 0
> Oops: 0000 [#1] SMP
> CPU 0
> Modules linked in:
>  ipv6 dm_mirror dm_region_hash dm_log uinput igb ptp pps_core mlx4_ib ib_mad ib_core mlx4_en mlx4_core sg kvm_intel kvm microcode pcspkr rng_core ioatdma dca shpchp dm_mod button sr_mod ext3 jbd sd_mod usb_storage ata_piix libata scsi_mod ehci_hcd uhci_hcd floppy [last unloaded: scsi_wait_scan]
>
> Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc5-12540-g061a5c3-dirty #94 Supermicro X7DWU/X7DWU
> RIP: 0010:[<ffffffff81320393>]  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
> RSP: 0018:ffff88022fc03a30  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff88022fc03b54 RCX: 0000000000000050
> RDX: 0000000000000020 RSI: 0000000000000001 RDI: ffff88022fc03a40
> RBP: ffff88022fc03a30 R08: ffff88022fc03a70 R09: ffff88022fc03a40
> R10: 0000000000000020 R11: ffff880225390a80 R12: 0000000000000001
> R13: ffff88021cc7a000 R14: 0000000000000000 R15: ffff8802269c26c0
> FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000000ac CR3: 0000000222aeb000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81613410)
> Stack:
>  ffff88022fc03ac0 ffffffff81318956 ffff8802fd010010 ffff8802232d5a80
>  ffff880222add880 ffff880223269a98 0000000000000020 ffff880200000000
>  0000000100000000 ffff000000000000 12311eac2540eaf0 ffff88027e001eac
> Call Trace:
>  <IRQ>
>
>  [<ffffffff81318956>] fib_validate_source+0x170/0x2a5
>  [<ffffffff812e6603>] ip_route_input_common+0x6fe/0xd12
>  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
>  [<ffffffff812e8461>] ip_rcv_finish+0x151/0x457
>  [<ffffffff812e8380>] ? ip_rcv_finish+0x70/0x457
>  [<ffffffff812e89a1>] ip_rcv+0x23a/0x260
>  [<ffffffff812beae7>] __netif_receive_skb+0x3ac/0x415
>  [<ffffffff812be86f>] ? __netif_receive_skb+0x134/0x415
>  [<ffffffff81312ae5>] ? inet_gro_receive+0x81/0x23f
>  [<ffffffff812b68da>] ? skb_free_head+0x47/0x49
>  [<ffffffff812c035d>] netif_receive_skb+0xee/0xf7
>  [<ffffffff812c071d>] ? dev_gro_receive+0x15f/0x2fb
>  [<ffffffff812c063a>] ? dev_gro_receive+0x7c/0x2fb
>  [<ffffffff81065644>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff812c044c>] napi_skb_finish+0x24/0x56
>  [<ffffffff812c0bf0>] napi_gro_receive+0x10f/0x11e
>  [<ffffffffa0216e85>] igb_poll+0x843/0xae5 [igb]
>  [<ffffffff812c0e01>] ? net_rx_action+0x14c/0x1ee
>  [<ffffffff812c0d76>] net_rx_action+0xc1/0x1ee
>  [<ffffffff8102f746>] __do_softirq+0xff/0x1de
>  [<ffffffff813631cc>] call_softirq+0x1c/0x26
>  [<ffffffff81003090>] do_softirq+0x38/0x80
>  [<ffffffff8102f41f>] irq_exit+0x4e/0x83
>  [<ffffffff810028f9>] do_IRQ+0x98/0xaf
>  [<ffffffff8135b52c>] common_interrupt+0x6c/0x6c
>  <EOI>
>
>  [<ffffffff810083ec>] ? mwait_idle+0x13c/0x208
>  [<ffffffff810083e3>] ? mwait_idle+0x133/0x208
>  [<ffffffff810088d1>] cpu_idle+0x6e/0xab
>  [<ffffffff81343e13>] rest_init+0xc7/0xce
>  [<ffffffff81343d4c>] ? csum_partial_copy_generic+0x16c/0x16c
>  [<ffffffff8167fbf3>] start_kernel+0x332/0x33f
>  [<ffffffff8167f6f6>] ? kernel_init+0x19d/0x19d
>  [<ffffffff8167f2b4>] x86_64_start_reservations+0xb8/0xbd
>  [<ffffffff8167f3a6>] x86_64_start_kernel+0xed/0xf4
> Code: 81 31 c0 e8 a5 bb dd ff 48 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 90 90 90 48 8b 57 20 55 31 c0 48 89 e5 48 85 d2 74 06 <8b> 82 8c 00 00 00 c9 c3 8b 47 7c 33 46 14 85 87 80 00 00 00 55
> RIP  [<ffffffff81320393>] fib_rules_tclass+0xf/0x17
>  RSP <ffff88022fc03a30>
> CR2: 00000000000000ac
> ---[ end trace e7c6714b8de1c341 ]---
> Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply

* Re: [PATCH] ipvs: fix oops on NAT reply in br_nf context
From: Simon Horman @ 2012-07-10  8:51 UTC (permalink / raw)
  To: Lin Ming
  Cc: Julian Anastasov, Massimo Cetra, Eric Dumazet, David S. Miller,
	netdev
In-Reply-To: <1341656770.8543.3.camel@chief-river-32>

On Sat, Jul 07, 2012 at 06:26:10PM +0800, Lin Ming wrote:
> IPVS should not reset skb->nf_bridge in FORWARD hook
> by calling nf_reset for NAT replies. It triggers oops in
> br_nf_forward_finish.
> 
> [  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> [  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
> [  579.781792] PGD 218f9067 PUD 0 
> [  579.781865] Oops: 0000 [#1] SMP 
> [  579.781945] CPU 0 
> [  579.781983] Modules linked in:
> [  579.782047] 
> [  579.782080] 
> [  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
> [  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
> [  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
> [  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
> [  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
> [  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
> [  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
> [  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
> [  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
> [  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> [  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
> [  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
> [  579.783919] Stack:
> [  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
> [  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
> [  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
> [  579.784477] Call Trace:
> [  579.784523]  <IRQ> 
> [  579.784562] 
> [  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
> [  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
> [  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
> [  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
> [  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
> [  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
> [  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
> [  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
> [  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
> [  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
> [  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
> [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
> [  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
> [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
> [  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
> [  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
> [  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
> [  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
> [  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
> [  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
> [  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
> [  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
> [  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30
> 
> The steps to reproduce as follow,
> 
> 1. On Host1, setup brige br0(192.168.1.106)
> 2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
> 3. Start IPVS service on Host1
>    ipvsadm -A -t 192.168.1.106:80 -s rr
>    ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
> 4. Run apache benchmark on Host2(192.168.1.101)
>    ab -n 1000 http://192.168.1.106/
> 
> ip_vs_reply4
>   ip_vs_out
>     handle_response
>       ip_vs_notrack
>         nf_reset()
>         {
>           skb->nf_bridge = NULL;
>         }
> 
> Actually, IPVS wants in this case just to replace nfct
> with untracked version. So replace the nf_reset(skb) call
> in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
> 
> Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

Actually, I'll queue up this version for 3.5 rather than the previous one
as it has a better title.

As per my previous comment (repeated here for reference) it seems to me
that this problem has been present since 2.6.37 and thus is stable material.

^ permalink raw reply

* Re: [PATCH] net: cgroup: fix access the unallocated memory in netprio cgroup
From: Gao feng @ 2012-07-10  8:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: nhorman, davem, linux-kernel, netdev, lizefan, tj, Eric Dumazet
In-Reply-To: <1341893650.3265.3974.camel@edumazet-glaptop>


> Hi Gao
> 
> Is it still needed to call update_netdev_tables() from write_priomap() ?
> 

Yes, I think it's needed,because read_priomap will show all of the net devices,

But we may add the netdev after create a netprio cgroup, so the new added netdev's
priomap will not be allocated. if we don't call update_netdev_tables in write_priomap,
we may access this unallocated memory.

^ permalink raw reply

* [PATCH] tc: filter: validate filter priority in userspace.
From: Li Wei @ 2012-07-10  8:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev


Because we use the high 16 bits of tcm_info to pass prio value to
kernel, thus it's range would be [0, 0xffff], without validation
in tc when user pass a lager(>65535) priority, the actual priority
set in kernel would confuse the user.

So, add a validation to ensure prio in the range.
---
 tc/tc_filter.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index 207302f..04c3b82 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -105,7 +105,7 @@ int tc_filter_modify(int cmd, unsigned flags, int argc, char **argv)
 			NEXT_ARG();
 			if (prio)
 				duparg("priority", *argv);
-			if (get_u32(&prio, *argv, 0))
+			if (get_u32(&prio, *argv, 0) || prio > 0xFFFF)
 				invarg(*argv, "invalid priority value");
 		} else if (matches(*argv, "protocol") == 0) {
 			__u16 id;
-- 
1.7.1

^ permalink raw reply related

* Re: net-next kernel NULL pointer dereference at fib_rules_tclass
From: David Miller @ 2012-07-10  9:00 UTC (permalink / raw)
  To: mlin; +Cc: ogerlitz, netdev, shlomop, amirv, erezsh
In-Reply-To: <CAF1ivSbw50US9dPxs63C8_hjdBq6K6_He7_Foi5bW1MvefunHw@mail.gmail.com>

From: Lin Ming <mlin@ss.pku.edu.cn>
Date: Tue, 10 Jul 2012 16:42:29 +0800

> On Tue, Jul 10, 2012 at 3:16 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> Hi Dave,
>>
>> Using latest net-next (061a5c316b6526dbc729049a16243ec27937cc31) I
>> get the below crash during the boot cycle. The crash happens on a set of
>> nodes which use igb for their onboard 1g nic, as soon as the device goes
>> up. Another group, that uses a 2nd lab, where the nodes use bnx2 for 1g
>> NIC doesn't get this crash, but the kernel there is built by a different
>> .config .
> 
> Hi,
> 
> I got similar panic, but not at boot time.
> I'll look for the cause.

Don't worry about it, I am sure that I added this bug and therefore
I will fix it.

^ permalink raw reply

* Re: [PATCH net-next 6/6] r8169: support RTL8168G
From: Francois Romieu @ 2012-07-10  9:00 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1341904369-5277-1-git-send-email-hayeswang@realtek.com>

(you should include a Signed-off-by)

Hayes Wang <hayeswang@realtek.com> :
> 1. Remove rtl_ocpdr_cond. No waiting is needed for mac_ocp_{write / read}.

Nit: it would not hurt to do a better job than me and save some commit noise
getting these things right before they pollute the history. :o)

> 2. Set ocp_base to OCP_STD_PHY_BASE after rtl8168g_1_hw_phy_config.

Can't it be stuffed into the firmware ?

The code does not explicitely switch from the PHY access context to
the extra OCP registers one and anything else in rtl8168g_1_hw_phy_config
seems to directly use the addresses it needs. So I'd expect the current
imbalance to come from the firmware, where it would make as much sense to
fix it
-> no imbalance after the firmware is applied
-> no useless instruction if the firmware is not used

-- 
Ueimor

^ permalink raw reply

* Re: [RFC PATCH net-next] ipvs: add missing lock in ip_vs_ftp_init_conn()
From: Simon Horman @ 2012-07-10  9:05 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Xiaotian Feng, netdev, lvs-devel, netfilter-devel, netfilter,
	coreteam, linux-kernel, Xiaotian Feng, Wensong Zhang,
	Pablo Neira Ayuso, Patrick McHardy, David S. Miller
In-Reply-To: <alpine.LFD.2.00.1207030952340.1749@ja.ssi.bg>

On Tue, Jul 03, 2012 at 10:12:41AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Thu, 28 Jun 2012, Xiaotian Feng wrote:
> 
> > We met a kernel panic in 2.6.32.43 kernel:
> > 
> > [2680191.848044] IPVS: ip_vs_conn_hash(): request for already hashed, called from run_timer_softirq+0x175/0x1d0
> > <snip>
> > [2680311.849009] general protection fault: 0000 [#1] SMP
> > [2680311.853001] RIP: 0010:[<ffffffff815f155c>]  [<ffffffff815f155c>] ip_vs_conn_expire+0xdc/0x2f0
> > [2680311.853001] RSP: 0018:ffff880028303e70  EFLAGS: 00010202
> > [2680311.853001] RAX: dead000000200200 RBX: ffff8801aad00b80 RCX: 0000000000001d90
> > [2680311.853001] RDX: dead000000100100 RSI: 000000004fd59800 RDI: ffff8801aad00c08
> > <snip>
> > [2680311.853001] Call Trace:
> > [2680311.853001]  <IRQ>
> > [2680311.853001]  [<ffffffff815f1480>] ? ip_vs_conn_expire+0x0/0x2f0
> > [2680311.853001]  [<ffffffff8104e2a5>] run_timer_softirq+0x175/0x1d0
> > [2680311.853001]  [<ffffffff81021a48>] ? lapic_next_event+0x18/0x20
> > [2680311.853001]  [<ffffffff81049a13>] __do_softirq+0xb3/0x150
> > [2680311.853001]  [<ffffffff8100cc5c>] call_softirq+0x1c/0x30
> > [2680311.853001]  [<ffffffff8100ea9a>] do_softirq+0x4a/0x80
> > [2680311.853001]  [<ffffffff81049957>] irq_exit+0x77/0x80
> > [2680311.853001]  [<ffffffff81021f2c>] smp_apic_timer_interrupt+0x6c/0xa0
> > [2680311.853001]  [<ffffffff8100c633>] apic_timer_interrupt+0x13/0x20
> > [2680311.853001]  <EOI>
> > [2680311.853001]  [<ffffffff81013b52>] ? mwait_idle+0x52/0x70
> > [2680311.853001]  [<ffffffff8100a7b0>] ? enter_idle+0x20/0x30
> > [2680311.853001]  [<ffffffff8100ac62>] ? cpu_idle+0x52/0x80
> > [2680311.853001]  [<ffffffff816d504d>] ? start_secondary+0x19d/0x280
> > 
> > rax and rdx is LIST_POISON1 and LIST_POISON2, so kernel is list_del() on an already deleted
> > connection and result the general protect fault.
> > 
> > The "request for already hashed" warning, told us someone might change the connection flags
> > incorrectly, like described in commit aea9d711, it changes the connection flags, but doesn't
> > put the connection back to the list. So ip_vs_conn_hash() throw a warning and return.
> > Later, when ip_vs_conn_expire fire again, ip_vs_conn_unhash() will find the HASHED connection
> > and list_del() it, then kernel panic happened.
> > 
> > After code review, the only chance that kernel change connection flag without protection is
> > in ip_vs_ftp_init_conn().
> > 
> > Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
> > Cc: Wensong Zhang <wensong@linux-vs.org>
> > Cc: Simon Horman <horms@verge.net.au>
> > Cc: Julian Anastasov <ja@ssi.bg>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Patrick McHardy <kaber@trash.net>
> > Cc: "David S. Miller" <davem@davemloft.net> 
> 
> 	For the fix below:
> 
> Acked-by: Julian Anastasov <ja@ssi.bg>
> 
> 	Simon, the change looks ok. ip_vs_ftp_init_conn is called
> from context where cp->lock is not locked (no double lock), so it
> should be safe for the backup.
> 
> 	Only that the comment is not specifying that we
> fix a problem in the backup server.

Thanks.

I have pushed this to my ipvs branch and will see about getting it included in 3.5.

It appears that this problem has been present since (at least) 2.6.37 and
my feeling is that it is -stable material.


^ permalink raw reply

* Re: [PATCH] net: cgroup: fix access the unallocated memory in netprio cgroup
From: Eric Dumazet @ 2012-07-10  9:15 UTC (permalink / raw)
  To: Gao feng; +Cc: nhorman, davem, linux-kernel, netdev, lizefan, tj, Eric Dumazet
In-Reply-To: <4FFBED84.1030905@cn.fujitsu.com>

On Tue, 2012-07-10 at 16:53 +0800, Gao feng wrote:
> > Hi Gao
> > 
> > Is it still needed to call update_netdev_tables() from write_priomap() ?
> > 
> 
> Yes, I think it's needed,because read_priomap will show all of the net devices,
> 
> But we may add the netdev after create a netprio cgroup, so the new added netdev's
> priomap will not be allocated. if we don't call update_netdev_tables in write_priomap,
> we may access this unallocated memory.
> 

I realize my question was not clear.

If we write in write_priomap() a field of a single netdevice,
why should we allocate memory for all netdevices on the machine ?

So the question was : Do we really need to call
update_netdev_tables(alldevs), instead of extend_netdev_table(dev)

^ permalink raw reply

* Re: [GIT PULL net] IPVS
From: Simon Horman @ 2012-07-10  9:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer
In-Reply-To: <20120430092722.GA6866@1984>

On Mon, Apr 30, 2012 at 11:27:22AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Apr 27, 2012 at 09:53:54AM +0900, Simon Horman wrote:
> > Hi Pablo,
> > 
> > please consider the following 5 changes for 3.4, they are all bug fixes.
> > I would also like these changes considered for stable.
> 
> Please, ping me again once these have hit Linus tree to ask for
> -stable submission.

Sorry for letting this slip through the cracks.

Please consider the following commits which are in Linus's tree for stable.
Or I can submit them directly if that is easier.

There are 7 patches listed below. The first 5 were the patches in this
pull request. The last two were patches in a git pull request
a few days earlier.


commit 8537de8a7ab6681cc72fb0411ab1ba7fdba62dd0
Author: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date:   Thu Apr 26 07:47:44 2012 +0200

    ipvs: kernel oops - do_ip_vs_get_ctl
    
    Change order of init so netns init is ready
    when register ioctl and netlink.
    
    Ver2
    	Whitespace fixes and __init added.
    
    Reported-by: "Ryan O'Hara" <rohara@redhat.com>
    Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 582b8e3eadaec77788c1aa188081a8d5059c42a6
Author: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date:   Thu Apr 26 09:45:35 2012 +0200

    ipvs: take care of return value from protocol init_netns
    
    ip_vs_create_timeout_table() can return NULL
    All functions protocol init_netns is affected of this patch.
    
    Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 4b984cd50bc1b6d492175cd77bfabb78e76ffa67
Author: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date:   Thu Apr 26 09:45:34 2012 +0200

    ipvs: null check of net->ipvs in lblc(r) shedulers
    
    Avoid crash when registering shedulers after
    the IPVS core initialization for netns fails. Do this by
    checking for present core (net->ipvs).
    
    Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 39f618b4fd95ae243d940ec64c961009c74e3333
Author: Julian Anastasov <ja@ssi.bg>
Date:   Wed Apr 25 00:29:58 2012 +0300

    ipvs: reset ipvs pointer in netns
    
    	Make sure net->ipvs is reset on netns cleanup or failed
    initialization. It is needed for IPVS applications to know that
    IPVS core is not loaded in netns.
    
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 8d08d71ce59438a6ef06be5db07966e0c144b74e
Author: Julian Anastasov <ja@ssi.bg>
Date:   Wed Apr 25 00:29:59 2012 +0300

    ipvs: add check in ftp for initialized core
    
    	Avoid crash when registering ip_vs_ftp after
    the IPVS core initialization for netns fails. Do this by
    checking for present core (net->ipvs).
    
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>

commit 8f9b9a2fad47af27e14b037395e03cd8278d96d7
Author: Julian Anastasov <ja@ssi.bg>
Date:   Fri Apr 13 18:08:43 2012 +0300

    ipvs: fix crash in ip_vs_control_net_cleanup on unload
    
    	commit 14e405461e664b777e2a5636e10b2ebf36a686ec (2.6.39)
    ("Add __ip_vs_control_{init,cleanup}_sysctl()")
    introduced regression due to wrong __net_init for
    __ip_vs_control_cleanup_sysctl. This leads to crash when
    the ip_vs module is unloaded.
    
    	Fix it by changing __net_init to __net_exit for
    the function that is already renamed to ip_vs_control_net_cleanup_sysctl.
    
    Signed-off-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit 7118c07a844d367560ee91adb2071bde2fabcdbf
Author: Sasha Levin <levinsasha928@gmail.com>
Date:   Sat Apr 14 12:37:46 2012 -0400

    ipvs: Verify that IP_VS protocol has been registered
    
    The registration of a protocol might fail, there were no checks
    and all registrations were assumed to be correct. This lead to
    NULL ptr dereferences when apps tried registering.
    
    For example:
    
    [ 1293.226051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    [ 1293.227038] IP: [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
    [ 1293.227038] PGD 391de067 PUD 6c20b067 PMD 0
    [ 1293.227038] Oops: 0000 [#1] PREEMPT SMP
    [ 1293.227038] CPU 1
    [ 1293.227038] Pid: 19609, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120405-sasha-dirty #57
    [ 1293.227038] RIP: 0010:[<ffffffff822aacb0>]  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
    [ 1293.227038] RSP: 0018:ffff880038c1dd18  EFLAGS: 00010286
    [ 1293.227038] RAX: ffffffffffffffc0 RBX: 0000000000001500 RCX: 0000000000010000
    [ 1293.227038] RDX: 0000000000000000 RSI: ffff88003a2d5888 RDI: 0000000000000282
    [ 1293.227038] RBP: ffff880038c1dd48 R08: 0000000000000000 R09: 0000000000000000
    [ 1293.227038] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a2d5668
    [ 1293.227038] R13: ffff88003a2d5988 R14: ffff8800696a8ff8 R15: 0000000000000000
    [ 1293.227038] FS:  00007f01930d9700(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
    [ 1293.227038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 1293.227038] CR2: 0000000000000018 CR3: 0000000065dfc000 CR4: 00000000000406e0
    [ 1293.227038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 1293.227038] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 1293.227038] Process trinity (pid: 19609, threadinfo ffff880038c1c000, task ffff88002dc73000)
    [ 1293.227038] Stack:
    [ 1293.227038]  ffff880038c1dd48 00000000fffffff4 ffff8800696aada0 ffff8800694f5580
    [ 1293.227038]  ffffffff8369f1e0 0000000000001500 ffff880038c1dd98 ffffffff822a716b
    [ 1293.227038]  0000000000000000 ffff8800696a8ff8 0000000000000015 ffff8800694f5580
    [ 1293.227038] Call Trace:
    [ 1293.227038]  [<ffffffff822a716b>] ip_vs_app_inc_new+0xdb/0x180
    [ 1293.227038]  [<ffffffff822a7258>] register_ip_vs_app_inc+0x48/0x70
    [ 1293.227038]  [<ffffffff822b2fea>] __ip_vs_ftp_init+0xba/0x140
    [ 1293.227038]  [<ffffffff821c9060>] ops_init+0x80/0x90
    [ 1293.227038]  [<ffffffff821c90cb>] setup_net+0x5b/0xe0
    [ 1293.227038]  [<ffffffff821c9416>] copy_net_ns+0x76/0x100
    [ 1293.227038]  [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
    [ 1293.227038]  [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
    [ 1293.227038]  [<ffffffff810afd1f>] sys_unshare+0xff/0x290
    [ 1293.227038]  [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 1293.227038]  [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
    [ 1293.227038] Code: 89 c7 e8 34 91 3b 00 89 de 66 c1 ee 04 31 de 83 e6 0f 48 83 c6 22 48 c1 e6 04 4a 8b 14 26 49 8d 34 34 48 8d 42 c0 48 39 d6 74 13 <66> 39 58 58 74 22 48 8b 48 40 48 8d 41 c0 48 39 ce 75 ed 49 8d
    [ 1293.227038] RIP  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
    [ 1293.227038]  RSP <ffff880038c1dd18>
    [ 1293.227038] CR2: 0000000000000018
    [ 1293.379284] ---[ end trace 364ab40c7011a009 ]---
    [ 1293.381182] Kernel panic - not syncing: Fatal exception in interrupt
    
    Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Signed-off-by: Simon Horman <horms@verge.net.au>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


^ permalink raw reply

* Re: [RFC PATCH] ppp: add support for L2 multihop / tunnel switching
From: James Chapman @ 2012-07-10  9:32 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: netdev, linux-ppp
In-Reply-To: <20120709141511.GL19462@kvack.org>

On 09/07/12 15:15, Benjamin LaHaise wrote:
> On Mon, Jul 09, 2012 at 12:52:15PM +0100, James Chapman wrote:
>> As a mechanism for switching PPP interfaces together, this patch is
>> good. For L2TP though, I prefer an approach that would be applicable for
>> all L2TP traffic types, not just PPP.
> 
> *nod*  This seems like a reasonable consideration.
> 
>> L2TP supports many different pseudowire types, and this patch will only
>> be useful for tunnel switching between PPP pseudowires. Whereas if we
>> implement it within the L2TP core, rather than in the PPP code, we would
>> get switching between all pseudowire types. If we add this patch and
>> then subsequently add switching between other pseudowires in the L2TP
>> core (which we're likely to want to do), then we're left with two
>> different interfaces for doing L2TP tunnel switching in the kernel.
> 
> At least for ethernet pseudowires, it can already be implemented by using 
> an ethernet bridge device.  Besides PPP and ethernet pseudowires, what 
> other types are supported at present by the L2TP core?

Only those two at the moment, but others (ATM etc) can be added if and
when there is demand. To do this at an L2TP level avoids using two
linked PPP interfaces in the case of PPP and two bridged l2tpeth
interfaces in the case of ethernet. I envisage a new L2TP netlink API to
join the datapaths of two L2TP sessions together with no devices being
needed. It would work for all L2TP session types, now and in the future.

>>> The reasoning behind using dev_queue_xmit() rather than outputting directly 
>>> to another PPP channel is to enable the use of the traffic shaping and 
>>> queuing features of the kernel on multihop sessions.
>>
>> I'm not sure about using a pseudo packet type to do this. For L2TP, it
>> would seem better to add netfilter/tc support for L2TP data packets,
>> which would let people add rules for, say, traffic in L2TP tunnel x /
>> session y. This would avoid the need for ETH_P_PPP and you could then
>> output directly to the ppp channel.
> 
> The downside of an L2TP specific method is that all the mechanisms need to 
> be duplicated, resulting in a much higher maintenance overhead for the 
> code and functionality, not to mention all the tool changes to go along 
> with that.

Could the same argument be applied to other protocols which have
netfilter/tc support already? Adding support for L2TP would seem
consistent with other protocol implementations. It would also mean that
the same rules would work for all L2TP session types.

> As for the pseudo packet type, it may indeed be better to avoid the pseudo 
> packet type for known PPP packet types.  One of the benefits of going the 
> network device route is that it makes it much easier to implement additional 
> functionality like lawful intercept, which would be yet more functionality 
> that would have to be implemented if the mechanism is L2TP specific.  The 
> pseudo packet type would still be needed for forwarding PPP frames that the 
> kernel doesn't know about (all the *CP packet types and MLPPP come to mind)
> 
> I had thought about doing the packet forwarding in a manner similar to the 
> bridging code -- that is, as a pseudowire bridge in the network core that 
> only works between 2 devices.  That approach might work better for L2TP, as 
> it would be able to pass packets of any type between the 2 endpoints.

For L2TP, I think it should be possible to avoid having devices for
switched L2TP sessions.

> 
> 		-ben
> 
-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development



^ permalink raw reply

* Re: [PATCH] net: cgroup: fix access the unallocated memory in netprio cgroup
From: Gao feng @ 2012-07-10  9:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: nhorman, davem, linux-kernel, netdev, lizefan, tj, Eric Dumazet
In-Reply-To: <1341911707.3265.4603.camel@edumazet-glaptop>

于 2012年07月10日 17:15, Eric Dumazet 写道:
> On Tue, 2012-07-10 at 16:53 +0800, Gao feng wrote:
>>> Hi Gao
>>>
>>> Is it still needed to call update_netdev_tables() from write_priomap() ?
>>>
>>
>> Yes, I think it's needed,because read_priomap will show all of the net devices,
>>
>> But we may add the netdev after create a netprio cgroup, so the new added netdev's
>> priomap will not be allocated. if we don't call update_netdev_tables in write_priomap,
>> we may access this unallocated memory.
>>
> 
> I realize my question was not clear.
> 
> If we write in write_priomap() a field of a single netdevice,
> why should we allocate memory for all netdevices on the machine ?
> 
> So the question was : Do we really need to call
> update_netdev_tables(alldevs), instead of extend_netdev_table(dev)
> 
> 

I get it.

You are right,Indeed we only need to call extend_netdev_table
for the netdev witch we want to change.

and I read the commit f5c38208d32412d72b97a4f0d44af0eb39feb20b,
found why we need delay allocation.

I will send a v2 patch.

Thanks!

^ permalink raw reply

* [PATCH iproute2] tc: u32: Fix icmp_code off.
From: Hiroaki SHIMODA @ 2012-07-10  9:53 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

The off of icmp_code is not 20 but 21. Also offmask should be 0 unless
nexthdr+ is specified.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
---
 tc/f_u32.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tc/f_u32.c b/tc/f_u32.c
index 975c0b5..7a04634 100644
--- a/tc/f_u32.c
+++ b/tc/f_u32.c
@@ -531,7 +531,7 @@ static int parse_ip(int *argc_p, char ***argv_p, struct tc_u32_sel *sel)
 		res = parse_u8(&argc, &argv, sel, 20, 0);
 	} else if (strcmp(*argv, "icmp_code") == 0) {
 		NEXT_ARG();
-		res = parse_u8(&argc, &argv, sel, 20, 1);
+		res = parse_u8(&argc, &argv, sel, 21, 0);
 	} else
 		return -1;
 
-- 
1.7.8.6

^ permalink raw reply related

* [PATCH] bridge: fix endian
From: roy.qing.li @ 2012-07-10  9:56 UTC (permalink / raw)
  To: netdev; +Cc: yoshfuji

From: Li RongQing <roy.qing.li@gmail.com>

mld->mld_maxdelay is net endian, so we should use ntohs, not htons

CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 net/bridge/br_multicast.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index b665812..2d9a066 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1160,7 +1160,7 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 			goto out;
 		}
 		mld = (struct mld_msg *) icmp6_hdr(skb);
-		max_delay = msecs_to_jiffies(htons(mld->mld_maxdelay));
+		max_delay = msecs_to_jiffies(ntohs(mld->mld_maxdelay));
 		if (max_delay)
 			group = &mld->mld_mca;
 	} else if (skb->len >= sizeof(*mld2q)) {
-- 
1.7.1

^ permalink raw reply related

* getting warn once around skb_try_coalesce
From: Or Gerlitz @ 2012-07-10  9:54 UTC (permalink / raw)
  To: David Miller, Eric Dumazet
  Cc: netdev@vger.kernel.org, Shlomo Pongratz, Erez Shitrit

Hi Dave, Eric,

Another trace that I see here with net-next is this one-time warning. I 
get it always
on the passive side of TCP, something that seems related to GRO, it 
happens only with
IPoIB, not with mlx4_en and igb (when igb get to work on net-next...)

The latest commit in this area is bad43ca8325f493dcaa0896c2f036276af059c7e
"net: introduce skb_try_coalesce()" from Eric.

Or.

-----------[ cut here ]------------
WARNING: at net/core/skbuff.c:3413 skb_try_coalesce+0x1f8/0x31d()
Hardware name: X7DWU
Modules linked in: drbd lru_cache cn autofs4 sunrpc 8021q ib_ipoib 
rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa 
dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath uinput 
mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb sg joydev kvm microcode 
pcspkr rng_core ioatdma dm_mod dca floppy shpchp button sr_mod ext3 jbd 
usb_storage sd_mod ata_piix libata scsi_mod ehci_hcd uhci_hcd [last 
unloaded: scsi_wait_scan]
Pid: 0, comm: swapper/1 Tainted: G          I 
3.5.0-rc1-00107-gf5bae8a-dirty #57
Call Trace:
  <IRQ>  [<ffffffff8102ab65>] warn_slowpath_common+0x80/0x98
  [<ffffffff8102ab92>] warn_slowpath_null+0x15/0x17
  [<ffffffff812c5a73>] skb_try_coalesce+0x1f8/0x31d
  [<ffffffff8130a6ad>] tcp_try_coalesce+0x4c/0xa0
  [<ffffffff8130a759>] tcp_queue_rcv+0x58/0xe1
  [<ffffffff8130d4ca>] tcp_data_queue+0x1bd/0xa8d
  [<ffffffff8130ecba>] tcp_rcv_established+0x646/0x6fc
  [<ffffffff81314fd7>] ? tcp_v4_rcv+0x427/0xa1b
  [<ffffffff81314892>] tcp_v4_do_rcv+0xd8/0x3f6
  [<ffffffff8136aefb>] ? _raw_spin_lock_nested+0x41/0x48
  [<ffffffff813151a5>] tcp_v4_rcv+0x5f5/0xa1b
  [<ffffffff812f8626>] ip_local_deliver_finish+0x1a1/0x2b2
  [<ffffffff812f84ba>] ? ip_local_deliver_finish+0x35/0x2b2
  [<ffffffff812f87a9>] ip_local_deliver+0x72/0x79
  [<ffffffff812f820d>] ip_rcv_finish+0x399/0x3b1
  [<ffffffff812f845f>] ip_rcv+0x23a/0x260
  [<ffffffff812cd086>] __netif_receive_skb+0x3b2/0x41b
  [<ffffffff812cce0e>] ? __netif_receive_skb+0x13a/0x41b
  [<ffffffff812ce93c>] netif_receive_skb+0xee/0xf7
  [<ffffffff81322512>] ? inet_compat_ioctl+0x1e/0x1e
  [<ffffffff812ceb90>] napi_gro_complete+0x133/0x140
  [<ffffffff812ceaab>] ? napi_gro_complete+0x4e/0x140
  [<ffffffff812ced3d>] dev_gro_receive+0x1a0/0x2fb
  [<ffffffff812cec19>] ? dev_gro_receive+0x7c/0x2fb
  [<ffffffff812cf1c5>] napi_gro_receive+0x105/0x11e
  [<ffffffffa02ed6d4>] ipoib_ib_handle_rx_wc+0x243/0x277 [ib_ipoib]
  [<ffffffffa02ee84e>] ipoib_poll+0xa9/0x12d [ib_ipoib]
  [<ffffffff812cf355>] net_rx_action+0xc1/0x1ee
  [<ffffffff81031e4a>] __do_softirq+0xff/0x1de
  [<ffffffff813735cc>] call_softirq+0x1c/0x30
  [<ffffffff81003174>] do_softirq+0x38/0x80
  [<ffffffff81031b23>] irq_exit+0x4e/0x83
  [<ffffffff810029dd>] do_IRQ+0x98/0xaf
  [<ffffffff8136b92c>] common_interrupt+0x6c/0x6c
  <EOI>  [<ffffffff8100850c>] ? mwait_idle+0x13c/0x208
  [<ffffffff81008503>] ? mwait_idle+0x133/0x208
  [<ffffffff810089f1>] cpu_idle+0x6e/0xab
  [<ffffffff81363763>] start_secondary+0x1b9/0x1bd
---[ end trace fdf1b0e917b37732 ]---

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox